NEP: Update NEP-18 to include the __numpy_implementation__ attribute.

Fix GH12974 This includes two minor changes to NEP-18 that will need to be implemented in NumPy: - The ``__numpy_implementation__`` attribute is documented and officially supported as a way to defer to NumPy's own implementation. - I have noted that dispatcher functions should be given the same name as implementations, which should result in more interprettable error messages.
author: Stephan Hoyer <shoyer@google.com> 2019-04-11 11:17:31 -0700
committer: Stephan Hoyer <shoyer@google.com> 2019-04-11 11:17:31 -0700
commit: a386e8cd2ad068ba6783e0a8cd38383d3b89ba63 (patch)
tree: 632c49170db253c261d72205fb93cf47192d00d4 /doc
parent: d7a73f8c700edcf150d59a570e0173b60f84c7a7 (diff)
download: numpy-a386e8cd2ad068ba6783e0a8cd38383d3b89ba63.tar.gz
1 files changed, 169 insertions, 118 deletions
diff --git a/doc/neps/nep-0018-array-function-protocol.rst b/doc/neps/nep-0018-array-function-protocol.rst
index ffe780c79..40adab7ce 100644
--- a/doc/neps/nep-0018-array-function-protocol.rst
+++ b/doc/neps/nep-0018-array-function-protocol.rst
@@ -10,6 +10,7 @@ NEP 18 — A dispatch mechanism for NumPy's high level array functions
 :Status: Provisional
 :Type: Standards Track
 :Created: 2018-05-29
+:Updated: 2019-04-11
 :Resolution: https://mail.python.org/pipermail/numpy-discussion/2018-August/078493.html
 
 Abstact
@@ -199,6 +200,68 @@ include *all* of the corresponding NumPy function's optional arguments
 Optional arguments are only passed in to ``__array_function__`` if they
 were explicitly used in the NumPy function call.
 
+.. note::
+
+    Just like the case for builtin special methods like ``__add__``, properly
+    written ``__array_function__`` methods should always return
+    ``NotImplemented`` when an unknown type is encountered. Otherwise, it will
+    be impossible to correctly overrides NumPy functions from another object
+    if the operation also includes one of your objects.
+
+Defaulting to NumPy's implementation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Some projects may prefer to default to NumPy's implementation, rather than
+explicitly defining implementing a supported API. This allows for incrementally
+overriding NumPy's API in projects that already support it implicitly by
+allowing their objects to be converted into NumPy arrays (e.g., because they
+implemented special methods such as ``__array__``). We don't recommend this
+for most new projects ("Explicit is better than implicit"), but in some cases
+it is the most expedient option.
+
+This is possible by making use of the special ``__numpy_implementation__``
+attribute found on NumPy functions that that support overrides with
+``__array_function__``. ``__numpy_implementation__`` always points back to
+the original NumPy-array specific implementation of a function. These functions
+do not check for a overrides, and instead usually coerce all of their
+array-like arguments to NumPy arrays.
+
+Adapting the previous example:
+
+.. code:: python
+
+    class MyArray:
+        def __array_function__(self, func, types, args, kwargs):
+            # It is still best practice to defer to unrecognized types
+            if not all(issubclass(t, (MyArray, np.ndarray)) for t in types):
+                return NotImplemented
+
+            my_func = HANDLED_FUNCTIONS.get(func)
+            if my_func is None:
+                return func.__numpy_implementation__(*args, **kwargs)
+            return my_func(*args, **kwargs)
+
+        def __array__(self, dtype):
+            # convert this object into a NumPy array
+
+Now, if a NumPy function that isn't explicitly handled is called on
+``MyArray`` object, the operation will act (almost) as if MyArray's
+``__array_function__`` method never existed.
+
+The one difference in behavior is in cases where multiple objects implement
+``__array_function__`` but don't know about each other: NumPy will raise
+``TypeError`` if all methods return ``NotImplemented``, whereas if no arguments
+defined ``__array_function__`` methods it would attempt to coerce all of them
+to NumPy arrays. Of course, you could "fix" this for your class by never
+returning ``NotImplemented`` from your ``__array_function__`` method, but in
+general we think this is a good change: fallback implementations often result
+in unpredictable and undesired behavior.
+
+.. note::
+
+    ``__numpy_implementation__`` was not included as part of the initial opt-in
+    only preview of ``__array_function__`` in NumPy 1.16.
+
 Necessary changes within the NumPy codebase itself
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -310,31 +373,29 @@ between these two dispatch protocols, we should
 `also update <https://github.com/numpy/numpy/issues/11306>`_
 ``__array_ufunc__`` to match this behavior.
 
-Special handling of ``numpy.ndarray``
-'''''''''''''''''''''''''''''''''''''
+The ``__array_function__`` method on ``numpy.ndarray``
+''''''''''''''''''''''''''''''''''''''''''''''''''''''
 
 The use cases for subclasses with ``__array_function__`` are the same as those
-with ``__array_ufunc__``, so ``numpy.ndarray`` should also define a
-``__array_function__`` method mirroring ``ndarray.__array_ufunc__``:
+with ``__array_ufunc__``, so ``numpy.ndarray`` also defines a
+``__array_function__`` method.
+
+``ndarray.__array_function__`` is a trivial case of the "Defaulting to NumPy's
+implementation" strategy described above: *every* NumPy function on NumPy
+arrays is defined by calling NumPy's own implementation if there are other
+overrides:
 
 .. code:: python
 
     def __array_function__(self, func, types, args, kwargs):
-        # Cannot handle items that have __array_function__ other than our own.
-        for t in types:
-            if (hasattr(t, '__array_function__') and
-                    t.__array_function__ is not ndarray.__array_function__):
-                return NotImplemented
-
-        # Arguments contain no overrides, so we can safely call the
-        # overloaded function again.
-        return func(*args, **kwargs)
+        if not all(issubclass(t, ndarray) for t in types):
+            # Defer to any non-subclasses that implement __array_function__
+            return NotImplemented
+        return func.__numpy_implementation__(*args, **kwargs)
 
-To avoid infinite recursion, the dispatch rules for ``__array_function__`` need
-also the same special case they have for ``__array_ufunc__``: any arguments with
-an ``__array_function__`` method that is identical to
-``numpy.ndarray.__array_function__`` are not be called as
-``__array_function__`` implementations.
+Notice that the ``__numpy_implementation__`` function attribute allows us to
+avoid the special cases for NumPy arrays that were needed in
+``ndarray.__array_ufunc__``.
 
 Changes within NumPy functions
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -347,12 +408,12 @@ without effect if no arguments implement the ``__array_function__``
 protocol.
 
 In most cases, these functions should written using the
-``array_function_dispatch`` decorator, which also associates dispatcher
-functions:
+``array_function_dispatch`` decorator. Error checking aside, here's what the
+core implementation looks like:
 
 .. code:: python
 
-    def array_function_dispatch(dispatcher):
+    def array_function_dispatch(dispatcher, module=None):
         """Wrap a function for dispatch with the __array_function__ protocol."""
         def decorator(implementation):
             @functools.wraps(implementation)
@@ -360,58 +421,76 @@ functions:
                 relevant_args = dispatcher(*args, **kwargs)
                 return implement_array_function(
                     implementation, public_api, relevant_args, args, kwargs)
+            if module is not None:
+                public_api.__module__ = module
+            public_api.__numpy_implementation__ = implementation
             return public_api
         return decorator
 
     # example usage
-    def _broadcast_to_dispatcher(array, shape, subok=None):
+    def broadcast_to(array, shape, subok=None):
         return (array,)
 
-    @array_function_dispatch(_broadcast_to_dispatcher)
+    @array_function_dispatch(broadcast_to, module='numpy')
     def broadcast_to(array, shape, subok=False):
         ...  # existing definition of np.broadcast_to
 
 Using a decorator is great! We don't need to change the definitions of
 existing NumPy functions, and only need to write a few additional lines
-for the dispatcher function. We could even reuse a single dispatcher for
-families of functions with the same signature (e.g., ``sum`` and ``prod``).
-For such functions, the largest change could be adding a few lines to the
-docstring to note which arguments are checked for overloads.
+to define dispatcher function. We originally thought that we might want to
+implement dispatching for some NumPy functions without the decorator, but
+so far it seems to every case.
 
-It's particularly worth calling out the decorator's use of
+Within NumPy's implementation, it's worth calling out the decorator's use of
 ``functools.wraps``:
 
 - This ensures that the wrapped function has the same name and docstring as
   the wrapped NumPy function.
 - On Python 3, it also ensures that the decorator function copies the original
   function signature, which is important for introspection based tools such as
-  auto-complete. If we care about preserving function signatures on Python 2,
-  for the `short while longer <http://www.numpy.org/neps/nep-0014-dropping-python2.7-proposal.html>`_
-  that NumPy supports Python 2.7, we do could do so by adding a vendored
-  dependency on the (single-file, BSD licensed)
-  `decorator library <https://github.com/micheles/decorator>`_.
+  auto-complete.
 - Finally, it ensures that the wrapped function
   `can be pickled <http://gael-varoquaux.info/programming/decoration-in-python-done-right-decorating-and-pickling.html>`_.
 
-In a few cases, it would not make sense to use the ``array_function_dispatch``
-decorator directly, but override implementation in terms of
-``implement_array_function`` should still be straightforward.
-
-- Functions written entirely in C (e.g., ``np.concatenate``) can't use
-  decorators, but they could still use a C equivalent of
-  ``implement_array_function``. If performance is not a
-  concern, they could also be easily wrapped with a small Python wrapper.
-- ``np.einsum`` does complicated argument parsing to handle two different
-  function signatures. It would probably be best to avoid the overhead of
-  parsing it twice in the typical case of no overrides.
-
-Fortunately, in each of these cases so far, the functions already has a generic
-signature of the form ``*args, **kwargs``, which means we don't need to worry
-about potential inconsistency between how functions are called and what we pass
-to ``__array_function__``. (In C, arguments for all Python functions are parsed
-from a tuple ``*args`` and dict ``**kwargs``.) This shouldn't stop us from
-writing overrides for functions with non-generic signatures that can't use the
-decorator, but we should consider these cases carefully.
+The example usage illustrates several best practices for writing dispatchers
+relevant to NumPy contributors:
+
+- We gave the "dispatcher" function ``broadcast_to`` the exact same name and
+  arguments as the "implementation" function. The matching arguments are
+  required, because the function generated by ``array_function_dispatch`` will
+  call the dispatcher in *exactly* the same way as it was called. The matching
+  function name isn't strictly necessary, but ensures that Python reports the
+  original function name in error messages if invalid arguments are used, e.g.,
+  ``TypeError: broadcast_to() got an unexpected keyword argument``.
+
+- We passed the ``module`` argument, which in turn sets the  ``__module__``
+  attribute on the generated function. This is for the benefit of better error
+  messages, here for errors raised internally by NumPy when no implementation
+  is found, e.g.,
+  ``TypeError: no implementation found for 'numpy.broadcast_to'``. Setting
+  ``__module__`` to the canonical location in NumPy's public API encourages
+  users to use NumPy's public API for identifying functions in
+  ``__array_function__``.
+
+- The dispatcher is a function that returns a tuple, rather than an equivalent
+  (and equally valid) generator using ``yield``:
+
+  .. code:: python
+
+    # example usage
+    def broadcast_to(array, shape, subok=None):
+        yield array
+
+  This is no accident: NumPy's implementation of dispatch for
+  ``__array_function__`` is fastest when dispatcher functions return a builtin
+  sequence type (``tuple`` or ``list``).
+
+  On a related note, it's perfectly fine for dispatchers to return arguments
+  even if in some cases you *know* that they cannot have an
+  ``__array_function__`` method. This can arise for functions with default
+  arguments (e.g., ``None``) or complex signatures. NumPy's dispatching logic
+  sorts out these cases very quickly, so it generally is not worth the trouble
+  of parsing them on your own.
 
 .. note::
 
@@ -426,10 +505,10 @@ An important virtue of this approach is that it allows for adding new
 optional arguments to NumPy functions without breaking code that already
 relies on ``__array_function__``.
 
-This is not a theoretical concern. The implementation of overrides *within*
-functions like ``np.sum()`` rather than defining a new function capturing
-``*args`` and ``**kwargs`` necessitated some awkward gymnastics to ensure that
-the new ``keepdims`` argument is only passed in cases where it is used, e.g.,
+This is not a theoretical concern. NumPy's older, haphazard implementation of
+overrides *within* functions like ``np.sum()`` necessitated some awkward
+gymnastics when we decided to add new optional arguments, e.g., the new
+``keepdims`` argument is only passed in cases where it is used:
 
 .. code:: python
 
@@ -439,11 +518,12 @@ the new ``keepdims`` argument is only passed in cases where it is used, e.g.,
             kwargs['keepdims'] = keepdims
         return array.sum(..., **kwargs)
 
-This also makes it possible to add optional arguments to ``__array_function__``
-implementations incrementally and only in cases where it makes sense. For
-example, a library implementing immutable arrays would not be required to
-explicitly include an unsupported ``out`` argument. Doing this properly for all
-optional arguments is somewhat onerous, e.g.,
+For ``__array_function__`` implementors, this also means that it is possible
+to implement even existing optional arguments incrementally, and only in cases
+where it makes sense. For example, a library implementing immutable arrays
+would not be required to explicitly include an unsupported ``out`` argument in
+the function signature. This can be somewhat onerous to implement properly,
+e.g.,
 
 .. code:: python
 
@@ -486,6 +566,34 @@ concerned about performance differences measured in microsecond(s) on NumPy
 functions, because it's difficult to do *anything* in Python in less than a
 microsecond.
 
+For rare cases where NumPy functions are called in performance critical inner
+loops on small arrays or scalars, it is possible to avoid the overhead of
+dispatching by calling the un-dispatched *implementations* of NumPy functions
+available in the ``__numpy_implementation__`` attribute. For example:
+
+.. code:: python
+
+    dot = getattr(np.dot, '__numpy_implementation__', np.dot)
+
+    def naive_matrix_power(x, n):
+        x = np.array(x)
+        for _ in range(n):
+            dot(x, x, out=x)
+        return x
+
+NumPy will use this internally to minimize overhead for NumPy functions
+defined in terms of other NumPy functions, but
+**we do not recommend it for most users**:
+
+- The specific implementation of overrides is still provisional, so the
+  ``__numpy_implementation__`` attribute on particular functions might go away.
+  For this reason, access to ``__numpy_implementation__`` attribute outside of
+  ``__array_function__`` methods should always be guarded by using
+  ``getattr()`` with a default value.
+- In cases where this makes a difference, you will get far greater speed-ups
+  rewriting your inner loops in a compiled language, e.g., with Cython or
+  Numba.
+
 Use outside of NumPy
 ~~~~~~~~~~~~~~~~~~~~
 
@@ -669,63 +777,6 @@ However, to work well this would require the possibility of implementing
 *some* but not all functions with ``__array_function__``, e.g., as described
 in the next section.
 
-Coercion to a NumPy array as a catch-all fallback
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-With the current design, classes that implement ``__array_function__``
-to overload at least one function implicitly declare an intent to
-implement the entire NumPy API. It's not possible to implement *only*
-``np.concatenate()`` on a type, but fall back to NumPy's default
-behavior of casting with ``np.asarray()`` for all other functions.
-
-This could present a backwards compatibility concern that would
-discourage libraries from adopting ``__array_function__`` in an
-incremental fashion. For example, currently most numpy functions will
-implicitly convert ``pandas.Series`` objects into NumPy arrays, behavior
-that assuredly many pandas users rely on. If pandas implemented
-``__array_function__`` only for ``np.concatenate``, unrelated NumPy
-functions like ``np.nanmean`` would suddenly break on pandas objects by
-raising TypeError.
-
-With ``__array_ufunc__``, it's possible to alleviate this concern by
-casting all arguments to numpy arrays and re-calling the ufunc, but the
-heterogeneous function signatures supported by ``__array_function__``
-make it impossible to implement this generic fallback behavior for
-``__array_function__``.
-
-We could resolve this issue by change the handling of return values in
-``__array_function__`` in either of two possible ways:
-
-1. Change the meaning of all arguments returning ``NotImplemented`` to indicate
-   that all arguments should be coerced to NumPy arrays and the operation
-   should be retried. However, many array libraries (e.g., scipy.sparse) really
-   don't want implicit conversions to NumPy arrays, and often avoid implementing
-   ``__array__`` for exactly this reason. Implicit conversions can result in
-   silent bugs and performance degradation.
-
-   Potentially, we could enable this behavior only for types that implement
-   ``__array__``, which would resolve the most problematic cases like
-   scipy.sparse. But in practice, a large fraction of classes that present a
-   high level API like NumPy arrays already implement ``__array__``. This would
-   preclude reliable use of NumPy's high level API on these objects.
-2. Use another sentinel value of some sort, e.g.,
-   ``np.NotImplementedButCoercible``, to indicate that a class implementing part
-   of NumPy's higher level array API is coercible as a fallback. This is a more
-   appealing option.
-
-With either approach, we would need to define additional rules for *how*
-coercible array arguments are coerced. The only sane rule would be to treat
-these return values as equivalent to not defining an
-``__array_function__`` method at all, which means that NumPy functions would
-fall-back to their current behavior of coercing all array-like arguments.
-
-It is not yet clear to us yet if we need an optional like
-``NotImplementedButCoercible``, so for now we propose to defer this issue.
-We can always implement ``np.NotImplementedButCoercible`` at some later time if
-it proves critical to the NumPy community in the future. Importantly, we don't
-think this will stop critical libraries that desire to implement most of the
-high level NumPy API from adopting this proposal.
-
 A magic decorator that inspects type annotations
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
author	Stephan Hoyer <shoyer@google.com>	2019-04-11 11:17:31 -0700
committer	Stephan Hoyer <shoyer@google.com>	2019-04-11 11:17:31 -0700
commit	a386e8cd2ad068ba6783e0a8cd38383d3b89ba63 (patch)
tree	632c49170db253c261d72205fb93cf47192d00d4 /doc
parent	d7a73f8c700edcf150d59a570e0173b60f84c7a7 (diff)
download	numpy-a386e8cd2ad068ba6783e0a8cd38383d3b89ba63.tar.gz