summaryrefslogtreecommitdiff
path: root/doc/neps
diff options
context:
space:
mode:
authorStephan Hoyer <shoyer@google.com>2019-04-11 11:17:31 -0700
committerStephan Hoyer <shoyer@google.com>2019-04-11 11:17:31 -0700
commita386e8cd2ad068ba6783e0a8cd38383d3b89ba63 (patch)
tree632c49170db253c261d72205fb93cf47192d00d4 /doc/neps
parentd7a73f8c700edcf150d59a570e0173b60f84c7a7 (diff)
downloadnumpy-a386e8cd2ad068ba6783e0a8cd38383d3b89ba63.tar.gz
NEP: Update NEP-18 to include the __numpy_implementation__ attribute.
Fix GH12974 This includes two minor changes to NEP-18 that will need to be implemented in NumPy: - The ``__numpy_implementation__`` attribute is documented and officially supported as a way to defer to NumPy's own implementation. - I have noted that dispatcher functions should be given the same name as implementations, which should result in more interprettable error messages.
Diffstat (limited to 'doc/neps')
-rw-r--r--doc/neps/nep-0018-array-function-protocol.rst287
1 files changed, 169 insertions, 118 deletions
diff --git a/doc/neps/nep-0018-array-function-protocol.rst b/doc/neps/nep-0018-array-function-protocol.rst
index ffe780c79..40adab7ce 100644
--- a/doc/neps/nep-0018-array-function-protocol.rst
+++ b/doc/neps/nep-0018-array-function-protocol.rst
@@ -10,6 +10,7 @@ NEP 18 — A dispatch mechanism for NumPy's high level array functions
:Status: Provisional
:Type: Standards Track
:Created: 2018-05-29
+:Updated: 2019-04-11
:Resolution: https://mail.python.org/pipermail/numpy-discussion/2018-August/078493.html
Abstact
@@ -199,6 +200,68 @@ include *all* of the corresponding NumPy function's optional arguments
Optional arguments are only passed in to ``__array_function__`` if they
were explicitly used in the NumPy function call.
+.. note::
+
+ Just like the case for builtin special methods like ``__add__``, properly
+ written ``__array_function__`` methods should always return
+ ``NotImplemented`` when an unknown type is encountered. Otherwise, it will
+ be impossible to correctly overrides NumPy functions from another object
+ if the operation also includes one of your objects.
+
+Defaulting to NumPy's implementation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Some projects may prefer to default to NumPy's implementation, rather than
+explicitly defining implementing a supported API. This allows for incrementally
+overriding NumPy's API in projects that already support it implicitly by
+allowing their objects to be converted into NumPy arrays (e.g., because they
+implemented special methods such as ``__array__``). We don't recommend this
+for most new projects ("Explicit is better than implicit"), but in some cases
+it is the most expedient option.
+
+This is possible by making use of the special ``__numpy_implementation__``
+attribute found on NumPy functions that that support overrides with
+``__array_function__``. ``__numpy_implementation__`` always points back to
+the original NumPy-array specific implementation of a function. These functions
+do not check for a overrides, and instead usually coerce all of their
+array-like arguments to NumPy arrays.
+
+Adapting the previous example:
+
+.. code:: python
+
+ class MyArray:
+ def __array_function__(self, func, types, args, kwargs):
+ # It is still best practice to defer to unrecognized types
+ if not all(issubclass(t, (MyArray, np.ndarray)) for t in types):
+ return NotImplemented
+
+ my_func = HANDLED_FUNCTIONS.get(func)
+ if my_func is None:
+ return func.__numpy_implementation__(*args, **kwargs)
+ return my_func(*args, **kwargs)
+
+ def __array__(self, dtype):
+ # convert this object into a NumPy array
+
+Now, if a NumPy function that isn't explicitly handled is called on
+``MyArray`` object, the operation will act (almost) as if MyArray's
+``__array_function__`` method never existed.
+
+The one difference in behavior is in cases where multiple objects implement
+``__array_function__`` but don't know about each other: NumPy will raise
+``TypeError`` if all methods return ``NotImplemented``, whereas if no arguments
+defined ``__array_function__`` methods it would attempt to coerce all of them
+to NumPy arrays. Of course, you could "fix" this for your class by never
+returning ``NotImplemented`` from your ``__array_function__`` method, but in
+general we think this is a good change: fallback implementations often result
+in unpredictable and undesired behavior.
+
+.. note::
+
+ ``__numpy_implementation__`` was not included as part of the initial opt-in
+ only preview of ``__array_function__`` in NumPy 1.16.
+
Necessary changes within the NumPy codebase itself
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -310,31 +373,29 @@ between these two dispatch protocols, we should
`also update <https://github.com/numpy/numpy/issues/11306>`_
``__array_ufunc__`` to match this behavior.
-Special handling of ``numpy.ndarray``
-'''''''''''''''''''''''''''''''''''''
+The ``__array_function__`` method on ``numpy.ndarray``
+''''''''''''''''''''''''''''''''''''''''''''''''''''''
The use cases for subclasses with ``__array_function__`` are the same as those
-with ``__array_ufunc__``, so ``numpy.ndarray`` should also define a
-``__array_function__`` method mirroring ``ndarray.__array_ufunc__``:
+with ``__array_ufunc__``, so ``numpy.ndarray`` also defines a
+``__array_function__`` method.
+
+``ndarray.__array_function__`` is a trivial case of the "Defaulting to NumPy's
+implementation" strategy described above: *every* NumPy function on NumPy
+arrays is defined by calling NumPy's own implementation if there are other
+overrides:
.. code:: python
def __array_function__(self, func, types, args, kwargs):
- # Cannot handle items that have __array_function__ other than our own.
- for t in types:
- if (hasattr(t, '__array_function__') and
- t.__array_function__ is not ndarray.__array_function__):
- return NotImplemented
-
- # Arguments contain no overrides, so we can safely call the
- # overloaded function again.
- return func(*args, **kwargs)
+ if not all(issubclass(t, ndarray) for t in types):
+ # Defer to any non-subclasses that implement __array_function__
+ return NotImplemented
+ return func.__numpy_implementation__(*args, **kwargs)
-To avoid infinite recursion, the dispatch rules for ``__array_function__`` need
-also the same special case they have for ``__array_ufunc__``: any arguments with
-an ``__array_function__`` method that is identical to
-``numpy.ndarray.__array_function__`` are not be called as
-``__array_function__`` implementations.
+Notice that the ``__numpy_implementation__`` function attribute allows us to
+avoid the special cases for NumPy arrays that were needed in
+``ndarray.__array_ufunc__``.
Changes within NumPy functions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -347,12 +408,12 @@ without effect if no arguments implement the ``__array_function__``
protocol.
In most cases, these functions should written using the
-``array_function_dispatch`` decorator, which also associates dispatcher
-functions:
+``array_function_dispatch`` decorator. Error checking aside, here's what the
+core implementation looks like:
.. code:: python
- def array_function_dispatch(dispatcher):
+ def array_function_dispatch(dispatcher, module=None):
"""Wrap a function for dispatch with the __array_function__ protocol."""
def decorator(implementation):
@functools.wraps(implementation)
@@ -360,58 +421,76 @@ functions:
relevant_args = dispatcher(*args, **kwargs)
return implement_array_function(
implementation, public_api, relevant_args, args, kwargs)
+ if module is not None:
+ public_api.__module__ = module
+ public_api.__numpy_implementation__ = implementation
return public_api
return decorator
# example usage
- def _broadcast_to_dispatcher(array, shape, subok=None):
+ def broadcast_to(array, shape, subok=None):
return (array,)
- @array_function_dispatch(_broadcast_to_dispatcher)
+ @array_function_dispatch(broadcast_to, module='numpy')
def broadcast_to(array, shape, subok=False):
... # existing definition of np.broadcast_to
Using a decorator is great! We don't need to change the definitions of
existing NumPy functions, and only need to write a few additional lines
-for the dispatcher function. We could even reuse a single dispatcher for
-families of functions with the same signature (e.g., ``sum`` and ``prod``).
-For such functions, the largest change could be adding a few lines to the
-docstring to note which arguments are checked for overloads.
+to define dispatcher function. We originally thought that we might want to
+implement dispatching for some NumPy functions without the decorator, but
+so far it seems to every case.
-It's particularly worth calling out the decorator's use of
+Within NumPy's implementation, it's worth calling out the decorator's use of
``functools.wraps``:
- This ensures that the wrapped function has the same name and docstring as
the wrapped NumPy function.
- On Python 3, it also ensures that the decorator function copies the original
function signature, which is important for introspection based tools such as
- auto-complete. If we care about preserving function signatures on Python 2,
- for the `short while longer <http://www.numpy.org/neps/nep-0014-dropping-python2.7-proposal.html>`_
- that NumPy supports Python 2.7, we do could do so by adding a vendored
- dependency on the (single-file, BSD licensed)
- `decorator library <https://github.com/micheles/decorator>`_.
+ auto-complete.
- Finally, it ensures that the wrapped function
`can be pickled <http://gael-varoquaux.info/programming/decoration-in-python-done-right-decorating-and-pickling.html>`_.
-In a few cases, it would not make sense to use the ``array_function_dispatch``
-decorator directly, but override implementation in terms of
-``implement_array_function`` should still be straightforward.
-
-- Functions written entirely in C (e.g., ``np.concatenate``) can't use
- decorators, but they could still use a C equivalent of
- ``implement_array_function``. If performance is not a
- concern, they could also be easily wrapped with a small Python wrapper.
-- ``np.einsum`` does complicated argument parsing to handle two different
- function signatures. It would probably be best to avoid the overhead of
- parsing it twice in the typical case of no overrides.
-
-Fortunately, in each of these cases so far, the functions already has a generic
-signature of the form ``*args, **kwargs``, which means we don't need to worry
-about potential inconsistency between how functions are called and what we pass
-to ``__array_function__``. (In C, arguments for all Python functions are parsed
-from a tuple ``*args`` and dict ``**kwargs``.) This shouldn't stop us from
-writing overrides for functions with non-generic signatures that can't use the
-decorator, but we should consider these cases carefully.
+The example usage illustrates several best practices for writing dispatchers
+relevant to NumPy contributors:
+
+- We gave the "dispatcher" function ``broadcast_to`` the exact same name and
+ arguments as the "implementation" function. The matching arguments are
+ required, because the function generated by ``array_function_dispatch`` will
+ call the dispatcher in *exactly* the same way as it was called. The matching
+ function name isn't strictly necessary, but ensures that Python reports the
+ original function name in error messages if invalid arguments are used, e.g.,
+ ``TypeError: broadcast_to() got an unexpected keyword argument``.
+
+- We passed the ``module`` argument, which in turn sets the ``__module__``
+ attribute on the generated function. This is for the benefit of better error
+ messages, here for errors raised internally by NumPy when no implementation
+ is found, e.g.,
+ ``TypeError: no implementation found for 'numpy.broadcast_to'``. Setting
+ ``__module__`` to the canonical location in NumPy's public API encourages
+ users to use NumPy's public API for identifying functions in
+ ``__array_function__``.
+
+- The dispatcher is a function that returns a tuple, rather than an equivalent
+ (and equally valid) generator using ``yield``:
+
+ .. code:: python
+
+ # example usage
+ def broadcast_to(array, shape, subok=None):
+ yield array
+
+ This is no accident: NumPy's implementation of dispatch for
+ ``__array_function__`` is fastest when dispatcher functions return a builtin
+ sequence type (``tuple`` or ``list``).
+
+ On a related note, it's perfectly fine for dispatchers to return arguments
+ even if in some cases you *know* that they cannot have an
+ ``__array_function__`` method. This can arise for functions with default
+ arguments (e.g., ``None``) or complex signatures. NumPy's dispatching logic
+ sorts out these cases very quickly, so it generally is not worth the trouble
+ of parsing them on your own.
.. note::
@@ -426,10 +505,10 @@ An important virtue of this approach is that it allows for adding new
optional arguments to NumPy functions without breaking code that already
relies on ``__array_function__``.
-This is not a theoretical concern. The implementation of overrides *within*
-functions like ``np.sum()`` rather than defining a new function capturing
-``*args`` and ``**kwargs`` necessitated some awkward gymnastics to ensure that
-the new ``keepdims`` argument is only passed in cases where it is used, e.g.,
+This is not a theoretical concern. NumPy's older, haphazard implementation of
+overrides *within* functions like ``np.sum()`` necessitated some awkward
+gymnastics when we decided to add new optional arguments, e.g., the new
+``keepdims`` argument is only passed in cases where it is used:
.. code:: python
@@ -439,11 +518,12 @@ the new ``keepdims`` argument is only passed in cases where it is used, e.g.,
kwargs['keepdims'] = keepdims
return array.sum(..., **kwargs)
-This also makes it possible to add optional arguments to ``__array_function__``
-implementations incrementally and only in cases where it makes sense. For
-example, a library implementing immutable arrays would not be required to
-explicitly include an unsupported ``out`` argument. Doing this properly for all
-optional arguments is somewhat onerous, e.g.,
+For ``__array_function__`` implementors, this also means that it is possible
+to implement even existing optional arguments incrementally, and only in cases
+where it makes sense. For example, a library implementing immutable arrays
+would not be required to explicitly include an unsupported ``out`` argument in
+the function signature. This can be somewhat onerous to implement properly,
+e.g.,
.. code:: python
@@ -486,6 +566,34 @@ concerned about performance differences measured in microsecond(s) on NumPy
functions, because it's difficult to do *anything* in Python in less than a
microsecond.
+For rare cases where NumPy functions are called in performance critical inner
+loops on small arrays or scalars, it is possible to avoid the overhead of
+dispatching by calling the un-dispatched *implementations* of NumPy functions
+available in the ``__numpy_implementation__`` attribute. For example:
+
+.. code:: python
+
+ dot = getattr(np.dot, '__numpy_implementation__', np.dot)
+
+ def naive_matrix_power(x, n):
+ x = np.array(x)
+ for _ in range(n):
+ dot(x, x, out=x)
+ return x
+
+NumPy will use this internally to minimize overhead for NumPy functions
+defined in terms of other NumPy functions, but
+**we do not recommend it for most users**:
+
+- The specific implementation of overrides is still provisional, so the
+ ``__numpy_implementation__`` attribute on particular functions might go away.
+ For this reason, access to ``__numpy_implementation__`` attribute outside of
+ ``__array_function__`` methods should always be guarded by using
+ ``getattr()`` with a default value.
+- In cases where this makes a difference, you will get far greater speed-ups
+ rewriting your inner loops in a compiled language, e.g., with Cython or
+ Numba.
+
Use outside of NumPy
~~~~~~~~~~~~~~~~~~~~
@@ -669,63 +777,6 @@ However, to work well this would require the possibility of implementing
*some* but not all functions with ``__array_function__``, e.g., as described
in the next section.
-Coercion to a NumPy array as a catch-all fallback
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-With the current design, classes that implement ``__array_function__``
-to overload at least one function implicitly declare an intent to
-implement the entire NumPy API. It's not possible to implement *only*
-``np.concatenate()`` on a type, but fall back to NumPy's default
-behavior of casting with ``np.asarray()`` for all other functions.
-
-This could present a backwards compatibility concern that would
-discourage libraries from adopting ``__array_function__`` in an
-incremental fashion. For example, currently most numpy functions will
-implicitly convert ``pandas.Series`` objects into NumPy arrays, behavior
-that assuredly many pandas users rely on. If pandas implemented
-``__array_function__`` only for ``np.concatenate``, unrelated NumPy
-functions like ``np.nanmean`` would suddenly break on pandas objects by
-raising TypeError.
-
-With ``__array_ufunc__``, it's possible to alleviate this concern by
-casting all arguments to numpy arrays and re-calling the ufunc, but the
-heterogeneous function signatures supported by ``__array_function__``
-make it impossible to implement this generic fallback behavior for
-``__array_function__``.
-
-We could resolve this issue by change the handling of return values in
-``__array_function__`` in either of two possible ways:
-
-1. Change the meaning of all arguments returning ``NotImplemented`` to indicate
- that all arguments should be coerced to NumPy arrays and the operation
- should be retried. However, many array libraries (e.g., scipy.sparse) really
- don't want implicit conversions to NumPy arrays, and often avoid implementing
- ``__array__`` for exactly this reason. Implicit conversions can result in
- silent bugs and performance degradation.
-
- Potentially, we could enable this behavior only for types that implement
- ``__array__``, which would resolve the most problematic cases like
- scipy.sparse. But in practice, a large fraction of classes that present a
- high level API like NumPy arrays already implement ``__array__``. This would
- preclude reliable use of NumPy's high level API on these objects.
-2. Use another sentinel value of some sort, e.g.,
- ``np.NotImplementedButCoercible``, to indicate that a class implementing part
- of NumPy's higher level array API is coercible as a fallback. This is a more
- appealing option.
-
-With either approach, we would need to define additional rules for *how*
-coercible array arguments are coerced. The only sane rule would be to treat
-these return values as equivalent to not defining an
-``__array_function__`` method at all, which means that NumPy functions would
-fall-back to their current behavior of coercing all array-like arguments.
-
-It is not yet clear to us yet if we need an optional like
-``NotImplementedButCoercible``, so for now we propose to defer this issue.
-We can always implement ``np.NotImplementedButCoercible`` at some later time if
-it proves critical to the NumPy community in the future. Importantly, we don't
-think this will stop critical libraries that desire to implement most of the
-high level NumPy API from adopting this proposal.
-
A magic decorator that inspects type annotations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~