diff options
author | Stephan Hoyer <shoyer@google.com> | 2019-04-11 11:17:31 -0700 |
---|---|---|
committer | Stephan Hoyer <shoyer@google.com> | 2019-04-11 11:17:31 -0700 |
commit | a386e8cd2ad068ba6783e0a8cd38383d3b89ba63 (patch) | |
tree | 632c49170db253c261d72205fb93cf47192d00d4 /doc | |
parent | d7a73f8c700edcf150d59a570e0173b60f84c7a7 (diff) | |
download | numpy-a386e8cd2ad068ba6783e0a8cd38383d3b89ba63.tar.gz |
NEP: Update NEP-18 to include the __numpy_implementation__ attribute.
Fix GH12974
This includes two minor changes to NEP-18 that will need to be implemented in
NumPy:
- The ``__numpy_implementation__`` attribute is documented and officially
supported as a way to defer to NumPy's own implementation.
- I have noted that dispatcher functions should be given the same name as
implementations, which should result in more interprettable error messages.
Diffstat (limited to 'doc')
-rw-r--r-- | doc/neps/nep-0018-array-function-protocol.rst | 287 |
1 files changed, 169 insertions, 118 deletions
diff --git a/doc/neps/nep-0018-array-function-protocol.rst b/doc/neps/nep-0018-array-function-protocol.rst index ffe780c79..40adab7ce 100644 --- a/doc/neps/nep-0018-array-function-protocol.rst +++ b/doc/neps/nep-0018-array-function-protocol.rst @@ -10,6 +10,7 @@ NEP 18 — A dispatch mechanism for NumPy's high level array functions :Status: Provisional :Type: Standards Track :Created: 2018-05-29 +:Updated: 2019-04-11 :Resolution: https://mail.python.org/pipermail/numpy-discussion/2018-August/078493.html Abstact @@ -199,6 +200,68 @@ include *all* of the corresponding NumPy function's optional arguments Optional arguments are only passed in to ``__array_function__`` if they were explicitly used in the NumPy function call. +.. note:: + + Just like the case for builtin special methods like ``__add__``, properly + written ``__array_function__`` methods should always return + ``NotImplemented`` when an unknown type is encountered. Otherwise, it will + be impossible to correctly overrides NumPy functions from another object + if the operation also includes one of your objects. + +Defaulting to NumPy's implementation +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Some projects may prefer to default to NumPy's implementation, rather than +explicitly defining implementing a supported API. This allows for incrementally +overriding NumPy's API in projects that already support it implicitly by +allowing their objects to be converted into NumPy arrays (e.g., because they +implemented special methods such as ``__array__``). We don't recommend this +for most new projects ("Explicit is better than implicit"), but in some cases +it is the most expedient option. + +This is possible by making use of the special ``__numpy_implementation__`` +attribute found on NumPy functions that that support overrides with +``__array_function__``. ``__numpy_implementation__`` always points back to +the original NumPy-array specific implementation of a function. These functions +do not check for a overrides, and instead usually coerce all of their +array-like arguments to NumPy arrays. + +Adapting the previous example: + +.. code:: python + + class MyArray: + def __array_function__(self, func, types, args, kwargs): + # It is still best practice to defer to unrecognized types + if not all(issubclass(t, (MyArray, np.ndarray)) for t in types): + return NotImplemented + + my_func = HANDLED_FUNCTIONS.get(func) + if my_func is None: + return func.__numpy_implementation__(*args, **kwargs) + return my_func(*args, **kwargs) + + def __array__(self, dtype): + # convert this object into a NumPy array + +Now, if a NumPy function that isn't explicitly handled is called on +``MyArray`` object, the operation will act (almost) as if MyArray's +``__array_function__`` method never existed. + +The one difference in behavior is in cases where multiple objects implement +``__array_function__`` but don't know about each other: NumPy will raise +``TypeError`` if all methods return ``NotImplemented``, whereas if no arguments +defined ``__array_function__`` methods it would attempt to coerce all of them +to NumPy arrays. Of course, you could "fix" this for your class by never +returning ``NotImplemented`` from your ``__array_function__`` method, but in +general we think this is a good change: fallback implementations often result +in unpredictable and undesired behavior. + +.. note:: + + ``__numpy_implementation__`` was not included as part of the initial opt-in + only preview of ``__array_function__`` in NumPy 1.16. + Necessary changes within the NumPy codebase itself ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -310,31 +373,29 @@ between these two dispatch protocols, we should `also update <https://github.com/numpy/numpy/issues/11306>`_ ``__array_ufunc__`` to match this behavior. -Special handling of ``numpy.ndarray`` -''''''''''''''''''''''''''''''''''''' +The ``__array_function__`` method on ``numpy.ndarray`` +'''''''''''''''''''''''''''''''''''''''''''''''''''''' The use cases for subclasses with ``__array_function__`` are the same as those -with ``__array_ufunc__``, so ``numpy.ndarray`` should also define a -``__array_function__`` method mirroring ``ndarray.__array_ufunc__``: +with ``__array_ufunc__``, so ``numpy.ndarray`` also defines a +``__array_function__`` method. + +``ndarray.__array_function__`` is a trivial case of the "Defaulting to NumPy's +implementation" strategy described above: *every* NumPy function on NumPy +arrays is defined by calling NumPy's own implementation if there are other +overrides: .. code:: python def __array_function__(self, func, types, args, kwargs): - # Cannot handle items that have __array_function__ other than our own. - for t in types: - if (hasattr(t, '__array_function__') and - t.__array_function__ is not ndarray.__array_function__): - return NotImplemented - - # Arguments contain no overrides, so we can safely call the - # overloaded function again. - return func(*args, **kwargs) + if not all(issubclass(t, ndarray) for t in types): + # Defer to any non-subclasses that implement __array_function__ + return NotImplemented + return func.__numpy_implementation__(*args, **kwargs) -To avoid infinite recursion, the dispatch rules for ``__array_function__`` need -also the same special case they have for ``__array_ufunc__``: any arguments with -an ``__array_function__`` method that is identical to -``numpy.ndarray.__array_function__`` are not be called as -``__array_function__`` implementations. +Notice that the ``__numpy_implementation__`` function attribute allows us to +avoid the special cases for NumPy arrays that were needed in +``ndarray.__array_ufunc__``. Changes within NumPy functions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -347,12 +408,12 @@ without effect if no arguments implement the ``__array_function__`` protocol. In most cases, these functions should written using the -``array_function_dispatch`` decorator, which also associates dispatcher -functions: +``array_function_dispatch`` decorator. Error checking aside, here's what the +core implementation looks like: .. code:: python - def array_function_dispatch(dispatcher): + def array_function_dispatch(dispatcher, module=None): """Wrap a function for dispatch with the __array_function__ protocol.""" def decorator(implementation): @functools.wraps(implementation) @@ -360,58 +421,76 @@ functions: relevant_args = dispatcher(*args, **kwargs) return implement_array_function( implementation, public_api, relevant_args, args, kwargs) + if module is not None: + public_api.__module__ = module + public_api.__numpy_implementation__ = implementation return public_api return decorator # example usage - def _broadcast_to_dispatcher(array, shape, subok=None): + def broadcast_to(array, shape, subok=None): return (array,) - @array_function_dispatch(_broadcast_to_dispatcher) + @array_function_dispatch(broadcast_to, module='numpy') def broadcast_to(array, shape, subok=False): ... # existing definition of np.broadcast_to Using a decorator is great! We don't need to change the definitions of existing NumPy functions, and only need to write a few additional lines -for the dispatcher function. We could even reuse a single dispatcher for -families of functions with the same signature (e.g., ``sum`` and ``prod``). -For such functions, the largest change could be adding a few lines to the -docstring to note which arguments are checked for overloads. +to define dispatcher function. We originally thought that we might want to +implement dispatching for some NumPy functions without the decorator, but +so far it seems to every case. -It's particularly worth calling out the decorator's use of +Within NumPy's implementation, it's worth calling out the decorator's use of ``functools.wraps``: - This ensures that the wrapped function has the same name and docstring as the wrapped NumPy function. - On Python 3, it also ensures that the decorator function copies the original function signature, which is important for introspection based tools such as - auto-complete. If we care about preserving function signatures on Python 2, - for the `short while longer <http://www.numpy.org/neps/nep-0014-dropping-python2.7-proposal.html>`_ - that NumPy supports Python 2.7, we do could do so by adding a vendored - dependency on the (single-file, BSD licensed) - `decorator library <https://github.com/micheles/decorator>`_. + auto-complete. - Finally, it ensures that the wrapped function `can be pickled <http://gael-varoquaux.info/programming/decoration-in-python-done-right-decorating-and-pickling.html>`_. -In a few cases, it would not make sense to use the ``array_function_dispatch`` -decorator directly, but override implementation in terms of -``implement_array_function`` should still be straightforward. - -- Functions written entirely in C (e.g., ``np.concatenate``) can't use - decorators, but they could still use a C equivalent of - ``implement_array_function``. If performance is not a - concern, they could also be easily wrapped with a small Python wrapper. -- ``np.einsum`` does complicated argument parsing to handle two different - function signatures. It would probably be best to avoid the overhead of - parsing it twice in the typical case of no overrides. - -Fortunately, in each of these cases so far, the functions already has a generic -signature of the form ``*args, **kwargs``, which means we don't need to worry -about potential inconsistency between how functions are called and what we pass -to ``__array_function__``. (In C, arguments for all Python functions are parsed -from a tuple ``*args`` and dict ``**kwargs``.) This shouldn't stop us from -writing overrides for functions with non-generic signatures that can't use the -decorator, but we should consider these cases carefully. +The example usage illustrates several best practices for writing dispatchers +relevant to NumPy contributors: + +- We gave the "dispatcher" function ``broadcast_to`` the exact same name and + arguments as the "implementation" function. The matching arguments are + required, because the function generated by ``array_function_dispatch`` will + call the dispatcher in *exactly* the same way as it was called. The matching + function name isn't strictly necessary, but ensures that Python reports the + original function name in error messages if invalid arguments are used, e.g., + ``TypeError: broadcast_to() got an unexpected keyword argument``. + +- We passed the ``module`` argument, which in turn sets the ``__module__`` + attribute on the generated function. This is for the benefit of better error + messages, here for errors raised internally by NumPy when no implementation + is found, e.g., + ``TypeError: no implementation found for 'numpy.broadcast_to'``. Setting + ``__module__`` to the canonical location in NumPy's public API encourages + users to use NumPy's public API for identifying functions in + ``__array_function__``. + +- The dispatcher is a function that returns a tuple, rather than an equivalent + (and equally valid) generator using ``yield``: + + .. code:: python + + # example usage + def broadcast_to(array, shape, subok=None): + yield array + + This is no accident: NumPy's implementation of dispatch for + ``__array_function__`` is fastest when dispatcher functions return a builtin + sequence type (``tuple`` or ``list``). + + On a related note, it's perfectly fine for dispatchers to return arguments + even if in some cases you *know* that they cannot have an + ``__array_function__`` method. This can arise for functions with default + arguments (e.g., ``None``) or complex signatures. NumPy's dispatching logic + sorts out these cases very quickly, so it generally is not worth the trouble + of parsing them on your own. .. note:: @@ -426,10 +505,10 @@ An important virtue of this approach is that it allows for adding new optional arguments to NumPy functions without breaking code that already relies on ``__array_function__``. -This is not a theoretical concern. The implementation of overrides *within* -functions like ``np.sum()`` rather than defining a new function capturing -``*args`` and ``**kwargs`` necessitated some awkward gymnastics to ensure that -the new ``keepdims`` argument is only passed in cases where it is used, e.g., +This is not a theoretical concern. NumPy's older, haphazard implementation of +overrides *within* functions like ``np.sum()`` necessitated some awkward +gymnastics when we decided to add new optional arguments, e.g., the new +``keepdims`` argument is only passed in cases where it is used: .. code:: python @@ -439,11 +518,12 @@ the new ``keepdims`` argument is only passed in cases where it is used, e.g., kwargs['keepdims'] = keepdims return array.sum(..., **kwargs) -This also makes it possible to add optional arguments to ``__array_function__`` -implementations incrementally and only in cases where it makes sense. For -example, a library implementing immutable arrays would not be required to -explicitly include an unsupported ``out`` argument. Doing this properly for all -optional arguments is somewhat onerous, e.g., +For ``__array_function__`` implementors, this also means that it is possible +to implement even existing optional arguments incrementally, and only in cases +where it makes sense. For example, a library implementing immutable arrays +would not be required to explicitly include an unsupported ``out`` argument in +the function signature. This can be somewhat onerous to implement properly, +e.g., .. code:: python @@ -486,6 +566,34 @@ concerned about performance differences measured in microsecond(s) on NumPy functions, because it's difficult to do *anything* in Python in less than a microsecond. +For rare cases where NumPy functions are called in performance critical inner +loops on small arrays or scalars, it is possible to avoid the overhead of +dispatching by calling the un-dispatched *implementations* of NumPy functions +available in the ``__numpy_implementation__`` attribute. For example: + +.. code:: python + + dot = getattr(np.dot, '__numpy_implementation__', np.dot) + + def naive_matrix_power(x, n): + x = np.array(x) + for _ in range(n): + dot(x, x, out=x) + return x + +NumPy will use this internally to minimize overhead for NumPy functions +defined in terms of other NumPy functions, but +**we do not recommend it for most users**: + +- The specific implementation of overrides is still provisional, so the + ``__numpy_implementation__`` attribute on particular functions might go away. + For this reason, access to ``__numpy_implementation__`` attribute outside of + ``__array_function__`` methods should always be guarded by using + ``getattr()`` with a default value. +- In cases where this makes a difference, you will get far greater speed-ups + rewriting your inner loops in a compiled language, e.g., with Cython or + Numba. + Use outside of NumPy ~~~~~~~~~~~~~~~~~~~~ @@ -669,63 +777,6 @@ However, to work well this would require the possibility of implementing *some* but not all functions with ``__array_function__``, e.g., as described in the next section. -Coercion to a NumPy array as a catch-all fallback -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -With the current design, classes that implement ``__array_function__`` -to overload at least one function implicitly declare an intent to -implement the entire NumPy API. It's not possible to implement *only* -``np.concatenate()`` on a type, but fall back to NumPy's default -behavior of casting with ``np.asarray()`` for all other functions. - -This could present a backwards compatibility concern that would -discourage libraries from adopting ``__array_function__`` in an -incremental fashion. For example, currently most numpy functions will -implicitly convert ``pandas.Series`` objects into NumPy arrays, behavior -that assuredly many pandas users rely on. If pandas implemented -``__array_function__`` only for ``np.concatenate``, unrelated NumPy -functions like ``np.nanmean`` would suddenly break on pandas objects by -raising TypeError. - -With ``__array_ufunc__``, it's possible to alleviate this concern by -casting all arguments to numpy arrays and re-calling the ufunc, but the -heterogeneous function signatures supported by ``__array_function__`` -make it impossible to implement this generic fallback behavior for -``__array_function__``. - -We could resolve this issue by change the handling of return values in -``__array_function__`` in either of two possible ways: - -1. Change the meaning of all arguments returning ``NotImplemented`` to indicate - that all arguments should be coerced to NumPy arrays and the operation - should be retried. However, many array libraries (e.g., scipy.sparse) really - don't want implicit conversions to NumPy arrays, and often avoid implementing - ``__array__`` for exactly this reason. Implicit conversions can result in - silent bugs and performance degradation. - - Potentially, we could enable this behavior only for types that implement - ``__array__``, which would resolve the most problematic cases like - scipy.sparse. But in practice, a large fraction of classes that present a - high level API like NumPy arrays already implement ``__array__``. This would - preclude reliable use of NumPy's high level API on these objects. -2. Use another sentinel value of some sort, e.g., - ``np.NotImplementedButCoercible``, to indicate that a class implementing part - of NumPy's higher level array API is coercible as a fallback. This is a more - appealing option. - -With either approach, we would need to define additional rules for *how* -coercible array arguments are coerced. The only sane rule would be to treat -these return values as equivalent to not defining an -``__array_function__`` method at all, which means that NumPy functions would -fall-back to their current behavior of coercing all array-like arguments. - -It is not yet clear to us yet if we need an optional like -``NotImplementedButCoercible``, so for now we propose to defer this issue. -We can always implement ``np.NotImplementedButCoercible`` at some later time if -it proves critical to the NumPy community in the future. Importantly, we don't -think this will stop critical libraries that desire to implement most of the -high level NumPy API from adopting this proposal. - A magic decorator that inspects type annotations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |