summaryrefslogtreecommitdiff
path: root/doc/neps
diff options
context:
space:
mode:
authorStephan Hoyer <shoyer@google.com>2019-04-14 15:52:49 -0700
committerStephan Hoyer <shoyer@google.com>2019-04-14 15:52:49 -0700
commit6c5e22dcaf6736177e9100442d7a6e6ff97f106d (patch)
tree12cbd3fe74a51aaf48a542591cf04703fe522bcb /doc/neps
parenta386e8cd2ad068ba6783e0a8cd38383d3b89ba63 (diff)
downloadnumpy-6c5e22dcaf6736177e9100442d7a6e6ff97f106d.tar.gz
Add clarification and fixes per PR discussion
Diffstat (limited to 'doc/neps')
-rw-r--r--doc/neps/nep-0018-array-function-protocol.rst105
1 files changed, 84 insertions, 21 deletions
diff --git a/doc/neps/nep-0018-array-function-protocol.rst b/doc/neps/nep-0018-array-function-protocol.rst
index 40adab7ce..63444f168 100644
--- a/doc/neps/nep-0018-array-function-protocol.rst
+++ b/doc/neps/nep-0018-array-function-protocol.rst
@@ -205,7 +205,7 @@ were explicitly used in the NumPy function call.
Just like the case for builtin special methods like ``__add__``, properly
written ``__array_function__`` methods should always return
``NotImplemented`` when an unknown type is encountered. Otherwise, it will
- be impossible to correctly overrides NumPy functions from another object
+ be impossible to correctly override NumPy functions from another object
if the operation also includes one of your objects.
Defaulting to NumPy's implementation
@@ -248,19 +248,16 @@ Now, if a NumPy function that isn't explicitly handled is called on
``MyArray`` object, the operation will act (almost) as if MyArray's
``__array_function__`` method never existed.
-The one difference in behavior is in cases where multiple objects implement
-``__array_function__`` but don't know about each other: NumPy will raise
-``TypeError`` if all methods return ``NotImplemented``, whereas if no arguments
-defined ``__array_function__`` methods it would attempt to coerce all of them
-to NumPy arrays. Of course, you could "fix" this for your class by never
-returning ``NotImplemented`` from your ``__array_function__`` method, but in
-general we think this is a good change: fallback implementations often result
-in unpredictable and undesired behavior.
+The name ``__numpy_implementation__`` refers to this being the implementation
+intended to be used *on* NumPy arrays. It does not necessarily imply that the
+implementation is also implemented *by* the NumPy library, though of course
+this will be the case unless external projects such as SciPy adopt the same
+protocol for their own APIs.
.. note::
- ``__numpy_implementation__`` was not included as part of the initial opt-in
- only preview of ``__array_function__`` in NumPy 1.16.
+ ``__numpy_implementation__`` was not included as part of the initial
+ opt-in-only preview of ``__array_function__`` in NumPy 1.16.
Necessary changes within the NumPy codebase itself
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -363,6 +360,13 @@ In particular:
- If all ``__array_function__`` methods return ``NotImplemented``,
NumPy will raise ``TypeError``.
+If no ``__array_function__`` methods exist, NumPy will default to calling its
+own implementation, intended for use on NumPy arrays. This case arises, for
+example, when all array-like arguments are Python numbers or lists.
+(NumPy arrays do have a ``__array_function__`` method, given below, but it
+always returns ``NotImplemented`` if another argument implements
+``__array_function__``.)
+
One deviation from the current behavior of ``__array_ufunc__`` is that NumPy
will only call ``__array_function__`` on the *first* argument of each unique
type. This matches Python's
@@ -393,10 +397,13 @@ overrides:
return NotImplemented
return func.__numpy_implementation__(*args, **kwargs)
-Notice that the ``__numpy_implementation__`` function attribute allows us to
-avoid the special cases for NumPy arrays that were needed in
+Notice that the ``__numpy_implementation__`` function attribute allowed us
+to avoid the special cases for NumPy arrays that were needed in
``ndarray.__array_ufunc__``.
+This method matches NumPy's dispatching rules, so for most part it is
+possible to pretend that ``ndarray.__array_function__`` does not exist.
+
Changes within NumPy functions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -439,7 +446,7 @@ Using a decorator is great! We don't need to change the definitions of
existing NumPy functions, and only need to write a few additional lines
to define dispatcher function. We originally thought that we might want to
implement dispatching for some NumPy functions without the decorator, but
-so far it seems to every case.
+so far it seems to cover every case.
Within NumPy's implementation, it's worth calling out the decorator's use of
``functools.wraps``:
@@ -739,7 +746,7 @@ would be straightforward to write a shim for a default
Implementations in terms of a limited core API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The internal implementations of some NumPy functions is extremely simple.
+The internal implementation of some NumPy functions is extremely simple.
For example:
- ``np.stack()`` is implemented in only a few lines of code by combining
@@ -773,9 +780,64 @@ nearly every public function in NumPy's API. This does not preclude the future
possibility of rewriting NumPy functions in terms of simplified core
functionality with ``__array_function__`` and a protocol and/or base class for
ensuring that arrays expose methods and properties like ``numpy.ndarray``.
-However, to work well this would require the possibility of implementing
-*some* but not all functions with ``__array_function__``, e.g., as described
-in the next section.
+
+Coercion to a NumPy array as a catch-all fallback
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+With the current design, classes that implement ``__array_function__``
+to overload at least one function can opt-out of overriding other functions
+by using the ``__numpy_implementation__`` function, as described above under
+"Defaulting to NumPy's implementation."
+
+However, this still results in different behavior than not implementing
+``__array_function__`` in at least one edge case. If multiple objects implement
+``__array_function__`` but don't know about each other NumPy will raise
+``TypeError`` if all methods return ``NotImplemented``, whereas if no arguments
+defined ``__array_function__`` methods it would attempt to coerce all of them
+to NumPy arrays.
+
+Alternatively, this could be "fixed" by writing a ``__array_function__``
+method that always calls ``__numpy_implementation__()`` instead of returning
+``NotImplemented`` for some functions, but that would result in a type
+whose implementation cannot be overriden by over argumetns -- like NumPy
+arrays themselves prior to the introduction of this protocol.
+
+Either way, it is not possible to *exactly* maintain the current behavior of
+all NumPy functions if at least one more function is overriden. If preserving
+this behavior is important, we could potentially solve it by changing the
+handling of return values in ``__array_function__`` in either of two ways:
+
+1. Change the meaning of all arguments returning ``NotImplemented`` to indicate
+ that all arguments should be coerced to NumPy arrays and the operation
+ should be retried. However, many array libraries (e.g., scipy.sparse) really
+ don't want implicit conversions to NumPy arrays, and often avoid implementing
+ ``__array__`` for exactly this reason. Implicit conversions can result in
+ silent bugs and performance degradation.
+
+ Potentially, we could enable this behavior only for types that implement
+ ``__array__``, which would resolve the most problematic cases like
+ scipy.sparse. But in practice, a large fraction of classes that present a
+ high level API like NumPy arrays already implement ``__array__``. This would
+ preclude reliable use of NumPy's high level API on these objects.
+2. Use another sentinel value of some sort, e.g.,
+ ``np.NotImplementedButCoercible``, to indicate that a class implementing part
+ of NumPy's higher level array API is coercible as a fallback. If all
+ arguments return ``NotImplementedButCoercible``, arguments would be coerced
+ and the operation would be retried.
+
+ Unfortunately, correct behavior after encountering
+ ``NotImplementedButCoercible`` is not always obvious. Particularly
+ challenging is the "mixed" case where some arguments return
+ ``NotImplementedButCoercible`` and others return ``NotImplemented``.
+ Would dispatching be retried after only coercing the "coercible" arguments?
+ If so, then conceivably we could end up looping through the dispatching
+ logic an arbitrary number of times. Either way, the dispatching rules would
+ definitely get more complex and harder to reason about.
+
+At present, neither of these alternatives looks like a good idea. Reusing
+``__numpy_implementation__()`` looks like it should suffice for most purposes.
+Arguably this loss in flexibility is a virtue: fallback implementations often
+result in unpredictable and undesired behavior.
A magic decorator that inspects type annotations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -874,7 +936,8 @@ There are two other arguments that we think *might* be important to pass to
- Access to the non-dispatched implementation (i.e., before wrapping with
``array_function_dispatch``) in ``ndarray.__array_function__`` would allow
us to drop special case logic for that method from
- ``implement_array_function``.
+ ``implement_array_function``. *Update: This has been implemented, as the
+ ``__numpy_implementation__`` attributes.*
- Access to the ``dispatcher`` function passed into
``array_function_dispatch()`` would allow ``__array_function__``
implementations to determine the list of "array-like" arguments in a generic
@@ -911,7 +974,7 @@ a descriptor.
Given the complexity and the limited use cases, we are also deferring on this
issue for now, but we are confident that ``__array_function__`` could be
-expanded to accomodate these use cases in the future if need be.
+expanded to accommodate these use cases in the future if need be.
Discussion
----------
@@ -928,7 +991,7 @@ it was discussed at a `NumPy developer sprint
Berkeley Institute for Data Science (BIDS) <https://bids.berkeley.edu/>`_.
Detailed discussion of this proposal itself can be found on the
-`the mailing list <https://mail.python.org/pipermail/numpy-discussion/2018-June/078127.html>`_ and relvant pull requests
+`the mailing list <https://mail.python.org/pipermail/numpy-discussion/2018-June/078127.html>`_ and relevant pull requests
(`1 <https://github.com/numpy/numpy/pull/11189>`_,
`2 <https://github.com/numpy/numpy/pull/11303#issuecomment-396638175>`_,
`3 <https://github.com/numpy/numpy/pull/11374>`_)