diff options
author | Peter Andreas Entschev <peter@entschev.com> | 2020-08-17 14:37:07 -0700 |
---|---|---|
committer | Peter Andreas Entschev <peter@entschev.com> | 2020-08-17 14:50:00 -0700 |
commit | 9b660e445bb19331b8d4308223b9da418166ef80 (patch) | |
tree | 2c929e2ab3ca38870bb8c48a624947a85f8049a1 /doc/neps | |
parent | 7d7b46c763a86897d254fafcca31e61d08c83ba5 (diff) | |
download | numpy-9b660e445bb19331b8d4308223b9da418166ef80.tar.gz |
NEP: Simplify NEP-35 further with reviewer's suggestions
Diffstat (limited to 'doc/neps')
-rw-r--r-- | doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst | 56 |
1 files changed, 39 insertions, 17 deletions
diff --git a/doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst b/doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst index 2613ebea2..8554cbccd 100644 --- a/doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst +++ b/doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst @@ -27,34 +27,49 @@ Motivation and Scope Many are the libraries implementing the NumPy API, such as Dask for graph computing, CuPy for GPGPU computing, xarray for N-D labeled arrays, etc. All the libraries mentioned have yet another thing in common: they have also adopted -the ``__array_function__`` protocol. The protocol defines a mechanism allowing a -user to directly use the NumPy API as a dispatcher based on the input array -type. In essence, dispatching means users are able to pass a downstream array, -such as a Dask array, directly to one of NumPy's compute functions, and NumPy -will be able to automatically recognize that and send the work back to Dask's -implementation of that function, which will define the return value. For -example: +the ``__array_function__`` protocol; a protocol that allows NumPy to understand +and treat downstream objects as if they are the native ``numpy.ndarray`` object. +Hence the community while using various libraries still benefits from a unified +NumPy API. This not only brings great convenience for standardization but also +removes the burden of learning a new API and rewriting code for every new +object. In more technical terms, this mechanism of the protocol is called a +"dispatcher", which is the terminology we use from here onwards when referring +to that. + .. code:: python x = dask.array.arange(5) # Creates dask.array - np.sum(a) # Returns dask.array + np.diff(x) # Returns dask.array Note above how we called Dask's implementation of ``sum`` via the NumPy -namespace by calling ``np.sum``, and the same would apply if we had a CuPy +namespace by calling ``np.diff``, and the same would apply if we had a CuPy array or any other array from a library that adopts ``__array_function__``. This allows writing code that is agnostic to the implementation library, thus users can write their code once and still be able to use different array implementations according to their needs. -Unfortunately, ``__array_function__`` has limitations, one of them being array -creation functions. In the example above, NumPy was able to call Dask's -implementation because the input array was a Dask array. The same is not true -for array creation functions, in the example the input of ``arange`` is simply -the integer ``5``, not providing any information of the array type that should -be the result, that's where a reference array passed by the ``like=`` argument -proposed here can be of help, as it provides NumPy with the information -required to create the expected type of array. +Obviously, having a protocol in-place is useful if the arrays are created +elsewhere and let NumPy handle them. But still these arrays have to be started +in their native library and brought back. Instead if it was possible to create +these objects through NumPy API then there would be an almost complete +experience, all using NumPy syntax. For example, say we have some CuPy array +``cp_arr`` , and want a similar CuPy array with identity matrix. We could still +write the following: + +.. code:: python + x = cupy.identity(3) + +Instead, the better way would be using to only use the NumPy API, this could now +be achieved with: + +.. code:: python + x = np.identity(3, like=cp_arr) + +As if by magic, ``x`` will also be a CuPy array, as NumPy was capable to infer +that from the type of ``cp_arr``. Note that this last step would not be possible +without ``like=``, as it would be impossible for the NumPy to know the user +expects a CuPy array based only on the integer input. The new ``like=`` keyword proposed is solely intended to identify the downstream library where to dispatch and the object is used only as reference, meaning that @@ -150,6 +165,13 @@ impossible to ensure ``my_pad`` creates a padding array with a type matching that of the input array, which would cause cause a ``TypeError`` exception to be raised by CuPy, as discussed above would happen to the CuPy case alone. +Current NumPy users who don't use other arrays from downstream libraries should +have no impact in their current usage of the NumPy API. In the event of the +user passing a NumPy array to ``like=``, that will continue to work as if no +array was passed via that argument. However, this is advised against, as +internally there will be additional checks required that will have an impact in +performance. + Backward Compatibility ---------------------- |