summaryrefslogtreecommitdiff
path: root/doc/neps
diff options
context:
space:
mode:
authorPeter Andreas Entschev <peter@entschev.com>2020-08-14 15:20:35 -0700
committerPeter Andreas Entschev <peter@entschev.com>2020-08-17 12:40:22 -0700
commit7d7b46c763a86897d254fafcca31e61d08c83ba5 (patch)
tree1376b27f3ecc2620d61dc25a2c8c35261a54ad20 /doc/neps
parent986e533f72c91f8a356c499c2e499c2894c94235 (diff)
downloadnumpy-7d7b46c763a86897d254fafcca31e61d08c83ba5.tar.gz
NEP: Adjust NEP-35 to make it more user-accessible
Diffstat (limited to 'doc/neps')
-rw-r--r--doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst263
1 files changed, 190 insertions, 73 deletions
diff --git a/doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst b/doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
index 18a00ae6a..2613ebea2 100644
--- a/doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
+++ b/doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
@@ -8,16 +8,154 @@ NEP 35 — Array Creation Dispatching With __array_function__
:Status: Draft
:Type: Standards Track
:Created: 2019-10-15
-:Updated: 2020-08-06
+:Updated: 2020-08-17
:Resolution:
Abstract
--------
We propose the introduction of a new keyword argument ``like=`` to all array
-creation functions to permit dispatching of such functions by the
-``__array_function__`` protocol, addressing one of the protocol shortcomings,
-as described by NEP-18 [1]_.
+creation functions, this argument permits the creation of an array based on
+a non-NumPy reference array passed via that argument, resulting in an array
+defined by the downstream library implementing that type, which also implements
+the ``__array_function__`` protocol. With this we address one of that
+protocol's shortcomings, as described by NEP 18 [1]_.
+
+Motivation and Scope
+--------------------
+
+Many are the libraries implementing the NumPy API, such as Dask for graph
+computing, CuPy for GPGPU computing, xarray for N-D labeled arrays, etc. All
+the libraries mentioned have yet another thing in common: they have also adopted
+the ``__array_function__`` protocol. The protocol defines a mechanism allowing a
+user to directly use the NumPy API as a dispatcher based on the input array
+type. In essence, dispatching means users are able to pass a downstream array,
+such as a Dask array, directly to one of NumPy's compute functions, and NumPy
+will be able to automatically recognize that and send the work back to Dask's
+implementation of that function, which will define the return value. For
+example:
+
+.. code:: python
+
+ x = dask.array.arange(5) # Creates dask.array
+ np.sum(a) # Returns dask.array
+
+Note above how we called Dask's implementation of ``sum`` via the NumPy
+namespace by calling ``np.sum``, and the same would apply if we had a CuPy
+array or any other array from a library that adopts ``__array_function__``.
+This allows writing code that is agnostic to the implementation library, thus
+users can write their code once and still be able to use different array
+implementations according to their needs.
+
+Unfortunately, ``__array_function__`` has limitations, one of them being array
+creation functions. In the example above, NumPy was able to call Dask's
+implementation because the input array was a Dask array. The same is not true
+for array creation functions, in the example the input of ``arange`` is simply
+the integer ``5``, not providing any information of the array type that should
+be the result, that's where a reference array passed by the ``like=`` argument
+proposed here can be of help, as it provides NumPy with the information
+required to create the expected type of array.
+
+The new ``like=`` keyword proposed is solely intended to identify the downstream
+library where to dispatch and the object is used only as reference, meaning that
+no modifications, copies or processing will be performed on that object.
+
+We expect that this functionality will be mostly useful to library developers,
+allowing them to create new arrays for internal usage based on arrays passed
+by the user, preventing unnecessary creation of NumPy arrays that will
+ultimately lead to an additional conversion into a downstream array type.
+
+Support for Python 2.7 has been dropped since NumPy 1.17, therefore we make use
+of the keyword-only argument standard described in PEP-3102 [2]_ to implement
+``like=``, thus preventing it from being passed by position.
+
+.. _neps.like-kwarg.usage-and-impact:
+
+Usage and Impact
+----------------
+
+To understand the intended use for ``like=``, and before we move to more complex
+cases, consider the following illustrative example consisting only of NumPy and
+CuPy arrays:
+
+.. code:: python
+
+ import numpy as np
+ import cupy
+
+ def my_pad(arr, padding):
+ padding = np.array(padding, like=arr)
+ return np.concatenate((padding, arr, padding))
+
+ my_pad(np.arange(5), [-1, -1]) # Returns np.ndarray
+ my_pad(cupy.arange(5), [-1, -1]) # Returns cupy.core.core.ndarray
+
+Note in the ``my_pad`` function above how ``arr`` is used as a reference to
+dictate what array type padding should have, before concatenating the arrays to
+produce the result. On the other hand, if ``like=`` wasn't used, the NumPy case
+case would still work, but CuPy wouldn't allow this kind of automatic
+conversion, ultimately raising a
+``TypeError: Only cupy arrays can be concatenated`` exception.
+
+Now we should look at how a library like Dask could benefit from ``like=``.
+Before we understand that, it's important to understand a bit about Dask basics
+and ensures correctness with ``__array_function__``. Note that Dask can compute
+different sorts of objects, like dataframes, bags and arrays, here we will focus
+strictly on arrays, which are the objects we can use ``__array_function__``
+with.
+
+Dask uses a graph computing model, meaning it breaks down a large problem in
+many smaller problems and merge their results to reach the final result. To
+break the problem down into smaller ones, Dask also breaks arrays into smaller
+arrays, that it calls "chunks". A Dask array can thus consist of one or more
+chunks and they may be of different types. However, in the context of
+``__array_function__``, Dask only allows chunks of the same type, for example,
+a Dask array can be formed of several NumPy arrays or several CuPy arrays, but
+not a mix of both.
+
+To avoid mismatched types during compute, Dask keeps an attribute ``_meta`` as
+part of its array throughout computation, this attribute is used to both predict
+the output type at graph creation time and to create any intermediary arrays
+that are necessary within some function's computation. Going back to our
+previous example, we can use ``_meta`` information to identify what kind of
+array we would use for padding, as seen below:
+
+.. code:: python
+
+ import numpy as np
+ import cupy
+ import dask.array as da
+ from dask.array.utils import meta_from_array
+
+ def my_pad(arr, padding):
+ padding = np.array(padding, like=meta_from_array(arr))
+ return np.concatenate((padding, arr, padding))
+
+ # Returns dask.array<concatenate, shape=(9,), dtype=int64, chunksize=(5,), chunktype=numpy.ndarray>
+ my_pad(da.arange(5), [-1, -1])
+
+ # Returns dask.array<concatenate, shape=(9,), dtype=int64, chunksize=(5,), chunktype=cupy.ndarray>
+ my_pad(da.from_array(cupy.arange(5)), [-1, -1])
+
+Note how ``chunktype`` in the return value above changes from
+``numpy.ndarray`` in the first ``my_pad`` call to ``cupy.ndarray`` in the
+second.
+
+To enable proper identification of the array type we use Dask's utility function
+``meta_from_array``, which was introduced as part of the work to support
+``__array_function__``, allowing Dask to handle ``_meta`` appropriately. That
+function is primarily targeted at the library's internal usage to ensure chunks
+are created with correct types. Without the ``like=`` argument, it would be
+impossible to ensure ``my_pad`` creates a padding array with a type matching
+that of the input array, which would cause cause a ``TypeError`` exception to
+be raised by CuPy, as discussed above would happen to the CuPy case alone.
+
+Backward Compatibility
+----------------------
+
+This proposal does not raise any backward compatibility issues within NumPy,
+given that it only introduces a new keyword argument to existing array creation
+functions with a default ``None`` value, thus not changing current behavior.
Detailed description
--------------------
@@ -28,10 +166,6 @@ did not -- and did not intend to -- address the creation of arrays by downstream
libraries, preventing those libraries from using such important functionality in
that context.
-Other NEPs have been written to address parts of that limitation, such as the
-introduction of the ``__duckarray__`` protocol in NEP-30 [2]_, and the
-introduction of an overriding mechanism called ``uarray`` by NEP-31 [3]_.
-
The purpose of this NEP is to address that shortcoming in a simple and
straighforward way: introduce a new ``like=`` keyword argument, similar to how
the ``empty_like`` family of functions work. When array creation functions
@@ -39,25 +173,25 @@ receive such an argument, they will trigger the ``__array_function__`` protocol,
and call the downstream library's own array creation function implementation.
The ``like=`` argument, as its own name suggests, shall be used solely for the
purpose of identifying where to dispatch. In contrast to the way
-``__array_function__`` has been used so far (the first argument identifies where
-to dispatch), and to avoid breaking NumPy's API with regards to array creation,
-the new ``like=`` keyword shall be used for the purpose of dispatching.
-
-Usage Guidance
-~~~~~~~~~~~~~~
-
-The new ``like=`` keyword is solely intended to identify the downstream library
-where to dispatch and the object is used only as reference, meaning that no
-modifications, copies or processing will be performed on that object.
-
-We expect that this functionality will be mostly useful to library developers,
-allowing them to create new arrays for internal usage based on arrays passed
-by the user, preventing unnecessary creation of NumPy arrays that will
-ultimately lead to an additional conversion into a downstream array type.
-
-Support for Python 2.7 has been dropped since NumPy 1.17, therefore we should
-make use of the keyword-only argument standard described in PEP-3102 [4]_ to
-implement the ``like=``, thus preventing it from being passed by position.
+``__array_function__`` has been used so far (the first argument identifies the
+target downstream library), and to avoid breaking NumPy's API with regards to
+array creation, the new ``like=`` keyword shall be used for the purpose of
+dispatching.
+
+Downstream libraries will benefit from the ``like=`` argument without any
+changes to their API, given the argument is of exclusive implementation in
+NumPy. It will still be required that downstream libraries implement the
+``__array_function__`` protocol, as described by NEP 18 [1]_, and appropriately
+introduce the argument to their calls to NumPy array creation functions, as
+exemplified in :ref:`neps.like-kwarg.usage-and-impact`.
+
+Related work
+------------
+
+Other NEPs have been written to address parts of ``__array_function__``
+protocol's limitation, such as the introduction of the ``__duckarray__``
+protocol in NEP 30 [3]_, and the introduction of an overriding mechanism called
+``uarray`` by NEP 31 [4]_.
Implementation
--------------
@@ -66,10 +200,10 @@ The implementation requires introducing a new ``like=`` keyword to all existing
array creation functions of NumPy. As examples of functions that would add this
new argument (but not limited to) we can cite those taking array-like objects
such as ``array`` and ``asarray``, functions that create arrays based on
-numerical ranges such as ``range`` and ``linspace``, as well as the ``empty``
-family of functions, even though that may be redundant, since there exists
-already specializations for those with the naming format ``empty_like``. As of
-the writing of this NEP, a complete list of array creation functions can be
+numerical inputs such as ``range`` and ``identity``, as well as the ``empty``
+family of functions, even though that may be redundant, since specializations
+for those already exist with the naming format ``empty_like``. As of the
+writing of this NEP, a complete list of array creation functions can be
found in [5]_.
This newly proposed keyword shall be removed by the ``__array_function__``
@@ -135,60 +269,43 @@ There are two downsides to the implementation above for C functions:
2. To follow current implementation standards, documentation should be attached
directly to the Python source code.
-Alternatively for C functions, the implementation of ``like=`` could be moved
-into the C implementation itself. This is not the primary suggestion here due
-to its inherent complexity which would be difficult too long to describe in its
-entirety here, and too tedious for the reader. However, we leave that as an
-option open for discussion.
+The first version of this proposal suggested the C implementation above as one
+viable solution. However, due to the downsides pointed above we have decided to
+implement that entirely in C. Please refer to [implementation]_ for details.
-Usage
------
+Alternatives
+------------
-The purpose of this NEP is to keep things simple. Similarly, we can exemplify
-the usage of ``like=`` in a simple way. Imagine you have an array of ones
-created by a downstream library, such as CuPy. What you need now is a new array
-that can be created using the NumPy API, but that will in fact be created by
-the downstream library, a simple way to achieve that is shown below.
+Recently a new protocol to replace ``__array_function__`` entirely was proposed
+by NEP 37 [6]_, which would require considerable rework by downstream libraries
+that adopt ``__array_function__`` already, because of that we still believe the
+``like=`` argument is beneficial for NumPy and downstream libraries. However,
+that proposal wouldn't necessarily be considered a direct alternative to the
+present NEP, as it would replace NEP 18 entirely, on which this builds upon.
+Discussion on details about this new proposal and why that would require rework
+by downstream libraries is beyond the scopy of the present proposal.
-.. code:: python
-
- x = cupy.ones(2)
- np.array([1, 3, 5], like=x) # Returns cupy.ndarray
-
-As a second example, we could also create an array of evenly spaced numbers
-using a Dask identity matrix as reference:
-
-.. code:: python
-
- x = dask.array.eye(3)
- np.linspace(0, 2, like=x) # Returns dask.array
+Discussion
+----------
+.. [implementation] `Implementation's pull request on GitHub <https://github.com/numpy/numpy/pull/16935>`_
+.. [discussion] `Further discussion on implementation and the NEP's content <https://mail.python.org/pipermail/numpy-discussion/2020-August/080919.html>`_
-Compatibility
--------------
+References
+----------
-This proposal does not raise any backward compatibility issues within NumPy,
-given that it only introduces a new keyword argument to existing array creation
-functions.
-
-Downstream libraries will benefit from the ``like=`` argument automatically,
-that is, without any explicit changes in their codebase. The only requirement
-is that they already implement the ``__array_function__`` protocol, as
-described by NEP-18 [2]_.
+.. [1] `NEP 18 - A dispatch mechanism for NumPy's high level array functions <https://numpy.org/neps/nep-0018-array-function-protocol.html>`_.
-References and Footnotes
-------------------------
+.. [2] `PEP 3102 — Keyword-Only Arguments <https://www.python.org/dev/peps/pep-3102/>`_.
-.. [1] `NEP-18 - A dispatch mechanism for NumPy's high level array functions <https://numpy.org/neps/nep-0018-array-function-protocol.html>`_.
+.. [3] `NEP 30 — Duck Typing for NumPy Arrays - Implementation <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_.
-.. [2] `NEP 30 — Duck Typing for NumPy Arrays - Implementation <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_.
-
-.. [3] `NEP 31 — Context-local and global overrides of the NumPy API <https://github.com/numpy/numpy/pull/14389>`_.
-
-.. [4] `PEP 3102 — Keyword-Only Arguments <https://www.python.org/dev/peps/pep-3102/>`_.
+.. [4] `NEP 31 — Context-local and global overrides of the NumPy API <https://github.com/numpy/numpy/pull/14389>`_.
.. [5] `Array creation routines <https://docs.scipy.org/doc/numpy-1.17.0/reference/routines.array-creation.html>`_.
+.. [6] `NEP 37 — A dispatch protocol for NumPy-like modules <https://numpy.org/neps/nep-0037-array-module.html>`_.
+
Copyright
---------