diff options
author | melissawm <melissawm.github@gmail.com> | 2021-11-10 09:32:15 -0300 |
---|---|---|
committer | melissawm <melissawm.github@gmail.com> | 2021-11-23 12:27:29 -0300 |
commit | c55b5072c09788ece4d67feeed3a7b9dec14d589 (patch) | |
tree | ec12256c679747bb82aa83adfcae383c2e107970 | |
parent | 7ce32d6188fcb76ad4790dd9679abdb3b7a6dacf (diff) | |
download | numpy-c55b5072c09788ece4d67feeed3a7b9dec14d589.tar.gz |
Adding description of __array_finalize__ and __array_wrap__
-rw-r--r-- | doc/source/user/basics.interoperability.rst | 105 | ||||
-rw-r--r-- | doc/source/user/c-info.beyond-basics.rst | 1 |
2 files changed, 78 insertions, 28 deletions
diff --git a/doc/source/user/basics.interoperability.rst b/doc/source/user/basics.interoperability.rst index eeb7492ef..a59f89431 100644 --- a/doc/source/user/basics.interoperability.rst +++ b/doc/source/user/basics.interoperability.rst @@ -5,29 +5,41 @@ Interoperability with NumPy NumPy's ndarray objects provide both a high-level API for operations on array-structured data and a concrete implementation of the API based on -:ref:`strided in-RAM storage <arrays>`. -While this API is powerful and fairly general, its concrete implementation has -limitations. As datasets grow and NumPy becomes used in a variety of new -environments and architectures, there are cases where the strided in-RAM storage -strategy is inappropriate, which has caused different libraries to reimplement -this API for their own uses. This includes GPU arrays (CuPy_), Sparse arrays -(`scipy.sparse`, `PyData/Sparse <Sparse_>`_) and parallel arrays (Dask_ arrays) -as well as various NumPy-like implementations in deep learning frameworks, like -TensorFlow_ and PyTorch_. Similarly, there are many projects that build on top -of the NumPy API for labeled and indexed arrays (XArray_), automatic -differentiation (JAX_), masked arrays (`numpy.ma`), physical units -(astropy.units_, pint_, unyt_), among others that add additional functionality -on top of the NumPy API. +:ref:`strided in-RAM storage <arrays>`. While this API is powerful and fairly +general, its concrete implementation has limitations. As datasets grow and NumPy +becomes used in a variety of new environments and architectures, there are cases +where the strided in-RAM storage strategy is inappropriate, which has caused +different libraries to reimplement this API for their own uses. This includes +GPU arrays (CuPy_), Sparse arrays (`scipy.sparse`, `PyData/Sparse <Sparse_>`_) +and parallel arrays (Dask_ arrays) as well as various NumPy-like implementations +in deep learning frameworks, like TensorFlow_ and PyTorch_. Similarly, there are +many projects that build on top of the NumPy API for labeled and indexed arrays +(XArray_), automatic differentiation (JAX_), masked arrays (`numpy.ma`), +physical units (astropy.units_, pint_, unyt_), among others that add additional +functionality on top of the NumPy API. Yet, users still want to work with these arrays using the familiar NumPy API and re-use existing code with minimal (ideally zero) porting overhead. With this goal in mind, various protocols are defined for implementations of -multi-dimensional arrays with high-level APIs matching NumPy. +multi-dimensional arrays with high-level APIs matching NumPy. -Using arbitrary objects in NumPy --------------------------------- +Broadly speaking, there are three groups of features used for interoperability +with NumPy: -When NumPy functions encounter a foreign object, they will try (in order): +1. Methods of turning a foreign object into an ndarray; +2. Methods of deferring execution from a NumPy function to another array + library; +3. Methods that use NumPy functions and return an instance of a foreign object. + +We describe these features below. + + +1. Using arbitrary objects in NumPy +----------------------------------- + +The first set of interoperability features from the NumPy API allows foreign +objects to be treated as NumPy arrays whenever possible. When NumPy functions +encounter a foreign object, they will try (in order): 1. The buffer protocol, described :py:doc:`in the Python C-API documentation <c-api/buffer>`. @@ -106,8 +118,12 @@ as the original object and any attributes/behavior it may have had, is lost. To see an example of a custom array implementation including the use of ``__array__()``, see :ref:`basics.dispatch`. -Operating on foreign objects without converting ------------------------------------------------ + +2. Operating on foreign objects without converting +-------------------------------------------------- + +A second set of methods defined by the NumPy API allows us to defer the +execution from a NumPy function to another array library. Consider the following function. @@ -115,9 +131,9 @@ Consider the following function. >>> def f(x): ... return np.mean(np.exp(x)) -Note that `np.exp` is a :ref:`ufunc <ufuncs-basics>`, which means that it -operates on ndarrays in an element-by-element fashion. On the other hand, -`np.mean` operates along one of the array's axes. +Note that `np.exp <numpy.exp>` is a :ref:`ufunc <ufuncs-basics>`, which means +that it operates on ndarrays in an element-by-element fashion. On the other +hand, `np.mean <numpy.mean>` operates along one of the array's axes. We can apply ``f`` to a NumPy ndarray object directly: @@ -126,8 +142,7 @@ We can apply ``f`` to a NumPy ndarray object directly: 21.1977562209304 We would like this function to work equally well with any NumPy-like array -object. Some of this is possible today with various protocol mechanisms within -NumPy. +object. NumPy allows a class to indicate that it would like to handle computations in a custom-defined way through the following interfaces: @@ -139,7 +154,7 @@ custom-defined way through the following interfaces: As long as foreign objects implement the ``__array_ufunc__`` or ``__array_function__`` protocols, it is possible to operate on them without the -need for explicit conversion. +need for explicit conversion. The ``__array_ufunc__`` protocol ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -147,7 +162,7 @@ The ``__array_ufunc__`` protocol A :ref:`universal function (or ufunc for short) <ufuncs-basics>` is a “vectorized” wrapper for a function that takes a fixed number of specific inputs and produces a fixed number of specific outputs. The output of the ufunc (and -its methods) is not necessarily an ndarray, if not all input arguments are +its methods) is not necessarily a ndarray, if not all input arguments are ndarrays. Indeed, if any input defines an ``__array_ufunc__`` method, control will be passed completely to that function, i.e., the ufunc is overridden. The ``__array_ufunc__`` method defined on that (non-ndarray) object has access to @@ -173,6 +188,36 @@ The semantics of ``__array_function__`` are very similar to ``__array_ufunc__``, except the operation is specified by an arbitrary callable object rather than a ufunc instance and method. For more details, see :ref:`NEP18`. + +3. Returning foreign objects +---------------------------- + +A third type of feature set is meant to use the NumPy function implementation +and then convert the return value back into an instance of the foreign object. +The ``__array_finalize__`` and ``__array_wrap__`` methods act behind the scenes +to ensure that the return type of a NumPy function can be specified as needed. + +The ``__array_finalize__`` method is the mechanism that NumPy provides to allow +subclasses to handle the various ways that new instances get created. This +method is called whenever the system internally allocates a new array from an +object which is a subclass (subtype) of the ndarray. It can be used to change +attributes after construction, or to update meta-information from the “parent.” + +The ``__array_wrap__`` method “wraps up the action” in the sense of allowing a +subclass to set the type of the return value and update attributes and metadata. +This can be seen as the opposite of the ``__array__`` method. At the end of +every ufunc, this method is called on the input object with the +highest *array priority*, or the output object if one was specified. The +``__array_priority__`` attribute is used to determine what type of object to +return in situations where there is more than one possibility for the Python +type of the returned object. Subclasses may opt to use this method to transform +the output array into an instance of the subclass and update metadata before +returning the array to the user. + +For more information on these methods, see :ref:`basics.subclassing` and +:ref:`specific-array-subtyping`. + + Interoperability examples ------------------------- @@ -218,6 +263,7 @@ We can even do operations with other ndarrays: >>> type(result) numpy.ndarray + Example: PyTorch tensors ~~~~~~~~~~~~~~~~~~~~~~~~ @@ -343,8 +389,11 @@ Further reading - :ref:`basics.dispatch` - :ref:`special-attributes-and-methods` (details on the ``__array_ufunc__`` and ``__array_function__`` protocols) -- `NumPy roadmap: interoperability - <https://numpy.org/neps/roadmap.html#interoperability>`__ +- :ref:`basics.subclassing` (details on the ``__array_wrap__`` and + ``__array_finalize__`` methods) +- :ref:`specific-array-subtyping` (more details on the implementation of + ``__array_finalize__``, ``__array_wrap__`` and ``__array_priority__``) +- :doc:`NumPy roadmap: interoperability <neps:roadmap>` - `PyTorch documentation on the Bridge with NumPy <https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#bridge-to-np-label>`__ diff --git a/doc/source/user/c-info.beyond-basics.rst b/doc/source/user/c-info.beyond-basics.rst index 7dd22afbf..04ca83489 100644 --- a/doc/source/user/c-info.beyond-basics.rst +++ b/doc/source/user/c-info.beyond-basics.rst @@ -450,6 +450,7 @@ type(s). In particular, to create a sub-type in C follow these steps: More information on creating sub-types in C can be learned by reading PEP 253 (available at https://www.python.org/dev/peps/pep-0253). +.. _specific-array-subtyping: Specific features of ndarray sub-typing --------------------------------------- |