diff options
author | Hameer Abbasi <hameerabbasi@yahoo.com> | 2019-09-04 21:04:23 +0200 |
---|---|---|
committer | Hameer Abbasi <hameerabbasi@yahoo.com> | 2019-09-05 13:06:47 +0200 |
commit | d2c57616d369fdb5b4ea22b77d314785b1a0508e (patch) | |
tree | 6eaa55f3167512cc044304512be5ac87f215d9ff | |
parent | d464d192ca996a107fc6dd5099f72227cf64a8ea (diff) | |
download | numpy-d2c57616d369fdb5b4ea22b77d314785b1a0508e.tar.gz |
NEP: Add NEP 31 — Context-local and global overrides of the NumPy API
Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>
Co-authored-by: Peter Bell <peterbell10@live.co.uk>
-rw-r--r-- | doc/neps/nep-0031-uarray.rst | 465 |
1 files changed, 465 insertions, 0 deletions
diff --git a/doc/neps/nep-0031-uarray.rst b/doc/neps/nep-0031-uarray.rst new file mode 100644 index 000000000..d1c3d806e --- /dev/null +++ b/doc/neps/nep-0031-uarray.rst @@ -0,0 +1,465 @@ +============================================================ +NEP 31 — Context-local and global overrides of the NumPy API +============================================================ + +:Author: Hameer Abbasi <habbasi@quansight.com> +:Author: Ralf Gommers <rgommers@quansight.com> +:Author: Peter Bell <pbell@quansight.com> +:Status: Draft +:Type: Standards Track +:Created: 2019-08-22 + + +Abstract +-------- + +This NEP proposes to make all of NumPy's public API overridable via an +extensible backend mechanism. + +Acceptance of this NEP means NumPy would provide global and context-local +overrides, as well as a dispatch mechanism similar to NEP-18 [2]_. First +experiences with ``__array_function__`` show that it is necessary to be able +to override NumPy functions that *do not take an array-like argument*, and +hence aren't overridable via ``__array_function__``. The most pressing need is +array creation and coercion functions, such as ``numpy.zeros`` or +``numpy.asarray``; see e.g. NEP-30 [9]_. + +This NEP proposes to allow, in an opt-in fashion, overriding any part of the +NumPy API. It is intended as a comprehensive resolution to NEP-22 [3]_, and +obviates the need to add an ever-growing list of new protocols for each new +type of function or object that needs to become overridable. + +Motivation and Scope +-------------------- + +The motivation behind ``uarray`` is manyfold: First, there have been several +attempts to allow dispatch of parts of the NumPy API, including (most +prominently), the ``__array_ufunc__`` protocol in NEP-13 [4]_, and the +``__array_function__`` protocol in NEP-18 [2]_, but this has shown the need +for further protocols to be developed, including a protocol for coercion (see +[5]_, [9]_). The reasons these overrides are needed have been extensively +discussed in the references, and this NEP will not attempt to go into the +details of why these are needed; but in short: It is necessary for library +authors to be able to coerce arbitrary objects into arrays of their own types, +such as CuPy needing to coerce to a CuPy array, for example, instead of +a NumPy array. + +These kinds of overrides are useful for both the end-user as well as library +authors. End-users may have written or wish to write code that they then later +speed up or move to a different implementation, say PyData/Sparse. They can do +this simply by setting a backend. Library authors may also wish to write code +that is portable across array implementations, for example ``sklearn`` may wish +to write code for a machine learning algorithm that is portable across array +implementations while also using array creation functions. + +This NEP takes a holistic approach: It assumes that there are parts of +the API that need to be overridable, and that these will grow over time. It +provides a general framework and a mechanism to avoid a design of a new +protocol each time this is required. This was the goal of ``uarray``: to +allow for overrides in an API without needing the design of a new protocol. + +This NEP proposes the following: That ``unumpy`` [8]_ becomes the +recommended override mechanism for the parts of the NumPy API not yet covered +by ``__array_function__`` or ``__array_ufunc__``, and that ``uarray`` is +vendored into a new namespace within NumPy to give users and downstream +dependencies access to these overrides. This vendoring mechanism is similar +to what SciPy decided to do for making ``scipy.fft`` overridable (see [10]_). + + +Detailed description +-------------------- + +Using overrides +~~~~~~~~~~~~~~~ + +The way we propose the overrides will be used by end users is:: + + # On the library side + import numpy.overridable as unp + + def library_function(array): + array = unp.asarray(array) + # Code using unumpy as usual + return array + + # On the user side: + import numpy.overridable as unp + import uarray as ua + import dask.array as da + + ua.register_backend(da) + + library_function(dask_array) # works and returns dask_array + + with unp.set_backend(da): + library_function([1, 2, 3, 4]) # actually returns a Dask array. + + +Here, ``backend`` can be any compatible object defined either by NumPy or an +external library, such as Dask or CuPy. Ideally, it should be the module +``dask.array`` or ``cupy`` itself. + +Composing backends +~~~~~~~~~~~~~~~~~~ + +There are some backends which may depend on other backends, for example xarray +depending on `numpy.fft`, and transforming a time axis into a frequency axis, +or Dask/xarray holding an array other than a NumPy array inside it. This would +be handled in the following manner inside code:: + + with ua.set_backend(cupy), ua.set_backend(dask.array): + # Code that has distributed GPU arrays here + +Proposals +~~~~~~~~~ + +The only change this NEP proposes at its acceptance, is to make ``unumpy`` the +officially recommended way to override NumPy. ``unumpy`` will remain a separate +repository/package (which we propose to vendor to avoid a hard dependency, and +use the separate ``unumpy`` package only if it is installed, rather than depend +on for the time being). In concrete terms, ``numpy.overridable`` becomes an +alias for ``unumpy``, if available with a fallback to the a vendored version if +not. ``uarray`` and ``unumpy`` and will be developed primarily with the input +of duck-array authors and secondarily, custom dtype authors, via the usual +GitHub workflow. There are a few reasons for this: + +* Faster iteration in the case of bugs or issues. +* Faster design changes, in the case of needed functionality. +* ``unumpy`` will work with older versions of NumPy as well. +* The user and library author opt-in to the override process, + rather than breakages happening when it is least expected. + In simple terms, bugs in ``unumpy`` mean that ``numpy`` remains + unaffected. + +Advantanges of ``unumpy`` over other solutions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``unumpy`` offers a number of advantanges over the approach of defining a new +protocol for every problem encountered: Whenever there is something requiring +an override, ``unumpy`` will be able to offer a unified API with very minor +changes. For example: + +* ``ufunc`` objects can be overridden via their ``__call__``, ``reduce`` and + other methods. +* Other functions can be overridden in a similar fashion. +* ``np.asduckarray`` goes away, and becomes ``np.overridable.asarray`` with a + backend set. +* The same holds for array creation functions such as ``np.zeros``, + ``np.empty`` and so on. + +This also holds for the future: Making something overridable would require only +minor changes to ``unumpy``. + +Another promise ``unumpy`` holds is one of default implementations. Default +implementations can be provided for any multimethod, in terms of others. This +allows one to override a large part of the NumPy API by defining only a small +part of it. This is to ease the creation of new duck-arrays, by providing +default implementations of many functions that can be easily expressed in +terms of others, as well as a repository of utility functions that help in the +implementation of duck-arrays that most duck-arrays would require. + +It also allows one to override functions in a manner which +``__array_function__`` simply cannot, such as overriding ``np.einsum`` with the +version from the ``opt_einsum`` package, or Intel MKL overriding FFT, BLAS +or ``ufunc`` objects. They would define a backend with the appropriate +multimethods, and the user would select them via a ``with`` statement, or +registering them as a backend. + +The last benefit is a clear way to coerce to a given backend (via the +``coerce`` keyword in ``ua.set_backend``), and a protocol +for coercing not only arrays, but also ``dtype`` objects and ``ufunc`` objects +with similar ones from other libraries. This is due to the existence of actual, +third party dtype packages, and their desire to blend into the NumPy ecosystem +(see [6]_). This is a separate issue compared to the C-level dtype redesign +proposed in [7]_, it's about allowing third-party dtype implementations to +work with NumPy, much like third-party array implementations. These can provide +features such as, for example, units, jagged arrays or other such features that +are outside the scope of NumPy. + +Mixing NumPy and ``unumpy`` in the same file +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Normally, one would only want to import only one of ``unumpy`` or ``numpy``, +you would import it as ``np`` for familiarity. However, there may be situations +where one wishes to mix NumPy and the overrides, and there are a few ways to do +this, depending on the user's style:: + + from numpy import overridable as unp + import numpy as np + +or:: + + import numpy as np + + # Use unumpy via np.overridable + +Duck-array coercion +~~~~~~~~~~~~~~~~~~~ + +There are inherent problems about returning objects that are not NumPy arrays +from ``numpy.array`` or ``numpy.asarray``, particularly in the context of C/C++ +or Cython code that may get an object with a different memory layout than the +one it expects. However, we believe this problem may apply not only to these +two functions but all functions that return NumPy arrays. For this reason, +overrides are opt-in for the user, by using the submodule ``numpy.overridable`` +rather than ``numpy``. NumPy will continue to work unaffected by anything in +``numpy.overridable``. + +If the user wishes to obtain a NumPy array, there are two ways of doing it: + +1. Use ``numpy.asarray`` (the non-overridable version). +2. Use ``numpy.overridable.asarray`` with the NumPy backend set and coercion + enabled + +Related Work +------------ + +Other override mechanisms +~~~~~~~~~~~~~~~~~~~~~~~~~ + +* NEP-18, the ``__array_function__`` protocol. [2]_ +* NEP-13, the ``__array_ufunc__`` protocol. [3]_ +* NEP-30, the ``__duck_array__`` protocol. [9]_ + +Existing NumPy-like array implementations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +* Dask: https://dask.org/ +* CuPy: https://cupy.chainer.org/ +* PyData/Sparse: https://sparse.pydata.org/ +* Xnd: https://xnd.readthedocs.io/ +* Astropy's Quantity: https://docs.astropy.org/en/stable/units/ + +Existing and potential consumers of alternative arrays +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +* Dask: https://dask.org/ +* scikit-learn: https://scikit-learn.org/ +* xarray: https://xarray.pydata.org/ +* TensorLy: http://tensorly.org/ + +Existing alternate dtype implementations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +* ``ndtypes``: https://ndtypes.readthedocs.io/en/latest/ +* Datashape: https://datashape.readthedocs.io +* Plum: https://plum-py.readthedocs.io/ + +Implementation +-------------- + +The implementation of this NEP will require the following steps: + +* Implementation of ``uarray`` multimethods corresponding to the + NumPy API, including classes for overriding ``dtype``, ``ufunc`` + and ``array`` objects, in the ``unumpy`` repository. +* Moving backends from ``unumpy`` into the respective array libraries. + +``uarray`` Primer +~~~~~~~~~~~~~~~~~ + +**Note:** *This section will not attempt to go into too much detail about +uarray, that is the purpose of the uarray documentation.* [1]_ +*However, the NumPy community will have input into the design of +uarray, via the issue tracker.* + +``unumpy`` is the interface that defines a set of overridable functions +(multimethods) compatible with the numpy API. To do this, it uses the +``uarray`` library. ``uarray`` is a general purpose tool for creating +multimethods that dispatch to one of multiple different possible backend +implementations. In this sense, it is similar to the ``__array_function__`` +protocol but with the key difference that the backend is explicitly installed +by the end-user and not coupled into the array type. + +Decoupling the backend from the array type gives much more flexibility to +end-users and backend authors. For example, it is possible to: + +* override functions not taking arrays as arguments +* create backends out of source from the array type +* install multiple backends for the same array type + +This decoupling also means that ``uarray`` is not constrained to dispatching +over array-like types. The backend is free to inspect the entire set of +function arguments to determine if it can implement the function e.g. ``dtype`` +parameter dispatching. + +Defining backends +^^^^^^^^^^^^^^^^^ + +``uarray`` consists of two main protocols: ``__ua_convert__`` and +``__ua_function__``, called in that order, along with ``__ua_domain__``. +``__ua_convert__`` is for conversion and coercion. It has the signature +``(dispatchables, coerce)``, where ``dispatchables`` is an iterable of +``ua.Dispatchable`` objects and ``coerce`` is a boolean indicating whether or +not to force the conversion. ``ua.Dispatchable`` is a simple class consisting +of three simple values: ``type``, ``value``, and ``coercible``. +``__ua_convert__`` returns an iterable of the converted values, or +``NotImplemented`` in the case of failure. + +``__ua_function__`` has the signature ``(func, args, kwargs)`` and defines +the actual implementation of the function. It recieves the function and its +arguments. Returning ``NotImplemented`` will cause a move to the default +implementation of the function if one exists, and failing that, the next +backend. + +Here is what will happen assuming a ``uarray`` multimethod is called: + +1. We canonicalise the arguments so any arguments without a default + are placed in ``*args`` and those with one are placed in ``**kwargs``. +2. We check the list of backends. + + a. If it is empty, we try the default implementation. + +3. We check if the backend's ``__ua_convert__`` method exists. If it exists: + + a. We pass it the output of the dispatcher, + which is an iterable of ``ua.Dispatchable`` objects. + b. We feed this output, along with the arguments, + to the argument replacer. ``NotImplemented`` means we move to 3 + with the next backend. + c. We store the replaced arguments as the new arguments. + +4. We feed the arguments into ``__ua_function__``, and return the output, and + exit if it isn't ``NotImplemented``. +5. If the default implementation exists, we try it with the current backend. +6. On failure, we move to 3 with the next backend. If there are no more + backends, we move to 7. +7. We raise a ``ua.BackendNotImplementedError``. + +Defining overridable multimethods +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +To define an overridable function (a multimethod), one needs a few things: + +1. A dispatcher that returns an iterable of ``ua.Dispatchable`` objects. +2. A reverse dispatcher that replaces dispatchable values with the supplied + ones. +3. A domain. +4. Optionally, a default implementation, which can be provided in terms of + other multimethods. + +As an example, consider the following:: + + import uarray as ua + + def full_argreplacer(args, kwargs, dispatchables): + def full(shape, fill_value, dtype=None, order='C'): + return (shape, fill_value), dict( + dtype=dispatchables[0], + order=order + ) + + return full(*args, **kwargs) + + @ua.create_multimethod(full_argreplacer, domain="numpy") + def full(shape, fill_value, dtype=None, order='C'): + return (ua.Dispatchable(dtype, np.dtype),) + +A large set of examples can be found in the ``unumpy`` repository, [8]_. +This simple act of overriding callables allows us to override: + +* Methods +* Properties, via ``fget`` and ``fset`` +* Entire objects, via ``__get__``. + +Examples for NumPy +^^^^^^^^^^^^^^^^^^ + +A library that implements a NumPy-like API will use it in the following +manner (as an example):: + + import numpy.overridable as unp + _ua_implementations = {} + + __ua_domain__ = "numpy" + + def __ua_function__(func, args, kwargs): + fn = _ua_implementations.get(func, None) + return fn(*args, **kwargs) if fn is not None else NotImplemented + + def implements(ua_func): + def inner(func): + _ua_implementations[ua_func] = func + return func + + return inner + + @implements(unp.asarray) + def asarray(a, dtype=None, order=None): + # Code here + # Either this method or __ua_convert__ must + # return NotImplemented for unsupported types, + # Or they shouldn't be marked as dispatchable. + + # Provides a default implementation for ones and zeros. + @implements(unp.full) + def full(shape, fill_value, dtype=None, order='C'): + # Code here + +Backward compatibility +---------------------- + +There are no backward incompatible changes proposed in this NEP. + +Alternatives +------------ + +The current alternative to this problem is a combination of NEP-18 [2]_, +NEP-13 [4]_ and NEP-30 [9]_ plus adding more protocols (not yet specified) +in addition to it. Even then, some parts of the NumPy API will remain +non-overridable, so it's a partial alternative. + +The main alternative to vendoring ``unumpy`` is to simply move it into NumPy +completely and not distribute it as a separate package. This would also achieve +the proposed goals, however we prefer to keep it a separate package for now, +for reasons already stated above. + +The third alternative is to move ``unumpy`` into the NumPy organisation and +develop it as a NumPy project. This will also achieve the said goals, and is +also a possibility that can be considered by this NEP. However, the act of +doing an extra ``pip install`` or ``conda install`` may discourage some users +from adopting this method. + +Discussion +---------- + +* ``uarray`` blogpost: https://labs.quansight.org/blog/2019/07/uarray-update-api-changes-overhead-and-comparison-to-__array_function__/ +* The discussion section of NEP-18: https://numpy.org/neps/nep-0018-array-function-protocol.html#discussion +* NEP-22: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html +* Dask issue #4462: https://github.com/dask/dask/issues/4462 +* PR #13046: https://github.com/numpy/numpy/pull/13046 +* Dask issue #4883: https://github.com/dask/dask/issues/4883 +* Issue #13831: https://github.com/numpy/numpy/issues/13831 +* Discussion PR 1: https://github.com/hameerabbasi/numpy/pull/3 +* Discussion PR 2: https://github.com/hameerabbasi/numpy/pull/4 +* Discussion PR 3: https://github.com/numpy/numpy/pull/14389 + + +References and Footnotes +------------------------ + +.. [1] uarray, A general dispatch mechanism for Python: https://uarray.readthedocs.io + +.. [2] NEP 18 — A dispatch mechanism for NumPy’s high level array functions: https://numpy.org/neps/nep-0018-array-function-protocol.html + +.. [3] NEP 22 — Duck typing for NumPy arrays – high level overview: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html + +.. [4] NEP 13 — A Mechanism for Overriding Ufuncs: https://numpy.org/neps/nep-0013-ufunc-overrides.html + +.. [5] Reply to Adding to the non-dispatched implementation of NumPy methods: http://numpy-discussion.10968.n7.nabble.com/Adding-to-the-non-dispatched-implementation-of-NumPy-methods-tp46816p46874.html + +.. [6] Custom Dtype/Units discussion: http://numpy-discussion.10968.n7.nabble.com/Custom-Dtype-Units-discussion-td43262.html + +.. [7] The epic dtype cleanup plan: https://github.com/numpy/numpy/issues/2899 + +.. [8] unumpy: NumPy, but implementation-independent: https://unumpy.readthedocs.io + +.. [9] NEP 30 — Duck Typing for NumPy Arrays - Implementation: https://www.numpy.org/neps/nep-0030-duck-array-protocol.html + +.. [10] http://scipy.github.io/devdocs/fft.html#backend-control + + +Copyright +--------- + +This document has been placed in the public domain. |