diff options
-rw-r--r-- | doc/neps/nep-0031-uarray.rst | 124 |
1 files changed, 122 insertions, 2 deletions
diff --git a/doc/neps/nep-0031-uarray.rst b/doc/neps/nep-0031-uarray.rst index d1c3d806e..c7b950f14 100644 --- a/doc/neps/nep-0031-uarray.rst +++ b/doc/neps/nep-0031-uarray.rst @@ -114,7 +114,8 @@ Proposals ~~~~~~~~~ The only change this NEP proposes at its acceptance, is to make ``unumpy`` the -officially recommended way to override NumPy. ``unumpy`` will remain a separate +officially recommended way to override NumPy, along with making some submodules +overridable by default via ``uarray``. ``unumpy`` will remain a separate repository/package (which we propose to vendor to avoid a hard dependency, and use the separate ``unumpy`` package only if it is installed, rather than depend on for the time being). In concrete terms, ``numpy.overridable`` becomes an @@ -130,6 +131,10 @@ GitHub workflow. There are a few reasons for this: rather than breakages happening when it is least expected. In simple terms, bugs in ``unumpy`` mean that ``numpy`` remains unaffected. +* For ``numpy.fft``, ``numpy.linalg`` and ``numpy.random``, the functions in + the main namespace will mirror those in the ``numpy.overridable`` namespace. + The reason for this is that there may exist functions in the in these + submodules that need backends, even for ``numpy.ndarray`` inputs. Advantanges of ``unumpy`` over other solutions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -156,7 +161,13 @@ allows one to override a large part of the NumPy API by defining only a small part of it. This is to ease the creation of new duck-arrays, by providing default implementations of many functions that can be easily expressed in terms of others, as well as a repository of utility functions that help in the -implementation of duck-arrays that most duck-arrays would require. +implementation of duck-arrays that most duck-arrays would require. This would +allow us to avoid designing entire protocols, e.g., a protocol for stacking +and concatenating would be replaced by simply implementing ``stack`` and/or +``concatenate`` and then providing default implementations for everything else +in that class. The same applies for transposing, and many other functions for +which protocols haven't been proposed, such as ``isin`` in terms of ``in1d``, +``setdiff1d`` in terms of ``unique``, and so on. It also allows one to override functions in a manner which ``__array_function__`` simply cannot, such as overriding ``np.einsum`` with the @@ -211,6 +222,101 @@ If the user wishes to obtain a NumPy array, there are two ways of doing it: 2. Use ``numpy.overridable.asarray`` with the NumPy backend set and coercion enabled +Aliases outside of the ``numpy.overridable`` namespace +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +All functionality in ``numpy.random``, ``numpy.linalg`` and ``numpy.fft`` +will be aliased to their respective overridable versions inside +``numpy.overridable``. The reason for this is that there are alternative +implementations of RNGs (``mkl-random``), linear algebra routines (``eigen``, +``blis``) and FFT routines (``mkl-fft``, ``pyFFTW``) that need to operate on +``numpy.ndarray`` inputs, but still need the ability to switch behaviour. + +This is different from monkeypatching in a few different ways: + +* The caller-facing signature of the function is always the same, + so there is at least the loose sense of an API contract. Monkeypatching + does not provide this ability. +* There is the ability of locally switching the backend. +* It has been `suggested <http://numpy-discussion.10968.n7.nabble.com/NEP-31-Context-local-and-global-overrides-of-the-NumPy-API-tp47452p47472.html>`_ + that the reason that 1.17 hasn't landed in the Anaconda defaults channel is + due to the incompatibility between monkeypatching and ``__array_function__``, + as monkeypatching would bypass the protocol completely. +* Statements of the form ``from numpy import x; x`` and ``np.x`` would have + different results depending on whether the import was made before or + after monkeypatching happened. + +All this isn't possible at all with ``__array_function__`` or +``__array_ufunc__``. + +It has been formally realised (at least in part) that a backend system is +needed for this, in the `NumPy roadmap <https://numpy.org/neps/roadmap.html#other-functionality>`_. + +For ``numpy.random``, it's still necessary to make the C-API fit the one +proposed in `NEP-19 <https://numpy.org/neps/nep-0019-rng-policy.html>`_. +This is impossible for `mkl-random`, because then it would need to be +rewritten to fit that framework. The guarantees on stream +compatibility will be the same as before, but if there's a backend that affects +``numpy.random`` set, we make no guarantees about stream compatibility, and it +is up to the backend author to provide their own guarantees. + +Providing a way for implicit dispatch +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +It has been suggested that the ability to dispatch methods which do not take +a dispatchable is needed, while guessing that backend from another dispatchable. + +As a concrete example, consider the following: + +.. code:: python + + with unumpy.determine_backend(array_like, np.ndarray): + unumpy.arange(len(array_like)) + +While this does not exist yet in ``uarray``, it is trivial to add it. The need for +this kind of code exists because one might want to have an alternative for the +proposed ``*_like`` functions, or the ``like=`` keyword argument. The need for these +exists because there are functions in the NumPy API that do not take a dispatchable +argument, but there is still the need to select a backend based on a different +dispatchable. + +The need for an opt-in module +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The need for an opt-in module is realised because of a few reasons: + +* There are parts of the API (like `numpy.asarray`) that simply cannot be + overridden due to incompatibility concerns with C/Cython extensions, however, + one may want to coerce to a duck-array using ``asarray`` with a backend set. +* There are possible issues around an implicit option and monkeypatching, such + as those mentioned above. + +NEP 18 notes that this may require maintenance of two separate APIs. However, +this burden may be lessened by, for example, parametrizing all tests over +``numpy.overridable`` separately via a fixture. This also has the side-effect +of thoroughly testing it, unlike ``__array_function__``. We also feel that it +provides an oppurtunity to separate the NumPy API contract properly from the +implementation. + +Benefits to end-users and mixing backends +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Mixing backends is easy in ``uarray``, one only has to do: + +.. code:: python + + # Explicitly say which backends you want to mix + ua.register_backend(backend1) + ua.register_backend(backend2) + ua.register_backend(backend3) + + # Freely use code that mixes backends here. + +The benefits to end-users extend beyond just writing new code. Old code +(usually in the form of scripts) can be easily ported to different backends +by a simple import switch and a line adding the preferred backend. This way, +users may find it easier to port existing code to GPU or distributed computing. + Related Work ------------ @@ -245,6 +351,14 @@ Existing alternate dtype implementations * Datashape: https://datashape.readthedocs.io * Plum: https://plum-py.readthedocs.io/ +Alternate implementations of parts of the NumPy API +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +* ``mkl_random``: https://github.com/IntelPython/mkl_random +* ``mkl_fft``: https://github.com/IntelPython/mkl_fft +* ``bottleneck``: https://github.com/pydata/bottleneck +* ``opt_einsum``: https://github.com/dgasmith/opt_einsum + Implementation -------------- @@ -420,6 +534,12 @@ also a possibility that can be considered by this NEP. However, the act of doing an extra ``pip install`` or ``conda install`` may discourage some users from adopting this method. +An alternative to requiring opt-in is mainly to *not* override ``np.asarray`` +and ``np.array``, and making the rest of the NumPy API surface overridable, +instead providing ``np.duckarray`` and ``np.asduckarray`` +as duck-array friendly alternatives that used the respective overrides. However, +this has the downside of adding a minor overhead to NumPy calls. + Discussion ---------- |