diff options
author | Marten van Kerkwijk <mhvk@astro.utoronto.ca> | 2017-03-31 12:28:16 -0400 |
---|---|---|
committer | Charles Harris <charlesr.harris@gmail.com> | 2017-04-27 13:25:50 -0600 |
commit | 5fe6fc640d752fe9e4a9a51635bf070b503aa85e (patch) | |
tree | ac0a5244c98a24e4f3ec88954e50e2f4f675c43b | |
parent | 30417109170d1f5f1256172e6506ea32751b0587 (diff) | |
download | numpy-5fe6fc640d752fe9e4a9a51635bf070b503aa85e.tar.gz |
DOC Update NEP to reflect actual implementation.
-rw-r--r-- | doc/neps/ufunc-overrides.rst | 262 |
1 files changed, 151 insertions, 111 deletions
diff --git a/doc/neps/ufunc-overrides.rst b/doc/neps/ufunc-overrides.rst index 480e229c2..f69da0090 100644 --- a/doc/neps/ufunc-overrides.rst +++ b/doc/neps/ufunc-overrides.rst @@ -2,6 +2,8 @@ A Mechanism for Overriding Ufuncs ================================= +.. currentmodule:: numpy + :Author: Blake Griffith :Contact: blake.g@utexas.edu :Date: 2013-07-10 @@ -10,25 +12,32 @@ A Mechanism for Overriding Ufuncs :Author: Nathaniel Smith +:Author: Marten van Kerkwijk +:Date: 2017-03-31 Executive summary ================= NumPy's universal functions (ufuncs) currently have some limited -functionality for operating on user defined subclasses of ndarray using -``__array_prepare__`` and ``__array_wrap__`` [1]_, and there is little -to no support for arbitrary objects. e.g. SciPy's sparse matrices [2]_ -[3]_. +functionality for operating on user defined subclasses of +:class:`ndarray` using ``__array_prepare__`` and ``__array_wrap__`` +[1]_, and there is little to no support for arbitrary +objects. e.g. SciPy's sparse matrices [2]_ [3]_. Here we propose adding a mechanism to override ufuncs based on the ufunc -checking each of it's arguments for a ``__numpy_ufunc__`` method. -On discovery of ``__numpy_ufunc__`` the ufunc will hand off the +checking each of it's arguments for a ``__array_ufunc__`` method. +On discovery of ``__array_ufunc__`` the ufunc will hand off the operation to the method. This covers some of the same ground as Travis Oliphant's proposal to retro-fit NumPy with multi-methods [4]_, which would solve the same problem. The mechanism here follows more closely the way Python enables -classes to override ``__mul__`` and other binary operations. +classes to override ``__mul__`` and other binary operations. It also +specifically addresses how binary operators and ufuncs should interact. + +.. note:: In earlier iterations, the override was called + ``__numpy_ufunc__``. An implementation was made, but had not + quite the right behaviour, hence the change in name. .. [1] http://docs.python.org/doc/numpy/user/basics.subclassing.html .. [2] https://github.com/scipy/scipy/issues/2123 @@ -41,13 +50,14 @@ Motivation The current machinery for dispatching Ufuncs is generally agreed to be insufficient. There have been lengthy discussions and other proposed -solutions [5]_. +solutions [5]_, [6]_. -Using ufuncs with subclasses of ndarray is limited to ``__array_prepare__`` and -``__array_wrap__`` to prepare the arguments, but these don't allow you to for -example change the shape or the data of the arguments. Trying to ufunc things -that don't subclass ndarray is even more difficult, as the input arguments tend -to be cast to object arrays, which ends up producing surprising results. +Using ufuncs with subclasses of :class:`ndarray` is limited to +``__array_prepare__`` and ``__array_wrap__`` to prepare the arguments, +but these don't allow you to for example change the shape or the data of +the arguments. Trying to ufunc things that don't subclass +:class:`ndarray` is even more difficult, as the input arguments tend to +be cast to object arrays, which ends up producing surprising results. Take this example of ufuncs interoperability with sparse matrices.:: @@ -81,7 +91,7 @@ Take this example of ufuncs interoperability with sparse matrices.:: In [5]: np.multiply(a, bsp) # Returns NotImplemented to user, bad! Out[5]: NotImplemted -Returning ``NotImplemented`` to user should not happen. Moreover:: +Returning :obj:`NotImplemented` to user should not happen. Moreover:: In [6]: np.multiply(asp, b) Out[6]: array([[ <3x3 sparse matrix of type '<class 'numpy.int64'>' @@ -106,21 +116,24 @@ Returning ``NotImplemented`` to user should not happen. Moreover:: Here, it appears that the sparse matrix was converted to an object array scalar, which was then multiplied with all elements of the ``b`` array. However, this behavior is more confusing than useful, and having a -``TypeError`` would be preferable. +:exc:`TypeError` would be preferable. -Adding the ``__numpy_ufunc__`` functionality fixes this and would +Adding the ``__array_ufunc__`` functionality fixes this and would deprecate the other ufunc modifying functions. .. [5] http://mail.python.org/pipermail/numpy-discussion/2011-June/056945.html +.. [6] https://github.com/numpy/numpy/issues/5844 Proposed interface ================== -Objects that want to override Ufuncs can define a ``__numpy_ufunc__`` method. -The method signature is:: +The standard array class :class:`ndarray` gains an ``__array_ufunc__`` +method and objects can override Ufuncs by overriding this method (if +they are :class:`ndarray` subclasses) or defining their own. The method +signature is:: - def __numpy_ufunc__(self, ufunc, method, i, inputs, **kwargs) + def __array_ufunc__(self, ufunc, method, *inputs, **kwargs) Here: @@ -128,141 +141,168 @@ Here: - *method* is a string indicating which Ufunc method was called (one of ``"__call__"``, ``"reduce"``, ``"reduceat"``, ``"accumulate"``, ``"outer"``, ``"inner"``). -- *i* is the index of *self* in *inputs*. - *inputs* is a tuple of the input arguments to the ``ufunc`` - *kwargs* are the keyword arguments passed to the function. The ``out`` - arguments are always contained in *kwargs*, how positional variables - are passed is discussed below. - -The ufunc's arguments are first normalized into a tuple of input data -(``inputs``), and dict of keyword arguments. If there are output -arguments they are handled as follows: - -- One positional output variable x is passed in the kwargs dict as ``out : - x``. -- Multiple positional output variables ``x0, x1, ...`` are passed as a tuple - in the kwargs dict as ``out : (x0, x1, ...)``. -- Keyword output variables like ``out = x`` and ``out = (x0, x1, ...)`` are - passed unchanged to the kwargs dict like ``out : x`` and ``out : (x0, x1, - ...)`` respectively. -- Combinations of positional and keyword output variables are not - supported. + arguments are always contained as a tuple in *kwargs*. + +Hence, the arguments are normalized: only the input data (``inputs``) +are passed on as positional arguments, all the others are passed on as a +dict of keyword arguments (``kwargs``). In particular, if there are +output arguments, positional are otherwise, they are passed on as a +tuple in the ``out`` keyword argument. The function dispatch proceeds as follows: -- If one of the input arguments implements ``__numpy_ufunc__`` it is +- If one of the input arguments implements ``__array_ufunc__`` it is executed instead of the Ufunc. -- If more than one of the input arguments implements ``__numpy_ufunc__``, +- If more than one of the input arguments implements ``__array_ufunc__``, they are tried in the following order: subclasses before superclasses, - otherwise left to right. The first ``__numpy_ufunc__`` method returning - something else than ``NotImplemented`` determines the return value of + otherwise left to right. The first ``__array_ufunc__`` method returning + something else than :obj:`NotImplemented` determines the return value of the Ufunc. -- If all ``__numpy_ufunc__`` methods of the input arguments return - ``NotImplemented``, a ``TypeError`` is raised. +- If all ``__array_ufunc__`` methods of the input arguments return + :obj:`NotImplemented`, a :exc:`TypeError` is raised. -- If a ``__numpy_ufunc__`` method raises an error, the error is propagated +- If a ``__array_ufunc__`` method raises an error, the error is propagated immediately. -If none of the input arguments has a ``__numpy_ufunc__`` method, the +If none of the input arguments has an ``__array_ufunc__`` method, the execution falls back on the default ufunc behaviour. +Subclass hierarchies +-------------------- + +Hierarchies of such containers (say, a masked quantity), are most easily +constructed if methods consistently use :func:`super` to pass through +the class hierarchy [7]_. To support this, :class:`ndarray` has its own +``__array_ufunc__`` method (which is equivalent to ``getattr(ufunc, +method)(*inputs, **kwargs)``, i.e., if any of the (adjusted) inputs +still defines ``__array_ufunc__`` that will be called in turn). This +should be particularly useful for container-like subclasses of +:class:`ndarray`, which add an attribute like a unit or mask to a +regular :class:`ndarray`. Such classes can do possible adjustment of the +arguments relevant to their own class, pass on to another class in the +hierarchy using :func:`super` until the Ufunc is actually done, and then +do possible adjustments of the outputs. + +Turning Ufuncs off +------------------ + +For some classes, Ufuncs make no sense, and, like for other special +methods [8]_, one can indicate Ufuncs are not available by setting +``__array_ufunc__`` to :obj:`None`. Inside a Ufunc, this is +equivalent to unconditionally return :obj:`NotImplemented`, and thus +will lead to a :exc:`TypeError` (unless another operand implements +``__array_ufunc__`` and knows how to deal with the class). + +.. [7] https://rhettinger.wordpress.com/2011/05/26/super-considered-super/ + +.. [8] https://docs.python.org/3/reference/datamodel.html#specialnames In combination with Python's binary operations ---------------------------------------------- -The ``__numpy_ufunc__`` mechanism is fully independent of Python's +The ``__array_ufunc__`` mechanism is fully independent of Python's standard operator override mechanism, and the two do not interact directly. -They however have indirect interactions, because NumPy's ``ndarray`` -type implements its binary operations via Ufuncs. Effectively, we have:: - - class ndarray(object): +They have indirect interactions, however, because NumPy's +:class:`ndarray` type implements its binary operations via Ufuncs. For +most numerical classes, the easiest way to override binary operations is +thus to define ``__array_ufunc__`` and override the corresponding +Ufunc. The class can then, like :class:`ndarray` itself, define the +binary operators in terms of Ufuncs. Here, one has to take some care. +E.g., the simplest implementation would be:: + + class ArrayLike(object): + def __array_ufunc__(self, ufunc, method, *inputs, **kwargs): + ... + return result ... def __mul__(self, other): - return np.multiply(self, other) + return self.__array_ufunc__(np.multiply, '__call__', self, other) -Suppose now we have a second class:: +Suppose now, however, that ``other`` is class that does not know how to +deal with arrays and ufuncs, but does know how to do multiplication:: class MyObject(object): - def __numpy_ufunc__(self, *a, **kw): - return "ufunc" + __array_ufunc__ = None def __mul__(self, other): return 1234 def __rmul__(self, other): return 4321 In this case, standard Python override rules combined with the above -discussion imply:: +discussion would imply:: - a = MyObject() - b = np.array([0]) + mine = MyObject() + arr = ArrayLike([0]) - a * b # == 1234 OK - b * a # == "ufunc" surprising + mine * arr # == 1234 OK + arr * mine # TypeError surprising -This is not what would be naively expected, and is therefore somewhat -undesirable behavior. +The reason why this would occur is: because ``MyObject`` is not an +``ArrayLike`` subclass, Python resolves the expression ``arr * mine`` by +calling first ``arr.__mul__``. In the above implementation, this would +just call the Ufunc, which would see that ``mine.__array_ufunc__`` is +:obj:`None` and raise a :exc:`TypeError`. (Note that if ``MyObject`` +is a subclass of :class:`ndarray`, Python calls ``mine.__rmul__`` first.) -The reason why this occurs is: because ``MyObject`` is not an ndarray -subclass, Python resolves the expression ``b * a`` by calling first -``b.__mul__``. Since NumPy implements this via an Ufunc, the call is -forwarded to ``__numpy_ufunc__`` and not to ``__rmul__``. Note that if -``MyObject`` is a subclass of ``ndarray``, Python calls ``a.__rmul__`` -first. The issue is therefore that ``__numpy_ufunc__`` implements -"virtual subclassing" of ndarray behavior, without actual subclassing. +So, a better implementation of the binary operators would check whether +the other class can be dealt with in ``__array_ufunc__`` and, if not, +return :obj:`NotImplemented`:: -This issue can be resolved by a modification of the binary operation -methods in NumPy:: - - class ndarray(object): + class ArrayLike(object): ... def __mul__(self, other): - if (not isinstance(other, self.__class__) - and hasattr(other, '__numpy_ufunc__') - and hasattr(other, '__rmul__')): - return NotImplemented - return np.multiply(self, other) - - def __imul__(self, other): - if (other.__class__ is not self.__class__ - and hasattr(other, '__numpy_ufunc__') - and hasattr(other, '__rmul__')): + if getattr(other, '__array_ufunc__', False) is None: return NotImplemented - return np.multiply(self, other, out=self) - - b * a # == 4321 OK - -The rationale here is the following: since the user class explicitly -defines both ``__numpy_ufunc__`` and ``__rmul__``, the implementor has -very likely made sure that the ``__rmul__`` method can process ndarrays. -If not, the special case is simple to deal with (just call -``np.multiply``). - -The exclusion of subclasses of self can be made because Python itself -calls the right-hand method first in this case. Moreover, it is -desirable that ndarray subclasses are able to inherit the right-hand -binary operation methods from ndarray. - -The same priority shuffling needs to be done also for the in-place -operations, so that ``MyObject.__rmul__`` is prioritized over -``ndarray.__imul__``. - + return self.__array_ufunc__(np.multiply, '__call__', self, other) + + arr = ArrayLike([0]) + + arr * mine # == 4321 OK + +Indeed, after long discussion about whether it might make more sense to +ask classes like ``ArrayLike`` to implement a full ``__array_ufunc__`` +[6]_, the same design as the above was agreed on for :class:`ndarray` +itself. + +.. note:: The above holds for regular operators. For in-place + operators, :class:`ndarray` never returns + :obj:`NotImplemented`, i.e., ``ndarr *= mine`` would always + lead to a :exc:`TypeError`. This is because for arrays + in-place operations cannot generically be replaced by a simple + reverse operation. For instance, sticking to the above + example, what would ``ndarr[:] *= mine`` imply? Assuming it + means ``ndarr[:] = ndarr[:] * mine``, as python does by + default, is likely to be wrong. + +Extension to other numpy functions +---------------------------------- + +The ``__array_ufunc__`` method is used to override :func:`~numpy.dot` +and :func:`~numpy.matmul` as well, since while these functions are not +(yet) implemented as (generalized) Ufuncs, they are very similar. For +other functions, such as :func:`~numpy.median`, :func:`~numpy.min`, +etc., implementations as (generalized) Ufuncs may well be possible and +logical as well, in which case it will become possible to override these +as well. Demo ==== -A pull request[6]_ has been made including the changes proposed in this NEP. -Here is a demo highlighting the functionality.:: +A pull request [8]_ has been made including the changes and revisions +proposed in this NEP. Here is a demo highlighting the functionality.:: In [1]: import numpy as np; In [2]: a = np.array([1]) In [3]: class B(): - ...: def __numpy_ufunc__(self, func, method, pos, inputs, **kwargs): + ...: def __array_ufunc__(self, func, method, pos, inputs, **kwargs): ...: return "B" ...: @@ -274,24 +314,24 @@ Here is a demo highlighting the functionality.:: In [6]: np.multiply(a, b) Out[6]: 'B' -A simple ``__numpy_ufunc__`` has been added to SciPy's sparse matrices -Currently this only handles ``np.dot`` and ``np.multiply`` because it was the -two most common cases where users would attempt to use sparse matrices with ufuncs. -The method is defined below:: +As a simple example, one could add the following ``__array_ufunc__`` to +SciPy's sparse matrices (just for ``np.dot`` and ``np.multiply`` as +these are the two most common cases where users would attempt to use +sparse matrices with ufuncs):: - def __numpy_ufunc__(self, func, method, pos, inputs, **kwargs): + def __array_ufunc__(self, func, method, pos, inputs, **kwargs): """Method for compatibility with NumPy's ufuncs and dot functions. """ without_self = list(inputs) - del without_self[pos] + without_self.pop(self) without_self = tuple(without_self) - if func == np.multiply: + if func is np.multiply: return self.multiply(*without_self) - elif func == np.dot: + elif func is np.dot: if pos == 0: return self.__mul__(inputs[1]) if pos == 1: |