diff options
author | Blake Griffith <blake.a.griffith@gmail.com> | 2013-08-25 09:35:06 -0500 |
---|---|---|
committer | Blake Griffith <blake.a.griffith@gmail.com> | 2013-08-31 16:53:00 -0500 |
commit | 6fe8eb607127b554195ed25f8636f5caefd477c3 (patch) | |
tree | 5bc514879285baaa2e928dd8ea16188e007aef88 /doc | |
parent | 74b6b2cf151c4e869c35e2d226f0d6b69ea9d330 (diff) | |
download | numpy-6fe8eb607127b554195ed25f8636f5caefd477c3.tar.gz |
DOC: Add NEP and documentation for ufunc overrides.
Diffstat (limited to 'doc')
-rw-r--r-- | doc/neps/ufunc-overrides.rst | 242 | ||||
-rw-r--r-- | doc/source/reference/arrays.classes.rst | 34 |
2 files changed, 276 insertions, 0 deletions
diff --git a/doc/neps/ufunc-overrides.rst b/doc/neps/ufunc-overrides.rst new file mode 100644 index 000000000..1c0ab1c78 --- /dev/null +++ b/doc/neps/ufunc-overrides.rst @@ -0,0 +1,242 @@ +================================= +A Mechanism for Overriding Ufuncs +================================= + +:Author: Blake Griffith +:Contact: blake.g@utexa.edu +:Date: 2013-07-10 + +:Author: Pauli Virtanen + +:Author: Nathaniel Smith + + +Executive summary +================= + +NumPy's universal functions (ufuncs) currently have some limited +functionality for operating on user defined subclasses of ndarray using +``__array_prepare__`` and ``__array_wrap__`` [1]_, and there is little +to no support for arbitrary objects. e.g. SciPy's sparse matrices [2]_ +[3]_. + +Here we propose adding a mechanism to override ufuncs based on the ufunc +checking each of it's arguments for a ``__numpy_ufunc__`` method. +On discovery of ``__numpy_ufunc__`` the ufunc will hand off the +operation to the method. + +This covers some of the same ground as Travis Oliphant's proposal to +retro-fit NumPy with multi-methods [4]_, which would solve the same +problem. The mechanism here follows more closely the way Python enables +classes to override ``__mul__`` and other binary operations. + +.. [1] http://docs.scipy.org/doc/numpy/user/basics.subclassing.html +.. [2] https://github.com/scipy/scipy/issues/2123 +.. [3] https://github.com/scipy/scipy/issues/1569 +.. [4] http://technicaldiscovery.blogspot.com/2013/07/thoughts-after-scipy-2013-and-specific.html + + +Motivation +========== + +The current machinery for dispatching Ufuncs is generally agreed to be +insufficient. There have been lengthy discussions and other proposed +solutions [5]_. + +Using ufuncs with subclasses of ndarray is limited to ``__array_prepare__`` and +``__array_wrap__`` to prepare the arguments, but these don't allow you to for +example change the shape or the data of the arguments. Trying to ufunc things +that don't subclass ndarray is even more difficult, as the input arguments tend +to be cast to object arrays, which ends up producing surprising results. + +Take this example of ufuncs interoperability with sparse matrices.:: + + In [1]: import numpy as np + import scipy.sparse as sp + + a = np.random.randint(5, size=(3,3)) + b = np.random.randint(5, size=(3,3)) + + asp = sp.csr_matrix(a) + bsp = sp.csr_matrix(b) + + In [2]: a, b + Out[2]:(array([[0, 4, 4], + [1, 3, 2], + [1, 3, 1]]), + array([[0, 1, 0], + [0, 0, 1], + [4, 0, 1]])) + + In [3]: np.multiply(a, b) # The right answer + Out[3]: array([[0, 4, 0], + [0, 0, 2], + [4, 0, 1]]) + + In [4]: np.multiply(asp, bsp).todense() # calls __mul__ which does matrix multi + Out[4]: matrix([[16, 0, 8], + [ 8, 1, 5], + [ 4, 1, 4]], dtype=int64) + + In [5]: np.multiply(a, bsp) # Returns NotImplemented to user, bad! + Out[5]: NotImplemted + +Returning ``NotImplemented`` to user should not happen. Moreover:: + + In [6]: np.multiply(asp, b) + Out[6]: array([[ <3x3 sparse matrix of type '<class 'numpy.int64'>' + with 8 stored elements in Compressed Sparse Row format>, + <3x3 sparse matrix of type '<class 'numpy.int64'>' + with 8 stored elements in Compressed Sparse Row format>, + <3x3 sparse matrix of type '<class 'numpy.int64'>' + with 8 stored elements in Compressed Sparse Row format>], + [ <3x3 sparse matrix of type '<class 'numpy.int64'>' + with 8 stored elements in Compressed Sparse Row format>, + <3x3 sparse matrix of type '<class 'numpy.int64'>' + with 8 stored elements in Compressed Sparse Row format>, + <3x3 sparse matrix of type '<class 'numpy.int64'>' + with 8 stored elements in Compressed Sparse Row format>], + [ <3x3 sparse matrix of type '<class 'numpy.int64'>' + with 8 stored elements in Compressed Sparse Row format>, + <3x3 sparse matrix of type '<class 'numpy.int64'>' + with 8 stored elements in Compressed Sparse Row format>, + <3x3 sparse matrix of type '<class 'numpy.int64'>' + with 8 stored elements in Compressed Sparse Row format>]], dtype=object) + +Here, it appears that the sparse matrix was converted to a object array +scalar, which was then multiplied with all elements of the ``b`` array. +However, this behavior is more confusing than useful, and having a +``TypeError`` would be preferable. + +Adding the ``__numpy_ufunc__`` functionality fixes this and would +deprecate the other ufunc modifying functions. + +.. [5] http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html + + +Proposed interface +================== + +Objects that want to override Ufuncs can define a ``__numpy_ufunc__`` method. +The method signature is:: + + def __numpy_ufunc__(self, ufunc, method, i, inputs, **kwargs) + +Here: + +- *ufunc* is the ufunc object that was called. +- *method* is a string indicating which Ufunc method was called + (one of ``"__call__"``, ``"reduce"``, ``"reduceat"``, + ``"accumulate"``, ``"outer"``, ``"inner"``). +- *i* is the index of *self* in *inputs*. +- *inputs* is a tuple of the input arguments to the ``ufunc`` +- *kwargs* are the keyword arguments passed to the function. The ``out`` + argument is always contained in *kwargs*, if given. + +The ufunc's arguments are first normalized into a tuple of input data +(``inputs``), and dict of keyword arguments. If the output argument is +passed as a positional argument it is moved to the keyword argmunets. + +The function dispatch proceeds as follows: + +- If one of the input arguments implements ``__numpy_ufunc__`` it is + executed instead of the Ufunc. + +- If more than one of the input arguments implements ``__numpy_ufunc__``, + they are tried in the following order: subclasses before superclasses, + otherwise left to right. The first ``__numpy_ufunc__`` method returning + something else than ``NotImplemented`` determines the return value of + the Ufunc. + +- If all ``__numpy_ufunc__`` methods of the input arguments return + ``NotImplemented``, a ``TypeError`` is raised. + +- If a ``__numpy_ufunc__`` method raises an error, the error is propagated + immediately. + +If none of the input arguments has a ``__numpy_ufunc__`` method, the +execution falls back on the default ufunc behaviour. + + +Demo +==== + +A pull request[6]_ has been made including the changes proposed in this NEP. +Here is a demo highlighting the functionality.:: + + In [1]: import numpy as np; + + In [2]: a = np.array([1]) + + In [3]: class B(): + ...: def __numpy_ufunc__(self, func, method, pos, inputs, **kwargs): + ...: return "B" + ...: + + In [4]: b = B() + + In [5]: np.dot(a, b) + Out[5]: 'B' + + In [6]: np.multiply(a, b) + Out[6]: 'B' + +A simple ``__numpy_ufunc__`` has been added to SciPy's sparse matrices +Currently this only handles ``np.dot`` and ``np.multiply`` because it was the +two most common cases where users would attempt to use sparse matrices with ufuncs. +The method is defined below:: + + def __numpy_ufunc__(self, func, method, pos, inputs, **kwargs): + """Method for compatibility with NumPy's ufuncs and dot + functions. + """ + + without_self = list(inputs) + del without_self[pos] + without_self = tuple(without_self) + + if func == np.multiply: + return self.multiply(*without_self) + + elif func == np.dot: + if pos == 0: + return self.__mul__(inputs[1]) + if pos == 1: + return self.__rmul__(inputs[0]) + else: + return NotImplemented + +So we now get the expected behavior when using ufuncs with sparse matrices.:: + + In [1]: import numpy as np; import scipy.sparse as sp + + In [2]: a = np.random.randint(3, size=(3,3)) + + In [3]: b = np.random.randint(3, size=(3,3)) + + In [4]: asp = sp.csr_matrix(a); bsp = sp.csr_matrix(b) + + In [5]: np.dot(a,b) + Out[5]: + array([[2, 4, 8], + [2, 4, 8], + [2, 2, 3]]) + + In [6]: np.dot(asp,b) + Out[6]: + array([[2, 4, 8], + [2, 4, 8], + [2, 2, 3]], dtype=int64) + + In [7]: np.dot(asp, bsp).A + Out[7]: + array([[2, 4, 8], + [2, 4, 8], + [2, 2, 3]], dtype=int64) + +.. Local Variables: +.. mode: rst +.. coding: utf-8 +.. fill-column: 72 +.. End: + diff --git a/doc/source/reference/arrays.classes.rst b/doc/source/reference/arrays.classes.rst index 5cdadd40e..82f95083e 100644 --- a/doc/source/reference/arrays.classes.rst +++ b/doc/source/reference/arrays.classes.rst @@ -38,6 +38,40 @@ Special attributes and methods Numpy provides several hooks that subclasses of :class:`ndarray` can customize: +.. function:: __numpy_ufunc__(self, ufunc, method, i, inputs, **kwargs) + + Any class (ndarray subclass or not) can define this method to + override behavior of Numpy's ufuncs. This works quite similarly to + Python's ``__mul__`` and other binary operation routines. + + - *ufunc* is the ufunc object that was called. + - *method* is a string indicating which Ufunc method was called + (one of ``"__call__"``, ``"reduce"``, ``"reduceat"``, + ``"accumulate"``, ``"outer"``, ``"inner"``). + - *i* is the index of *self* in *inputs*. + - *inputs* is a tuple of the input arguments to the ``ufunc`` + - *kwargs* is a dictionary containing the optional input arguments + of the ufunc. The ``out`` argument is always contained in + *kwargs*, if given. + + The method should return either the result of the operation, or + :obj:`NotImplemented` if the operation requested is not + implemented. + + If one of the arguments has a :func:`__numpy_ufunc__` method, it is + executed *instead* of the ufunc. If more than one of the input + arguments implements :func:`__numpy_ufunc__`, they are tried in the + order: subclasses before superclasses, otherwise left to right. The + first routine returning something else than :obj:`NotImplemented` + determines the result. If all of the :func:`__numpy_ufunc__` + operations returns :obj:`NotImplemented`, a :exc:`TypeError` is + raised. + + If an :class:`ndarray` subclass defines the :func:`__numpy_ufunc__` + method, this disables the :func:`__array_wrap__`, + :func:`__array_prepare__`, :data:`__array_priority__` mechanism + described below. + .. function:: __array_finalize__(self) This method is called whenever the system internally allocates a |