DOC: Add NEP and documentation for ufunc overrides.

author: Blake Griffith <blake.a.griffith@gmail.com> 2013-08-25 09:35:06 -0500
committer: Blake Griffith <blake.a.griffith@gmail.com> 2013-08-31 16:53:00 -0500
commit: 6fe8eb607127b554195ed25f8636f5caefd477c3 (patch)
tree: 5bc514879285baaa2e928dd8ea16188e007aef88 /doc
parent: 74b6b2cf151c4e869c35e2d226f0d6b69ea9d330 (diff)
download: numpy-6fe8eb607127b554195ed25f8636f5caefd477c3.tar.gz
2 files changed, 276 insertions, 0 deletions
diff --git a/doc/neps/ufunc-overrides.rst b/doc/neps/ufunc-overrides.rst
new file mode 100644
index 000000000..1c0ab1c78
--- /dev/null
+++ b/doc/neps/ufunc-overrides.rst
@@ -0,0 +1,242 @@
+=================================
+A Mechanism for Overriding Ufuncs
+=================================
+
+:Author: Blake Griffith
+:Contact: blake.g@utexa.edu 
+:Date: 2013-07-10
+
+:Author: Pauli Virtanen
+
+:Author: Nathaniel Smith
+
+
+Executive summary
+=================
+
+NumPy's universal functions (ufuncs) currently have some limited
+functionality for operating on user defined subclasses of ndarray using
+``__array_prepare__`` and ``__array_wrap__`` [1]_, and there is little
+to no support for arbitrary objects. e.g. SciPy's sparse matrices [2]_
+[3]_.
+
+Here we propose adding a mechanism to override ufuncs based on the ufunc
+checking each of it's arguments for a ``__numpy_ufunc__`` method.
+On discovery of ``__numpy_ufunc__`` the ufunc will hand off the
+operation to the method. 
+
+This covers some of the same ground as Travis Oliphant's proposal to
+retro-fit NumPy with multi-methods [4]_, which would solve the same
+problem. The mechanism here follows more closely the way Python enables
+classes to override ``__mul__`` and other binary operations.
+
+.. [1] http://docs.scipy.org/doc/numpy/user/basics.subclassing.html
+.. [2] https://github.com/scipy/scipy/issues/2123
+.. [3] https://github.com/scipy/scipy/issues/1569
+.. [4] http://technicaldiscovery.blogspot.com/2013/07/thoughts-after-scipy-2013-and-specific.html
+
+
+Motivation
+==========
+
+The current machinery for dispatching Ufuncs is generally agreed to be
+insufficient. There have been lengthy discussions and other proposed
+solutions [5]_.
+
+Using ufuncs with subclasses of ndarray is limited to ``__array_prepare__`` and
+``__array_wrap__`` to prepare the arguments, but these don't allow you to for
+example change the shape or the data of the arguments. Trying to ufunc things
+that don't subclass ndarray is even more difficult, as the input arguments tend
+to be cast to object arrays, which ends up producing surprising results.
+
+Take this example of ufuncs interoperability with sparse matrices.::
+
+    In [1]: import numpy as np
+    import scipy.sparse as sp
+
+    a = np.random.randint(5, size=(3,3))
+    b = np.random.randint(5, size=(3,3))
+
+    asp = sp.csr_matrix(a)
+    bsp = sp.csr_matrix(b)
+
+    In [2]: a, b
+    Out[2]:(array([[0, 4, 4],
+                   [1, 3, 2],
+                   [1, 3, 1]]),
+            array([[0, 1, 0],
+                   [0, 0, 1],
+                   [4, 0, 1]]))
+
+    In [3]: np.multiply(a, b) # The right answer
+    Out[3]: array([[0, 4, 0],
+                   [0, 0, 2],
+                   [4, 0, 1]])
+
+    In [4]: np.multiply(asp, bsp).todense() # calls __mul__ which does matrix multi
+    Out[4]: matrix([[16,  0,  8],
+                    [ 8,  1,  5],
+                    [ 4,  1,  4]], dtype=int64)
+                    
+    In [5]: np.multiply(a, bsp) # Returns NotImplemented to user, bad!
+    Out[5]: NotImplemted
+
+Returning ``NotImplemented`` to user should not happen. Moreover::
+
+    In [6]: np.multiply(asp, b)
+    Out[6]: array([[ <3x3 sparse matrix of type '<class 'numpy.int64'>'
+                    with 8 stored elements in Compressed Sparse Row format>,
+                        <3x3 sparse matrix of type '<class 'numpy.int64'>'
+                    with 8 stored elements in Compressed Sparse Row format>,
+                        <3x3 sparse matrix of type '<class 'numpy.int64'>'
+                    with 8 stored elements in Compressed Sparse Row format>],
+                       [ <3x3 sparse matrix of type '<class 'numpy.int64'>'
+                    with 8 stored elements in Compressed Sparse Row format>,
+                        <3x3 sparse matrix of type '<class 'numpy.int64'>'
+                    with 8 stored elements in Compressed Sparse Row format>,
+                        <3x3 sparse matrix of type '<class 'numpy.int64'>'
+                    with 8 stored elements in Compressed Sparse Row format>],
+                       [ <3x3 sparse matrix of type '<class 'numpy.int64'>'
+                    with 8 stored elements in Compressed Sparse Row format>,
+                        <3x3 sparse matrix of type '<class 'numpy.int64'>'
+                    with 8 stored elements in Compressed Sparse Row format>,
+                        <3x3 sparse matrix of type '<class 'numpy.int64'>'
+                    with 8 stored elements in Compressed Sparse Row format>]], dtype=object)
+
+Here, it appears that the sparse matrix was converted to a object array
+scalar, which was then multiplied with all elements of the ``b`` array.
+However, this behavior is more confusing than useful, and having a
+``TypeError`` would be preferable.
+
+Adding the ``__numpy_ufunc__`` functionality fixes this and would
+deprecate the other ufunc modifying functions.
+
+.. [5] http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html
+
+
+Proposed interface
+==================
+
+Objects that want to override Ufuncs can define a ``__numpy_ufunc__`` method.
+The method signature is::
+
+    def __numpy_ufunc__(self, ufunc, method, i, inputs, **kwargs)
+
+Here:
+
+- *ufunc* is the ufunc object that was called. 
+- *method* is a string indicating which Ufunc method was called
+  (one of ``"__call__"``, ``"reduce"``, ``"reduceat"``,
+  ``"accumulate"``, ``"outer"``, ``"inner"``). 
+- *i* is the index of *self* in *inputs*.
+- *inputs* is a tuple of the input arguments to the ``ufunc``
+- *kwargs* are the keyword arguments passed to the function. The ``out``
+  argument is always contained in *kwargs*, if given.
+
+The ufunc's arguments are first normalized into a tuple of input data
+(``inputs``), and dict of keyword arguments. If the output argument is
+passed as a positional argument it is moved to the keyword argmunets.
+
+The function dispatch proceeds as follows:
+
+- If one of the input arguments implements ``__numpy_ufunc__`` it is
+  executed instead of the Ufunc.
+
+- If more than one of the input arguments implements ``__numpy_ufunc__``,
+  they are tried in the following order: subclasses before superclasses,
+  otherwise left to right.  The first ``__numpy_ufunc__`` method returning
+  something else than ``NotImplemented`` determines the return value of
+  the Ufunc.
+
+- If all ``__numpy_ufunc__`` methods of the input arguments return
+  ``NotImplemented``, a ``TypeError`` is raised.
+
+- If a ``__numpy_ufunc__`` method raises an error, the error is propagated
+  immediately.
+
+If none of the input arguments has a ``__numpy_ufunc__`` method, the
+execution falls back on the default ufunc behaviour.
+
+
+Demo
+====
+
+A pull request[6]_ has been made including the changes proposed in this NEP.
+Here is a demo highlighting the functionality.::
+
+    In [1]: import numpy as np;
+
+    In [2]: a = np.array([1])
+
+    In [3]: class B():
+       ...:     def __numpy_ufunc__(self, func, method, pos, inputs, **kwargs):
+       ...:         return "B"
+       ...:     
+
+    In [4]: b = B()
+
+    In [5]: np.dot(a, b)
+    Out[5]: 'B'
+
+    In [6]: np.multiply(a, b)
+    Out[6]: 'B'
+
+A simple ``__numpy_ufunc__`` has been added to SciPy's sparse matrices
+Currently this only handles ``np.dot`` and ``np.multiply`` because it was the 
+two most common cases where users would attempt to use sparse matrices with ufuncs.
+The method is defined below::
+
+    def __numpy_ufunc__(self, func, method, pos, inputs, **kwargs):
+        """Method for compatibility with NumPy's ufuncs and dot
+        functions.
+        """
+
+        without_self = list(inputs)
+        del without_self[pos]
+        without_self = tuple(without_self)
+
+        if func == np.multiply:
+            return self.multiply(*without_self)
+
+        elif func == np.dot:
+            if pos == 0:
+                return self.__mul__(inputs[1])
+            if pos == 1:
+                return self.__rmul__(inputs[0])
+        else:
+            return NotImplemented
+
+So we now get the expected behavior when using ufuncs with sparse matrices.::
+
+        In [1]: import numpy as np; import scipy.sparse as sp
+
+        In [2]: a = np.random.randint(3, size=(3,3))
+
+        In [3]: b = np.random.randint(3, size=(3,3))
+
+        In [4]: asp = sp.csr_matrix(a); bsp = sp.csr_matrix(b)
+
+        In [5]: np.dot(a,b)
+        Out[5]: 
+        array([[2, 4, 8],
+               [2, 4, 8],
+                [2, 2, 3]])
+
+        In [6]: np.dot(asp,b)
+        Out[6]: 
+        array([[2, 4, 8],
+               [2, 4, 8],
+               [2, 2, 3]], dtype=int64)
+
+        In [7]: np.dot(asp, bsp).A
+        Out[7]: 
+        array([[2, 4, 8],
+               [2, 4, 8],
+               [2, 2, 3]], dtype=int64)
+                            
+.. Local Variables:
+.. mode: rst
+.. coding: utf-8
+.. fill-column: 72
+.. End:
+
diff --git a/doc/source/reference/arrays.classes.rst b/doc/source/reference/arrays.classes.rst
index 5cdadd40e..82f95083e 100644
--- a/doc/source/reference/arrays.classes.rst
+++ b/doc/source/reference/arrays.classes.rst
@@ -38,6 +38,40 @@ Special attributes and methods
 Numpy provides several hooks that subclasses of :class:`ndarray` can
 customize:
 
+.. function:: __numpy_ufunc__(self, ufunc, method, i, inputs, **kwargs)
+
+   Any class (ndarray subclass or not) can define this method to
+   override behavior of Numpy's ufuncs. This works quite similarly to
+   Python's ``__mul__`` and other binary operation routines.
+
+   - *ufunc* is the ufunc object that was called. 
+   - *method* is a string indicating which Ufunc method was called
+     (one of ``"__call__"``, ``"reduce"``, ``"reduceat"``,
+     ``"accumulate"``, ``"outer"``, ``"inner"``). 
+   - *i* is the index of *self* in *inputs*.
+   - *inputs* is a tuple of the input arguments to the ``ufunc``
+   - *kwargs* is a dictionary containing the optional input arguments
+     of the ufunc. The ``out`` argument is always contained in
+     *kwargs*, if given.
+
+   The method should return either the result of the operation, or
+   :obj:`NotImplemented` if the operation requested is not
+   implemented.
+
+   If one of the arguments has a :func:`__numpy_ufunc__` method, it is
+   executed *instead* of the ufunc.  If more than one of the input
+   arguments implements :func:`__numpy_ufunc__`, they are tried in the
+   order: subclasses before superclasses, otherwise left to right. The
+   first routine returning something else than :obj:`NotImplemented`
+   determines the result. If all of the :func:`__numpy_ufunc__`
+   operations returns :obj:`NotImplemented`, a :exc:`TypeError` is
+   raised.
+
+   If an :class:`ndarray` subclass defines the :func:`__numpy_ufunc__`
+   method, this disables the :func:`__array_wrap__`,
+   :func:`__array_prepare__`, :data:`__array_priority__` mechanism
+   described below.
+
 .. function:: __array_finalize__(self)
 
    This method is called whenever the system internally allocates a
author	Blake Griffith <blake.a.griffith@gmail.com>	2013-08-25 09:35:06 -0500
committer	Blake Griffith <blake.a.griffith@gmail.com>	2013-08-31 16:53:00 -0500
commit	6fe8eb607127b554195ed25f8636f5caefd477c3 (patch)
tree	5bc514879285baaa2e928dd8ea16188e007aef88 /doc
parent	74b6b2cf151c4e869c35e2d226f0d6b69ea9d330 (diff)
download	numpy-6fe8eb607127b554195ed25f8636f5caefd477c3.tar.gz