summaryrefslogtreecommitdiff
path: root/numpy/doc/subclassing.py
diff options
context:
space:
mode:
authorMatthew Brett <matthew.brett@gmail.com>2008-08-27 10:05:41 +0000
committerMatthew Brett <matthew.brett@gmail.com>2008-08-27 10:05:41 +0000
commit4f7b154e57713884b240fd916b536c08033d0be3 (patch)
tree00bee24c66ba7b4ff2947705f0de846243ed9798 /numpy/doc/subclassing.py
parentdfab4520b63b75d7e310a33405716518e2a0b8f8 (diff)
downloadnumpy-4f7b154e57713884b240fd916b536c08033d0be3.tar.gz
Document on subclassing
Diffstat (limited to 'numpy/doc/subclassing.py')
-rw-r--r--numpy/doc/subclassing.py291
1 files changed, 291 insertions, 0 deletions
diff --git a/numpy/doc/subclassing.py b/numpy/doc/subclassing.py
new file mode 100644
index 000000000..fa5f22253
--- /dev/null
+++ b/numpy/doc/subclassing.py
@@ -0,0 +1,291 @@
+"""
+=============================
+Subclassing ndarray in python
+=============================
+
+Credits
+-------
+
+This page is based with thanks on the wiki page on subclassing by Pierre
+Gerard-Marchant - http://www.scipy.org/Subclasses.
+
+Introduction
+------------
+Subclassing ndarray is relatively simple, but you will need to
+understand some behavior of ndarrays to understand some minor
+complications to subclassing. There are examples at the bottom of the
+page, but you will probably want to read the background to understand
+why subclassing works as it does.
+
+ndarrays and object creation
+============================
+The creation of ndarrays is complicated by the need to return views of
+ndarrays, that are also ndarrays. For example::
+
+ >>> import numpy as np
+ >>> arr = np.zeros((3,))
+ >>> type(arr)
+ <type 'numpy.ndarray'>
+ >>> v = arr[1:]
+ >>> type(v)
+ <type 'numpy.ndarray'>
+ >>> v is arr
+ False
+
+So, when we take a view (here a slice) from the ndarray, we return a
+new ndarray, that points to the data in the original. When we
+subclass ndarray, taking a view (such as a slice) needs to return an
+object of our own class. There is machinery to do this, but it is
+this machinery that makes subclassing slightly non-standard.
+
+To allow subclassing, and views of subclasses, ndarray uses the
+ndarray ``__new__`` method for the main work of object initialization,
+rather then the more usual ``__init__`` method.
+
+``__new__`` and ``__init__``
+============================
+
+``__new__`` is a standard python method, and, if present, is called
+before ``__init__`` when we create a class instance. Consider the
+following::
+
+ class C(object):
+ def __new__(cls, *args):
+ print 'Args in __new__:', args
+ return object.__new__(cls, *args)
+ def __init__(self, *args):
+ print 'Args in __init__:', args
+
+ C('hello')
+
+The code gives the following output::
+
+ cls is: <class '__main__.C'>
+ Args in __new__: ('hello',)
+ self is : <__main__.C object at 0xb7dc720c>
+ Args in __init__: ('hello',)
+
+When we call ``C('hello')``, the ``__new__`` method gets its own class
+as first argument, and the passed argument, which is the string
+``'hello'``. After python calls ``__new__``, it usually (see below)
+calls our ``__init__`` method, with the output of ``__new__`` as the
+first argument (now a class instance), and the passed arguments
+following.
+
+As you can see, the object can be initialized in the ``__new__``
+method or the ``__init__`` method, or both, and in fact ndarray does
+not have an ``__init__`` method, because all the initialization is
+done in the ``__new__`` method.
+
+Why use ``__new__`` rather than just the usual ``__init__``? Because
+in some cases, as for ndarray, we want to be able to return an object
+of some other class. Consider the following::
+
+ class C(object):
+ def __new__(cls, *args):
+ print 'cls is:', cls
+ print 'Args in __new__:', args
+ return object.__new__(cls, *args)
+ def __init__(self, *args):
+ print 'self is :', self
+ print 'Args in __init__:', args
+
+ class D(C):
+ def __new__(cls, *args):
+ print 'D cls is:', cls
+ print 'D args in __new__:', args
+ return C.__new__(C, *args)
+ def __init__(self, *args):
+ print 'D self is :', self
+ print 'D args in __init__:', args
+
+ D('hello')
+
+which gives::
+
+ D cls is: <class '__main__.D'>
+ D args in __new__: ('hello',)
+ cls is: <class '__main__.C'>
+ Args in __new__: ('hello',)
+
+The definition of ``C`` is the same as before, but for ``D``, the
+``__new__`` method returns an instance of class ``C`` rather than
+``D``. Note that the ``__init__`` method of ``D`` does not get
+called. In general, when the ``__new__`` method returns an object of
+class other than the class in which it is defined, the ``__init__``
+method of that class is not called.
+
+This is how subclasses of the ndarray class are able to return views
+that preserve the class type. When taking a view, the standard
+ndarray machinery creates the new ndarray object with something
+like::
+
+ obj = ndarray.__new__(subtype, shape, ...
+
+where ``subdtype`` is the subclass. Thus the returned view is of the
+same class as the subclass, rather than being of class ``ndarray``.
+
+That solves the problem of returning views of the same type, but now
+we have a new problem. The machinery of ndarray can set the class
+this way, in its standard methods for taking views, but the ndarray
+``__new__`` method knows nothing of what we have done in our own
+``__new__`` method in order to set attributes, and so on. (Aside -
+why not call ``obj = subdtype.__new__(...`` then? Because we may not
+have a ``__new__`` method with the same call signature).
+
+So, when creating a new view object of our subclass, we need to be
+able to set any extra attributes from the original object of our
+class. This is the role of the ``__array_finalize__`` method of
+ndarray. ``__array_finalize__`` is called from within the
+ndarray machinery, each time we create an ndarray of our own class,
+and passes in the new view object, created as above, as well as the
+old object from which the view has been taken. In it we can take any
+attributes from the old object and put then into the new view object,
+or do any other related processing. Now we are ready for a simple
+example.
+
+Simple example - adding an extra attribute to ndarray
+-----------------------------------------------------
+
+::
+ import numpy as np
+
+ class InfoArray(np.ndarray):
+
+ def __new__(subtype, shape, dtype=float, buffer=None, offset=0,
+ strides=None, order=None, info=None):
+ # Create the ndarray instance of our type, given the usual
+ # input arguments. This will call the standard ndarray
+ # constructor, but return an object of our type
+ obj = np.ndarray.__new__(subtype, shape, dtype, buffer, offset, strides,
+ order)
+ # add the new attribute to the created instance
+ obj.info = info
+ # Finally, we must return the newly created object:
+ return obj
+
+ def __array_finalize__(self,obj):
+ # reset the attribute from passed original object
+ self.info = getattr(obj, 'info', None)
+ # We do not need to return anything
+
+ obj = InfoArray(shape=(3,), info='information')
+ print type(obj)
+ print obj.info
+ v = obj[1:]
+ print type(v)
+ print v.info
+
+which gives::
+
+ <class '__main__.InfoArray'>
+ information
+ <class '__main__.InfoArray'>
+ information
+
+This class isn't very useful, because it has the same constructor as
+the bare ndarray object, including passing in buffers and shapes and
+so on. We would probably prefer to be able to take an already formed
+ndarray from the usual numpy calls to ``np.array`` and return an
+object.
+
+Slightly more realistic example - attribute added to existing array
+-------------------------------------------------------------------
+Here is a class (with thanks to PierreGM for the original example,
+that takes array that already exists, casts as our type, and adds an
+extra attribute::
+
+ import numpy as np
+
+ class RealisticInfoArray(np.ndarray):
+
+ def __new__(cls, input_array, info=None):
+ # Input array is an already formed ndarray instance
+ # We first cast to be our class type
+ obj = np.asarray(input_array).view(cls)
+ # add the new attribute to the created instance
+ obj.info = info
+ # Finally, we must return the newly created object:
+ return obj
+
+ def __array_finalize__(self,obj):
+ # reset the attribute from passed original object
+ self.info = getattr(obj, 'info', None)
+ # We do not need to return anything
+
+ arr = np.arange(5)
+ obj = RealisticInfoArray(arr, info='information')
+ print type(obj)
+ print obj.info
+ v = obj[1:]
+ print type(v)
+ print v.info
+
+which gives::
+
+ <class '__main__.RealisticInfoArray'>
+ information
+ <class '__main__.RealisticInfoArray'>
+ information
+
+``__array_wrap__`` for ufuncs
+-----------------------------
+
+Let's say you have an instance ``obj`` of your new subclass,
+``RealisticInfoArray``, and you pass it into a ufunc with another
+array::
+
+ arr = np.arange(5)
+ ret = np.multiply.outer(arr, obj)
+
+When a numpy ufunc is called on a subclass of ndarray, the
+__array_wrap__ method is called to transform the result into a new
+instance of the subclass. By default, __array_wrap__ will call
+__array_finalize__, and the attributes will be inherited.
+
+By defining a specific __array_wrap__ method for our subclass, we can
+tweak the output. The __array_wrap__ method requires one argument, the
+object on which the ufunc is applied, and an optional parameter
+context. This parameter is returned by some ufuncs as a 3-element
+tuple: (name of the ufunc, argument of the ufunc, domain of the
+ufunc).
+
+Extra gotchas - custom __del__ methods and ndarray.base
+-------------------------------------------------------
+One of the problems that ndarray solves is that of memory ownership of
+ndarrays and their views. Consider the case where we have created an
+ndarray, ``arr`` and then taken a view with ``v = arr[1:]``. If we
+then do ``del v``, we need to make sure that the ``del`` does not
+delete the memory pointed to by the view, because we still need it for
+the original ``arr`` object. Numpy therefore keeps track of where the
+data came from for a particular array or view, with the ``base`` attribute::
+
+ import numpy as np
+
+ # A normal ndarray, that owns its own data
+ arr = np.zeros((4,))
+ # In this case, base is None
+ assert arr.base is None
+ # We take a view
+ v1 = arr[1:]
+ # base now points to the array that it derived from
+ assert v1.base is arr
+ # Take a view of a view
+ v2 = v1[1:]
+ # base points to the view it derived from
+ assert v2.base is v1
+
+The assertions all succeed in this case. In general, if the array
+owns its own memory, as for ``arr`` in this case, then ``arr.base``
+will be None - there are some exceptions to this - see the numpy book
+for more details.
+
+The ``base`` attribute is useful in being able to tell whether we have
+a view or the original array. This in turn can be useful if we need
+to know whether or not to do some specific cleanup when the subclassed
+array is deleted. For example, we may only want to do the cleanup if
+the original array is deleted, but not the views. For an example of
+how this can work, have a look at the ``memmap`` class in
+``numpy.core``.
+
+"""