Document on subclassing

author: Matthew Brett <matthew.brett@gmail.com> 2008-08-27 10:05:41 +0000
committer: Matthew Brett <matthew.brett@gmail.com> 2008-08-27 10:05:41 +0000
commit: 4f7b154e57713884b240fd916b536c08033d0be3 (patch)
tree: 00bee24c66ba7b4ff2947705f0de846243ed9798 /numpy/doc/subclassing.py
parent: dfab4520b63b75d7e310a33405716518e2a0b8f8 (diff)
download: numpy-4f7b154e57713884b240fd916b536c08033d0be3.tar.gz
1 files changed, 291 insertions, 0 deletions
diff --git a/numpy/doc/subclassing.py b/numpy/doc/subclassing.py
new file mode 100644
index 000000000..fa5f22253
--- /dev/null
+++ b/numpy/doc/subclassing.py
@@ -0,0 +1,291 @@
+"""
+=============================
+Subclassing ndarray in python
+=============================
+
+Credits
+-------
+
+This page is based with thanks on the wiki page on subclassing by Pierre
+Gerard-Marchant - http://www.scipy.org/Subclasses. 
+
+Introduction
+------------
+Subclassing ndarray is relatively simple, but you will need to
+understand some behavior of ndarrays to understand some minor
+complications to subclassing.  There are examples at the bottom of the
+page, but you will probably want to read the background to understand
+why subclassing works as it does.
+
+ndarrays and object creation
+============================
+The creation of ndarrays is complicated by the need to return views of
+ndarrays, that are also ndarrays.  For example::
+
+  >>> import numpy as np
+  >>> arr = np.zeros((3,))
+  >>> type(arr)
+  <type 'numpy.ndarray'>
+  >>> v = arr[1:]
+  >>> type(v)
+  <type 'numpy.ndarray'>
+  >>> v is arr
+  False
+
+So, when we take a view (here a slice) from the ndarray, we return a
+new ndarray, that points to the data in the original.  When we
+subclass ndarray, taking a view (such as a slice) needs to return an
+object of our own class.  There is machinery to do this, but it is
+this machinery that makes subclassing slightly non-standard.
+
+To allow subclassing, and views of subclasses, ndarray uses the
+ndarray ``__new__`` method for the main work of object initialization,
+rather then the more usual ``__init__`` method.  
+
+``__new__`` and ``__init__``
+============================
+
+``__new__`` is a standard python method, and, if present, is called
+before ``__init__`` when we create a class instance. Consider the
+following::  
+
+  class C(object):
+      def __new__(cls, *args):
+	  print 'Args in __new__:', args
+	  return object.__new__(cls, *args)
+      def __init__(self, *args):
+	  print 'Args in __init__:', args
+
+  C('hello')
+
+The code gives the following output::
+
+  cls is: <class '__main__.C'>
+  Args in __new__: ('hello',)
+  self is : <__main__.C object at 0xb7dc720c>
+  Args in __init__: ('hello',)
+
+When we call ``C('hello')``, the ``__new__`` method gets its own class
+as first argument, and the passed argument, which is the string
+``'hello'``.  After python calls ``__new__``, it usually (see below)
+calls our ``__init__`` method, with the output of ``__new__`` as the
+first argument (now a class instance), and the passed arguments
+following.
+
+As you can see, the object can be initialized in the ``__new__``
+method or the ``__init__`` method, or both, and in fact ndarray does
+not have an ``__init__`` method, because all the initialization is
+done in the ``__new__`` method. 
+
+Why use ``__new__`` rather than just the usual ``__init__``?  Because
+in some cases, as for ndarray, we want to be able to return an object
+of some other class.  Consider the following::
+
+  class C(object):
+      def __new__(cls, *args):
+	  print 'cls is:', cls
+	  print 'Args in __new__:', args
+	  return object.__new__(cls, *args)
+      def __init__(self, *args):
+	  print 'self is :', self
+	  print 'Args in __init__:', args
+
+  class D(C):
+      def __new__(cls, *args):
+	  print 'D cls is:', cls
+	  print 'D args in __new__:', args
+	  return C.__new__(C, *args)
+      def __init__(self, *args):
+	  print 'D self is :', self
+	  print 'D args in __init__:', args
+
+  D('hello')
+
+which gives::
+
+  D cls is: <class '__main__.D'>
+  D args in __new__: ('hello',)
+  cls is: <class '__main__.C'>
+  Args in __new__: ('hello',)
+
+The definition of ``C`` is the same as before, but for ``D``, the
+``__new__`` method returns an instance of class ``C`` rather than
+``D``.  Note that the ``__init__`` method of ``D`` does not get
+called.  In general, when the ``__new__`` method returns an object of
+class other than the class in which it is defined, the ``__init__``
+method of that class is not called.
+
+This is how subclasses of the ndarray class are able to return views
+that preserve the class type.  When taking a view, the standard
+ndarray machinery creates the new ndarray object with something
+like::
+
+  obj = ndarray.__new__(subtype, shape, ...
+
+where ``subdtype`` is the subclass.  Thus the returned view is of the
+same class as the subclass, rather than being of class ``ndarray``.
+
+That solves the problem of returning views of the same type, but now
+we have a new problem.  The machinery of ndarray can set the class
+this way, in its standard methods for taking views, but the ndarray
+``__new__`` method knows nothing of what we have done in our own
+``__new__`` method in order to set attributes, and so on.  (Aside -
+why not call ``obj = subdtype.__new__(...`` then?  Because we may not
+have a ``__new__`` method with the same call signature).  
+
+So, when creating a new view object of our subclass, we need to be
+able to set any extra attributes from the original object of our
+class. This is the role of the ``__array_finalize__`` method of
+ndarray.  ``__array_finalize__`` is called from within the
+ndarray machinery, each time we create an ndarray of our own class,
+and passes in the new view object, created as above, as well as the
+old object from which the view has been taken.  In it we can take any
+attributes from the old object and put then into the new view object,
+or do any other related processing.  Now we are ready for a simple
+example.
+
+Simple example - adding an extra attribute to ndarray
+-----------------------------------------------------
+
+::
+  import numpy as np
+
+  class InfoArray(np.ndarray):
+
+      def __new__(subtype, shape, dtype=float, buffer=None, offset=0,
+	    strides=None, order=None, info=None):
+	  # Create the ndarray instance of our type, given the usual
+	  # input arguments.  This will call the standard ndarray
+	  # constructor, but return an object of our type
+	  obj = np.ndarray.__new__(subtype, shape, dtype, buffer, offset, strides,
+			   order)
+	  # add the new attribute to the created instance
+	  obj.info = info
+	  # Finally, we must return the newly created object:
+	  return obj
+
+      def __array_finalize__(self,obj):
+	  # reset the attribute from passed original object
+	  self.info = getattr(obj, 'info', None)
+	  # We do not need to return anything
+
+  obj = InfoArray(shape=(3,), info='information')
+  print type(obj)
+  print obj.info
+  v = obj[1:]
+  print type(v)
+  print v.info
+
+which gives::
+
+  <class '__main__.InfoArray'>
+  information
+  <class '__main__.InfoArray'>
+  information
+
+This class isn't very useful, because it has the same constructor as
+the bare ndarray object, including passing in buffers and shapes and
+so on.   We would probably prefer to be able to take an already formed
+ndarray from the usual numpy calls to ``np.array`` and return an
+object.
+
+Slightly more realistic example - attribute added to existing array
+-------------------------------------------------------------------
+Here is a class (with thanks to PierreGM for the original example,
+that takes array that already exists, casts as our type, and adds an
+extra attribute::
+
+  import numpy as np
+
+  class RealisticInfoArray(np.ndarray):
+
+      def __new__(cls, input_array, info=None):
+	  # Input array is an already formed ndarray instance
+	  # We first cast to be our class type
+	  obj = np.asarray(input_array).view(cls)
+	  # add the new attribute to the created instance
+	  obj.info = info
+	  # Finally, we must return the newly created object:
+	  return obj
+
+      def __array_finalize__(self,obj):
+	  # reset the attribute from passed original object
+	  self.info = getattr(obj, 'info', None)
+	  # We do not need to return anything
+
+  arr = np.arange(5)
+  obj = RealisticInfoArray(arr, info='information')
+  print type(obj)
+  print obj.info
+  v = obj[1:]
+  print type(v)
+  print v.info
+
+which gives::
+
+  <class '__main__.RealisticInfoArray'>
+  information
+  <class '__main__.RealisticInfoArray'>
+  information
+
+``__array_wrap__`` for ufuncs
+-----------------------------
+
+Let's say you have an instance ``obj`` of your new subclass,
+``RealisticInfoArray``, and you pass it into a ufunc with another
+array::
+
+  arr = np.arange(5)
+  ret = np.multiply.outer(arr, obj)
+
+When a numpy ufunc is called on a subclass of ndarray, the
+__array_wrap__ method is called to transform the result into a new
+instance of the subclass. By default, __array_wrap__ will call
+__array_finalize__, and the attributes will be inherited.
+
+By defining a specific __array_wrap__ method for our subclass, we can
+tweak the output. The __array_wrap__ method requires one argument, the
+object on which the ufunc is applied, and an optional parameter
+context. This parameter is returned by some ufuncs as a 3-element
+tuple: (name of the ufunc, argument of the ufunc, domain of the
+ufunc).
+
+Extra gotchas - custom __del__ methods and ndarray.base
+-------------------------------------------------------
+One of the problems that ndarray solves is that of memory ownership of
+ndarrays and their views.  Consider the case where we have created an
+ndarray, ``arr`` and then taken a view with ``v = arr[1:]``.  If we
+then do ``del v``, we need to make sure that the ``del`` does not
+delete the memory pointed to by the view, because we still need it for
+the original ``arr`` object.  Numpy therefore keeps track of where the
+data came from for a particular array or view, with the ``base`` attribute::
+
+  import numpy as np
+
+  # A normal ndarray, that owns its own data
+  arr = np.zeros((4,))
+  # In this case, base is None
+  assert arr.base is None
+  # We take a view
+  v1 = arr[1:]
+  # base now points to the array that it derived from
+  assert v1.base is arr
+  # Take a view of a view
+  v2 = v1[1:]
+  # base points to the view it derived from
+  assert v2.base is v1
+
+The assertions all succeed in this case.  In general, if the array
+owns its own memory, as for ``arr`` in this case, then ``arr.base``
+will be None - there are some exceptions to this - see the numpy book
+for more details.
+
+The ``base`` attribute is useful in being able to tell whether we have
+a view or the original array.  This in turn can be useful if we need
+to know whether or not to do some specific cleanup when the subclassed
+array is deleted.  For example, we may only want to do the cleanup if
+the original array is deleted, but not the views.  For an example of
+how this can work, have a look at the ``memmap`` class in
+``numpy.core``.
+
+"""
author	Matthew Brett <matthew.brett@gmail.com>	2008-08-27 10:05:41 +0000
committer	Matthew Brett <matthew.brett@gmail.com>	2008-08-27 10:05:41 +0000
commit	4f7b154e57713884b240fd916b536c08033d0be3 (patch)
tree	00bee24c66ba7b4ff2947705f0de846243ed9798 /numpy/doc/subclassing.py
parent	dfab4520b63b75d7e310a33405716518e2a0b8f8 (diff)
download	numpy-4f7b154e57713884b240fd916b536c08033d0be3.tar.gz