diff options
author | Matti Picus <matti.picus@gmail.com> | 2020-09-02 21:01:35 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2020-09-02 13:01:35 -0500 |
commit | 1f8ce6341159ebb0731c2c262f4576609210d2c8 (patch) | |
tree | aa10443358243366d776ad669c0cbcd383ed3634 /doc/source/user | |
parent | e3c84a44b68966ab887a3623a0ff57169e508deb (diff) | |
download | numpy-1f8ce6341159ebb0731c2c262f4576609210d2c8.tar.gz |
MAINT, DOC: move informational files from numpy.doc.*.py to their *.rst counterparts (#17222)
* DOC: redistribute docstring-only content from numpy/doc
* DOC: post-transition clean-up
* DOC, MAINT: reskip doctests, fix a few easy ones
Diffstat (limited to 'doc/source/user')
-rw-r--r-- | doc/source/user/basics.broadcasting.rst | 176 | ||||
-rw-r--r-- | doc/source/user/basics.byteswapping.rst | 150 | ||||
-rw-r--r-- | doc/source/user/basics.creation.rst | 139 | ||||
-rw-r--r-- | doc/source/user/basics.dispatch.rst | 266 | ||||
-rw-r--r-- | doc/source/user/basics.indexing.rst | 452 | ||||
-rw-r--r-- | doc/source/user/basics.io.genfromtxt.rst | 26 | ||||
-rw-r--r-- | doc/source/user/basics.rec.rst | 643 | ||||
-rw-r--r-- | doc/source/user/basics.subclassing.rst | 749 | ||||
-rw-r--r-- | doc/source/user/basics.types.rst | 337 | ||||
-rw-r--r-- | doc/source/user/misc.rst | 222 | ||||
-rw-r--r-- | doc/source/user/whatisnumpy.rst | 2 |
11 files changed, 3138 insertions, 24 deletions
diff --git a/doc/source/user/basics.broadcasting.rst b/doc/source/user/basics.broadcasting.rst index 00bf17a41..5eae3eb32 100644 --- a/doc/source/user/basics.broadcasting.rst +++ b/doc/source/user/basics.broadcasting.rst @@ -10,4 +10,178 @@ Broadcasting :ref:`array-broadcasting-in-numpy` An introduction to the concepts discussed here -.. automodule:: numpy.doc.broadcasting +.. note:: + See `this article + <https://numpy.org/devdocs/user/theory.broadcasting.html>`_ + for illustrations of broadcasting concepts. + + +The term broadcasting describes how numpy treats arrays with different +shapes during arithmetic operations. Subject to certain constraints, +the smaller array is "broadcast" across the larger array so that they +have compatible shapes. Broadcasting provides a means of vectorizing +array operations so that looping occurs in C instead of Python. It does +this without making needless copies of data and usually leads to +efficient algorithm implementations. There are, however, cases where +broadcasting is a bad idea because it leads to inefficient use of memory +that slows computation. + +NumPy operations are usually done on pairs of arrays on an +element-by-element basis. In the simplest case, the two arrays must +have exactly the same shape, as in the following example: + + >>> a = np.array([1.0, 2.0, 3.0]) + >>> b = np.array([2.0, 2.0, 2.0]) + >>> a * b + array([ 2., 4., 6.]) + +NumPy's broadcasting rule relaxes this constraint when the arrays' +shapes meet certain constraints. The simplest broadcasting example occurs +when an array and a scalar value are combined in an operation: + +>>> a = np.array([1.0, 2.0, 3.0]) +>>> b = 2.0 +>>> a * b +array([ 2., 4., 6.]) + +The result is equivalent to the previous example where ``b`` was an array. +We can think of the scalar ``b`` being *stretched* during the arithmetic +operation into an array with the same shape as ``a``. The new elements in +``b`` are simply copies of the original scalar. The stretching analogy is +only conceptual. NumPy is smart enough to use the original scalar value +without actually making copies so that broadcasting operations are as +memory and computationally efficient as possible. + +The code in the second example is more efficient than that in the first +because broadcasting moves less memory around during the multiplication +(``b`` is a scalar rather than an array). + +General Broadcasting Rules +========================== +When operating on two arrays, NumPy compares their shapes element-wise. +It starts with the trailing (i.e. rightmost) dimensions and works its +way left. Two dimensions are compatible when + +1) they are equal, or +2) one of them is 1 + +If these conditions are not met, a +``ValueError: operands could not be broadcast together`` exception is +thrown, indicating that the arrays have incompatible shapes. The size of +the resulting array is the size that is not 1 along each axis of the inputs. + +Arrays do not need to have the same *number* of dimensions. For example, +if you have a ``256x256x3`` array of RGB values, and you want to scale +each color in the image by a different value, you can multiply the image +by a one-dimensional array with 3 values. Lining up the sizes of the +trailing axes of these arrays according to the broadcast rules, shows that +they are compatible:: + + Image (3d array): 256 x 256 x 3 + Scale (1d array): 3 + Result (3d array): 256 x 256 x 3 + +When either of the dimensions compared is one, the other is +used. In other words, dimensions with size 1 are stretched or "copied" +to match the other. + +In the following example, both the ``A`` and ``B`` arrays have axes with +length one that are expanded to a larger size during the broadcast +operation:: + + A (4d array): 8 x 1 x 6 x 1 + B (3d array): 7 x 1 x 5 + Result (4d array): 8 x 7 x 6 x 5 + +Here are some more examples:: + + A (2d array): 5 x 4 + B (1d array): 1 + Result (2d array): 5 x 4 + + A (2d array): 5 x 4 + B (1d array): 4 + Result (2d array): 5 x 4 + + A (3d array): 15 x 3 x 5 + B (3d array): 15 x 1 x 5 + Result (3d array): 15 x 3 x 5 + + A (3d array): 15 x 3 x 5 + B (2d array): 3 x 5 + Result (3d array): 15 x 3 x 5 + + A (3d array): 15 x 3 x 5 + B (2d array): 3 x 1 + Result (3d array): 15 x 3 x 5 + +Here are examples of shapes that do not broadcast:: + + A (1d array): 3 + B (1d array): 4 # trailing dimensions do not match + + A (2d array): 2 x 1 + B (3d array): 8 x 4 x 3 # second from last dimensions mismatched + +An example of broadcasting in practice:: + + >>> x = np.arange(4) + >>> xx = x.reshape(4,1) + >>> y = np.ones(5) + >>> z = np.ones((3,4)) + + >>> x.shape + (4,) + + >>> y.shape + (5,) + + >>> x + y + ValueError: operands could not be broadcast together with shapes (4,) (5,) + + >>> xx.shape + (4, 1) + + >>> y.shape + (5,) + + >>> (xx + y).shape + (4, 5) + + >>> xx + y + array([[ 1., 1., 1., 1., 1.], + [ 2., 2., 2., 2., 2.], + [ 3., 3., 3., 3., 3.], + [ 4., 4., 4., 4., 4.]]) + + >>> x.shape + (4,) + + >>> z.shape + (3, 4) + + >>> (x + z).shape + (3, 4) + + >>> x + z + array([[ 1., 2., 3., 4.], + [ 1., 2., 3., 4.], + [ 1., 2., 3., 4.]]) + +Broadcasting provides a convenient way of taking the outer product (or +any other outer operation) of two arrays. The following example shows an +outer addition operation of two 1-d arrays:: + + >>> a = np.array([0.0, 10.0, 20.0, 30.0]) + >>> b = np.array([1.0, 2.0, 3.0]) + >>> a[:, np.newaxis] + b + array([[ 1., 2., 3.], + [ 11., 12., 13.], + [ 21., 22., 23.], + [ 31., 32., 33.]]) + +Here the ``newaxis`` index operator inserts a new axis into ``a``, +making it a two-dimensional ``4x1`` array. Combining the ``4x1`` array +with ``b``, which has shape ``(3,)``, yields a ``4x3`` array. + + diff --git a/doc/source/user/basics.byteswapping.rst b/doc/source/user/basics.byteswapping.rst index 4b1008df3..fecdb9ee8 100644 --- a/doc/source/user/basics.byteswapping.rst +++ b/doc/source/user/basics.byteswapping.rst @@ -2,4 +2,152 @@ Byte-swapping ************* -.. automodule:: numpy.doc.byteswapping +Introduction to byte ordering and ndarrays +========================================== + +The ``ndarray`` is an object that provide a python array interface to data +in memory. + +It often happens that the memory that you want to view with an array is +not of the same byte ordering as the computer on which you are running +Python. + +For example, I might be working on a computer with a little-endian CPU - +such as an Intel Pentium, but I have loaded some data from a file +written by a computer that is big-endian. Let's say I have loaded 4 +bytes from a file written by a Sun (big-endian) computer. I know that +these 4 bytes represent two 16-bit integers. On a big-endian machine, a +two-byte integer is stored with the Most Significant Byte (MSB) first, +and then the Least Significant Byte (LSB). Thus the bytes are, in memory order: + +#. MSB integer 1 +#. LSB integer 1 +#. MSB integer 2 +#. LSB integer 2 + +Let's say the two integers were in fact 1 and 770. Because 770 = 256 * +3 + 2, the 4 bytes in memory would contain respectively: 0, 1, 3, 2. +The bytes I have loaded from the file would have these contents: + +>>> big_end_buffer = bytearray([0,1,3,2]) +>>> big_end_buffer +bytearray(b'\\x00\\x01\\x03\\x02') + +We might want to use an ``ndarray`` to access these integers. In that +case, we can create an array around this memory, and tell numpy that +there are two integers, and that they are 16 bit and big-endian: + +>>> import numpy as np +>>> big_end_arr = np.ndarray(shape=(2,),dtype='>i2', buffer=big_end_buffer) +>>> big_end_arr[0] +1 +>>> big_end_arr[1] +770 + +Note the array ``dtype`` above of ``>i2``. The ``>`` means 'big-endian' +(``<`` is little-endian) and ``i2`` means 'signed 2-byte integer'. For +example, if our data represented a single unsigned 4-byte little-endian +integer, the dtype string would be ``<u4``. + +In fact, why don't we try that? + +>>> little_end_u4 = np.ndarray(shape=(1,),dtype='<u4', buffer=big_end_buffer) +>>> little_end_u4[0] == 1 * 256**1 + 3 * 256**2 + 2 * 256**3 +True + +Returning to our ``big_end_arr`` - in this case our underlying data is +big-endian (data endianness) and we've set the dtype to match (the dtype +is also big-endian). However, sometimes you need to flip these around. + +.. warning:: + + Scalars currently do not include byte order information, so extracting + a scalar from an array will return an integer in native byte order. + Hence: + + >>> big_end_arr[0].dtype.byteorder == little_end_u4[0].dtype.byteorder + True + +Changing byte ordering +====================== + +As you can imagine from the introduction, there are two ways you can +affect the relationship between the byte ordering of the array and the +underlying memory it is looking at: + +* Change the byte-ordering information in the array dtype so that it + interprets the underlying data as being in a different byte order. + This is the role of ``arr.newbyteorder()`` +* Change the byte-ordering of the underlying data, leaving the dtype + interpretation as it was. This is what ``arr.byteswap()`` does. + +The common situations in which you need to change byte ordering are: + +#. Your data and dtype endianness don't match, and you want to change + the dtype so that it matches the data. +#. Your data and dtype endianness don't match, and you want to swap the + data so that they match the dtype +#. Your data and dtype endianness match, but you want the data swapped + and the dtype to reflect this + +Data and dtype endianness don't match, change dtype to match data +----------------------------------------------------------------- + +We make something where they don't match: + +>>> wrong_end_dtype_arr = np.ndarray(shape=(2,),dtype='<i2', buffer=big_end_buffer) +>>> wrong_end_dtype_arr[0] +256 + +The obvious fix for this situation is to change the dtype so it gives +the correct endianness: + +>>> fixed_end_dtype_arr = wrong_end_dtype_arr.newbyteorder() +>>> fixed_end_dtype_arr[0] +1 + +Note the array has not changed in memory: + +>>> fixed_end_dtype_arr.tobytes() == big_end_buffer +True + +Data and type endianness don't match, change data to match dtype +---------------------------------------------------------------- + +You might want to do this if you need the data in memory to be a certain +ordering. For example you might be writing the memory out to a file +that needs a certain byte ordering. + +>>> fixed_end_mem_arr = wrong_end_dtype_arr.byteswap() +>>> fixed_end_mem_arr[0] +1 + +Now the array *has* changed in memory: + +>>> fixed_end_mem_arr.tobytes() == big_end_buffer +False + +Data and dtype endianness match, swap data and dtype +---------------------------------------------------- + +You may have a correctly specified array dtype, but you need the array +to have the opposite byte order in memory, and you want the dtype to +match so the array values make sense. In this case you just do both of +the previous operations: + +>>> swapped_end_arr = big_end_arr.byteswap().newbyteorder() +>>> swapped_end_arr[0] +1 +>>> swapped_end_arr.tobytes() == big_end_buffer +False + +An easier way of casting the data to a specific dtype and byte ordering +can be achieved with the ndarray astype method: + +>>> swapped_end_arr = big_end_arr.astype('<i2') +>>> swapped_end_arr[0] +1 +>>> swapped_end_arr.tobytes() == big_end_buffer +False + + diff --git a/doc/source/user/basics.creation.rst b/doc/source/user/basics.creation.rst index b3fa81017..671a8ec59 100644 --- a/doc/source/user/basics.creation.rst +++ b/doc/source/user/basics.creation.rst @@ -6,4 +6,141 @@ Array creation .. seealso:: :ref:`Array creation routines <routines.array-creation>` -.. automodule:: numpy.doc.creation +Introduction +============ + +There are 5 general mechanisms for creating arrays: + +1) Conversion from other Python structures (e.g., lists, tuples) +2) Intrinsic numpy array creation objects (e.g., arange, ones, zeros, + etc.) +3) Reading arrays from disk, either from standard or custom formats +4) Creating arrays from raw bytes through the use of strings or buffers +5) Use of special library functions (e.g., random) + +This section will not cover means of replicating, joining, or otherwise +expanding or mutating existing arrays. Nor will it cover creating object +arrays or structured arrays. Both of those are covered in their own sections. + +Converting Python array_like Objects to NumPy Arrays +==================================================== + +In general, numerical data arranged in an array-like structure in Python can +be converted to arrays through the use of the array() function. The most +obvious examples are lists and tuples. See the documentation for array() for +details for its use. Some objects may support the array-protocol and allow +conversion to arrays this way. A simple way to find out if the object can be +converted to a numpy array using array() is simply to try it interactively and +see if it works! (The Python Way). + +Examples: :: + + >>> x = np.array([2,3,1,0]) + >>> x = np.array([2, 3, 1, 0]) + >>> x = np.array([[1,2.0],[0,0],(1+1j,3.)]) # note mix of tuple and lists, + and types + >>> x = np.array([[ 1.+0.j, 2.+0.j], [ 0.+0.j, 0.+0.j], [ 1.+1.j, 3.+0.j]]) + +Intrinsic NumPy Array Creation +============================== + +NumPy has built-in functions for creating arrays from scratch: + +zeros(shape) will create an array filled with 0 values with the specified +shape. The default dtype is float64. :: + + >>> np.zeros((2, 3)) + array([[ 0., 0., 0.], [ 0., 0., 0.]]) + +ones(shape) will create an array filled with 1 values. It is identical to +zeros in all other respects. + +arange() will create arrays with regularly incrementing values. Check the +docstring for complete information on the various ways it can be used. A few +examples will be given here: :: + + >>> np.arange(10) + array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) + >>> np.arange(2, 10, dtype=float) + array([ 2., 3., 4., 5., 6., 7., 8., 9.]) + >>> np.arange(2, 3, 0.1) + array([ 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9]) + +Note that there are some subtleties regarding the last usage that the user +should be aware of that are described in the arange docstring. + +linspace() will create arrays with a specified number of elements, and +spaced equally between the specified beginning and end values. For +example: :: + + >>> np.linspace(1., 4., 6) + array([ 1. , 1.6, 2.2, 2.8, 3.4, 4. ]) + +The advantage of this creation function is that one can guarantee the +number of elements and the starting and end point, which arange() +generally will not do for arbitrary start, stop, and step values. + +indices() will create a set of arrays (stacked as a one-higher dimensioned +array), one per dimension with each representing variation in that dimension. +An example illustrates much better than a verbal description: :: + + >>> np.indices((3,3)) + array([[[0, 0, 0], [1, 1, 1], [2, 2, 2]], [[0, 1, 2], [0, 1, 2], [0, 1, 2]]]) + +This is particularly useful for evaluating functions of multiple dimensions on +a regular grid. + +Reading Arrays From Disk +======================== + +This is presumably the most common case of large array creation. The details, +of course, depend greatly on the format of data on disk and so this section +can only give general pointers on how to handle various formats. + +Standard Binary Formats +----------------------- + +Various fields have standard formats for array data. The following lists the +ones with known python libraries to read them and return numpy arrays (there +may be others for which it is possible to read and convert to numpy arrays so +check the last section as well) +:: + + HDF5: h5py + FITS: Astropy + +Examples of formats that cannot be read directly but for which it is not hard to +convert are those formats supported by libraries like PIL (able to read and +write many image formats such as jpg, png, etc). + +Common ASCII Formats +------------------------ + +Comma Separated Value files (CSV) are widely used (and an export and import +option for programs like Excel). There are a number of ways of reading these +files in Python. There are CSV functions in Python and functions in pylab +(part of matplotlib). + +More generic ascii files can be read using the io package in scipy. + +Custom Binary Formats +--------------------- + +There are a variety of approaches one can use. If the file has a relatively +simple format then one can write a simple I/O library and use the numpy +fromfile() function and .tofile() method to read and write numpy arrays +directly (mind your byteorder though!) If a good C or C++ library exists that +read the data, one can wrap that library with a variety of techniques though +that certainly is much more work and requires significantly more advanced +knowledge to interface with C or C++. + +Use of Special Libraries +------------------------ + +There are libraries that can be used to generate arrays for special purposes +and it isn't possible to enumerate all of them. The most common uses are use +of the many array generation functions in random that can generate arrays of +random values, and some utility functions to generate special matrices (e.g. +diagonal). + + diff --git a/doc/source/user/basics.dispatch.rst b/doc/source/user/basics.dispatch.rst index f7b8da262..c0e1cf9ba 100644 --- a/doc/source/user/basics.dispatch.rst +++ b/doc/source/user/basics.dispatch.rst @@ -4,5 +4,269 @@ Writing custom array containers ******************************* -.. automodule:: numpy.doc.dispatch +Numpy's dispatch mechanism, introduced in numpy version v1.16 is the +recommended approach for writing custom N-dimensional array containers that are +compatible with the numpy API and provide custom implementations of numpy +functionality. Applications include `dask <http://dask.pydata.org>`_ arrays, an +N-dimensional array distributed across multiple nodes, and `cupy +<https://docs-cupy.chainer.org/en/stable/>`_ arrays, an N-dimensional array on +a GPU. + +To get a feel for writing custom array containers, we'll begin with a simple +example that has rather narrow utility but illustrates the concepts involved. + +>>> import numpy as np +>>> class DiagonalArray: +... def __init__(self, N, value): +... self._N = N +... self._i = value +... def __repr__(self): +... return f"{self.__class__.__name__}(N={self._N}, value={self._i})" +... def __array__(self): +... return self._i * np.eye(self._N) + +Our custom array can be instantiated like: + +>>> arr = DiagonalArray(5, 1) +>>> arr +DiagonalArray(N=5, value=1) + +We can convert to a numpy array using :func:`numpy.array` or +:func:`numpy.asarray`, which will call its ``__array__`` method to obtain a +standard ``numpy.ndarray``. + +>>> np.asarray(arr) +array([[1., 0., 0., 0., 0.], + [0., 1., 0., 0., 0.], + [0., 0., 1., 0., 0.], + [0., 0., 0., 1., 0.], + [0., 0., 0., 0., 1.]]) + +If we operate on ``arr`` with a numpy function, numpy will again use the +``__array__`` interface to convert it to an array and then apply the function +in the usual way. + +>>> np.multiply(arr, 2) +array([[2., 0., 0., 0., 0.], + [0., 2., 0., 0., 0.], + [0., 0., 2., 0., 0.], + [0., 0., 0., 2., 0.], + [0., 0., 0., 0., 2.]]) + + +Notice that the return type is a standard ``numpy.ndarray``. + +>>> type(arr) +numpy.ndarray + +How can we pass our custom array type through this function? Numpy allows a +class to indicate that it would like to handle computations in a custom-defined +way through the interfaces ``__array_ufunc__`` and ``__array_function__``. Let's +take one at a time, starting with ``_array_ufunc__``. This method covers +:ref:`ufuncs`, a class of functions that includes, for example, +:func:`numpy.multiply` and :func:`numpy.sin`. + +The ``__array_ufunc__`` receives: + +- ``ufunc``, a function like ``numpy.multiply`` +- ``method``, a string, differentiating between ``numpy.multiply(...)`` and + variants like ``numpy.multiply.outer``, ``numpy.multiply.accumulate``, and so + on. For the common case, ``numpy.multiply(...)``, ``method == '__call__'``. +- ``inputs``, which could be a mixture of different types +- ``kwargs``, keyword arguments passed to the function + +For this example we will only handle the method ``__call__`` + +>>> from numbers import Number +>>> class DiagonalArray: +... def __init__(self, N, value): +... self._N = N +... self._i = value +... def __repr__(self): +... return f"{self.__class__.__name__}(N={self._N}, value={self._i})" +... def __array__(self): +... return self._i * np.eye(self._N) +... def __array_ufunc__(self, ufunc, method, *inputs, **kwargs): +... if method == '__call__': +... N = None +... scalars = [] +... for input in inputs: +... if isinstance(input, Number): +... scalars.append(input) +... elif isinstance(input, self.__class__): +... scalars.append(input._i) +... if N is not None: +... if N != self._N: +... raise TypeError("inconsistent sizes") +... else: +... N = self._N +... else: +... return NotImplemented +... return self.__class__(N, ufunc(*scalars, **kwargs)) +... else: +... return NotImplemented + +Now our custom array type passes through numpy functions. + +>>> arr = DiagonalArray(5, 1) +>>> np.multiply(arr, 3) +DiagonalArray(N=5, value=3) +>>> np.add(arr, 3) +DiagonalArray(N=5, value=4) +>>> np.sin(arr) +DiagonalArray(N=5, value=0.8414709848078965) + +At this point ``arr + 3`` does not work. + +>>> arr + 3 +TypeError: unsupported operand type(s) for *: 'DiagonalArray' and 'int' + +To support it, we need to define the Python interfaces ``__add__``, ``__lt__``, +and so on to dispatch to the corresponding ufunc. We can achieve this +conveniently by inheriting from the mixin +:class:`~numpy.lib.mixins.NDArrayOperatorsMixin`. + +>>> import numpy.lib.mixins +>>> class DiagonalArray(numpy.lib.mixins.NDArrayOperatorsMixin): +... def __init__(self, N, value): +... self._N = N +... self._i = value +... def __repr__(self): +... return f"{self.__class__.__name__}(N={self._N}, value={self._i})" +... def __array__(self): +... return self._i * np.eye(self._N) +... def __array_ufunc__(self, ufunc, method, *inputs, **kwargs): +... if method == '__call__': +... N = None +... scalars = [] +... for input in inputs: +... if isinstance(input, Number): +... scalars.append(input) +... elif isinstance(input, self.__class__): +... scalars.append(input._i) +... if N is not None: +... if N != self._N: +... raise TypeError("inconsistent sizes") +... else: +... N = self._N +... else: +... return NotImplemented +... return self.__class__(N, ufunc(*scalars, **kwargs)) +... else: +... return NotImplemented + +>>> arr = DiagonalArray(5, 1) +>>> arr + 3 +DiagonalArray(N=5, value=4) +>>> arr > 0 +DiagonalArray(N=5, value=True) + +Now let's tackle ``__array_function__``. We'll create dict that maps numpy +functions to our custom variants. + +>>> HANDLED_FUNCTIONS = {} +>>> class DiagonalArray(numpy.lib.mixins.NDArrayOperatorsMixin): +... def __init__(self, N, value): +... self._N = N +... self._i = value +... def __repr__(self): +... return f"{self.__class__.__name__}(N={self._N}, value={self._i})" +... def __array__(self): +... return self._i * np.eye(self._N) +... def __array_ufunc__(self, ufunc, method, *inputs, **kwargs): +... if method == '__call__': +... N = None +... scalars = [] +... for input in inputs: +... # In this case we accept only scalar numbers or DiagonalArrays. +... if isinstance(input, Number): +... scalars.append(input) +... elif isinstance(input, self.__class__): +... scalars.append(input._i) +... if N is not None: +... if N != self._N: +... raise TypeError("inconsistent sizes") +... else: +... N = self._N +... else: +... return NotImplemented +... return self.__class__(N, ufunc(*scalars, **kwargs)) +... else: +... return NotImplemented +... def __array_function__(self, func, types, args, kwargs): +... if func not in HANDLED_FUNCTIONS: +... return NotImplemented +... # Note: this allows subclasses that don't override +... # __array_function__ to handle DiagonalArray objects. +... if not all(issubclass(t, self.__class__) for t in types): +... return NotImplemented +... return HANDLED_FUNCTIONS[func](*args, **kwargs) +... + +A convenient pattern is to define a decorator ``implements`` that can be used +to add functions to ``HANDLED_FUNCTIONS``. + +>>> def implements(np_function): +... "Register an __array_function__ implementation for DiagonalArray objects." +... def decorator(func): +... HANDLED_FUNCTIONS[np_function] = func +... return func +... return decorator +... + +Now we write implementations of numpy functions for ``DiagonalArray``. +For completeness, to support the usage ``arr.sum()`` add a method ``sum`` that +calls ``numpy.sum(self)``, and the same for ``mean``. + +>>> @implements(np.sum) +... def sum(arr): +... "Implementation of np.sum for DiagonalArray objects" +... return arr._i * arr._N +... +>>> @implements(np.mean) +... def mean(arr): +... "Implementation of np.mean for DiagonalArray objects" +... return arr._i / arr._N +... +>>> arr = DiagonalArray(5, 1) +>>> np.sum(arr) +5 +>>> np.mean(arr) +0.2 + +If the user tries to use any numpy functions not included in +``HANDLED_FUNCTIONS``, a ``TypeError`` will be raised by numpy, indicating that +this operation is not supported. For example, concatenating two +``DiagonalArrays`` does not produce another diagonal array, so it is not +supported. + +>>> np.concatenate([arr, arr]) +TypeError: no implementation found for 'numpy.concatenate' on types that implement __array_function__: [<class '__main__.DiagonalArray'>] + +Additionally, our implementations of ``sum`` and ``mean`` do not accept the +optional arguments that numpy's implementation does. + +>>> np.sum(arr, axis=0) +TypeError: sum() got an unexpected keyword argument 'axis' + +The user always has the option of converting to a normal ``numpy.ndarray`` with +:func:`numpy.asarray` and using standard numpy from there. + +>>> np.concatenate([np.asarray(arr), np.asarray(arr)]) +array([[1., 0., 0., 0., 0.], + [0., 1., 0., 0., 0.], + [0., 0., 1., 0., 0.], + [0., 0., 0., 1., 0.], + [0., 0., 0., 0., 1.], + [1., 0., 0., 0., 0.], + [0., 1., 0., 0., 0.], + [0., 0., 1., 0., 0.], + [0., 0., 0., 1., 0.], + [0., 0., 0., 0., 1.]]) + +Refer to the `dask source code <https://github.com/dask/dask>`_ and +`cupy source code <https://github.com/cupy/cupy>`_ for more fully-worked +examples of custom array containers. + +See also :doc:`NEP 18<neps:nep-0018-array-function-protocol>`. diff --git a/doc/source/user/basics.indexing.rst b/doc/source/user/basics.indexing.rst index 0dca4b884..9545bb78c 100644 --- a/doc/source/user/basics.indexing.rst +++ b/doc/source/user/basics.indexing.rst @@ -10,4 +10,454 @@ Indexing :ref:`Indexing routines <routines.indexing>` -.. automodule:: numpy.doc.indexing +Array indexing refers to any use of the square brackets ([]) to index +array values. There are many options to indexing, which give numpy +indexing great power, but with power comes some complexity and the +potential for confusion. This section is just an overview of the +various options and issues related to indexing. Aside from single +element indexing, the details on most of these options are to be +found in related sections. + +Assignment vs referencing +========================= + +Most of the following examples show the use of indexing when +referencing data in an array. The examples work just as well +when assigning to an array. See the section at the end for +specific examples and explanations on how assignments work. + +Single element indexing +======================= + +Single element indexing for a 1-D array is what one expects. It work +exactly like that for other standard Python sequences. It is 0-based, +and accepts negative indices for indexing from the end of the array. :: + + >>> x = np.arange(10) + >>> x[2] + 2 + >>> x[-2] + 8 + +Unlike lists and tuples, numpy arrays support multidimensional indexing +for multidimensional arrays. That means that it is not necessary to +separate each dimension's index into its own set of square brackets. :: + + >>> x.shape = (2,5) # now x is 2-dimensional + >>> x[1,3] + 8 + >>> x[1,-1] + 9 + +Note that if one indexes a multidimensional array with fewer indices +than dimensions, one gets a subdimensional array. For example: :: + + >>> x[0] + array([0, 1, 2, 3, 4]) + +That is, each index specified selects the array corresponding to the +rest of the dimensions selected. In the above example, choosing 0 +means that the remaining dimension of length 5 is being left unspecified, +and that what is returned is an array of that dimensionality and size. +It must be noted that the returned array is not a copy of the original, +but points to the same values in memory as does the original array. +In this case, the 1-D array at the first position (0) is returned. +So using a single index on the returned array, results in a single +element being returned. That is: :: + + >>> x[0][2] + 2 + +So note that ``x[0,2] = x[0][2]`` though the second case is more +inefficient as a new temporary array is created after the first index +that is subsequently indexed by 2. + +Note to those used to IDL or Fortran memory order as it relates to +indexing. NumPy uses C-order indexing. That means that the last +index usually represents the most rapidly changing memory location, +unlike Fortran or IDL, where the first index represents the most +rapidly changing location in memory. This difference represents a +great potential for confusion. + +Other indexing options +====================== + +It is possible to slice and stride arrays to extract arrays of the +same number of dimensions, but of different sizes than the original. +The slicing and striding works exactly the same way it does for lists +and tuples except that they can be applied to multiple dimensions as +well. A few examples illustrates best: :: + + >>> x = np.arange(10) + >>> x[2:5] + array([2, 3, 4]) + >>> x[:-7] + array([0, 1, 2]) + >>> x[1:7:2] + array([1, 3, 5]) + >>> y = np.arange(35).reshape(5,7) + >>> y[1:5:2,::3] + array([[ 7, 10, 13], + [21, 24, 27]]) + +Note that slices of arrays do not copy the internal array data but +only produce new views of the original data. This is different from +list or tuple slicing and an explicit ``copy()`` is recommended if +the original data is not required anymore. + +It is possible to index arrays with other arrays for the purposes of +selecting lists of values out of arrays into new arrays. There are +two different ways of accomplishing this. One uses one or more arrays +of index values. The other involves giving a boolean array of the proper +shape to indicate the values to be selected. Index arrays are a very +powerful tool that allow one to avoid looping over individual elements in +arrays and thus greatly improve performance. + +It is possible to use special features to effectively increase the +number of dimensions in an array through indexing so the resulting +array acquires the shape needed for use in an expression or with a +specific function. + +Index arrays +============ + +NumPy arrays may be indexed with other arrays (or any other sequence- +like object that can be converted to an array, such as lists, with the +exception of tuples; see the end of this document for why this is). The +use of index arrays ranges from simple, straightforward cases to +complex, hard-to-understand cases. For all cases of index arrays, what +is returned is a copy of the original data, not a view as one gets for +slices. + +Index arrays must be of integer type. Each value in the array indicates +which value in the array to use in place of the index. To illustrate: :: + + >>> x = np.arange(10,1,-1) + >>> x + array([10, 9, 8, 7, 6, 5, 4, 3, 2]) + >>> x[np.array([3, 3, 1, 8])] + array([7, 7, 9, 2]) + + +The index array consisting of the values 3, 3, 1 and 8 correspondingly +create an array of length 4 (same as the index array) where each index +is replaced by the value the index array has in the array being indexed. + +Negative values are permitted and work as they do with single indices +or slices: :: + + >>> x[np.array([3,3,-3,8])] + array([7, 7, 4, 2]) + +It is an error to have index values out of bounds: :: + + >>> x[np.array([3, 3, 20, 8])] + <type 'exceptions.IndexError'>: index 20 out of bounds 0<=index<9 + +Generally speaking, what is returned when index arrays are used is +an array with the same shape as the index array, but with the type +and values of the array being indexed. As an example, we can use a +multidimensional index array instead: :: + + >>> x[np.array([[1,1],[2,3]])] + array([[9, 9], + [8, 7]]) + +Indexing Multi-dimensional arrays +================================= + +Things become more complex when multidimensional arrays are indexed, +particularly with multidimensional index arrays. These tend to be +more unusual uses, but they are permitted, and they are useful for some +problems. We'll start with the simplest multidimensional case (using +the array y from the previous examples): :: + + >>> y[np.array([0,2,4]), np.array([0,1,2])] + array([ 0, 15, 30]) + +In this case, if the index arrays have a matching shape, and there is +an index array for each dimension of the array being indexed, the +resultant array has the same shape as the index arrays, and the values +correspond to the index set for each position in the index arrays. In +this example, the first index value is 0 for both index arrays, and +thus the first value of the resultant array is y[0,0]. The next value +is y[2,1], and the last is y[4,2]. + +If the index arrays do not have the same shape, there is an attempt to +broadcast them to the same shape. If they cannot be broadcast to the +same shape, an exception is raised: :: + + >>> y[np.array([0,2,4]), np.array([0,1])] + <type 'exceptions.ValueError'>: shape mismatch: objects cannot be + broadcast to a single shape + +The broadcasting mechanism permits index arrays to be combined with +scalars for other indices. The effect is that the scalar value is used +for all the corresponding values of the index arrays: :: + + >>> y[np.array([0,2,4]), 1] + array([ 1, 15, 29]) + +Jumping to the next level of complexity, it is possible to only +partially index an array with index arrays. It takes a bit of thought +to understand what happens in such cases. For example if we just use +one index array with y: :: + + >>> y[np.array([0,2,4])] + array([[ 0, 1, 2, 3, 4, 5, 6], + [14, 15, 16, 17, 18, 19, 20], + [28, 29, 30, 31, 32, 33, 34]]) + +What results is the construction of a new array where each value of +the index array selects one row from the array being indexed and the +resultant array has the resulting shape (number of index elements, +size of row). + +An example of where this may be useful is for a color lookup table +where we want to map the values of an image into RGB triples for +display. The lookup table could have a shape (nlookup, 3). Indexing +such an array with an image with shape (ny, nx) with dtype=np.uint8 +(or any integer type so long as values are with the bounds of the +lookup table) will result in an array of shape (ny, nx, 3) where a +triple of RGB values is associated with each pixel location. + +In general, the shape of the resultant array will be the concatenation +of the shape of the index array (or the shape that all the index arrays +were broadcast to) with the shape of any unused dimensions (those not +indexed) in the array being indexed. + +Boolean or "mask" index arrays +============================== + +Boolean arrays used as indices are treated in a different manner +entirely than index arrays. Boolean arrays must be of the same shape +as the initial dimensions of the array being indexed. In the +most straightforward case, the boolean array has the same shape: :: + + >>> b = y>20 + >>> y[b] + array([21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34]) + +Unlike in the case of integer index arrays, in the boolean case, the +result is a 1-D array containing all the elements in the indexed array +corresponding to all the true elements in the boolean array. The +elements in the indexed array are always iterated and returned in +:term:`row-major` (C-style) order. The result is also identical to +``y[np.nonzero(b)]``. As with index arrays, what is returned is a copy +of the data, not a view as one gets with slices. + +The result will be multidimensional if y has more dimensions than b. +For example: :: + + >>> b[:,5] # use a 1-D boolean whose first dim agrees with the first dim of y + array([False, False, False, True, True]) + >>> y[b[:,5]] + array([[21, 22, 23, 24, 25, 26, 27], + [28, 29, 30, 31, 32, 33, 34]]) + +Here the 4th and 5th rows are selected from the indexed array and +combined to make a 2-D array. + +In general, when the boolean array has fewer dimensions than the array +being indexed, this is equivalent to y[b, ...], which means +y is indexed by b followed by as many : as are needed to fill +out the rank of y. +Thus the shape of the result is one dimension containing the number +of True elements of the boolean array, followed by the remaining +dimensions of the array being indexed. + +For example, using a 2-D boolean array of shape (2,3) +with four True elements to select rows from a 3-D array of shape +(2,3,5) results in a 2-D result of shape (4,5): :: + + >>> x = np.arange(30).reshape(2,3,5) + >>> x + array([[[ 0, 1, 2, 3, 4], + [ 5, 6, 7, 8, 9], + [10, 11, 12, 13, 14]], + [[15, 16, 17, 18, 19], + [20, 21, 22, 23, 24], + [25, 26, 27, 28, 29]]]) + >>> b = np.array([[True, True, False], [False, True, True]]) + >>> x[b] + array([[ 0, 1, 2, 3, 4], + [ 5, 6, 7, 8, 9], + [20, 21, 22, 23, 24], + [25, 26, 27, 28, 29]]) + +For further details, consult the numpy reference documentation on array indexing. + +Combining index arrays with slices +================================== + +Index arrays may be combined with slices. For example: :: + + >>> y[np.array([0, 2, 4]), 1:3] + array([[ 1, 2], + [15, 16], + [29, 30]]) + +In effect, the slice and index array operation are independent. +The slice operation extracts columns with index 1 and 2, +(i.e. the 2nd and 3rd columns), +followed by the index array operation which extracts rows with +index 0, 2 and 4 (i.e the first, third and fifth rows). + +This is equivalent to:: + + >>> y[:, 1:3][np.array([0, 2, 4]), :] + array([[ 1, 2], + [15, 16], + [29, 30]]) + +Likewise, slicing can be combined with broadcasted boolean indices: :: + + >>> b = y > 20 + >>> b + array([[False, False, False, False, False, False, False], + [False, False, False, False, False, False, False], + [False, False, False, False, False, False, False], + [ True, True, True, True, True, True, True], + [ True, True, True, True, True, True, True]]) + >>> y[b[:,5],1:3] + array([[22, 23], + [29, 30]]) + +Structural indexing tools +========================= + +To facilitate easy matching of array shapes with expressions and in +assignments, the np.newaxis object can be used within array indices +to add new dimensions with a size of 1. For example: :: + + >>> y.shape + (5, 7) + >>> y[:,np.newaxis,:].shape + (5, 1, 7) + +Note that there are no new elements in the array, just that the +dimensionality is increased. This can be handy to combine two +arrays in a way that otherwise would require explicitly reshaping +operations. For example: :: + + >>> x = np.arange(5) + >>> x[:,np.newaxis] + x[np.newaxis,:] + array([[0, 1, 2, 3, 4], + [1, 2, 3, 4, 5], + [2, 3, 4, 5, 6], + [3, 4, 5, 6, 7], + [4, 5, 6, 7, 8]]) + +The ellipsis syntax maybe used to indicate selecting in full any +remaining unspecified dimensions. For example: :: + + >>> z = np.arange(81).reshape(3,3,3,3) + >>> z[1,...,2] + array([[29, 32, 35], + [38, 41, 44], + [47, 50, 53]]) + +This is equivalent to: :: + + >>> z[1,:,:,2] + array([[29, 32, 35], + [38, 41, 44], + [47, 50, 53]]) + +Assigning values to indexed arrays +================================== + +As mentioned, one can select a subset of an array to assign to using +a single index, slices, and index and mask arrays. The value being +assigned to the indexed array must be shape consistent (the same shape +or broadcastable to the shape the index produces). For example, it is +permitted to assign a constant to a slice: :: + + >>> x = np.arange(10) + >>> x[2:7] = 1 + +or an array of the right size: :: + + >>> x[2:7] = np.arange(5) + +Note that assignments may result in changes if assigning +higher types to lower types (like floats to ints) or even +exceptions (assigning complex to floats or ints): :: + + >>> x[1] = 1.2 + >>> x[1] + 1 + >>> x[1] = 1.2j + TypeError: can't convert complex to int + + +Unlike some of the references (such as array and mask indices) +assignments are always made to the original data in the array +(indeed, nothing else would make sense!). Note though, that some +actions may not work as one may naively expect. This particular +example is often surprising to people: :: + + >>> x = np.arange(0, 50, 10) + >>> x + array([ 0, 10, 20, 30, 40]) + >>> x[np.array([1, 1, 3, 1])] += 1 + >>> x + array([ 0, 11, 20, 31, 40]) + +Where people expect that the 1st location will be incremented by 3. +In fact, it will only be incremented by 1. The reason is because +a new array is extracted from the original (as a temporary) containing +the values at 1, 1, 3, 1, then the value 1 is added to the temporary, +and then the temporary is assigned back to the original array. Thus +the value of the array at x[1]+1 is assigned to x[1] three times, +rather than being incremented 3 times. + +Dealing with variable numbers of indices within programs +======================================================== + +The index syntax is very powerful but limiting when dealing with +a variable number of indices. For example, if you want to write +a function that can handle arguments with various numbers of +dimensions without having to write special case code for each +number of possible dimensions, how can that be done? If one +supplies to the index a tuple, the tuple will be interpreted +as a list of indices. For example (using the previous definition +for the array z): :: + + >>> indices = (1,1,1,1) + >>> z[indices] + 40 + +So one can use code to construct tuples of any number of indices +and then use these within an index. + +Slices can be specified within programs by using the slice() function +in Python. For example: :: + + >>> indices = (1,1,1,slice(0,2)) # same as [1,1,1,0:2] + >>> z[indices] + array([39, 40]) + +Likewise, ellipsis can be specified by code by using the Ellipsis +object: :: + + >>> indices = (1, Ellipsis, 1) # same as [1,...,1] + >>> z[indices] + array([[28, 31, 34], + [37, 40, 43], + [46, 49, 52]]) + +For this reason it is possible to use the output from the np.nonzero() +function directly as an index since it always returns a tuple of index +arrays. + +Because the special treatment of tuples, they are not automatically +converted to an array as a list would be. As an example: :: + + >>> z[[1,1,1,1]] # produces a large array + array([[[[27, 28, 29], + [30, 31, 32], ... + >>> z[(1,1,1,1)] # returns a single value + 40 + + diff --git a/doc/source/user/basics.io.genfromtxt.rst b/doc/source/user/basics.io.genfromtxt.rst index 3fce6a8aa..5364acbe9 100644 --- a/doc/source/user/basics.io.genfromtxt.rst +++ b/doc/source/user/basics.io.genfromtxt.rst @@ -28,7 +28,7 @@ Defining the input The only mandatory argument of :func:`~numpy.genfromtxt` is the source of the data. It can be a string, a list of strings, a generator or an open -file-like object with a :meth:`read` method, for example, a file or +file-like object with a ``read`` method, for example, a file or :class:`io.StringIO` object. If a single string is provided, it is assumed to be the name of a local or remote file. If a list of strings or a generator returning strings is provided, each string is treated as one line in a file. @@ -36,10 +36,10 @@ When the URL of a remote file is passed, the file is automatically downloaded to the current directory and opened. Recognized file types are text files and archives. Currently, the function -recognizes :class:`gzip` and :class:`bz2` (`bzip2`) archives. The type of +recognizes ``gzip`` and ``bz2`` (``bzip2``) archives. The type of the archive is determined from the extension of the file: if the filename -ends with ``'.gz'``, a :class:`gzip` archive is expected; if it ends with -``'bz2'``, a :class:`bzip2` archive is assumed. +ends with ``'.gz'``, a ``gzip`` archive is expected; if it ends with +``'bz2'``, a ``bzip2`` archive is assumed. @@ -360,9 +360,9 @@ The ``converters`` argument Usually, defining a dtype is sufficient to define how the sequence of strings must be converted. However, some additional control may sometimes be required. For example, we may want to make sure that a date in a format -``YYYY/MM/DD`` is converted to a :class:`datetime` object, or that a string -like ``xx%`` is properly converted to a float between 0 and 1. In such -cases, we should define conversion functions with the ``converters`` +``YYYY/MM/DD`` is converted to a :class:`~datetime.datetime` object, or that +a string like ``xx%`` is properly converted to a float between 0 and 1. In +such cases, we should define conversion functions with the ``converters`` arguments. The value of this argument is typically a dictionary with column indices or @@ -427,7 +427,7 @@ previous example, we used a converter to transform an empty string into a float. However, user-defined converters may rapidly become cumbersome to manage. -The :func:`~nummpy.genfromtxt` function provides two other complementary +The :func:`~numpy.genfromtxt` function provides two other complementary mechanisms: the ``missing_values`` argument is used to recognize missing data and a second argument, ``filling_values``, is used to process these missing data. @@ -514,15 +514,15 @@ output array will then be a :class:`~numpy.ma.MaskedArray`. Shortcut functions ================== -In addition to :func:`~numpy.genfromtxt`, the :mod:`numpy.lib.io` module +In addition to :func:`~numpy.genfromtxt`, the :mod:`numpy.lib.npyio` module provides several convenience functions derived from :func:`~numpy.genfromtxt`. These functions work the same way as the original, but they have different default values. -:func:`~numpy.recfromtxt` +:func:`~numpy.npyio.recfromtxt` Returns a standard :class:`numpy.recarray` (if ``usemask=False``) or a - :class:`~numpy.ma.MaskedRecords` array (if ``usemaske=True``). The + :class:`~numpy.ma.mrecords.MaskedRecords` array (if ``usemaske=True``). The default dtype is ``dtype=None``, meaning that the types of each column will be automatically determined. -:func:`~numpy.recfromcsv` - Like :func:`~numpy.recfromtxt`, but with a default ``delimiter=","``. +:func:`~numpy.npyio.recfromcsv` + Like :func:`~numpy.npyio.recfromtxt`, but with a default ``delimiter=","``. diff --git a/doc/source/user/basics.rec.rst b/doc/source/user/basics.rec.rst index b885c9e77..f579b0d85 100644 --- a/doc/source/user/basics.rec.rst +++ b/doc/source/user/basics.rec.rst @@ -4,10 +4,649 @@ Structured arrays ***************** -.. automodule:: numpy.doc.structured_arrays +Introduction +============ + +Structured arrays are ndarrays whose datatype is a composition of simpler +datatypes organized as a sequence of named :term:`fields <field>`. For example, +:: + + >>> x = np.array([('Rex', 9, 81.0), ('Fido', 3, 27.0)], + ... dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')]) + >>> x + array([('Rex', 9, 81.), ('Fido', 3, 27.)], + dtype=[('name', 'U10'), ('age', '<i4'), ('weight', '<f4')]) + +Here ``x`` is a one-dimensional array of length two whose datatype is a +structure with three fields: 1. A string of length 10 or less named 'name', 2. +a 32-bit integer named 'age', and 3. a 32-bit float named 'weight'. + +If you index ``x`` at position 1 you get a structure:: + + >>> x[1] + ('Fido', 3, 27.0) + +You can access and modify individual fields of a structured array by indexing +with the field name:: + + >>> x['age'] + array([9, 3], dtype=int32) + >>> x['age'] = 5 + >>> x + array([('Rex', 5, 81.), ('Fido', 5, 27.)], + dtype=[('name', 'U10'), ('age', '<i4'), ('weight', '<f4')]) + +Structured datatypes are designed to be able to mimic 'structs' in the C +language, and share a similar memory layout. They are meant for interfacing with +C code and for low-level manipulation of structured buffers, for example for +interpreting binary blobs. For these purposes they support specialized features +such as subarrays, nested datatypes, and unions, and allow control over the +memory layout of the structure. + +Users looking to manipulate tabular data, such as stored in csv files, may find +other pydata projects more suitable, such as xarray, pandas, or DataArray. +These provide a high-level interface for tabular data analysis and are better +optimized for that use. For instance, the C-struct-like memory layout of +structured arrays in numpy can lead to poor cache behavior in comparison. + +.. _defining-structured-types: + +Structured Datatypes +==================== + +A structured datatype can be thought of as a sequence of bytes of a certain +length (the structure's :term:`itemsize`) which is interpreted as a collection +of fields. Each field has a name, a datatype, and a byte offset within the +structure. The datatype of a field may be any numpy datatype including other +structured datatypes, and it may also be a :term:`subarray data type` which +behaves like an ndarray of a specified shape. The offsets of the fields are +arbitrary, and fields may even overlap. These offsets are usually determined +automatically by numpy, but can also be specified. + +Structured Datatype Creation +---------------------------- + +Structured datatypes may be created using the function :func:`numpy.dtype`. +There are 4 alternative forms of specification which vary in flexibility and +conciseness. These are further documented in the +:ref:`Data Type Objects <arrays.dtypes.constructing>` reference page, and in +summary they are: + +1. A list of tuples, one tuple per field + + Each tuple has the form ``(fieldname, datatype, shape)`` where shape is + optional. ``fieldname`` is a string (or tuple if titles are used, see + :ref:`Field Titles <titles>` below), ``datatype`` may be any object + convertible to a datatype, and ``shape`` is a tuple of integers specifying + subarray shape. + + >>> np.dtype([('x', 'f4'), ('y', np.float32), ('z', 'f4', (2, 2))]) + dtype([('x', '<f4'), ('y', '<f4'), ('z', '<f4', (2, 2))]) + + If ``fieldname`` is the empty string ``''``, the field will be given a + default name of the form ``f#``, where ``#`` is the integer index of the + field, counting from 0 from the left:: + + >>> np.dtype([('x', 'f4'), ('', 'i4'), ('z', 'i8')]) + dtype([('x', '<f4'), ('f1', '<i4'), ('z', '<i8')]) + + The byte offsets of the fields within the structure and the total + structure itemsize are determined automatically. + +2. A string of comma-separated dtype specifications + + In this shorthand notation any of the :ref:`string dtype specifications + <arrays.dtypes.constructing>` may be used in a string and separated by + commas. The itemsize and byte offsets of the fields are determined + automatically, and the field names are given the default names ``f0``, + ``f1``, etc. :: + + >>> np.dtype('i8, f4, S3') + dtype([('f0', '<i8'), ('f1', '<f4'), ('f2', 'S3')]) + >>> np.dtype('3int8, float32, (2, 3)float64') + dtype([('f0', 'i1', (3,)), ('f1', '<f4'), ('f2', '<f8', (2, 3))]) + +3. A dictionary of field parameter arrays + + This is the most flexible form of specification since it allows control + over the byte-offsets of the fields and the itemsize of the structure. + + The dictionary has two required keys, 'names' and 'formats', and four + optional keys, 'offsets', 'itemsize', 'aligned' and 'titles'. The values + for 'names' and 'formats' should respectively be a list of field names and + a list of dtype specifications, of the same length. The optional 'offsets' + value should be a list of integer byte-offsets, one for each field within + the structure. If 'offsets' is not given the offsets are determined + automatically. The optional 'itemsize' value should be an integer + describing the total size in bytes of the dtype, which must be large + enough to contain all the fields. + :: + + >>> np.dtype({'names': ['col1', 'col2'], 'formats': ['i4', 'f4']}) + dtype([('col1', '<i4'), ('col2', '<f4')]) + >>> np.dtype({'names': ['col1', 'col2'], + ... 'formats': ['i4', 'f4'], + ... 'offsets': [0, 4], + ... 'itemsize': 12}) + dtype({'names':['col1','col2'], 'formats':['<i4','<f4'], 'offsets':[0,4], 'itemsize':12}) + + Offsets may be chosen such that the fields overlap, though this will mean + that assigning to one field may clobber any overlapping field's data. As + an exception, fields of :class:`numpy.object` type cannot overlap with + other fields, because of the risk of clobbering the internal object + pointer and then dereferencing it. + + The optional 'aligned' value can be set to ``True`` to make the automatic + offset computation use aligned offsets (see :ref:`offsets-and-alignment`), + as if the 'align' keyword argument of :func:`numpy.dtype` had been set to + True. + + The optional 'titles' value should be a list of titles of the same length + as 'names', see :ref:`Field Titles <titles>` below. + +4. A dictionary of field names + + The use of this form of specification is discouraged, but documented here + because older numpy code may use it. The keys of the dictionary are the + field names and the values are tuples specifying type and offset:: + + >>> np.dtype({'col1': ('i1', 0), 'col2': ('f4', 1)}) + dtype([('col1', 'i1'), ('col2', '<f4')]) + + This form is discouraged because Python dictionaries do not preserve order + in Python versions before Python 3.6, and the order of the fields in a + structured dtype has meaning. :ref:`Field Titles <titles>` may be + specified by using a 3-tuple, see below. + +Manipulating and Displaying Structured Datatypes +------------------------------------------------ + +The list of field names of a structured datatype can be found in the ``names`` +attribute of the dtype object:: + + >>> d = np.dtype([('x', 'i8'), ('y', 'f4')]) + >>> d.names + ('x', 'y') + +The field names may be modified by assigning to the ``names`` attribute using a +sequence of strings of the same length. + +The dtype object also has a dictionary-like attribute, ``fields``, whose keys +are the field names (and :ref:`Field Titles <titles>`, see below) and whose +values are tuples containing the dtype and byte offset of each field. :: + + >>> d.fields + mappingproxy({'x': (dtype('int64'), 0), 'y': (dtype('float32'), 8)}) + +Both the ``names`` and ``fields`` attributes will equal ``None`` for +unstructured arrays. The recommended way to test if a dtype is structured is +with `if dt.names is not None` rather than `if dt.names`, to account for dtypes +with 0 fields. + +The string representation of a structured datatype is shown in the "list of +tuples" form if possible, otherwise numpy falls back to using the more general +dictionary form. + +.. _offsets-and-alignment: + +Automatic Byte Offsets and Alignment +------------------------------------ + +Numpy uses one of two methods to automatically determine the field byte offsets +and the overall itemsize of a structured datatype, depending on whether +``align=True`` was specified as a keyword argument to :func:`numpy.dtype`. + +By default (``align=False``), numpy will pack the fields together such that +each field starts at the byte offset the previous field ended, and the fields +are contiguous in memory. :: + + >>> def print_offsets(d): + ... print("offsets:", [d.fields[name][1] for name in d.names]) + ... print("itemsize:", d.itemsize) + >>> print_offsets(np.dtype('u1, u1, i4, u1, i8, u2')) + offsets: [0, 1, 2, 6, 7, 15] + itemsize: 17 + +If ``align=True`` is set, numpy will pad the structure in the same way many C +compilers would pad a C-struct. Aligned structures can give a performance +improvement in some cases, at the cost of increased datatype size. Padding +bytes are inserted between fields such that each field's byte offset will be a +multiple of that field's alignment, which is usually equal to the field's size +in bytes for simple datatypes, see :c:member:`PyArray_Descr.alignment`. The +structure will also have trailing padding added so that its itemsize is a +multiple of the largest field's alignment. :: + + >>> print_offsets(np.dtype('u1, u1, i4, u1, i8, u2', align=True)) + offsets: [0, 1, 4, 8, 16, 24] + itemsize: 32 + +Note that although almost all modern C compilers pad in this way by default, +padding in C structs is C-implementation-dependent so this memory layout is not +guaranteed to exactly match that of a corresponding struct in a C program. Some +work may be needed, either on the numpy side or the C side, to obtain exact +correspondence. + +If offsets were specified using the optional ``offsets`` key in the +dictionary-based dtype specification, setting ``align=True`` will check that +each field's offset is a multiple of its size and that the itemsize is a +multiple of the largest field size, and raise an exception if not. + +If the offsets of the fields and itemsize of a structured array satisfy the +alignment conditions, the array will have the ``ALIGNED`` :attr:`flag +<numpy.ndarray.flags>` set. + +A convenience function :func:`numpy.lib.recfunctions.repack_fields` converts an +aligned dtype or array to a packed one and vice versa. It takes either a dtype +or structured ndarray as an argument, and returns a copy with fields re-packed, +with or without padding bytes. + +.. _titles: + +Field Titles +------------ + +In addition to field names, fields may also have an associated :term:`title`, +an alternate name, which is sometimes used as an additional description or +alias for the field. The title may be used to index an array, just like a +field name. + +To add titles when using the list-of-tuples form of dtype specification, the +field name may be specified as a tuple of two strings instead of a single +string, which will be the field's title and field name respectively. For +example:: + + >>> np.dtype([(('my title', 'name'), 'f4')]) + dtype([(('my title', 'name'), '<f4')]) + +When using the first form of dictionary-based specification, the titles may be +supplied as an extra ``'titles'`` key as described above. When using the second +(discouraged) dictionary-based specification, the title can be supplied by +providing a 3-element tuple ``(datatype, offset, title)`` instead of the usual +2-element tuple:: + + >>> np.dtype({'name': ('i4', 0, 'my title')}) + dtype([(('my title', 'name'), '<i4')]) + +The ``dtype.fields`` dictionary will contain titles as keys, if any +titles are used. This means effectively that a field with a title will be +represented twice in the fields dictionary. The tuple values for these fields +will also have a third element, the field title. Because of this, and because +the ``names`` attribute preserves the field order while the ``fields`` +attribute may not, it is recommended to iterate through the fields of a dtype +using the ``names`` attribute of the dtype, which will not list titles, as +in:: + + >>> for name in d.names: + ... print(d.fields[name][:2]) + (dtype('int64'), 0) + (dtype('float32'), 8) + +Union types +----------- + +Structured datatypes are implemented in numpy to have base type +:class:`numpy.void` by default, but it is possible to interpret other numpy +types as structured types using the ``(base_dtype, dtype)`` form of dtype +specification described in +:ref:`Data Type Objects <arrays.dtypes.constructing>`. Here, ``base_dtype`` is +the desired underlying dtype, and fields and flags will be copied from +``dtype``. This dtype is similar to a 'union' in C. + +Indexing and Assignment to Structured arrays +============================================ + +Assigning data to a Structured Array +------------------------------------ + +There are a number of ways to assign values to a structured array: Using python +tuples, using scalar values, or using other structured arrays. + +Assignment from Python Native Types (Tuples) +```````````````````````````````````````````` + +The simplest way to assign values to a structured array is using python tuples. +Each assigned value should be a tuple of length equal to the number of fields +in the array, and not a list or array as these will trigger numpy's +broadcasting rules. The tuple's elements are assigned to the successive fields +of the array, from left to right:: + + >>> x = np.array([(1, 2, 3), (4, 5, 6)], dtype='i8, f4, f8') + >>> x[1] = (7, 8, 9) + >>> x + array([(1, 2., 3.), (7, 8., 9.)], + dtype=[('f0', '<i8'), ('f1', '<f4'), ('f2', '<f8')]) + +Assignment from Scalars +``````````````````````` + +A scalar assigned to a structured element will be assigned to all fields. This +happens when a scalar is assigned to a structured array, or when an +unstructured array is assigned to a structured array:: + + >>> x = np.zeros(2, dtype='i8, f4, ?, S1') + >>> x[:] = 3 + >>> x + array([(3, 3., True, b'3'), (3, 3., True, b'3')], + dtype=[('f0', '<i8'), ('f1', '<f4'), ('f2', '?'), ('f3', 'S1')]) + >>> x[:] = np.arange(2) + >>> x + array([(0, 0., False, b'0'), (1, 1., True, b'1')], + dtype=[('f0', '<i8'), ('f1', '<f4'), ('f2', '?'), ('f3', 'S1')]) + +Structured arrays can also be assigned to unstructured arrays, but only if the +structured datatype has just a single field:: + + >>> twofield = np.zeros(2, dtype=[('A', 'i4'), ('B', 'i4')]) + >>> onefield = np.zeros(2, dtype=[('A', 'i4')]) + >>> nostruct = np.zeros(2, dtype='i4') + >>> nostruct[:] = twofield + Traceback (most recent call last): + ... + TypeError: Cannot cast array data from dtype([('A', '<i4'), ('B', '<i4')]) to dtype('int32') according to the rule 'unsafe' + +Assignment from other Structured Arrays +``````````````````````````````````````` + +Assignment between two structured arrays occurs as if the source elements had +been converted to tuples and then assigned to the destination elements. That +is, the first field of the source array is assigned to the first field of the +destination array, and the second field likewise, and so on, regardless of +field names. Structured arrays with a different number of fields cannot be +assigned to each other. Bytes of the destination structure which are not +included in any of the fields are unaffected. :: + + >>> a = np.zeros(3, dtype=[('a', 'i8'), ('b', 'f4'), ('c', 'S3')]) + >>> b = np.ones(3, dtype=[('x', 'f4'), ('y', 'S3'), ('z', 'O')]) + >>> b[:] = a + >>> b + array([(0., b'0.0', b''), (0., b'0.0', b''), (0., b'0.0', b'')], + dtype=[('x', '<f4'), ('y', 'S3'), ('z', 'O')]) + + +Assignment involving subarrays +`````````````````````````````` + +When assigning to fields which are subarrays, the assigned value will first be +broadcast to the shape of the subarray. + +Indexing Structured Arrays +-------------------------- + +Accessing Individual Fields +``````````````````````````` + +Individual fields of a structured array may be accessed and modified by indexing +the array with the field name. :: + + >>> x = np.array([(1, 2), (3, 4)], dtype=[('foo', 'i8'), ('bar', 'f4')]) + >>> x['foo'] + array([1, 3]) + >>> x['foo'] = 10 + >>> x + array([(10, 2.), (10, 4.)], + dtype=[('foo', '<i8'), ('bar', '<f4')]) + +The resulting array is a view into the original array. It shares the same +memory locations and writing to the view will modify the original array. :: + + >>> y = x['bar'] + >>> y[:] = 11 + >>> x + array([(10, 11.), (10, 11.)], + dtype=[('foo', '<i8'), ('bar', '<f4')]) + +This view has the same dtype and itemsize as the indexed field, so it is +typically a non-structured array, except in the case of nested structures. + + >>> y.dtype, y.shape, y.strides + (dtype('float32'), (2,), (12,)) + +If the accessed field is a subarray, the dimensions of the subarray +are appended to the shape of the result:: + + >>> x = np.zeros((2, 2), dtype=[('a', np.int32), ('b', np.float64, (3, 3))]) + >>> x['a'].shape + (2, 2) + >>> x['b'].shape + (2, 2, 3, 3) + +Accessing Multiple Fields +``````````````````````````` + +One can index and assign to a structured array with a multi-field index, where +the index is a list of field names. + +.. warning:: + The behavior of multi-field indexes changed from Numpy 1.15 to Numpy 1.16. + +The result of indexing with a multi-field index is a view into the original +array, as follows:: + + >>> a = np.zeros(3, dtype=[('a', 'i4'), ('b', 'i4'), ('c', 'f4')]) + >>> a[['a', 'c']] + array([(0, 0.), (0, 0.), (0, 0.)], + dtype={'names':['a','c'], 'formats':['<i4','<f4'], 'offsets':[0,8], 'itemsize':12}) + +Assignment to the view modifies the original array. The view's fields will be +in the order they were indexed. Note that unlike for single-field indexing, the +dtype of the view has the same itemsize as the original array, and has fields +at the same offsets as in the original array, and unindexed fields are merely +missing. + +.. warning:: + In Numpy 1.15, indexing an array with a multi-field index returned a copy of + the result above, but with fields packed together in memory as if + passed through :func:`numpy.lib.recfunctions.repack_fields`. + + The new behavior as of Numpy 1.16 leads to extra "padding" bytes at the + location of unindexed fields compared to 1.15. You will need to update any + code which depends on the data having a "packed" layout. For instance code + such as:: + + >>> a[['a', 'c']].view('i8') # Fails in Numpy 1.16 + Traceback (most recent call last): + File "<stdin>", line 1, in <module> + ValueError: When changing to a smaller dtype, its size must be a divisor of the size of original dtype + + will need to be changed. This code has raised a ``FutureWarning`` since + Numpy 1.12, and similar code has raised ``FutureWarning`` since 1.7. + + In 1.16 a number of functions have been introduced in the + :mod:`numpy.lib.recfunctions` module to help users account for this + change. These are + :func:`numpy.lib.recfunctions.repack_fields`. + :func:`numpy.lib.recfunctions.structured_to_unstructured`, + :func:`numpy.lib.recfunctions.unstructured_to_structured`, + :func:`numpy.lib.recfunctions.apply_along_fields`, + :func:`numpy.lib.recfunctions.assign_fields_by_name`, and + :func:`numpy.lib.recfunctions.require_fields`. + + The function :func:`numpy.lib.recfunctions.repack_fields` can always be + used to reproduce the old behavior, as it will return a packed copy of the + structured array. The code above, for example, can be replaced with: + + >>> from numpy.lib.recfunctions import repack_fields + >>> repack_fields(a[['a', 'c']]).view('i8') # supported in 1.16 + array([0, 0, 0]) + + Furthermore, numpy now provides a new function + :func:`numpy.lib.recfunctions.structured_to_unstructured` which is a safer + and more efficient alternative for users who wish to convert structured + arrays to unstructured arrays, as the view above is often indeded to do. + This function allows safe conversion to an unstructured type taking into + account padding, often avoids a copy, and also casts the datatypes + as needed, unlike the view. Code such as: + + >>> b = np.zeros(3, dtype=[('x', 'f4'), ('y', 'f4'), ('z', 'f4')]) + >>> b[['x', 'z']].view('f4') + array([0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32) + + can be made safer by replacing with: + + >>> from numpy.lib.recfunctions import structured_to_unstructured + >>> structured_to_unstructured(b[['x', 'z']]) + array([0, 0, 0]) + + +Assignment to an array with a multi-field index modifies the original array:: + + >>> a[['a', 'c']] = (2, 3) + >>> a + array([(2, 0, 3.), (2, 0, 3.), (2, 0, 3.)], + dtype=[('a', '<i4'), ('b', '<i4'), ('c', '<f4')]) + +This obeys the structured array assignment rules described above. For example, +this means that one can swap the values of two fields using appropriate +multi-field indexes:: + + >>> a[['a', 'c']] = a[['c', 'a']] + +Indexing with an Integer to get a Structured Scalar +``````````````````````````````````````````````````` + +Indexing a single element of a structured array (with an integer index) returns +a structured scalar:: + + >>> x = np.array([(1, 2., 3.)], dtype='i, f, f') + >>> scalar = x[0] + >>> scalar + (1, 2., 3.) + >>> type(scalar) + <class 'numpy.void'> + +Unlike other numpy scalars, structured scalars are mutable and act like views +into the original array, such that modifying the scalar will modify the +original array. Structured scalars also support access and assignment by field +name:: + + >>> x = np.array([(1, 2), (3, 4)], dtype=[('foo', 'i8'), ('bar', 'f4')]) + >>> s = x[0] + >>> s['bar'] = 100 + >>> x + array([(1, 100.), (3, 4.)], + dtype=[('foo', '<i8'), ('bar', '<f4')]) + +Similarly to tuples, structured scalars can also be indexed with an integer:: + + >>> scalar = np.array([(1, 2., 3.)], dtype='i, f, f')[0] + >>> scalar[0] + 1 + >>> scalar[1] = 4 + +Thus, tuples might be thought of as the native Python equivalent to numpy's +structured types, much like native python integers are the equivalent to +numpy's integer types. Structured scalars may be converted to a tuple by +calling `numpy.ndarray.item`:: + + >>> scalar.item(), type(scalar.item()) + ((1, 4.0, 3.0), <class 'tuple'>) + +Viewing Structured Arrays Containing Objects +-------------------------------------------- + +In order to prevent clobbering object pointers in fields of +:class:`numpy.object` type, numpy currently does not allow views of structured +arrays containing objects. + +Structure Comparison +-------------------- + +If the dtypes of two void structured arrays are equal, testing the equality of +the arrays will result in a boolean array with the dimensions of the original +arrays, with elements set to ``True`` where all fields of the corresponding +structures are equal. Structured dtypes are equal if the field names, +dtypes and titles are the same, ignoring endianness, and the fields are in +the same order:: + + >>> a = np.zeros(2, dtype=[('a', 'i4'), ('b', 'i4')]) + >>> b = np.ones(2, dtype=[('a', 'i4'), ('b', 'i4')]) + >>> a == b + array([False, False]) + +Currently, if the dtypes of two void structured arrays are not equivalent the +comparison fails, returning the scalar value ``False``. This behavior is +deprecated as of numpy 1.10 and will raise an error or perform elementwise +comparison in the future. + +The ``<`` and ``>`` operators always return ``False`` when comparing void +structured arrays, and arithmetic and bitwise operations are not supported. + +Record Arrays +============= + +As an optional convenience numpy provides an ndarray subclass, +:class:`numpy.recarray`, and associated helper functions in the +:mod:`numpy.lib.recfunctions` submodule (aliased as ``numpy.rec``), that allows +access to fields of structured arrays by attribute instead of only by index. +Record arrays also use a special datatype, :class:`numpy.record`, that allows +field access by attribute on the structured scalars obtained from the array. + +The simplest way to create a record array is with ``numpy.rec.array``:: + + >>> recordarr = np.rec.array([(1, 2., 'Hello'), (2, 3., "World")], + ... dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'S10')]) + >>> recordarr.bar + array([ 2., 3.], dtype=float32) + >>> recordarr[1:2] + rec.array([(2, 3., b'World')], + dtype=[('foo', '<i4'), ('bar', '<f4'), ('baz', 'S10')]) + >>> recordarr[1:2].foo + array([2], dtype=int32) + >>> recordarr.foo[1:2] + array([2], dtype=int32) + >>> recordarr[1].baz + b'World' + +:func:`numpy.rec.array` can convert a wide variety of arguments into record +arrays, including structured arrays:: + + >>> arr = np.array([(1, 2., 'Hello'), (2, 3., "World")], + ... dtype=[('foo', 'i4'), ('bar', 'f4'), ('baz', 'S10')]) + >>> recordarr = np.rec.array(arr) + +The :mod:`numpy.rec` module provides a number of other convenience functions for +creating record arrays, see :ref:`record array creation routines +<routines.array-creation.rec>`. + +A record array representation of a structured array can be obtained using the +appropriate `view <numpy-ndarray-view>`_:: + + >>> arr = np.array([(1, 2., 'Hello'), (2, 3., "World")], + ... dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'a10')]) + >>> recordarr = arr.view(dtype=np.dtype((np.record, arr.dtype)), + ... type=np.recarray) + +For convenience, viewing an ndarray as type :class:`np.recarray` will +automatically convert to :class:`np.record` datatype, so the dtype can be left +out of the view:: + + >>> recordarr = arr.view(np.recarray) + >>> recordarr.dtype + dtype((numpy.record, [('foo', '<i4'), ('bar', '<f4'), ('baz', 'S10')])) + +To get back to a plain ndarray both the dtype and type must be reset. The +following view does so, taking into account the unusual case that the +recordarr was not a structured type:: + + >>> arr2 = recordarr.view(recordarr.dtype.fields or recordarr.dtype, np.ndarray) + +Record array fields accessed by index or by attribute are returned as a record +array if the field has a structured type but as a plain ndarray otherwise. :: + + >>> recordarr = np.rec.array([('Hello', (1, 2)), ("World", (3, 4))], + ... dtype=[('foo', 'S6'),('bar', [('A', int), ('B', int)])]) + >>> type(recordarr.foo) + <class 'numpy.ndarray'> + >>> type(recordarr.bar) + <class 'numpy.recarray'> + +Note that if a field has the same name as an ndarray attribute, the ndarray +attribute takes precedence. Such fields will be inaccessible by attribute but +will still be accessible by index. + Recarray Helper Functions -************************* +------------------------- .. automodule:: numpy.lib.recfunctions :members: diff --git a/doc/source/user/basics.subclassing.rst b/doc/source/user/basics.subclassing.rst index 43315521c..d8d104220 100644 --- a/doc/source/user/basics.subclassing.rst +++ b/doc/source/user/basics.subclassing.rst @@ -4,4 +4,751 @@ Subclassing ndarray ******************* -.. automodule:: numpy.doc.subclassing +Introduction +------------ + +Subclassing ndarray is relatively simple, but it has some complications +compared to other Python objects. On this page we explain the machinery +that allows you to subclass ndarray, and the implications for +implementing a subclass. + +ndarrays and object creation +============================ + +Subclassing ndarray is complicated by the fact that new instances of +ndarray classes can come about in three different ways. These are: + +#. Explicit constructor call - as in ``MySubClass(params)``. This is + the usual route to Python instance creation. +#. View casting - casting an existing ndarray as a given subclass +#. New from template - creating a new instance from a template + instance. Examples include returning slices from a subclassed array, + creating return types from ufuncs, and copying arrays. See + :ref:`new-from-template` for more details + +The last two are characteristics of ndarrays - in order to support +things like array slicing. The complications of subclassing ndarray are +due to the mechanisms numpy has to support these latter two routes of +instance creation. + +.. _view-casting: + +View casting +------------ + +*View casting* is the standard ndarray mechanism by which you take an +ndarray of any subclass, and return a view of the array as another +(specified) subclass: + +>>> import numpy as np +>>> # create a completely useless ndarray subclass +>>> class C(np.ndarray): pass +>>> # create a standard ndarray +>>> arr = np.zeros((3,)) +>>> # take a view of it, as our useless subclass +>>> c_arr = arr.view(C) +>>> type(c_arr) +<class 'C'> + +.. _new-from-template: + +Creating new from template +-------------------------- + +New instances of an ndarray subclass can also come about by a very +similar mechanism to :ref:`view-casting`, when numpy finds it needs to +create a new instance from a template instance. The most obvious place +this has to happen is when you are taking slices of subclassed arrays. +For example: + +>>> v = c_arr[1:] +>>> type(v) # the view is of type 'C' +<class 'C'> +>>> v is c_arr # but it's a new instance +False + +The slice is a *view* onto the original ``c_arr`` data. So, when we +take a view from the ndarray, we return a new ndarray, of the same +class, that points to the data in the original. + +There are other points in the use of ndarrays where we need such views, +such as copying arrays (``c_arr.copy()``), creating ufunc output arrays +(see also :ref:`array-wrap`), and reducing methods (like +``c_arr.mean()``). + +Relationship of view casting and new-from-template +-------------------------------------------------- + +These paths both use the same machinery. We make the distinction here, +because they result in different input to your methods. Specifically, +:ref:`view-casting` means you have created a new instance of your array +type from any potential subclass of ndarray. :ref:`new-from-template` +means you have created a new instance of your class from a pre-existing +instance, allowing you - for example - to copy across attributes that +are particular to your subclass. + +Implications for subclassing +---------------------------- + +If we subclass ndarray, we need to deal not only with explicit +construction of our array type, but also :ref:`view-casting` or +:ref:`new-from-template`. NumPy has the machinery to do this, and this +machinery that makes subclassing slightly non-standard. + +There are two aspects to the machinery that ndarray uses to support +views and new-from-template in subclasses. + +The first is the use of the ``ndarray.__new__`` method for the main work +of object initialization, rather then the more usual ``__init__`` +method. The second is the use of the ``__array_finalize__`` method to +allow subclasses to clean up after the creation of views and new +instances from templates. + +A brief Python primer on ``__new__`` and ``__init__`` +===================================================== + +``__new__`` is a standard Python method, and, if present, is called +before ``__init__`` when we create a class instance. See the `python +__new__ documentation +<https://docs.python.org/reference/datamodel.html#object.__new__>`_ for more detail. + +For example, consider the following Python code: + +.. testcode:: + + class C: + def __new__(cls, *args): + print('Cls in __new__:', cls) + print('Args in __new__:', args) + # The `object` type __new__ method takes a single argument. + return object.__new__(cls) + + def __init__(self, *args): + print('type(self) in __init__:', type(self)) + print('Args in __init__:', args) + +meaning that we get: + +>>> c = C('hello') +Cls in __new__: <class 'C'> +Args in __new__: ('hello',) +type(self) in __init__: <class 'C'> +Args in __init__: ('hello',) + +When we call ``C('hello')``, the ``__new__`` method gets its own class +as first argument, and the passed argument, which is the string +``'hello'``. After python calls ``__new__``, it usually (see below) +calls our ``__init__`` method, with the output of ``__new__`` as the +first argument (now a class instance), and the passed arguments +following. + +As you can see, the object can be initialized in the ``__new__`` +method or the ``__init__`` method, or both, and in fact ndarray does +not have an ``__init__`` method, because all the initialization is +done in the ``__new__`` method. + +Why use ``__new__`` rather than just the usual ``__init__``? Because +in some cases, as for ndarray, we want to be able to return an object +of some other class. Consider the following: + +.. testcode:: + + class D(C): + def __new__(cls, *args): + print('D cls is:', cls) + print('D args in __new__:', args) + return C.__new__(C, *args) + + def __init__(self, *args): + # we never get here + print('In D __init__') + +meaning that: + +>>> obj = D('hello') +D cls is: <class 'D'> +D args in __new__: ('hello',) +Cls in __new__: <class 'C'> +Args in __new__: ('hello',) +>>> type(obj) +<class 'C'> + +The definition of ``C`` is the same as before, but for ``D``, the +``__new__`` method returns an instance of class ``C`` rather than +``D``. Note that the ``__init__`` method of ``D`` does not get +called. In general, when the ``__new__`` method returns an object of +class other than the class in which it is defined, the ``__init__`` +method of that class is not called. + +This is how subclasses of the ndarray class are able to return views +that preserve the class type. When taking a view, the standard +ndarray machinery creates the new ndarray object with something +like:: + + obj = ndarray.__new__(subtype, shape, ... + +where ``subdtype`` is the subclass. Thus the returned view is of the +same class as the subclass, rather than being of class ``ndarray``. + +That solves the problem of returning views of the same type, but now +we have a new problem. The machinery of ndarray can set the class +this way, in its standard methods for taking views, but the ndarray +``__new__`` method knows nothing of what we have done in our own +``__new__`` method in order to set attributes, and so on. (Aside - +why not call ``obj = subdtype.__new__(...`` then? Because we may not +have a ``__new__`` method with the same call signature). + +The role of ``__array_finalize__`` +================================== + +``__array_finalize__`` is the mechanism that numpy provides to allow +subclasses to handle the various ways that new instances get created. + +Remember that subclass instances can come about in these three ways: + +#. explicit constructor call (``obj = MySubClass(params)``). This will + call the usual sequence of ``MySubClass.__new__`` then (if it exists) + ``MySubClass.__init__``. +#. :ref:`view-casting` +#. :ref:`new-from-template` + +Our ``MySubClass.__new__`` method only gets called in the case of the +explicit constructor call, so we can't rely on ``MySubClass.__new__`` or +``MySubClass.__init__`` to deal with the view casting and +new-from-template. It turns out that ``MySubClass.__array_finalize__`` +*does* get called for all three methods of object creation, so this is +where our object creation housekeeping usually goes. + +* For the explicit constructor call, our subclass will need to create a + new ndarray instance of its own class. In practice this means that + we, the authors of the code, will need to make a call to + ``ndarray.__new__(MySubClass,...)``, a class-hierarchy prepared call to + ``super(MySubClass, cls).__new__(cls, ...)``, or do view casting of an + existing array (see below) +* For view casting and new-from-template, the equivalent of + ``ndarray.__new__(MySubClass,...`` is called, at the C level. + +The arguments that ``__array_finalize__`` receives differ for the three +methods of instance creation above. + +The following code allows us to look at the call sequences and arguments: + +.. testcode:: + + import numpy as np + + class C(np.ndarray): + def __new__(cls, *args, **kwargs): + print('In __new__ with class %s' % cls) + return super(C, cls).__new__(cls, *args, **kwargs) + + def __init__(self, *args, **kwargs): + # in practice you probably will not need or want an __init__ + # method for your subclass + print('In __init__ with class %s' % self.__class__) + + def __array_finalize__(self, obj): + print('In array_finalize:') + print(' self type is %s' % type(self)) + print(' obj type is %s' % type(obj)) + + +Now: + +>>> # Explicit constructor +>>> c = C((10,)) +In __new__ with class <class 'C'> +In array_finalize: + self type is <class 'C'> + obj type is <type 'NoneType'> +In __init__ with class <class 'C'> +>>> # View casting +>>> a = np.arange(10) +>>> cast_a = a.view(C) +In array_finalize: + self type is <class 'C'> + obj type is <type 'numpy.ndarray'> +>>> # Slicing (example of new-from-template) +>>> cv = c[:1] +In array_finalize: + self type is <class 'C'> + obj type is <class 'C'> + +The signature of ``__array_finalize__`` is:: + + def __array_finalize__(self, obj): + +One sees that the ``super`` call, which goes to +``ndarray.__new__``, passes ``__array_finalize__`` the new object, of our +own class (``self``) as well as the object from which the view has been +taken (``obj``). As you can see from the output above, the ``self`` is +always a newly created instance of our subclass, and the type of ``obj`` +differs for the three instance creation methods: + +* When called from the explicit constructor, ``obj`` is ``None`` +* When called from view casting, ``obj`` can be an instance of any + subclass of ndarray, including our own. +* When called in new-from-template, ``obj`` is another instance of our + own subclass, that we might use to update the new ``self`` instance. + +Because ``__array_finalize__`` is the only method that always sees new +instances being created, it is the sensible place to fill in instance +defaults for new object attributes, among other tasks. + +This may be clearer with an example. + +Simple example - adding an extra attribute to ndarray +----------------------------------------------------- + +.. testcode:: + + import numpy as np + + class InfoArray(np.ndarray): + + def __new__(subtype, shape, dtype=float, buffer=None, offset=0, + strides=None, order=None, info=None): + # Create the ndarray instance of our type, given the usual + # ndarray input arguments. This will call the standard + # ndarray constructor, but return an object of our type. + # It also triggers a call to InfoArray.__array_finalize__ + obj = super(InfoArray, subtype).__new__(subtype, shape, dtype, + buffer, offset, strides, + order) + # set the new 'info' attribute to the value passed + obj.info = info + # Finally, we must return the newly created object: + return obj + + def __array_finalize__(self, obj): + # ``self`` is a new object resulting from + # ndarray.__new__(InfoArray, ...), therefore it only has + # attributes that the ndarray.__new__ constructor gave it - + # i.e. those of a standard ndarray. + # + # We could have got to the ndarray.__new__ call in 3 ways: + # From an explicit constructor - e.g. InfoArray(): + # obj is None + # (we're in the middle of the InfoArray.__new__ + # constructor, and self.info will be set when we return to + # InfoArray.__new__) + if obj is None: return + # From view casting - e.g arr.view(InfoArray): + # obj is arr + # (type(obj) can be InfoArray) + # From new-from-template - e.g infoarr[:3] + # type(obj) is InfoArray + # + # Note that it is here, rather than in the __new__ method, + # that we set the default value for 'info', because this + # method sees all creation of default objects - with the + # InfoArray.__new__ constructor, but also with + # arr.view(InfoArray). + self.info = getattr(obj, 'info', None) + # We do not need to return anything + + +Using the object looks like this: + + >>> obj = InfoArray(shape=(3,)) # explicit constructor + >>> type(obj) + <class 'InfoArray'> + >>> obj.info is None + True + >>> obj = InfoArray(shape=(3,), info='information') + >>> obj.info + 'information' + >>> v = obj[1:] # new-from-template - here - slicing + >>> type(v) + <class 'InfoArray'> + >>> v.info + 'information' + >>> arr = np.arange(10) + >>> cast_arr = arr.view(InfoArray) # view casting + >>> type(cast_arr) + <class 'InfoArray'> + >>> cast_arr.info is None + True + +This class isn't very useful, because it has the same constructor as the +bare ndarray object, including passing in buffers and shapes and so on. +We would probably prefer the constructor to be able to take an already +formed ndarray from the usual numpy calls to ``np.array`` and return an +object. + +Slightly more realistic example - attribute added to existing array +------------------------------------------------------------------- + +Here is a class that takes a standard ndarray that already exists, casts +as our type, and adds an extra attribute. + +.. testcode:: + + import numpy as np + + class RealisticInfoArray(np.ndarray): + + def __new__(cls, input_array, info=None): + # Input array is an already formed ndarray instance + # We first cast to be our class type + obj = np.asarray(input_array).view(cls) + # add the new attribute to the created instance + obj.info = info + # Finally, we must return the newly created object: + return obj + + def __array_finalize__(self, obj): + # see InfoArray.__array_finalize__ for comments + if obj is None: return + self.info = getattr(obj, 'info', None) + + +So: + + >>> arr = np.arange(5) + >>> obj = RealisticInfoArray(arr, info='information') + >>> type(obj) + <class 'RealisticInfoArray'> + >>> obj.info + 'information' + >>> v = obj[1:] + >>> type(v) + <class 'RealisticInfoArray'> + >>> v.info + 'information' + +.. _array-ufunc: + +``__array_ufunc__`` for ufuncs +------------------------------ + + .. versionadded:: 1.13 + +A subclass can override what happens when executing numpy ufuncs on it by +overriding the default ``ndarray.__array_ufunc__`` method. This method is +executed *instead* of the ufunc and should return either the result of the +operation, or :obj:`NotImplemented` if the operation requested is not +implemented. + +The signature of ``__array_ufunc__`` is:: + + def __array_ufunc__(ufunc, method, *inputs, **kwargs): + + - *ufunc* is the ufunc object that was called. + - *method* is a string indicating how the Ufunc was called, either + ``"__call__"`` to indicate it was called directly, or one of its + :ref:`methods<ufuncs.methods>`: ``"reduce"``, ``"accumulate"``, + ``"reduceat"``, ``"outer"``, or ``"at"``. + - *inputs* is a tuple of the input arguments to the ``ufunc`` + - *kwargs* contains any optional or keyword arguments passed to the + function. This includes any ``out`` arguments, which are always + contained in a tuple. + +A typical implementation would convert any inputs or outputs that are +instances of one's own class, pass everything on to a superclass using +``super()``, and finally return the results after possible +back-conversion. An example, taken from the test case +``test_ufunc_override_with_super`` in ``core/tests/test_umath.py``, is the +following. + +.. testcode:: + + input numpy as np + + class A(np.ndarray): + def __array_ufunc__(self, ufunc, method, *inputs, out=None, **kwargs): + args = [] + in_no = [] + for i, input_ in enumerate(inputs): + if isinstance(input_, A): + in_no.append(i) + args.append(input_.view(np.ndarray)) + else: + args.append(input_) + + outputs = out + out_no = [] + if outputs: + out_args = [] + for j, output in enumerate(outputs): + if isinstance(output, A): + out_no.append(j) + out_args.append(output.view(np.ndarray)) + else: + out_args.append(output) + kwargs['out'] = tuple(out_args) + else: + outputs = (None,) * ufunc.nout + + info = {} + if in_no: + info['inputs'] = in_no + if out_no: + info['outputs'] = out_no + + results = super(A, self).__array_ufunc__(ufunc, method, + *args, **kwargs) + if results is NotImplemented: + return NotImplemented + + if method == 'at': + if isinstance(inputs[0], A): + inputs[0].info = info + return + + if ufunc.nout == 1: + results = (results,) + + results = tuple((np.asarray(result).view(A) + if output is None else output) + for result, output in zip(results, outputs)) + if results and isinstance(results[0], A): + results[0].info = info + + return results[0] if len(results) == 1 else results + +So, this class does not actually do anything interesting: it just +converts any instances of its own to regular ndarray (otherwise, we'd +get infinite recursion!), and adds an ``info`` dictionary that tells +which inputs and outputs it converted. Hence, e.g., + +>>> a = np.arange(5.).view(A) +>>> b = np.sin(a) +>>> b.info +{'inputs': [0]} +>>> b = np.sin(np.arange(5.), out=(a,)) +>>> b.info +{'outputs': [0]} +>>> a = np.arange(5.).view(A) +>>> b = np.ones(1).view(A) +>>> c = a + b +>>> c.info +{'inputs': [0, 1]} +>>> a += b +>>> a.info +{'inputs': [0, 1], 'outputs': [0]} + +Note that another approach would be to to use ``getattr(ufunc, +methods)(*inputs, **kwargs)`` instead of the ``super`` call. For this example, +the result would be identical, but there is a difference if another operand +also defines ``__array_ufunc__``. E.g., lets assume that we evalulate +``np.add(a, b)``, where ``b`` is an instance of another class ``B`` that has +an override. If you use ``super`` as in the example, +``ndarray.__array_ufunc__`` will notice that ``b`` has an override, which +means it cannot evaluate the result itself. Thus, it will return +`NotImplemented` and so will our class ``A``. Then, control will be passed +over to ``b``, which either knows how to deal with us and produces a result, +or does not and returns `NotImplemented`, raising a ``TypeError``. + +If instead, we replace our ``super`` call with ``getattr(ufunc, method)``, we +effectively do ``np.add(a.view(np.ndarray), b)``. Again, ``B.__array_ufunc__`` +will be called, but now it sees an ``ndarray`` as the other argument. Likely, +it will know how to handle this, and return a new instance of the ``B`` class +to us. Our example class is not set up to handle this, but it might well be +the best approach if, e.g., one were to re-implement ``MaskedArray`` using +``__array_ufunc__``. + +As a final note: if the ``super`` route is suited to a given class, an +advantage of using it is that it helps in constructing class hierarchies. +E.g., suppose that our other class ``B`` also used the ``super`` in its +``__array_ufunc__`` implementation, and we created a class ``C`` that depended +on both, i.e., ``class C(A, B)`` (with, for simplicity, not another +``__array_ufunc__`` override). Then any ufunc on an instance of ``C`` would +pass on to ``A.__array_ufunc__``, the ``super`` call in ``A`` would go to +``B.__array_ufunc__``, and the ``super`` call in ``B`` would go to +``ndarray.__array_ufunc__``, thus allowing ``A`` and ``B`` to collaborate. + +.. _array-wrap: + +``__array_wrap__`` for ufuncs and other functions +------------------------------------------------- + +Prior to numpy 1.13, the behaviour of ufuncs could only be tuned using +``__array_wrap__`` and ``__array_prepare__``. These two allowed one to +change the output type of a ufunc, but, in contrast to +``__array_ufunc__``, did not allow one to make any changes to the inputs. +It is hoped to eventually deprecate these, but ``__array_wrap__`` is also +used by other numpy functions and methods, such as ``squeeze``, so at the +present time is still needed for full functionality. + +Conceptually, ``__array_wrap__`` "wraps up the action" in the sense of +allowing a subclass to set the type of the return value and update +attributes and metadata. Let's show how this works with an example. First +we return to the simpler example subclass, but with a different name and +some print statements: + +.. testcode:: + + import numpy as np + + class MySubClass(np.ndarray): + + def __new__(cls, input_array, info=None): + obj = np.asarray(input_array).view(cls) + obj.info = info + return obj + + def __array_finalize__(self, obj): + print('In __array_finalize__:') + print(' self is %s' % repr(self)) + print(' obj is %s' % repr(obj)) + if obj is None: return + self.info = getattr(obj, 'info', None) + + def __array_wrap__(self, out_arr, context=None): + print('In __array_wrap__:') + print(' self is %s' % repr(self)) + print(' arr is %s' % repr(out_arr)) + # then just call the parent + return super(MySubClass, self).__array_wrap__(self, out_arr, context) + +We run a ufunc on an instance of our new array: + +>>> obj = MySubClass(np.arange(5), info='spam') +In __array_finalize__: + self is MySubClass([0, 1, 2, 3, 4]) + obj is array([0, 1, 2, 3, 4]) +>>> arr2 = np.arange(5)+1 +>>> ret = np.add(arr2, obj) +In __array_wrap__: + self is MySubClass([0, 1, 2, 3, 4]) + arr is array([1, 3, 5, 7, 9]) +In __array_finalize__: + self is MySubClass([1, 3, 5, 7, 9]) + obj is MySubClass([0, 1, 2, 3, 4]) +>>> ret +MySubClass([1, 3, 5, 7, 9]) +>>> ret.info +'spam' + +Note that the ufunc (``np.add``) has called the ``__array_wrap__`` method +with arguments ``self`` as ``obj``, and ``out_arr`` as the (ndarray) result +of the addition. In turn, the default ``__array_wrap__`` +(``ndarray.__array_wrap__``) has cast the result to class ``MySubClass``, +and called ``__array_finalize__`` - hence the copying of the ``info`` +attribute. This has all happened at the C level. + +But, we could do anything we wanted: + +.. testcode:: + + class SillySubClass(np.ndarray): + + def __array_wrap__(self, arr, context=None): + return 'I lost your data' + +>>> arr1 = np.arange(5) +>>> obj = arr1.view(SillySubClass) +>>> arr2 = np.arange(5) +>>> ret = np.multiply(obj, arr2) +>>> ret +'I lost your data' + +So, by defining a specific ``__array_wrap__`` method for our subclass, +we can tweak the output from ufuncs. The ``__array_wrap__`` method +requires ``self``, then an argument - which is the result of the ufunc - +and an optional parameter *context*. This parameter is returned by +ufuncs as a 3-element tuple: (name of the ufunc, arguments of the ufunc, +domain of the ufunc), but is not set by other numpy functions. Though, +as seen above, it is possible to do otherwise, ``__array_wrap__`` should +return an instance of its containing class. See the masked array +subclass for an implementation. + +In addition to ``__array_wrap__``, which is called on the way out of the +ufunc, there is also an ``__array_prepare__`` method which is called on +the way into the ufunc, after the output arrays are created but before any +computation has been performed. The default implementation does nothing +but pass through the array. ``__array_prepare__`` should not attempt to +access the array data or resize the array, it is intended for setting the +output array type, updating attributes and metadata, and performing any +checks based on the input that may be desired before computation begins. +Like ``__array_wrap__``, ``__array_prepare__`` must return an ndarray or +subclass thereof or raise an error. + +Extra gotchas - custom ``__del__`` methods and ndarray.base +----------------------------------------------------------- + +One of the problems that ndarray solves is keeping track of memory +ownership of ndarrays and their views. Consider the case where we have +created an ndarray, ``arr`` and have taken a slice with ``v = arr[1:]``. +The two objects are looking at the same memory. NumPy keeps track of +where the data came from for a particular array or view, with the +``base`` attribute: + +>>> # A normal ndarray, that owns its own data +>>> arr = np.zeros((4,)) +>>> # In this case, base is None +>>> arr.base is None +True +>>> # We take a view +>>> v1 = arr[1:] +>>> # base now points to the array that it derived from +>>> v1.base is arr +True +>>> # Take a view of a view +>>> v2 = v1[1:] +>>> # base points to the original array that it was derived from +>>> v2.base is arr +True + +In general, if the array owns its own memory, as for ``arr`` in this +case, then ``arr.base`` will be None - there are some exceptions to this +- see the numpy book for more details. + +The ``base`` attribute is useful in being able to tell whether we have +a view or the original array. This in turn can be useful if we need +to know whether or not to do some specific cleanup when the subclassed +array is deleted. For example, we may only want to do the cleanup if +the original array is deleted, but not the views. For an example of +how this can work, have a look at the ``memmap`` class in +``numpy.core``. + +Subclassing and Downstream Compatibility +---------------------------------------- + +When sub-classing ``ndarray`` or creating duck-types that mimic the ``ndarray`` +interface, it is your responsibility to decide how aligned your APIs will be +with those of numpy. For convenience, many numpy functions that have a corresponding +``ndarray`` method (e.g., ``sum``, ``mean``, ``take``, ``reshape``) work by checking +if the first argument to a function has a method of the same name. If it exists, the +method is called instead of coercing the arguments to a numpy array. + +For example, if you want your sub-class or duck-type to be compatible with +numpy's ``sum`` function, the method signature for this object's ``sum`` method +should be the following: + +.. testcode:: + + def sum(self, axis=None, dtype=None, out=None, keepdims=False): + ... + +This is the exact same method signature for ``np.sum``, so now if a user calls +``np.sum`` on this object, numpy will call the object's own ``sum`` method and +pass in these arguments enumerated above in the signature, and no errors will +be raised because the signatures are completely compatible with each other. + +If, however, you decide to deviate from this signature and do something like this: + +.. testcode:: + + def sum(self, axis=None, dtype=None): + ... + +This object is no longer compatible with ``np.sum`` because if you call ``np.sum``, +it will pass in unexpected arguments ``out`` and ``keepdims``, causing a TypeError +to be raised. + +If you wish to maintain compatibility with numpy and its subsequent versions (which +might add new keyword arguments) but do not want to surface all of numpy's arguments, +your function's signature should accept ``**kwargs``. For example: + +.. testcode:: + + def sum(self, axis=None, dtype=None, **unused_kwargs): + ... + +This object is now compatible with ``np.sum`` again because any extraneous arguments +(i.e. keywords that are not ``axis`` or ``dtype``) will be hidden away in the +``**unused_kwargs`` parameter. + + diff --git a/doc/source/user/basics.types.rst b/doc/source/user/basics.types.rst index 5ce5af15a..3c39b35d0 100644 --- a/doc/source/user/basics.types.rst +++ b/doc/source/user/basics.types.rst @@ -4,4 +4,339 @@ Data types .. seealso:: :ref:`Data type objects <arrays.dtypes>` -.. automodule:: numpy.doc.basics +Array types and conversions between types +========================================= + +NumPy supports a much greater variety of numerical types than Python does. +This section shows which are available, and how to modify an array's data-type. + +The primitive types supported are tied closely to those in C: + +.. list-table:: + :header-rows: 1 + + * - Numpy type + - C type + - Description + + * - `np.bool_` + - ``bool`` + - Boolean (True or False) stored as a byte + + * - `np.byte` + - ``signed char`` + - Platform-defined + + * - `np.ubyte` + - ``unsigned char`` + - Platform-defined + + * - `np.short` + - ``short`` + - Platform-defined + + * - `np.ushort` + - ``unsigned short`` + - Platform-defined + + * - `np.intc` + - ``int`` + - Platform-defined + + * - `np.uintc` + - ``unsigned int`` + - Platform-defined + + * - `np.int_` + - ``long`` + - Platform-defined + + * - `np.uint` + - ``unsigned long`` + - Platform-defined + + * - `np.longlong` + - ``long long`` + - Platform-defined + + * - `np.ulonglong` + - ``unsigned long long`` + - Platform-defined + + * - `np.half` / `np.float16` + - + - Half precision float: + sign bit, 5 bits exponent, 10 bits mantissa + + * - `np.single` + - ``float`` + - Platform-defined single precision float: + typically sign bit, 8 bits exponent, 23 bits mantissa + + * - `np.double` + - ``double`` + - Platform-defined double precision float: + typically sign bit, 11 bits exponent, 52 bits mantissa. + + * - `np.longdouble` + - ``long double`` + - Platform-defined extended-precision float + + * - `np.csingle` + - ``float complex`` + - Complex number, represented by two single-precision floats (real and imaginary components) + + * - `np.cdouble` + - ``double complex`` + - Complex number, represented by two double-precision floats (real and imaginary components). + + * - `np.clongdouble` + - ``long double complex`` + - Complex number, represented by two extended-precision floats (real and imaginary components). + + +Since many of these have platform-dependent definitions, a set of fixed-size +aliases are provided: + +.. list-table:: + :header-rows: 1 + + * - Numpy type + - C type + - Description + + * - `np.int8` + - ``int8_t`` + - Byte (-128 to 127) + + * - `np.int16` + - ``int16_t`` + - Integer (-32768 to 32767) + + * - `np.int32` + - ``int32_t`` + - Integer (-2147483648 to 2147483647) + + * - `np.int64` + - ``int64_t`` + - Integer (-9223372036854775808 to 9223372036854775807) + + * - `np.uint8` + - ``uint8_t`` + - Unsigned integer (0 to 255) + + * - `np.uint16` + - ``uint16_t`` + - Unsigned integer (0 to 65535) + + * - `np.uint32` + - ``uint32_t`` + - Unsigned integer (0 to 4294967295) + + * - `np.uint64` + - ``uint64_t`` + - Unsigned integer (0 to 18446744073709551615) + + * - `np.intp` + - ``intptr_t`` + - Integer used for indexing, typically the same as ``ssize_t`` + + * - `np.uintp` + - ``uintptr_t`` + - Integer large enough to hold a pointer + + * - `np.float32` + - ``float`` + - + + * - `np.float64` / `np.float_` + - ``double`` + - Note that this matches the precision of the builtin python `float`. + + * - `np.complex64` + - ``float complex`` + - Complex number, represented by two 32-bit floats (real and imaginary components) + + * - `np.complex128` / `np.complex_` + - ``double complex`` + - Note that this matches the precision of the builtin python `complex`. + + +NumPy numerical types are instances of ``dtype`` (data-type) objects, each +having unique characteristics. Once you have imported NumPy using + + :: + + >>> import numpy as np + +the dtypes are available as ``np.bool_``, ``np.float32``, etc. + +Advanced types, not listed in the table above, are explored in +section :ref:`structured_arrays`. + +There are 5 basic numerical types representing booleans (bool), integers (int), +unsigned integers (uint) floating point (float) and complex. Those with numbers +in their name indicate the bitsize of the type (i.e. how many bits are needed +to represent a single value in memory). Some types, such as ``int`` and +``intp``, have differing bitsizes, dependent on the platforms (e.g. 32-bit +vs. 64-bit machines). This should be taken into account when interfacing +with low-level code (such as C or Fortran) where the raw memory is addressed. + +Data-types can be used as functions to convert python numbers to array scalars +(see the array scalar section for an explanation), python sequences of numbers +to arrays of that type, or as arguments to the dtype keyword that many numpy +functions or methods accept. Some examples:: + + >>> import numpy as np + >>> x = np.float32(1.0) + >>> x + 1.0 + >>> y = np.int_([1,2,4]) + >>> y + array([1, 2, 4]) + >>> z = np.arange(3, dtype=np.uint8) + >>> z + array([0, 1, 2], dtype=uint8) + +Array types can also be referred to by character codes, mostly to retain +backward compatibility with older packages such as Numeric. Some +documentation may still refer to these, for example:: + + >>> np.array([1, 2, 3], dtype='f') + array([ 1., 2., 3.], dtype=float32) + +We recommend using dtype objects instead. + +To convert the type of an array, use the .astype() method (preferred) or +the type itself as a function. For example: :: + + >>> z.astype(float) #doctest: +NORMALIZE_WHITESPACE + array([ 0., 1., 2.]) + >>> np.int8(z) + array([0, 1, 2], dtype=int8) + +Note that, above, we use the *Python* float object as a dtype. NumPy knows +that ``int`` refers to ``np.int_``, ``bool`` means ``np.bool_``, +that ``float`` is ``np.float_`` and ``complex`` is ``np.complex_``. +The other data-types do not have Python equivalents. + +To determine the type of an array, look at the dtype attribute:: + + >>> z.dtype + dtype('uint8') + +dtype objects also contain information about the type, such as its bit-width +and its byte-order. The data type can also be used indirectly to query +properties of the type, such as whether it is an integer:: + + >>> d = np.dtype(int) + >>> d + dtype('int32') + + >>> np.issubdtype(d, np.integer) + True + + >>> np.issubdtype(d, np.floating) + False + + +Array Scalars +============= + +NumPy generally returns elements of arrays as array scalars (a scalar +with an associated dtype). Array scalars differ from Python scalars, but +for the most part they can be used interchangeably (the primary +exception is for versions of Python older than v2.x, where integer array +scalars cannot act as indices for lists and tuples). There are some +exceptions, such as when code requires very specific attributes of a scalar +or when it checks specifically whether a value is a Python scalar. Generally, +problems are easily fixed by explicitly converting array scalars +to Python scalars, using the corresponding Python type function +(e.g., ``int``, ``float``, ``complex``, ``str``, ``unicode``). + +The primary advantage of using array scalars is that +they preserve the array type (Python may not have a matching scalar type +available, e.g. ``int16``). Therefore, the use of array scalars ensures +identical behaviour between arrays and scalars, irrespective of whether the +value is inside an array or not. NumPy scalars also have many of the same +methods arrays do. + +Overflow Errors +=============== + +The fixed size of NumPy numeric types may cause overflow errors when a value +requires more memory than available in the data type. For example, +`numpy.power` evaluates ``100 * 10 ** 8`` correctly for 64-bit integers, +but gives 1874919424 (incorrect) for a 32-bit integer. + + >>> np.power(100, 8, dtype=np.int64) + 10000000000000000 + >>> np.power(100, 8, dtype=np.int32) + 1874919424 + +The behaviour of NumPy and Python integer types differs significantly for +integer overflows and may confuse users expecting NumPy integers to behave +similar to Python's ``int``. Unlike NumPy, the size of Python's ``int`` is +flexible. This means Python integers may expand to accommodate any integer and +will not overflow. + +NumPy provides `numpy.iinfo` and `numpy.finfo` to verify the +minimum or maximum values of NumPy integer and floating point values +respectively :: + + >>> np.iinfo(int) # Bounds of the default integer on this system. + iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64) + >>> np.iinfo(np.int32) # Bounds of a 32-bit integer + iinfo(min=-2147483648, max=2147483647, dtype=int32) + >>> np.iinfo(np.int64) # Bounds of a 64-bit integer + iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64) + +If 64-bit integers are still too small the result may be cast to a +floating point number. Floating point numbers offer a larger, but inexact, +range of possible values. + + >>> np.power(100, 100, dtype=np.int64) # Incorrect even with 64-bit int + 0 + >>> np.power(100, 100, dtype=np.float64) + 1e+200 + +Extended Precision +================== + +Python's floating-point numbers are usually 64-bit floating-point numbers, +nearly equivalent to ``np.float64``. In some unusual situations it may be +useful to use floating-point numbers with more precision. Whether this +is possible in numpy depends on the hardware and on the development +environment: specifically, x86 machines provide hardware floating-point +with 80-bit precision, and while most C compilers provide this as their +``long double`` type, MSVC (standard for Windows builds) makes +``long double`` identical to ``double`` (64 bits). NumPy makes the +compiler's ``long double`` available as ``np.longdouble`` (and +``np.clongdouble`` for the complex numbers). You can find out what your +numpy provides with ``np.finfo(np.longdouble)``. + +NumPy does not provide a dtype with more precision than C's +``long double``\\; in particular, the 128-bit IEEE quad precision +data type (FORTRAN's ``REAL*16``\\) is not available. + +For efficient memory alignment, ``np.longdouble`` is usually stored +padded with zero bits, either to 96 or 128 bits. Which is more efficient +depends on hardware and development environment; typically on 32-bit +systems they are padded to 96 bits, while on 64-bit systems they are +typically padded to 128 bits. ``np.longdouble`` is padded to the system +default; ``np.float96`` and ``np.float128`` are provided for users who +want specific padding. In spite of the names, ``np.float96`` and +``np.float128`` provide only as much precision as ``np.longdouble``, +that is, 80 bits on most x86 machines and 64 bits in standard +Windows builds. + +Be warned that even if ``np.longdouble`` offers more precision than +python ``float``, it is easy to lose that extra precision, since +python often forces values to pass through ``float``. For example, +the ``%`` formatting operator requires its arguments to be converted +to standard python types, and it is therefore impossible to preserve +extended precision even if many decimal places are requested. It can +be useful to test your code with the value +``1 + np.finfo(np.longdouble).eps``. + + diff --git a/doc/source/user/misc.rst b/doc/source/user/misc.rst index c10aea486..031ce4efa 100644 --- a/doc/source/user/misc.rst +++ b/doc/source/user/misc.rst @@ -2,4 +2,224 @@ Miscellaneous ************* -.. automodule:: numpy.doc.misc +IEEE 754 Floating Point Special Values +-------------------------------------- + +Special values defined in numpy: nan, inf, + +NaNs can be used as a poor-man's mask (if you don't care what the +original value was) + +Note: cannot use equality to test NaNs. E.g.: :: + + >>> myarr = np.array([1., 0., np.nan, 3.]) + >>> np.nonzero(myarr == np.nan) + (array([], dtype=int64),) + >>> np.nan == np.nan # is always False! Use special numpy functions instead. + False + >>> myarr[myarr == np.nan] = 0. # doesn't work + >>> myarr + array([ 1., 0., NaN, 3.]) + >>> myarr[np.isnan(myarr)] = 0. # use this instead find + >>> myarr + array([ 1., 0., 0., 3.]) + +Other related special value functions: :: + + isinf(): True if value is inf + isfinite(): True if not nan or inf + nan_to_num(): Map nan to 0, inf to max float, -inf to min float + +The following corresponds to the usual functions except that nans are excluded +from the results: :: + + nansum() + nanmax() + nanmin() + nanargmax() + nanargmin() + + >>> x = np.arange(10.) + >>> x[3] = np.nan + >>> x.sum() + nan + >>> np.nansum(x) + 42.0 + +How numpy handles numerical exceptions +-------------------------------------- + +The default is to ``'warn'`` for ``invalid``, ``divide``, and ``overflow`` +and ``'ignore'`` for ``underflow``. But this can be changed, and it can be +set individually for different kinds of exceptions. The different behaviors +are: + + - 'ignore' : Take no action when the exception occurs. + - 'warn' : Print a `RuntimeWarning` (via the Python `warnings` module). + - 'raise' : Raise a `FloatingPointError`. + - 'call' : Call a function specified using the `seterrcall` function. + - 'print' : Print a warning directly to ``stdout``. + - 'log' : Record error in a Log object specified by `seterrcall`. + +These behaviors can be set for all kinds of errors or specific ones: + + - all : apply to all numeric exceptions + - invalid : when NaNs are generated + - divide : divide by zero (for integers as well!) + - overflow : floating point overflows + - underflow : floating point underflows + +Note that integer divide-by-zero is handled by the same machinery. +These behaviors are set on a per-thread basis. + +Examples +-------- + +:: + + >>> oldsettings = np.seterr(all='warn') + >>> np.zeros(5,dtype=np.float32)/0. + invalid value encountered in divide + >>> j = np.seterr(under='ignore') + >>> np.array([1.e-100])**10 + >>> j = np.seterr(invalid='raise') + >>> np.sqrt(np.array([-1.])) + FloatingPointError: invalid value encountered in sqrt + >>> def errorhandler(errstr, errflag): + ... print("saw stupid error!") + >>> np.seterrcall(errorhandler) + <function err_handler at 0x...> + >>> j = np.seterr(all='call') + >>> np.zeros(5, dtype=np.int32)/0 + FloatingPointError: invalid value encountered in divide + saw stupid error! + >>> j = np.seterr(**oldsettings) # restore previous + ... # error-handling settings + +Interfacing to C +---------------- +Only a survey of the choices. Little detail on how each works. + +1) Bare metal, wrap your own C-code manually. + + - Plusses: + + - Efficient + - No dependencies on other tools + + - Minuses: + + - Lots of learning overhead: + + - need to learn basics of Python C API + - need to learn basics of numpy C API + - need to learn how to handle reference counting and love it. + + - Reference counting often difficult to get right. + + - getting it wrong leads to memory leaks, and worse, segfaults + + - API will change for Python 3.0! + +2) Cython + + - Plusses: + + - avoid learning C API's + - no dealing with reference counting + - can code in pseudo python and generate C code + - can also interface to existing C code + - should shield you from changes to Python C api + - has become the de-facto standard within the scientific Python community + - fast indexing support for arrays + + - Minuses: + + - Can write code in non-standard form which may become obsolete + - Not as flexible as manual wrapping + +3) ctypes + + - Plusses: + + - part of Python standard library + - good for interfacing to existing sharable libraries, particularly + Windows DLLs + - avoids API/reference counting issues + - good numpy support: arrays have all these in their ctypes + attribute: :: + + a.ctypes.data a.ctypes.get_strides + a.ctypes.data_as a.ctypes.shape + a.ctypes.get_as_parameter a.ctypes.shape_as + a.ctypes.get_data a.ctypes.strides + a.ctypes.get_shape a.ctypes.strides_as + + - Minuses: + + - can't use for writing code to be turned into C extensions, only a wrapper + tool. + +4) SWIG (automatic wrapper generator) + + - Plusses: + + - around a long time + - multiple scripting language support + - C++ support + - Good for wrapping large (many functions) existing C libraries + + - Minuses: + + - generates lots of code between Python and the C code + - can cause performance problems that are nearly impossible to optimize + out + - interface files can be hard to write + - doesn't necessarily avoid reference counting issues or needing to know + API's + +5) scipy.weave + + - Plusses: + + - can turn many numpy expressions into C code + - dynamic compiling and loading of generated C code + - can embed pure C code in Python module and have weave extract, generate + interfaces and compile, etc. + + - Minuses: + + - Future very uncertain: it's the only part of Scipy not ported to Python 3 + and is effectively deprecated in favor of Cython. + +6) Psyco + + - Plusses: + + - Turns pure python into efficient machine code through jit-like + optimizations + - very fast when it optimizes well + + - Minuses: + + - Only on intel (windows?) + - Doesn't do much for numpy? + +Interfacing to Fortran: +----------------------- +The clear choice to wrap Fortran code is +`f2py <https://docs.scipy.org/doc/numpy/f2py/>`_. + +Pyfort is an older alternative, but not supported any longer. +Fwrap is a newer project that looked promising but isn't being developed any +longer. + +Interfacing to C++: +------------------- + 1) Cython + 2) CXX + 3) Boost.python + 4) SWIG + 5) SIP (used mainly in PyQT) + + diff --git a/doc/source/user/whatisnumpy.rst b/doc/source/user/whatisnumpy.rst index 8478a77c4..154f91c84 100644 --- a/doc/source/user/whatisnumpy.rst +++ b/doc/source/user/whatisnumpy.rst @@ -125,7 +125,7 @@ same shape, or a scalar and an array, or even two arrays of with different shapes, provided that the smaller array is "expandable" to the shape of the larger in such a way that the resulting broadcast is unambiguous. For detailed "rules" of broadcasting see -`numpy.doc.broadcasting`. +`basics.broadcasting`. Who Else Uses NumPy? -------------------- |