diff options
author | Charles Harris <charlesr.harris@gmail.com> | 2020-12-13 14:14:49 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2020-12-13 14:14:49 -0700 |
commit | 3fe2d9d2627fc0f84aeed293ff8afa7c1f08d899 (patch) | |
tree | 2ea27fe06a19c39e8d7a5fe2f87cb7e05363247d /doc/source/user | |
parent | 7d7e446fcbeeff70d905bde2eb0264a797488280 (diff) | |
parent | eff302e5e8678fa17fb3d8156d49eb585b0876d9 (diff) | |
download | numpy-3fe2d9d2627fc0f84aeed293ff8afa7c1f08d899.tar.gz |
Merge branch 'master' into fix-issue-10244
Diffstat (limited to 'doc/source/user')
31 files changed, 4159 insertions, 461 deletions
diff --git a/doc/source/user/absolute_beginners.rst b/doc/source/user/absolute_beginners.rst index 5873eb108..126f5f2a3 100644 --- a/doc/source/user/absolute_beginners.rst +++ b/doc/source/user/absolute_beginners.rst @@ -1090,7 +1090,7 @@ To learn more about finding the unique elements in an array, see `unique`. Transposing and reshaping a matrix ---------------------------------- -*This section covers* ``arr.reshape()``, ``arr.transpose()``, ``arr.T()`` +*This section covers* ``arr.reshape()``, ``arr.transpose()``, ``arr.T`` ----- @@ -1114,7 +1114,7 @@ You simply need to pass in the new dimensions that you want for the matrix. :: .. image:: images/np_reshape.png -You can also use ``.transpose`` to reverse or change the axes of an array +You can also use ``.transpose()`` to reverse or change the axes of an array according to the values you specify. If you start with this array:: @@ -1131,6 +1131,13 @@ You can transpose your array with ``arr.transpose()``. :: [1, 4], [2, 5]]) +You can also use ``arr.T``:: + + >>> arr.T + array([[0, 3], + [1, 4], + [2, 5]]) + To learn more about transposing and reshaping arrays, see `transpose` and `reshape`. @@ -1138,12 +1145,12 @@ To learn more about transposing and reshaping arrays, see `transpose` and How to reverse an array ----------------------- -*This section covers* ``np.flip`` +*This section covers* ``np.flip()`` ----- NumPy's ``np.flip()`` function allows you to flip, or reverse, the contents of -an array along an axis. When using ``np.flip``, specify the array you would like +an array along an axis. When using ``np.flip()``, specify the array you would like to reverse and the axis. If you don't specify the axis, NumPy will reverse the contents along all of the axes of your input array. diff --git a/doc/source/user/basics.broadcasting.rst b/doc/source/user/basics.broadcasting.rst index 00bf17a41..5eae3eb32 100644 --- a/doc/source/user/basics.broadcasting.rst +++ b/doc/source/user/basics.broadcasting.rst @@ -10,4 +10,178 @@ Broadcasting :ref:`array-broadcasting-in-numpy` An introduction to the concepts discussed here -.. automodule:: numpy.doc.broadcasting +.. note:: + See `this article + <https://numpy.org/devdocs/user/theory.broadcasting.html>`_ + for illustrations of broadcasting concepts. + + +The term broadcasting describes how numpy treats arrays with different +shapes during arithmetic operations. Subject to certain constraints, +the smaller array is "broadcast" across the larger array so that they +have compatible shapes. Broadcasting provides a means of vectorizing +array operations so that looping occurs in C instead of Python. It does +this without making needless copies of data and usually leads to +efficient algorithm implementations. There are, however, cases where +broadcasting is a bad idea because it leads to inefficient use of memory +that slows computation. + +NumPy operations are usually done on pairs of arrays on an +element-by-element basis. In the simplest case, the two arrays must +have exactly the same shape, as in the following example: + + >>> a = np.array([1.0, 2.0, 3.0]) + >>> b = np.array([2.0, 2.0, 2.0]) + >>> a * b + array([ 2., 4., 6.]) + +NumPy's broadcasting rule relaxes this constraint when the arrays' +shapes meet certain constraints. The simplest broadcasting example occurs +when an array and a scalar value are combined in an operation: + +>>> a = np.array([1.0, 2.0, 3.0]) +>>> b = 2.0 +>>> a * b +array([ 2., 4., 6.]) + +The result is equivalent to the previous example where ``b`` was an array. +We can think of the scalar ``b`` being *stretched* during the arithmetic +operation into an array with the same shape as ``a``. The new elements in +``b`` are simply copies of the original scalar. The stretching analogy is +only conceptual. NumPy is smart enough to use the original scalar value +without actually making copies so that broadcasting operations are as +memory and computationally efficient as possible. + +The code in the second example is more efficient than that in the first +because broadcasting moves less memory around during the multiplication +(``b`` is a scalar rather than an array). + +General Broadcasting Rules +========================== +When operating on two arrays, NumPy compares their shapes element-wise. +It starts with the trailing (i.e. rightmost) dimensions and works its +way left. Two dimensions are compatible when + +1) they are equal, or +2) one of them is 1 + +If these conditions are not met, a +``ValueError: operands could not be broadcast together`` exception is +thrown, indicating that the arrays have incompatible shapes. The size of +the resulting array is the size that is not 1 along each axis of the inputs. + +Arrays do not need to have the same *number* of dimensions. For example, +if you have a ``256x256x3`` array of RGB values, and you want to scale +each color in the image by a different value, you can multiply the image +by a one-dimensional array with 3 values. Lining up the sizes of the +trailing axes of these arrays according to the broadcast rules, shows that +they are compatible:: + + Image (3d array): 256 x 256 x 3 + Scale (1d array): 3 + Result (3d array): 256 x 256 x 3 + +When either of the dimensions compared is one, the other is +used. In other words, dimensions with size 1 are stretched or "copied" +to match the other. + +In the following example, both the ``A`` and ``B`` arrays have axes with +length one that are expanded to a larger size during the broadcast +operation:: + + A (4d array): 8 x 1 x 6 x 1 + B (3d array): 7 x 1 x 5 + Result (4d array): 8 x 7 x 6 x 5 + +Here are some more examples:: + + A (2d array): 5 x 4 + B (1d array): 1 + Result (2d array): 5 x 4 + + A (2d array): 5 x 4 + B (1d array): 4 + Result (2d array): 5 x 4 + + A (3d array): 15 x 3 x 5 + B (3d array): 15 x 1 x 5 + Result (3d array): 15 x 3 x 5 + + A (3d array): 15 x 3 x 5 + B (2d array): 3 x 5 + Result (3d array): 15 x 3 x 5 + + A (3d array): 15 x 3 x 5 + B (2d array): 3 x 1 + Result (3d array): 15 x 3 x 5 + +Here are examples of shapes that do not broadcast:: + + A (1d array): 3 + B (1d array): 4 # trailing dimensions do not match + + A (2d array): 2 x 1 + B (3d array): 8 x 4 x 3 # second from last dimensions mismatched + +An example of broadcasting in practice:: + + >>> x = np.arange(4) + >>> xx = x.reshape(4,1) + >>> y = np.ones(5) + >>> z = np.ones((3,4)) + + >>> x.shape + (4,) + + >>> y.shape + (5,) + + >>> x + y + ValueError: operands could not be broadcast together with shapes (4,) (5,) + + >>> xx.shape + (4, 1) + + >>> y.shape + (5,) + + >>> (xx + y).shape + (4, 5) + + >>> xx + y + array([[ 1., 1., 1., 1., 1.], + [ 2., 2., 2., 2., 2.], + [ 3., 3., 3., 3., 3.], + [ 4., 4., 4., 4., 4.]]) + + >>> x.shape + (4,) + + >>> z.shape + (3, 4) + + >>> (x + z).shape + (3, 4) + + >>> x + z + array([[ 1., 2., 3., 4.], + [ 1., 2., 3., 4.], + [ 1., 2., 3., 4.]]) + +Broadcasting provides a convenient way of taking the outer product (or +any other outer operation) of two arrays. The following example shows an +outer addition operation of two 1-d arrays:: + + >>> a = np.array([0.0, 10.0, 20.0, 30.0]) + >>> b = np.array([1.0, 2.0, 3.0]) + >>> a[:, np.newaxis] + b + array([[ 1., 2., 3.], + [ 11., 12., 13.], + [ 21., 22., 23.], + [ 31., 32., 33.]]) + +Here the ``newaxis`` index operator inserts a new axis into ``a``, +making it a two-dimensional ``4x1`` array. Combining the ``4x1`` array +with ``b``, which has shape ``(3,)``, yields a ``4x3`` array. + + diff --git a/doc/source/user/basics.byteswapping.rst b/doc/source/user/basics.byteswapping.rst index 4b1008df3..fecdb9ee8 100644 --- a/doc/source/user/basics.byteswapping.rst +++ b/doc/source/user/basics.byteswapping.rst @@ -2,4 +2,152 @@ Byte-swapping ************* -.. automodule:: numpy.doc.byteswapping +Introduction to byte ordering and ndarrays +========================================== + +The ``ndarray`` is an object that provide a python array interface to data +in memory. + +It often happens that the memory that you want to view with an array is +not of the same byte ordering as the computer on which you are running +Python. + +For example, I might be working on a computer with a little-endian CPU - +such as an Intel Pentium, but I have loaded some data from a file +written by a computer that is big-endian. Let's say I have loaded 4 +bytes from a file written by a Sun (big-endian) computer. I know that +these 4 bytes represent two 16-bit integers. On a big-endian machine, a +two-byte integer is stored with the Most Significant Byte (MSB) first, +and then the Least Significant Byte (LSB). Thus the bytes are, in memory order: + +#. MSB integer 1 +#. LSB integer 1 +#. MSB integer 2 +#. LSB integer 2 + +Let's say the two integers were in fact 1 and 770. Because 770 = 256 * +3 + 2, the 4 bytes in memory would contain respectively: 0, 1, 3, 2. +The bytes I have loaded from the file would have these contents: + +>>> big_end_buffer = bytearray([0,1,3,2]) +>>> big_end_buffer +bytearray(b'\\x00\\x01\\x03\\x02') + +We might want to use an ``ndarray`` to access these integers. In that +case, we can create an array around this memory, and tell numpy that +there are two integers, and that they are 16 bit and big-endian: + +>>> import numpy as np +>>> big_end_arr = np.ndarray(shape=(2,),dtype='>i2', buffer=big_end_buffer) +>>> big_end_arr[0] +1 +>>> big_end_arr[1] +770 + +Note the array ``dtype`` above of ``>i2``. The ``>`` means 'big-endian' +(``<`` is little-endian) and ``i2`` means 'signed 2-byte integer'. For +example, if our data represented a single unsigned 4-byte little-endian +integer, the dtype string would be ``<u4``. + +In fact, why don't we try that? + +>>> little_end_u4 = np.ndarray(shape=(1,),dtype='<u4', buffer=big_end_buffer) +>>> little_end_u4[0] == 1 * 256**1 + 3 * 256**2 + 2 * 256**3 +True + +Returning to our ``big_end_arr`` - in this case our underlying data is +big-endian (data endianness) and we've set the dtype to match (the dtype +is also big-endian). However, sometimes you need to flip these around. + +.. warning:: + + Scalars currently do not include byte order information, so extracting + a scalar from an array will return an integer in native byte order. + Hence: + + >>> big_end_arr[0].dtype.byteorder == little_end_u4[0].dtype.byteorder + True + +Changing byte ordering +====================== + +As you can imagine from the introduction, there are two ways you can +affect the relationship between the byte ordering of the array and the +underlying memory it is looking at: + +* Change the byte-ordering information in the array dtype so that it + interprets the underlying data as being in a different byte order. + This is the role of ``arr.newbyteorder()`` +* Change the byte-ordering of the underlying data, leaving the dtype + interpretation as it was. This is what ``arr.byteswap()`` does. + +The common situations in which you need to change byte ordering are: + +#. Your data and dtype endianness don't match, and you want to change + the dtype so that it matches the data. +#. Your data and dtype endianness don't match, and you want to swap the + data so that they match the dtype +#. Your data and dtype endianness match, but you want the data swapped + and the dtype to reflect this + +Data and dtype endianness don't match, change dtype to match data +----------------------------------------------------------------- + +We make something where they don't match: + +>>> wrong_end_dtype_arr = np.ndarray(shape=(2,),dtype='<i2', buffer=big_end_buffer) +>>> wrong_end_dtype_arr[0] +256 + +The obvious fix for this situation is to change the dtype so it gives +the correct endianness: + +>>> fixed_end_dtype_arr = wrong_end_dtype_arr.newbyteorder() +>>> fixed_end_dtype_arr[0] +1 + +Note the array has not changed in memory: + +>>> fixed_end_dtype_arr.tobytes() == big_end_buffer +True + +Data and type endianness don't match, change data to match dtype +---------------------------------------------------------------- + +You might want to do this if you need the data in memory to be a certain +ordering. For example you might be writing the memory out to a file +that needs a certain byte ordering. + +>>> fixed_end_mem_arr = wrong_end_dtype_arr.byteswap() +>>> fixed_end_mem_arr[0] +1 + +Now the array *has* changed in memory: + +>>> fixed_end_mem_arr.tobytes() == big_end_buffer +False + +Data and dtype endianness match, swap data and dtype +---------------------------------------------------- + +You may have a correctly specified array dtype, but you need the array +to have the opposite byte order in memory, and you want the dtype to +match so the array values make sense. In this case you just do both of +the previous operations: + +>>> swapped_end_arr = big_end_arr.byteswap().newbyteorder() +>>> swapped_end_arr[0] +1 +>>> swapped_end_arr.tobytes() == big_end_buffer +False + +An easier way of casting the data to a specific dtype and byte ordering +can be achieved with the ndarray astype method: + +>>> swapped_end_arr = big_end_arr.astype('<i2') +>>> swapped_end_arr[0] +1 +>>> swapped_end_arr.tobytes() == big_end_buffer +False + + diff --git a/doc/source/user/basics.creation.rst b/doc/source/user/basics.creation.rst index b3fa81017..671a8ec59 100644 --- a/doc/source/user/basics.creation.rst +++ b/doc/source/user/basics.creation.rst @@ -6,4 +6,141 @@ Array creation .. seealso:: :ref:`Array creation routines <routines.array-creation>` -.. automodule:: numpy.doc.creation +Introduction +============ + +There are 5 general mechanisms for creating arrays: + +1) Conversion from other Python structures (e.g., lists, tuples) +2) Intrinsic numpy array creation objects (e.g., arange, ones, zeros, + etc.) +3) Reading arrays from disk, either from standard or custom formats +4) Creating arrays from raw bytes through the use of strings or buffers +5) Use of special library functions (e.g., random) + +This section will not cover means of replicating, joining, or otherwise +expanding or mutating existing arrays. Nor will it cover creating object +arrays or structured arrays. Both of those are covered in their own sections. + +Converting Python array_like Objects to NumPy Arrays +==================================================== + +In general, numerical data arranged in an array-like structure in Python can +be converted to arrays through the use of the array() function. The most +obvious examples are lists and tuples. See the documentation for array() for +details for its use. Some objects may support the array-protocol and allow +conversion to arrays this way. A simple way to find out if the object can be +converted to a numpy array using array() is simply to try it interactively and +see if it works! (The Python Way). + +Examples: :: + + >>> x = np.array([2,3,1,0]) + >>> x = np.array([2, 3, 1, 0]) + >>> x = np.array([[1,2.0],[0,0],(1+1j,3.)]) # note mix of tuple and lists, + and types + >>> x = np.array([[ 1.+0.j, 2.+0.j], [ 0.+0.j, 0.+0.j], [ 1.+1.j, 3.+0.j]]) + +Intrinsic NumPy Array Creation +============================== + +NumPy has built-in functions for creating arrays from scratch: + +zeros(shape) will create an array filled with 0 values with the specified +shape. The default dtype is float64. :: + + >>> np.zeros((2, 3)) + array([[ 0., 0., 0.], [ 0., 0., 0.]]) + +ones(shape) will create an array filled with 1 values. It is identical to +zeros in all other respects. + +arange() will create arrays with regularly incrementing values. Check the +docstring for complete information on the various ways it can be used. A few +examples will be given here: :: + + >>> np.arange(10) + array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) + >>> np.arange(2, 10, dtype=float) + array([ 2., 3., 4., 5., 6., 7., 8., 9.]) + >>> np.arange(2, 3, 0.1) + array([ 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9]) + +Note that there are some subtleties regarding the last usage that the user +should be aware of that are described in the arange docstring. + +linspace() will create arrays with a specified number of elements, and +spaced equally between the specified beginning and end values. For +example: :: + + >>> np.linspace(1., 4., 6) + array([ 1. , 1.6, 2.2, 2.8, 3.4, 4. ]) + +The advantage of this creation function is that one can guarantee the +number of elements and the starting and end point, which arange() +generally will not do for arbitrary start, stop, and step values. + +indices() will create a set of arrays (stacked as a one-higher dimensioned +array), one per dimension with each representing variation in that dimension. +An example illustrates much better than a verbal description: :: + + >>> np.indices((3,3)) + array([[[0, 0, 0], [1, 1, 1], [2, 2, 2]], [[0, 1, 2], [0, 1, 2], [0, 1, 2]]]) + +This is particularly useful for evaluating functions of multiple dimensions on +a regular grid. + +Reading Arrays From Disk +======================== + +This is presumably the most common case of large array creation. The details, +of course, depend greatly on the format of data on disk and so this section +can only give general pointers on how to handle various formats. + +Standard Binary Formats +----------------------- + +Various fields have standard formats for array data. The following lists the +ones with known python libraries to read them and return numpy arrays (there +may be others for which it is possible to read and convert to numpy arrays so +check the last section as well) +:: + + HDF5: h5py + FITS: Astropy + +Examples of formats that cannot be read directly but for which it is not hard to +convert are those formats supported by libraries like PIL (able to read and +write many image formats such as jpg, png, etc). + +Common ASCII Formats +------------------------ + +Comma Separated Value files (CSV) are widely used (and an export and import +option for programs like Excel). There are a number of ways of reading these +files in Python. There are CSV functions in Python and functions in pylab +(part of matplotlib). + +More generic ascii files can be read using the io package in scipy. + +Custom Binary Formats +--------------------- + +There are a variety of approaches one can use. If the file has a relatively +simple format then one can write a simple I/O library and use the numpy +fromfile() function and .tofile() method to read and write numpy arrays +directly (mind your byteorder though!) If a good C or C++ library exists that +read the data, one can wrap that library with a variety of techniques though +that certainly is much more work and requires significantly more advanced +knowledge to interface with C or C++. + +Use of Special Libraries +------------------------ + +There are libraries that can be used to generate arrays for special purposes +and it isn't possible to enumerate all of them. The most common uses are use +of the many array generation functions in random that can generate arrays of +random values, and some utility functions to generate special matrices (e.g. +diagonal). + + diff --git a/doc/source/user/basics.dispatch.rst b/doc/source/user/basics.dispatch.rst index f7b8da262..c0e1cf9ba 100644 --- a/doc/source/user/basics.dispatch.rst +++ b/doc/source/user/basics.dispatch.rst @@ -4,5 +4,269 @@ Writing custom array containers ******************************* -.. automodule:: numpy.doc.dispatch +Numpy's dispatch mechanism, introduced in numpy version v1.16 is the +recommended approach for writing custom N-dimensional array containers that are +compatible with the numpy API and provide custom implementations of numpy +functionality. Applications include `dask <http://dask.pydata.org>`_ arrays, an +N-dimensional array distributed across multiple nodes, and `cupy +<https://docs-cupy.chainer.org/en/stable/>`_ arrays, an N-dimensional array on +a GPU. + +To get a feel for writing custom array containers, we'll begin with a simple +example that has rather narrow utility but illustrates the concepts involved. + +>>> import numpy as np +>>> class DiagonalArray: +... def __init__(self, N, value): +... self._N = N +... self._i = value +... def __repr__(self): +... return f"{self.__class__.__name__}(N={self._N}, value={self._i})" +... def __array__(self): +... return self._i * np.eye(self._N) + +Our custom array can be instantiated like: + +>>> arr = DiagonalArray(5, 1) +>>> arr +DiagonalArray(N=5, value=1) + +We can convert to a numpy array using :func:`numpy.array` or +:func:`numpy.asarray`, which will call its ``__array__`` method to obtain a +standard ``numpy.ndarray``. + +>>> np.asarray(arr) +array([[1., 0., 0., 0., 0.], + [0., 1., 0., 0., 0.], + [0., 0., 1., 0., 0.], + [0., 0., 0., 1., 0.], + [0., 0., 0., 0., 1.]]) + +If we operate on ``arr`` with a numpy function, numpy will again use the +``__array__`` interface to convert it to an array and then apply the function +in the usual way. + +>>> np.multiply(arr, 2) +array([[2., 0., 0., 0., 0.], + [0., 2., 0., 0., 0.], + [0., 0., 2., 0., 0.], + [0., 0., 0., 2., 0.], + [0., 0., 0., 0., 2.]]) + + +Notice that the return type is a standard ``numpy.ndarray``. + +>>> type(arr) +numpy.ndarray + +How can we pass our custom array type through this function? Numpy allows a +class to indicate that it would like to handle computations in a custom-defined +way through the interfaces ``__array_ufunc__`` and ``__array_function__``. Let's +take one at a time, starting with ``_array_ufunc__``. This method covers +:ref:`ufuncs`, a class of functions that includes, for example, +:func:`numpy.multiply` and :func:`numpy.sin`. + +The ``__array_ufunc__`` receives: + +- ``ufunc``, a function like ``numpy.multiply`` +- ``method``, a string, differentiating between ``numpy.multiply(...)`` and + variants like ``numpy.multiply.outer``, ``numpy.multiply.accumulate``, and so + on. For the common case, ``numpy.multiply(...)``, ``method == '__call__'``. +- ``inputs``, which could be a mixture of different types +- ``kwargs``, keyword arguments passed to the function + +For this example we will only handle the method ``__call__`` + +>>> from numbers import Number +>>> class DiagonalArray: +... def __init__(self, N, value): +... self._N = N +... self._i = value +... def __repr__(self): +... return f"{self.__class__.__name__}(N={self._N}, value={self._i})" +... def __array__(self): +... return self._i * np.eye(self._N) +... def __array_ufunc__(self, ufunc, method, *inputs, **kwargs): +... if method == '__call__': +... N = None +... scalars = [] +... for input in inputs: +... if isinstance(input, Number): +... scalars.append(input) +... elif isinstance(input, self.__class__): +... scalars.append(input._i) +... if N is not None: +... if N != self._N: +... raise TypeError("inconsistent sizes") +... else: +... N = self._N +... else: +... return NotImplemented +... return self.__class__(N, ufunc(*scalars, **kwargs)) +... else: +... return NotImplemented + +Now our custom array type passes through numpy functions. + +>>> arr = DiagonalArray(5, 1) +>>> np.multiply(arr, 3) +DiagonalArray(N=5, value=3) +>>> np.add(arr, 3) +DiagonalArray(N=5, value=4) +>>> np.sin(arr) +DiagonalArray(N=5, value=0.8414709848078965) + +At this point ``arr + 3`` does not work. + +>>> arr + 3 +TypeError: unsupported operand type(s) for *: 'DiagonalArray' and 'int' + +To support it, we need to define the Python interfaces ``__add__``, ``__lt__``, +and so on to dispatch to the corresponding ufunc. We can achieve this +conveniently by inheriting from the mixin +:class:`~numpy.lib.mixins.NDArrayOperatorsMixin`. + +>>> import numpy.lib.mixins +>>> class DiagonalArray(numpy.lib.mixins.NDArrayOperatorsMixin): +... def __init__(self, N, value): +... self._N = N +... self._i = value +... def __repr__(self): +... return f"{self.__class__.__name__}(N={self._N}, value={self._i})" +... def __array__(self): +... return self._i * np.eye(self._N) +... def __array_ufunc__(self, ufunc, method, *inputs, **kwargs): +... if method == '__call__': +... N = None +... scalars = [] +... for input in inputs: +... if isinstance(input, Number): +... scalars.append(input) +... elif isinstance(input, self.__class__): +... scalars.append(input._i) +... if N is not None: +... if N != self._N: +... raise TypeError("inconsistent sizes") +... else: +... N = self._N +... else: +... return NotImplemented +... return self.__class__(N, ufunc(*scalars, **kwargs)) +... else: +... return NotImplemented + +>>> arr = DiagonalArray(5, 1) +>>> arr + 3 +DiagonalArray(N=5, value=4) +>>> arr > 0 +DiagonalArray(N=5, value=True) + +Now let's tackle ``__array_function__``. We'll create dict that maps numpy +functions to our custom variants. + +>>> HANDLED_FUNCTIONS = {} +>>> class DiagonalArray(numpy.lib.mixins.NDArrayOperatorsMixin): +... def __init__(self, N, value): +... self._N = N +... self._i = value +... def __repr__(self): +... return f"{self.__class__.__name__}(N={self._N}, value={self._i})" +... def __array__(self): +... return self._i * np.eye(self._N) +... def __array_ufunc__(self, ufunc, method, *inputs, **kwargs): +... if method == '__call__': +... N = None +... scalars = [] +... for input in inputs: +... # In this case we accept only scalar numbers or DiagonalArrays. +... if isinstance(input, Number): +... scalars.append(input) +... elif isinstance(input, self.__class__): +... scalars.append(input._i) +... if N is not None: +... if N != self._N: +... raise TypeError("inconsistent sizes") +... else: +... N = self._N +... else: +... return NotImplemented +... return self.__class__(N, ufunc(*scalars, **kwargs)) +... else: +... return NotImplemented +... def __array_function__(self, func, types, args, kwargs): +... if func not in HANDLED_FUNCTIONS: +... return NotImplemented +... # Note: this allows subclasses that don't override +... # __array_function__ to handle DiagonalArray objects. +... if not all(issubclass(t, self.__class__) for t in types): +... return NotImplemented +... return HANDLED_FUNCTIONS[func](*args, **kwargs) +... + +A convenient pattern is to define a decorator ``implements`` that can be used +to add functions to ``HANDLED_FUNCTIONS``. + +>>> def implements(np_function): +... "Register an __array_function__ implementation for DiagonalArray objects." +... def decorator(func): +... HANDLED_FUNCTIONS[np_function] = func +... return func +... return decorator +... + +Now we write implementations of numpy functions for ``DiagonalArray``. +For completeness, to support the usage ``arr.sum()`` add a method ``sum`` that +calls ``numpy.sum(self)``, and the same for ``mean``. + +>>> @implements(np.sum) +... def sum(arr): +... "Implementation of np.sum for DiagonalArray objects" +... return arr._i * arr._N +... +>>> @implements(np.mean) +... def mean(arr): +... "Implementation of np.mean for DiagonalArray objects" +... return arr._i / arr._N +... +>>> arr = DiagonalArray(5, 1) +>>> np.sum(arr) +5 +>>> np.mean(arr) +0.2 + +If the user tries to use any numpy functions not included in +``HANDLED_FUNCTIONS``, a ``TypeError`` will be raised by numpy, indicating that +this operation is not supported. For example, concatenating two +``DiagonalArrays`` does not produce another diagonal array, so it is not +supported. + +>>> np.concatenate([arr, arr]) +TypeError: no implementation found for 'numpy.concatenate' on types that implement __array_function__: [<class '__main__.DiagonalArray'>] + +Additionally, our implementations of ``sum`` and ``mean`` do not accept the +optional arguments that numpy's implementation does. + +>>> np.sum(arr, axis=0) +TypeError: sum() got an unexpected keyword argument 'axis' + +The user always has the option of converting to a normal ``numpy.ndarray`` with +:func:`numpy.asarray` and using standard numpy from there. + +>>> np.concatenate([np.asarray(arr), np.asarray(arr)]) +array([[1., 0., 0., 0., 0.], + [0., 1., 0., 0., 0.], + [0., 0., 1., 0., 0.], + [0., 0., 0., 1., 0.], + [0., 0., 0., 0., 1.], + [1., 0., 0., 0., 0.], + [0., 1., 0., 0., 0.], + [0., 0., 1., 0., 0.], + [0., 0., 0., 1., 0.], + [0., 0., 0., 0., 1.]]) + +Refer to the `dask source code <https://github.com/dask/dask>`_ and +`cupy source code <https://github.com/cupy/cupy>`_ for more fully-worked +examples of custom array containers. + +See also :doc:`NEP 18<neps:nep-0018-array-function-protocol>`. diff --git a/doc/source/user/basics.indexing.rst b/doc/source/user/basics.indexing.rst index 0dca4b884..9545bb78c 100644 --- a/doc/source/user/basics.indexing.rst +++ b/doc/source/user/basics.indexing.rst @@ -10,4 +10,454 @@ Indexing :ref:`Indexing routines <routines.indexing>` -.. automodule:: numpy.doc.indexing +Array indexing refers to any use of the square brackets ([]) to index +array values. There are many options to indexing, which give numpy +indexing great power, but with power comes some complexity and the +potential for confusion. This section is just an overview of the +various options and issues related to indexing. Aside from single +element indexing, the details on most of these options are to be +found in related sections. + +Assignment vs referencing +========================= + +Most of the following examples show the use of indexing when +referencing data in an array. The examples work just as well +when assigning to an array. See the section at the end for +specific examples and explanations on how assignments work. + +Single element indexing +======================= + +Single element indexing for a 1-D array is what one expects. It work +exactly like that for other standard Python sequences. It is 0-based, +and accepts negative indices for indexing from the end of the array. :: + + >>> x = np.arange(10) + >>> x[2] + 2 + >>> x[-2] + 8 + +Unlike lists and tuples, numpy arrays support multidimensional indexing +for multidimensional arrays. That means that it is not necessary to +separate each dimension's index into its own set of square brackets. :: + + >>> x.shape = (2,5) # now x is 2-dimensional + >>> x[1,3] + 8 + >>> x[1,-1] + 9 + +Note that if one indexes a multidimensional array with fewer indices +than dimensions, one gets a subdimensional array. For example: :: + + >>> x[0] + array([0, 1, 2, 3, 4]) + +That is, each index specified selects the array corresponding to the +rest of the dimensions selected. In the above example, choosing 0 +means that the remaining dimension of length 5 is being left unspecified, +and that what is returned is an array of that dimensionality and size. +It must be noted that the returned array is not a copy of the original, +but points to the same values in memory as does the original array. +In this case, the 1-D array at the first position (0) is returned. +So using a single index on the returned array, results in a single +element being returned. That is: :: + + >>> x[0][2] + 2 + +So note that ``x[0,2] = x[0][2]`` though the second case is more +inefficient as a new temporary array is created after the first index +that is subsequently indexed by 2. + +Note to those used to IDL or Fortran memory order as it relates to +indexing. NumPy uses C-order indexing. That means that the last +index usually represents the most rapidly changing memory location, +unlike Fortran or IDL, where the first index represents the most +rapidly changing location in memory. This difference represents a +great potential for confusion. + +Other indexing options +====================== + +It is possible to slice and stride arrays to extract arrays of the +same number of dimensions, but of different sizes than the original. +The slicing and striding works exactly the same way it does for lists +and tuples except that they can be applied to multiple dimensions as +well. A few examples illustrates best: :: + + >>> x = np.arange(10) + >>> x[2:5] + array([2, 3, 4]) + >>> x[:-7] + array([0, 1, 2]) + >>> x[1:7:2] + array([1, 3, 5]) + >>> y = np.arange(35).reshape(5,7) + >>> y[1:5:2,::3] + array([[ 7, 10, 13], + [21, 24, 27]]) + +Note that slices of arrays do not copy the internal array data but +only produce new views of the original data. This is different from +list or tuple slicing and an explicit ``copy()`` is recommended if +the original data is not required anymore. + +It is possible to index arrays with other arrays for the purposes of +selecting lists of values out of arrays into new arrays. There are +two different ways of accomplishing this. One uses one or more arrays +of index values. The other involves giving a boolean array of the proper +shape to indicate the values to be selected. Index arrays are a very +powerful tool that allow one to avoid looping over individual elements in +arrays and thus greatly improve performance. + +It is possible to use special features to effectively increase the +number of dimensions in an array through indexing so the resulting +array acquires the shape needed for use in an expression or with a +specific function. + +Index arrays +============ + +NumPy arrays may be indexed with other arrays (or any other sequence- +like object that can be converted to an array, such as lists, with the +exception of tuples; see the end of this document for why this is). The +use of index arrays ranges from simple, straightforward cases to +complex, hard-to-understand cases. For all cases of index arrays, what +is returned is a copy of the original data, not a view as one gets for +slices. + +Index arrays must be of integer type. Each value in the array indicates +which value in the array to use in place of the index. To illustrate: :: + + >>> x = np.arange(10,1,-1) + >>> x + array([10, 9, 8, 7, 6, 5, 4, 3, 2]) + >>> x[np.array([3, 3, 1, 8])] + array([7, 7, 9, 2]) + + +The index array consisting of the values 3, 3, 1 and 8 correspondingly +create an array of length 4 (same as the index array) where each index +is replaced by the value the index array has in the array being indexed. + +Negative values are permitted and work as they do with single indices +or slices: :: + + >>> x[np.array([3,3,-3,8])] + array([7, 7, 4, 2]) + +It is an error to have index values out of bounds: :: + + >>> x[np.array([3, 3, 20, 8])] + <type 'exceptions.IndexError'>: index 20 out of bounds 0<=index<9 + +Generally speaking, what is returned when index arrays are used is +an array with the same shape as the index array, but with the type +and values of the array being indexed. As an example, we can use a +multidimensional index array instead: :: + + >>> x[np.array([[1,1],[2,3]])] + array([[9, 9], + [8, 7]]) + +Indexing Multi-dimensional arrays +================================= + +Things become more complex when multidimensional arrays are indexed, +particularly with multidimensional index arrays. These tend to be +more unusual uses, but they are permitted, and they are useful for some +problems. We'll start with the simplest multidimensional case (using +the array y from the previous examples): :: + + >>> y[np.array([0,2,4]), np.array([0,1,2])] + array([ 0, 15, 30]) + +In this case, if the index arrays have a matching shape, and there is +an index array for each dimension of the array being indexed, the +resultant array has the same shape as the index arrays, and the values +correspond to the index set for each position in the index arrays. In +this example, the first index value is 0 for both index arrays, and +thus the first value of the resultant array is y[0,0]. The next value +is y[2,1], and the last is y[4,2]. + +If the index arrays do not have the same shape, there is an attempt to +broadcast them to the same shape. If they cannot be broadcast to the +same shape, an exception is raised: :: + + >>> y[np.array([0,2,4]), np.array([0,1])] + <type 'exceptions.ValueError'>: shape mismatch: objects cannot be + broadcast to a single shape + +The broadcasting mechanism permits index arrays to be combined with +scalars for other indices. The effect is that the scalar value is used +for all the corresponding values of the index arrays: :: + + >>> y[np.array([0,2,4]), 1] + array([ 1, 15, 29]) + +Jumping to the next level of complexity, it is possible to only +partially index an array with index arrays. It takes a bit of thought +to understand what happens in such cases. For example if we just use +one index array with y: :: + + >>> y[np.array([0,2,4])] + array([[ 0, 1, 2, 3, 4, 5, 6], + [14, 15, 16, 17, 18, 19, 20], + [28, 29, 30, 31, 32, 33, 34]]) + +What results is the construction of a new array where each value of +the index array selects one row from the array being indexed and the +resultant array has the resulting shape (number of index elements, +size of row). + +An example of where this may be useful is for a color lookup table +where we want to map the values of an image into RGB triples for +display. The lookup table could have a shape (nlookup, 3). Indexing +such an array with an image with shape (ny, nx) with dtype=np.uint8 +(or any integer type so long as values are with the bounds of the +lookup table) will result in an array of shape (ny, nx, 3) where a +triple of RGB values is associated with each pixel location. + +In general, the shape of the resultant array will be the concatenation +of the shape of the index array (or the shape that all the index arrays +were broadcast to) with the shape of any unused dimensions (those not +indexed) in the array being indexed. + +Boolean or "mask" index arrays +============================== + +Boolean arrays used as indices are treated in a different manner +entirely than index arrays. Boolean arrays must be of the same shape +as the initial dimensions of the array being indexed. In the +most straightforward case, the boolean array has the same shape: :: + + >>> b = y>20 + >>> y[b] + array([21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34]) + +Unlike in the case of integer index arrays, in the boolean case, the +result is a 1-D array containing all the elements in the indexed array +corresponding to all the true elements in the boolean array. The +elements in the indexed array are always iterated and returned in +:term:`row-major` (C-style) order. The result is also identical to +``y[np.nonzero(b)]``. As with index arrays, what is returned is a copy +of the data, not a view as one gets with slices. + +The result will be multidimensional if y has more dimensions than b. +For example: :: + + >>> b[:,5] # use a 1-D boolean whose first dim agrees with the first dim of y + array([False, False, False, True, True]) + >>> y[b[:,5]] + array([[21, 22, 23, 24, 25, 26, 27], + [28, 29, 30, 31, 32, 33, 34]]) + +Here the 4th and 5th rows are selected from the indexed array and +combined to make a 2-D array. + +In general, when the boolean array has fewer dimensions than the array +being indexed, this is equivalent to y[b, ...], which means +y is indexed by b followed by as many : as are needed to fill +out the rank of y. +Thus the shape of the result is one dimension containing the number +of True elements of the boolean array, followed by the remaining +dimensions of the array being indexed. + +For example, using a 2-D boolean array of shape (2,3) +with four True elements to select rows from a 3-D array of shape +(2,3,5) results in a 2-D result of shape (4,5): :: + + >>> x = np.arange(30).reshape(2,3,5) + >>> x + array([[[ 0, 1, 2, 3, 4], + [ 5, 6, 7, 8, 9], + [10, 11, 12, 13, 14]], + [[15, 16, 17, 18, 19], + [20, 21, 22, 23, 24], + [25, 26, 27, 28, 29]]]) + >>> b = np.array([[True, True, False], [False, True, True]]) + >>> x[b] + array([[ 0, 1, 2, 3, 4], + [ 5, 6, 7, 8, 9], + [20, 21, 22, 23, 24], + [25, 26, 27, 28, 29]]) + +For further details, consult the numpy reference documentation on array indexing. + +Combining index arrays with slices +================================== + +Index arrays may be combined with slices. For example: :: + + >>> y[np.array([0, 2, 4]), 1:3] + array([[ 1, 2], + [15, 16], + [29, 30]]) + +In effect, the slice and index array operation are independent. +The slice operation extracts columns with index 1 and 2, +(i.e. the 2nd and 3rd columns), +followed by the index array operation which extracts rows with +index 0, 2 and 4 (i.e the first, third and fifth rows). + +This is equivalent to:: + + >>> y[:, 1:3][np.array([0, 2, 4]), :] + array([[ 1, 2], + [15, 16], + [29, 30]]) + +Likewise, slicing can be combined with broadcasted boolean indices: :: + + >>> b = y > 20 + >>> b + array([[False, False, False, False, False, False, False], + [False, False, False, False, False, False, False], + [False, False, False, False, False, False, False], + [ True, True, True, True, True, True, True], + [ True, True, True, True, True, True, True]]) + >>> y[b[:,5],1:3] + array([[22, 23], + [29, 30]]) + +Structural indexing tools +========================= + +To facilitate easy matching of array shapes with expressions and in +assignments, the np.newaxis object can be used within array indices +to add new dimensions with a size of 1. For example: :: + + >>> y.shape + (5, 7) + >>> y[:,np.newaxis,:].shape + (5, 1, 7) + +Note that there are no new elements in the array, just that the +dimensionality is increased. This can be handy to combine two +arrays in a way that otherwise would require explicitly reshaping +operations. For example: :: + + >>> x = np.arange(5) + >>> x[:,np.newaxis] + x[np.newaxis,:] + array([[0, 1, 2, 3, 4], + [1, 2, 3, 4, 5], + [2, 3, 4, 5, 6], + [3, 4, 5, 6, 7], + [4, 5, 6, 7, 8]]) + +The ellipsis syntax maybe used to indicate selecting in full any +remaining unspecified dimensions. For example: :: + + >>> z = np.arange(81).reshape(3,3,3,3) + >>> z[1,...,2] + array([[29, 32, 35], + [38, 41, 44], + [47, 50, 53]]) + +This is equivalent to: :: + + >>> z[1,:,:,2] + array([[29, 32, 35], + [38, 41, 44], + [47, 50, 53]]) + +Assigning values to indexed arrays +================================== + +As mentioned, one can select a subset of an array to assign to using +a single index, slices, and index and mask arrays. The value being +assigned to the indexed array must be shape consistent (the same shape +or broadcastable to the shape the index produces). For example, it is +permitted to assign a constant to a slice: :: + + >>> x = np.arange(10) + >>> x[2:7] = 1 + +or an array of the right size: :: + + >>> x[2:7] = np.arange(5) + +Note that assignments may result in changes if assigning +higher types to lower types (like floats to ints) or even +exceptions (assigning complex to floats or ints): :: + + >>> x[1] = 1.2 + >>> x[1] + 1 + >>> x[1] = 1.2j + TypeError: can't convert complex to int + + +Unlike some of the references (such as array and mask indices) +assignments are always made to the original data in the array +(indeed, nothing else would make sense!). Note though, that some +actions may not work as one may naively expect. This particular +example is often surprising to people: :: + + >>> x = np.arange(0, 50, 10) + >>> x + array([ 0, 10, 20, 30, 40]) + >>> x[np.array([1, 1, 3, 1])] += 1 + >>> x + array([ 0, 11, 20, 31, 40]) + +Where people expect that the 1st location will be incremented by 3. +In fact, it will only be incremented by 1. The reason is because +a new array is extracted from the original (as a temporary) containing +the values at 1, 1, 3, 1, then the value 1 is added to the temporary, +and then the temporary is assigned back to the original array. Thus +the value of the array at x[1]+1 is assigned to x[1] three times, +rather than being incremented 3 times. + +Dealing with variable numbers of indices within programs +======================================================== + +The index syntax is very powerful but limiting when dealing with +a variable number of indices. For example, if you want to write +a function that can handle arguments with various numbers of +dimensions without having to write special case code for each +number of possible dimensions, how can that be done? If one +supplies to the index a tuple, the tuple will be interpreted +as a list of indices. For example (using the previous definition +for the array z): :: + + >>> indices = (1,1,1,1) + >>> z[indices] + 40 + +So one can use code to construct tuples of any number of indices +and then use these within an index. + +Slices can be specified within programs by using the slice() function +in Python. For example: :: + + >>> indices = (1,1,1,slice(0,2)) # same as [1,1,1,0:2] + >>> z[indices] + array([39, 40]) + +Likewise, ellipsis can be specified by code by using the Ellipsis +object: :: + + >>> indices = (1, Ellipsis, 1) # same as [1,...,1] + >>> z[indices] + array([[28, 31, 34], + [37, 40, 43], + [46, 49, 52]]) + +For this reason it is possible to use the output from the np.nonzero() +function directly as an index since it always returns a tuple of index +arrays. + +Because the special treatment of tuples, they are not automatically +converted to an array as a list would be. As an example: :: + + >>> z[[1,1,1,1]] # produces a large array + array([[[[27, 28, 29], + [30, 31, 32], ... + >>> z[(1,1,1,1)] # returns a single value + 40 + + diff --git a/doc/source/user/basics.io.genfromtxt.rst b/doc/source/user/basics.io.genfromtxt.rst index 3fce6a8aa..5364acbe9 100644 --- a/doc/source/user/basics.io.genfromtxt.rst +++ b/doc/source/user/basics.io.genfromtxt.rst @@ -28,7 +28,7 @@ Defining the input The only mandatory argument of :func:`~numpy.genfromtxt` is the source of the data. It can be a string, a list of strings, a generator or an open -file-like object with a :meth:`read` method, for example, a file or +file-like object with a ``read`` method, for example, a file or :class:`io.StringIO` object. If a single string is provided, it is assumed to be the name of a local or remote file. If a list of strings or a generator returning strings is provided, each string is treated as one line in a file. @@ -36,10 +36,10 @@ When the URL of a remote file is passed, the file is automatically downloaded to the current directory and opened. Recognized file types are text files and archives. Currently, the function -recognizes :class:`gzip` and :class:`bz2` (`bzip2`) archives. The type of +recognizes ``gzip`` and ``bz2`` (``bzip2``) archives. The type of the archive is determined from the extension of the file: if the filename -ends with ``'.gz'``, a :class:`gzip` archive is expected; if it ends with -``'bz2'``, a :class:`bzip2` archive is assumed. +ends with ``'.gz'``, a ``gzip`` archive is expected; if it ends with +``'bz2'``, a ``bzip2`` archive is assumed. @@ -360,9 +360,9 @@ The ``converters`` argument Usually, defining a dtype is sufficient to define how the sequence of strings must be converted. However, some additional control may sometimes be required. For example, we may want to make sure that a date in a format -``YYYY/MM/DD`` is converted to a :class:`datetime` object, or that a string -like ``xx%`` is properly converted to a float between 0 and 1. In such -cases, we should define conversion functions with the ``converters`` +``YYYY/MM/DD`` is converted to a :class:`~datetime.datetime` object, or that +a string like ``xx%`` is properly converted to a float between 0 and 1. In +such cases, we should define conversion functions with the ``converters`` arguments. The value of this argument is typically a dictionary with column indices or @@ -427,7 +427,7 @@ previous example, we used a converter to transform an empty string into a float. However, user-defined converters may rapidly become cumbersome to manage. -The :func:`~nummpy.genfromtxt` function provides two other complementary +The :func:`~numpy.genfromtxt` function provides two other complementary mechanisms: the ``missing_values`` argument is used to recognize missing data and a second argument, ``filling_values``, is used to process these missing data. @@ -514,15 +514,15 @@ output array will then be a :class:`~numpy.ma.MaskedArray`. Shortcut functions ================== -In addition to :func:`~numpy.genfromtxt`, the :mod:`numpy.lib.io` module +In addition to :func:`~numpy.genfromtxt`, the :mod:`numpy.lib.npyio` module provides several convenience functions derived from :func:`~numpy.genfromtxt`. These functions work the same way as the original, but they have different default values. -:func:`~numpy.recfromtxt` +:func:`~numpy.npyio.recfromtxt` Returns a standard :class:`numpy.recarray` (if ``usemask=False``) or a - :class:`~numpy.ma.MaskedRecords` array (if ``usemaske=True``). The + :class:`~numpy.ma.mrecords.MaskedRecords` array (if ``usemaske=True``). The default dtype is ``dtype=None``, meaning that the types of each column will be automatically determined. -:func:`~numpy.recfromcsv` - Like :func:`~numpy.recfromtxt`, but with a default ``delimiter=","``. +:func:`~numpy.npyio.recfromcsv` + Like :func:`~numpy.npyio.recfromtxt`, but with a default ``delimiter=","``. diff --git a/doc/source/user/basics.rec.rst b/doc/source/user/basics.rec.rst index b885c9e77..0524fde8e 100644 --- a/doc/source/user/basics.rec.rst +++ b/doc/source/user/basics.rec.rst @@ -4,10 +4,652 @@ Structured arrays ***************** -.. automodule:: numpy.doc.structured_arrays +Introduction +============ + +Structured arrays are ndarrays whose datatype is a composition of simpler +datatypes organized as a sequence of named :term:`fields <field>`. For example, +:: + + >>> x = np.array([('Rex', 9, 81.0), ('Fido', 3, 27.0)], + ... dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')]) + >>> x + array([('Rex', 9, 81.), ('Fido', 3, 27.)], + dtype=[('name', 'U10'), ('age', '<i4'), ('weight', '<f4')]) + +Here ``x`` is a one-dimensional array of length two whose datatype is a +structure with three fields: 1. A string of length 10 or less named 'name', 2. +a 32-bit integer named 'age', and 3. a 32-bit float named 'weight'. + +If you index ``x`` at position 1 you get a structure:: + + >>> x[1] + ('Fido', 3, 27.0) + +You can access and modify individual fields of a structured array by indexing +with the field name:: + + >>> x['age'] + array([9, 3], dtype=int32) + >>> x['age'] = 5 + >>> x + array([('Rex', 5, 81.), ('Fido', 5, 27.)], + dtype=[('name', 'U10'), ('age', '<i4'), ('weight', '<f4')]) + +Structured datatypes are designed to be able to mimic 'structs' in the C +language, and share a similar memory layout. They are meant for interfacing with +C code and for low-level manipulation of structured buffers, for example for +interpreting binary blobs. For these purposes they support specialized features +such as subarrays, nested datatypes, and unions, and allow control over the +memory layout of the structure. + +Users looking to manipulate tabular data, such as stored in csv files, may find +other pydata projects more suitable, such as xarray, pandas, or DataArray. +These provide a high-level interface for tabular data analysis and are better +optimized for that use. For instance, the C-struct-like memory layout of +structured arrays in numpy can lead to poor cache behavior in comparison. + +.. _defining-structured-types: + +Structured Datatypes +==================== + +A structured datatype can be thought of as a sequence of bytes of a certain +length (the structure's :term:`itemsize`) which is interpreted as a collection +of fields. Each field has a name, a datatype, and a byte offset within the +structure. The datatype of a field may be any numpy datatype including other +structured datatypes, and it may also be a :term:`subarray data type` which +behaves like an ndarray of a specified shape. The offsets of the fields are +arbitrary, and fields may even overlap. These offsets are usually determined +automatically by numpy, but can also be specified. + +Structured Datatype Creation +---------------------------- + +Structured datatypes may be created using the function :func:`numpy.dtype`. +There are 4 alternative forms of specification which vary in flexibility and +conciseness. These are further documented in the +:ref:`Data Type Objects <arrays.dtypes.constructing>` reference page, and in +summary they are: + +1. A list of tuples, one tuple per field + + Each tuple has the form ``(fieldname, datatype, shape)`` where shape is + optional. ``fieldname`` is a string (or tuple if titles are used, see + :ref:`Field Titles <titles>` below), ``datatype`` may be any object + convertible to a datatype, and ``shape`` is a tuple of integers specifying + subarray shape. + + >>> np.dtype([('x', 'f4'), ('y', np.float32), ('z', 'f4', (2, 2))]) + dtype([('x', '<f4'), ('y', '<f4'), ('z', '<f4', (2, 2))]) + + If ``fieldname`` is the empty string ``''``, the field will be given a + default name of the form ``f#``, where ``#`` is the integer index of the + field, counting from 0 from the left:: + + >>> np.dtype([('x', 'f4'), ('', 'i4'), ('z', 'i8')]) + dtype([('x', '<f4'), ('f1', '<i4'), ('z', '<i8')]) + + The byte offsets of the fields within the structure and the total + structure itemsize are determined automatically. + +2. A string of comma-separated dtype specifications + + In this shorthand notation any of the :ref:`string dtype specifications + <arrays.dtypes.constructing>` may be used in a string and separated by + commas. The itemsize and byte offsets of the fields are determined + automatically, and the field names are given the default names ``f0``, + ``f1``, etc. :: + + >>> np.dtype('i8, f4, S3') + dtype([('f0', '<i8'), ('f1', '<f4'), ('f2', 'S3')]) + >>> np.dtype('3int8, float32, (2, 3)float64') + dtype([('f0', 'i1', (3,)), ('f1', '<f4'), ('f2', '<f8', (2, 3))]) + +3. A dictionary of field parameter arrays + + This is the most flexible form of specification since it allows control + over the byte-offsets of the fields and the itemsize of the structure. + + The dictionary has two required keys, 'names' and 'formats', and four + optional keys, 'offsets', 'itemsize', 'aligned' and 'titles'. The values + for 'names' and 'formats' should respectively be a list of field names and + a list of dtype specifications, of the same length. The optional 'offsets' + value should be a list of integer byte-offsets, one for each field within + the structure. If 'offsets' is not given the offsets are determined + automatically. The optional 'itemsize' value should be an integer + describing the total size in bytes of the dtype, which must be large + enough to contain all the fields. + :: + + >>> np.dtype({'names': ['col1', 'col2'], 'formats': ['i4', 'f4']}) + dtype([('col1', '<i4'), ('col2', '<f4')]) + >>> np.dtype({'names': ['col1', 'col2'], + ... 'formats': ['i4', 'f4'], + ... 'offsets': [0, 4], + ... 'itemsize': 12}) + dtype({'names':['col1','col2'], 'formats':['<i4','<f4'], 'offsets':[0,4], 'itemsize':12}) + + Offsets may be chosen such that the fields overlap, though this will mean + that assigning to one field may clobber any overlapping field's data. As + an exception, fields of :class:`numpy.object_` type cannot overlap with + other fields, because of the risk of clobbering the internal object + pointer and then dereferencing it. + + The optional 'aligned' value can be set to ``True`` to make the automatic + offset computation use aligned offsets (see :ref:`offsets-and-alignment`), + as if the 'align' keyword argument of :func:`numpy.dtype` had been set to + True. + + The optional 'titles' value should be a list of titles of the same length + as 'names', see :ref:`Field Titles <titles>` below. + +4. A dictionary of field names + + The use of this form of specification is discouraged, but documented here + because older numpy code may use it. The keys of the dictionary are the + field names and the values are tuples specifying type and offset:: + + >>> np.dtype({'col1': ('i1', 0), 'col2': ('f4', 1)}) + dtype([('col1', 'i1'), ('col2', '<f4')]) + + This form is discouraged because Python dictionaries do not preserve order + in Python versions before Python 3.6, and the order of the fields in a + structured dtype has meaning. :ref:`Field Titles <titles>` may be + specified by using a 3-tuple, see below. + +Manipulating and Displaying Structured Datatypes +------------------------------------------------ + +The list of field names of a structured datatype can be found in the ``names`` +attribute of the dtype object:: + + >>> d = np.dtype([('x', 'i8'), ('y', 'f4')]) + >>> d.names + ('x', 'y') + +The field names may be modified by assigning to the ``names`` attribute using a +sequence of strings of the same length. + +The dtype object also has a dictionary-like attribute, ``fields``, whose keys +are the field names (and :ref:`Field Titles <titles>`, see below) and whose +values are tuples containing the dtype and byte offset of each field. :: + + >>> d.fields + mappingproxy({'x': (dtype('int64'), 0), 'y': (dtype('float32'), 8)}) + +Both the ``names`` and ``fields`` attributes will equal ``None`` for +unstructured arrays. The recommended way to test if a dtype is structured is +with `if dt.names is not None` rather than `if dt.names`, to account for dtypes +with 0 fields. + +The string representation of a structured datatype is shown in the "list of +tuples" form if possible, otherwise numpy falls back to using the more general +dictionary form. + +.. _offsets-and-alignment: + +Automatic Byte Offsets and Alignment +------------------------------------ + +Numpy uses one of two methods to automatically determine the field byte offsets +and the overall itemsize of a structured datatype, depending on whether +``align=True`` was specified as a keyword argument to :func:`numpy.dtype`. + +By default (``align=False``), numpy will pack the fields together such that +each field starts at the byte offset the previous field ended, and the fields +are contiguous in memory. :: + + >>> def print_offsets(d): + ... print("offsets:", [d.fields[name][1] for name in d.names]) + ... print("itemsize:", d.itemsize) + >>> print_offsets(np.dtype('u1, u1, i4, u1, i8, u2')) + offsets: [0, 1, 2, 6, 7, 15] + itemsize: 17 + +If ``align=True`` is set, numpy will pad the structure in the same way many C +compilers would pad a C-struct. Aligned structures can give a performance +improvement in some cases, at the cost of increased datatype size. Padding +bytes are inserted between fields such that each field's byte offset will be a +multiple of that field's alignment, which is usually equal to the field's size +in bytes for simple datatypes, see :c:member:`PyArray_Descr.alignment`. The +structure will also have trailing padding added so that its itemsize is a +multiple of the largest field's alignment. :: + + >>> print_offsets(np.dtype('u1, u1, i4, u1, i8, u2', align=True)) + offsets: [0, 1, 4, 8, 16, 24] + itemsize: 32 + +Note that although almost all modern C compilers pad in this way by default, +padding in C structs is C-implementation-dependent so this memory layout is not +guaranteed to exactly match that of a corresponding struct in a C program. Some +work may be needed, either on the numpy side or the C side, to obtain exact +correspondence. + +If offsets were specified using the optional ``offsets`` key in the +dictionary-based dtype specification, setting ``align=True`` will check that +each field's offset is a multiple of its size and that the itemsize is a +multiple of the largest field size, and raise an exception if not. + +If the offsets of the fields and itemsize of a structured array satisfy the +alignment conditions, the array will have the ``ALIGNED`` :attr:`flag +<numpy.ndarray.flags>` set. + +A convenience function :func:`numpy.lib.recfunctions.repack_fields` converts an +aligned dtype or array to a packed one and vice versa. It takes either a dtype +or structured ndarray as an argument, and returns a copy with fields re-packed, +with or without padding bytes. + +.. _titles: + +Field Titles +------------ + +In addition to field names, fields may also have an associated :term:`title`, +an alternate name, which is sometimes used as an additional description or +alias for the field. The title may be used to index an array, just like a +field name. + +To add titles when using the list-of-tuples form of dtype specification, the +field name may be specified as a tuple of two strings instead of a single +string, which will be the field's title and field name respectively. For +example:: + + >>> np.dtype([(('my title', 'name'), 'f4')]) + dtype([(('my title', 'name'), '<f4')]) + +When using the first form of dictionary-based specification, the titles may be +supplied as an extra ``'titles'`` key as described above. When using the second +(discouraged) dictionary-based specification, the title can be supplied by +providing a 3-element tuple ``(datatype, offset, title)`` instead of the usual +2-element tuple:: + + >>> np.dtype({'name': ('i4', 0, 'my title')}) + dtype([(('my title', 'name'), '<i4')]) + +The ``dtype.fields`` dictionary will contain titles as keys, if any +titles are used. This means effectively that a field with a title will be +represented twice in the fields dictionary. The tuple values for these fields +will also have a third element, the field title. Because of this, and because +the ``names`` attribute preserves the field order while the ``fields`` +attribute may not, it is recommended to iterate through the fields of a dtype +using the ``names`` attribute of the dtype, which will not list titles, as +in:: + + >>> for name in d.names: + ... print(d.fields[name][:2]) + (dtype('int64'), 0) + (dtype('float32'), 8) + +Union types +----------- + +Structured datatypes are implemented in numpy to have base type +:class:`numpy.void` by default, but it is possible to interpret other numpy +types as structured types using the ``(base_dtype, dtype)`` form of dtype +specification described in +:ref:`Data Type Objects <arrays.dtypes.constructing>`. Here, ``base_dtype`` is +the desired underlying dtype, and fields and flags will be copied from +``dtype``. This dtype is similar to a 'union' in C. + +Indexing and Assignment to Structured arrays +============================================ + +Assigning data to a Structured Array +------------------------------------ + +There are a number of ways to assign values to a structured array: Using python +tuples, using scalar values, or using other structured arrays. + +Assignment from Python Native Types (Tuples) +```````````````````````````````````````````` + +The simplest way to assign values to a structured array is using python tuples. +Each assigned value should be a tuple of length equal to the number of fields +in the array, and not a list or array as these will trigger numpy's +broadcasting rules. The tuple's elements are assigned to the successive fields +of the array, from left to right:: + + >>> x = np.array([(1, 2, 3), (4, 5, 6)], dtype='i8, f4, f8') + >>> x[1] = (7, 8, 9) + >>> x + array([(1, 2., 3.), (7, 8., 9.)], + dtype=[('f0', '<i8'), ('f1', '<f4'), ('f2', '<f8')]) + +Assignment from Scalars +``````````````````````` + +A scalar assigned to a structured element will be assigned to all fields. This +happens when a scalar is assigned to a structured array, or when an +unstructured array is assigned to a structured array:: + + >>> x = np.zeros(2, dtype='i8, f4, ?, S1') + >>> x[:] = 3 + >>> x + array([(3, 3., True, b'3'), (3, 3., True, b'3')], + dtype=[('f0', '<i8'), ('f1', '<f4'), ('f2', '?'), ('f3', 'S1')]) + >>> x[:] = np.arange(2) + >>> x + array([(0, 0., False, b'0'), (1, 1., True, b'1')], + dtype=[('f0', '<i8'), ('f1', '<f4'), ('f2', '?'), ('f3', 'S1')]) + +Structured arrays can also be assigned to unstructured arrays, but only if the +structured datatype has just a single field:: + + >>> twofield = np.zeros(2, dtype=[('A', 'i4'), ('B', 'i4')]) + >>> onefield = np.zeros(2, dtype=[('A', 'i4')]) + >>> nostruct = np.zeros(2, dtype='i4') + >>> nostruct[:] = twofield + Traceback (most recent call last): + ... + TypeError: Cannot cast array data from dtype([('A', '<i4'), ('B', '<i4')]) to dtype('int32') according to the rule 'unsafe' + +Assignment from other Structured Arrays +``````````````````````````````````````` + +Assignment between two structured arrays occurs as if the source elements had +been converted to tuples and then assigned to the destination elements. That +is, the first field of the source array is assigned to the first field of the +destination array, and the second field likewise, and so on, regardless of +field names. Structured arrays with a different number of fields cannot be +assigned to each other. Bytes of the destination structure which are not +included in any of the fields are unaffected. :: + + >>> a = np.zeros(3, dtype=[('a', 'i8'), ('b', 'f4'), ('c', 'S3')]) + >>> b = np.ones(3, dtype=[('x', 'f4'), ('y', 'S3'), ('z', 'O')]) + >>> b[:] = a + >>> b + array([(0., b'0.0', b''), (0., b'0.0', b''), (0., b'0.0', b'')], + dtype=[('x', '<f4'), ('y', 'S3'), ('z', 'O')]) + + +Assignment involving subarrays +`````````````````````````````` + +When assigning to fields which are subarrays, the assigned value will first be +broadcast to the shape of the subarray. + +Indexing Structured Arrays +-------------------------- + +Accessing Individual Fields +``````````````````````````` + +Individual fields of a structured array may be accessed and modified by indexing +the array with the field name. :: + + >>> x = np.array([(1, 2), (3, 4)], dtype=[('foo', 'i8'), ('bar', 'f4')]) + >>> x['foo'] + array([1, 3]) + >>> x['foo'] = 10 + >>> x + array([(10, 2.), (10, 4.)], + dtype=[('foo', '<i8'), ('bar', '<f4')]) + +The resulting array is a view into the original array. It shares the same +memory locations and writing to the view will modify the original array. :: + + >>> y = x['bar'] + >>> y[:] = 11 + >>> x + array([(10, 11.), (10, 11.)], + dtype=[('foo', '<i8'), ('bar', '<f4')]) + +This view has the same dtype and itemsize as the indexed field, so it is +typically a non-structured array, except in the case of nested structures. + + >>> y.dtype, y.shape, y.strides + (dtype('float32'), (2,), (12,)) + +If the accessed field is a subarray, the dimensions of the subarray +are appended to the shape of the result:: + + >>> x = np.zeros((2, 2), dtype=[('a', np.int32), ('b', np.float64, (3, 3))]) + >>> x['a'].shape + (2, 2) + >>> x['b'].shape + (2, 2, 3, 3) + +Accessing Multiple Fields +``````````````````````````` + +One can index and assign to a structured array with a multi-field index, where +the index is a list of field names. + +.. warning:: + The behavior of multi-field indexes changed from Numpy 1.15 to Numpy 1.16. + +The result of indexing with a multi-field index is a view into the original +array, as follows:: + + >>> a = np.zeros(3, dtype=[('a', 'i4'), ('b', 'i4'), ('c', 'f4')]) + >>> a[['a', 'c']] + array([(0, 0.), (0, 0.), (0, 0.)], + dtype={'names':['a','c'], 'formats':['<i4','<f4'], 'offsets':[0,8], 'itemsize':12}) + +Assignment to the view modifies the original array. The view's fields will be +in the order they were indexed. Note that unlike for single-field indexing, the +dtype of the view has the same itemsize as the original array, and has fields +at the same offsets as in the original array, and unindexed fields are merely +missing. + +.. warning:: + In Numpy 1.15, indexing an array with a multi-field index returned a copy of + the result above, but with fields packed together in memory as if + passed through :func:`numpy.lib.recfunctions.repack_fields`. + + The new behavior as of Numpy 1.16 leads to extra "padding" bytes at the + location of unindexed fields compared to 1.15. You will need to update any + code which depends on the data having a "packed" layout. For instance code + such as:: + + >>> a[['a', 'c']].view('i8') # Fails in Numpy 1.16 + Traceback (most recent call last): + File "<stdin>", line 1, in <module> + ValueError: When changing to a smaller dtype, its size must be a divisor of the size of original dtype + + will need to be changed. This code has raised a ``FutureWarning`` since + Numpy 1.12, and similar code has raised ``FutureWarning`` since 1.7. + + In 1.16 a number of functions have been introduced in the + :mod:`numpy.lib.recfunctions` module to help users account for this + change. These are + :func:`numpy.lib.recfunctions.repack_fields`. + :func:`numpy.lib.recfunctions.structured_to_unstructured`, + :func:`numpy.lib.recfunctions.unstructured_to_structured`, + :func:`numpy.lib.recfunctions.apply_along_fields`, + :func:`numpy.lib.recfunctions.assign_fields_by_name`, and + :func:`numpy.lib.recfunctions.require_fields`. + + The function :func:`numpy.lib.recfunctions.repack_fields` can always be + used to reproduce the old behavior, as it will return a packed copy of the + structured array. The code above, for example, can be replaced with: + + >>> from numpy.lib.recfunctions import repack_fields + >>> repack_fields(a[['a', 'c']]).view('i8') # supported in 1.16 + array([0, 0, 0]) + + Furthermore, numpy now provides a new function + :func:`numpy.lib.recfunctions.structured_to_unstructured` which is a safer + and more efficient alternative for users who wish to convert structured + arrays to unstructured arrays, as the view above is often indeded to do. + This function allows safe conversion to an unstructured type taking into + account padding, often avoids a copy, and also casts the datatypes + as needed, unlike the view. Code such as: + + >>> b = np.zeros(3, dtype=[('x', 'f4'), ('y', 'f4'), ('z', 'f4')]) + >>> b[['x', 'z']].view('f4') + array([0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32) + + can be made safer by replacing with: + + >>> from numpy.lib.recfunctions import structured_to_unstructured + >>> structured_to_unstructured(b[['x', 'z']]) + array([0, 0, 0]) + + +Assignment to an array with a multi-field index modifies the original array:: + + >>> a[['a', 'c']] = (2, 3) + >>> a + array([(2, 0, 3.), (2, 0, 3.), (2, 0, 3.)], + dtype=[('a', '<i4'), ('b', '<i4'), ('c', '<f4')]) + +This obeys the structured array assignment rules described above. For example, +this means that one can swap the values of two fields using appropriate +multi-field indexes:: + + >>> a[['a', 'c']] = a[['c', 'a']] + +Indexing with an Integer to get a Structured Scalar +``````````````````````````````````````````````````` + +Indexing a single element of a structured array (with an integer index) returns +a structured scalar:: + + >>> x = np.array([(1, 2., 3.)], dtype='i, f, f') + >>> scalar = x[0] + >>> scalar + (1, 2., 3.) + >>> type(scalar) + <class 'numpy.void'> + +Unlike other numpy scalars, structured scalars are mutable and act like views +into the original array, such that modifying the scalar will modify the +original array. Structured scalars also support access and assignment by field +name:: + + >>> x = np.array([(1, 2), (3, 4)], dtype=[('foo', 'i8'), ('bar', 'f4')]) + >>> s = x[0] + >>> s['bar'] = 100 + >>> x + array([(1, 100.), (3, 4.)], + dtype=[('foo', '<i8'), ('bar', '<f4')]) + +Similarly to tuples, structured scalars can also be indexed with an integer:: + + >>> scalar = np.array([(1, 2., 3.)], dtype='i, f, f')[0] + >>> scalar[0] + 1 + >>> scalar[1] = 4 + +Thus, tuples might be thought of as the native Python equivalent to numpy's +structured types, much like native python integers are the equivalent to +numpy's integer types. Structured scalars may be converted to a tuple by +calling `numpy.ndarray.item`:: + + >>> scalar.item(), type(scalar.item()) + ((1, 4.0, 3.0), <class 'tuple'>) + +Viewing Structured Arrays Containing Objects +-------------------------------------------- + +In order to prevent clobbering object pointers in fields of +:class:`object` type, numpy currently does not allow views of structured +arrays containing objects. + +Structure Comparison +-------------------- + +If the dtypes of two void structured arrays are equal, testing the equality of +the arrays will result in a boolean array with the dimensions of the original +arrays, with elements set to ``True`` where all fields of the corresponding +structures are equal. Structured dtypes are equal if the field names, +dtypes and titles are the same, ignoring endianness, and the fields are in +the same order:: + + >>> a = np.zeros(2, dtype=[('a', 'i4'), ('b', 'i4')]) + >>> b = np.ones(2, dtype=[('a', 'i4'), ('b', 'i4')]) + >>> a == b + array([False, False]) + +Currently, if the dtypes of two void structured arrays are not equivalent the +comparison fails, returning the scalar value ``False``. This behavior is +deprecated as of numpy 1.10 and will raise an error or perform elementwise +comparison in the future. + +The ``<`` and ``>`` operators always return ``False`` when comparing void +structured arrays, and arithmetic and bitwise operations are not supported. + +Record Arrays +============= + +As an optional convenience numpy provides an ndarray subclass, +:class:`numpy.recarray` that allows access to fields of structured arrays by +attribute instead of only by index. +Record arrays use a special datatype, :class:`numpy.record`, that allows +field access by attribute on the structured scalars obtained from the array. +The :mod:`numpy.rec` module provides functions for creating recarrays from +various objects. +Additional helper functions for creating and manipulating structured arrays +can be found in :mod:`numpy.lib.recfunctions`. + +The simplest way to create a record array is with ``numpy.rec.array``:: + + >>> recordarr = np.rec.array([(1, 2., 'Hello'), (2, 3., "World")], + ... dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'S10')]) + >>> recordarr.bar + array([ 2., 3.], dtype=float32) + >>> recordarr[1:2] + rec.array([(2, 3., b'World')], + dtype=[('foo', '<i4'), ('bar', '<f4'), ('baz', 'S10')]) + >>> recordarr[1:2].foo + array([2], dtype=int32) + >>> recordarr.foo[1:2] + array([2], dtype=int32) + >>> recordarr[1].baz + b'World' + +:func:`numpy.rec.array` can convert a wide variety of arguments into record +arrays, including structured arrays:: + + >>> arr = np.array([(1, 2., 'Hello'), (2, 3., "World")], + ... dtype=[('foo', 'i4'), ('bar', 'f4'), ('baz', 'S10')]) + >>> recordarr = np.rec.array(arr) + +The :mod:`numpy.rec` module provides a number of other convenience functions for +creating record arrays, see :ref:`record array creation routines +<routines.array-creation.rec>`. + +A record array representation of a structured array can be obtained using the +appropriate `view <numpy-ndarray-view>`_:: + + >>> arr = np.array([(1, 2., 'Hello'), (2, 3., "World")], + ... dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'a10')]) + >>> recordarr = arr.view(dtype=np.dtype((np.record, arr.dtype)), + ... type=np.recarray) + +For convenience, viewing an ndarray as type :class:`numpy.recarray` will +automatically convert to :class:`numpy.record` datatype, so the dtype can be left +out of the view:: + + >>> recordarr = arr.view(np.recarray) + >>> recordarr.dtype + dtype((numpy.record, [('foo', '<i4'), ('bar', '<f4'), ('baz', 'S10')])) + +To get back to a plain ndarray both the dtype and type must be reset. The +following view does so, taking into account the unusual case that the +recordarr was not a structured type:: + + >>> arr2 = recordarr.view(recordarr.dtype.fields or recordarr.dtype, np.ndarray) + +Record array fields accessed by index or by attribute are returned as a record +array if the field has a structured type but as a plain ndarray otherwise. :: + + >>> recordarr = np.rec.array([('Hello', (1, 2)), ("World", (3, 4))], + ... dtype=[('foo', 'S6'),('bar', [('A', int), ('B', int)])]) + >>> type(recordarr.foo) + <class 'numpy.ndarray'> + >>> type(recordarr.bar) + <class 'numpy.recarray'> + +Note that if a field has the same name as an ndarray attribute, the ndarray +attribute takes precedence. Such fields will be inaccessible by attribute but +will still be accessible by index. + Recarray Helper Functions -************************* +------------------------- .. automodule:: numpy.lib.recfunctions :members: diff --git a/doc/source/user/basics.rst b/doc/source/user/basics.rst index e0fc0ece3..66f3f9ee9 100644 --- a/doc/source/user/basics.rst +++ b/doc/source/user/basics.rst @@ -1,14 +1,18 @@ -************ -NumPy basics -************ +****************** +NumPy fundamentals +****************** + +These documents clarify concepts, design decisions, and technical +constraints in NumPy. This is a great place to understand the +fundamental NumPy ideas and philosophy. .. toctree:: :maxdepth: 1 - basics.types basics.creation - basics.io basics.indexing + basics.io + basics.types basics.broadcasting basics.byteswapping basics.rec diff --git a/doc/source/user/basics.subclassing.rst b/doc/source/user/basics.subclassing.rst index 43315521c..8ffa31688 100644 --- a/doc/source/user/basics.subclassing.rst +++ b/doc/source/user/basics.subclassing.rst @@ -4,4 +4,751 @@ Subclassing ndarray ******************* -.. automodule:: numpy.doc.subclassing +Introduction +------------ + +Subclassing ndarray is relatively simple, but it has some complications +compared to other Python objects. On this page we explain the machinery +that allows you to subclass ndarray, and the implications for +implementing a subclass. + +ndarrays and object creation +============================ + +Subclassing ndarray is complicated by the fact that new instances of +ndarray classes can come about in three different ways. These are: + +#. Explicit constructor call - as in ``MySubClass(params)``. This is + the usual route to Python instance creation. +#. View casting - casting an existing ndarray as a given subclass +#. New from template - creating a new instance from a template + instance. Examples include returning slices from a subclassed array, + creating return types from ufuncs, and copying arrays. See + :ref:`new-from-template` for more details + +The last two are characteristics of ndarrays - in order to support +things like array slicing. The complications of subclassing ndarray are +due to the mechanisms numpy has to support these latter two routes of +instance creation. + +.. _view-casting: + +View casting +------------ + +*View casting* is the standard ndarray mechanism by which you take an +ndarray of any subclass, and return a view of the array as another +(specified) subclass: + +>>> import numpy as np +>>> # create a completely useless ndarray subclass +>>> class C(np.ndarray): pass +>>> # create a standard ndarray +>>> arr = np.zeros((3,)) +>>> # take a view of it, as our useless subclass +>>> c_arr = arr.view(C) +>>> type(c_arr) +<class 'C'> + +.. _new-from-template: + +Creating new from template +-------------------------- + +New instances of an ndarray subclass can also come about by a very +similar mechanism to :ref:`view-casting`, when numpy finds it needs to +create a new instance from a template instance. The most obvious place +this has to happen is when you are taking slices of subclassed arrays. +For example: + +>>> v = c_arr[1:] +>>> type(v) # the view is of type 'C' +<class 'C'> +>>> v is c_arr # but it's a new instance +False + +The slice is a *view* onto the original ``c_arr`` data. So, when we +take a view from the ndarray, we return a new ndarray, of the same +class, that points to the data in the original. + +There are other points in the use of ndarrays where we need such views, +such as copying arrays (``c_arr.copy()``), creating ufunc output arrays +(see also :ref:`array-wrap`), and reducing methods (like +``c_arr.mean()``). + +Relationship of view casting and new-from-template +-------------------------------------------------- + +These paths both use the same machinery. We make the distinction here, +because they result in different input to your methods. Specifically, +:ref:`view-casting` means you have created a new instance of your array +type from any potential subclass of ndarray. :ref:`new-from-template` +means you have created a new instance of your class from a pre-existing +instance, allowing you - for example - to copy across attributes that +are particular to your subclass. + +Implications for subclassing +---------------------------- + +If we subclass ndarray, we need to deal not only with explicit +construction of our array type, but also :ref:`view-casting` or +:ref:`new-from-template`. NumPy has the machinery to do this, and it is +this machinery that makes subclassing slightly non-standard. + +There are two aspects to the machinery that ndarray uses to support +views and new-from-template in subclasses. + +The first is the use of the ``ndarray.__new__`` method for the main work +of object initialization, rather then the more usual ``__init__`` +method. The second is the use of the ``__array_finalize__`` method to +allow subclasses to clean up after the creation of views and new +instances from templates. + +A brief Python primer on ``__new__`` and ``__init__`` +===================================================== + +``__new__`` is a standard Python method, and, if present, is called +before ``__init__`` when we create a class instance. See the `python +__new__ documentation +<https://docs.python.org/reference/datamodel.html#object.__new__>`_ for more detail. + +For example, consider the following Python code: + +.. testcode:: + + class C: + def __new__(cls, *args): + print('Cls in __new__:', cls) + print('Args in __new__:', args) + # The `object` type __new__ method takes a single argument. + return object.__new__(cls) + + def __init__(self, *args): + print('type(self) in __init__:', type(self)) + print('Args in __init__:', args) + +meaning that we get: + +>>> c = C('hello') +Cls in __new__: <class 'C'> +Args in __new__: ('hello',) +type(self) in __init__: <class 'C'> +Args in __init__: ('hello',) + +When we call ``C('hello')``, the ``__new__`` method gets its own class +as first argument, and the passed argument, which is the string +``'hello'``. After python calls ``__new__``, it usually (see below) +calls our ``__init__`` method, with the output of ``__new__`` as the +first argument (now a class instance), and the passed arguments +following. + +As you can see, the object can be initialized in the ``__new__`` +method or the ``__init__`` method, or both, and in fact ndarray does +not have an ``__init__`` method, because all the initialization is +done in the ``__new__`` method. + +Why use ``__new__`` rather than just the usual ``__init__``? Because +in some cases, as for ndarray, we want to be able to return an object +of some other class. Consider the following: + +.. testcode:: + + class D(C): + def __new__(cls, *args): + print('D cls is:', cls) + print('D args in __new__:', args) + return C.__new__(C, *args) + + def __init__(self, *args): + # we never get here + print('In D __init__') + +meaning that: + +>>> obj = D('hello') +D cls is: <class 'D'> +D args in __new__: ('hello',) +Cls in __new__: <class 'C'> +Args in __new__: ('hello',) +>>> type(obj) +<class 'C'> + +The definition of ``C`` is the same as before, but for ``D``, the +``__new__`` method returns an instance of class ``C`` rather than +``D``. Note that the ``__init__`` method of ``D`` does not get +called. In general, when the ``__new__`` method returns an object of +class other than the class in which it is defined, the ``__init__`` +method of that class is not called. + +This is how subclasses of the ndarray class are able to return views +that preserve the class type. When taking a view, the standard +ndarray machinery creates the new ndarray object with something +like:: + + obj = ndarray.__new__(subtype, shape, ... + +where ``subdtype`` is the subclass. Thus the returned view is of the +same class as the subclass, rather than being of class ``ndarray``. + +That solves the problem of returning views of the same type, but now +we have a new problem. The machinery of ndarray can set the class +this way, in its standard methods for taking views, but the ndarray +``__new__`` method knows nothing of what we have done in our own +``__new__`` method in order to set attributes, and so on. (Aside - +why not call ``obj = subdtype.__new__(...`` then? Because we may not +have a ``__new__`` method with the same call signature). + +The role of ``__array_finalize__`` +================================== + +``__array_finalize__`` is the mechanism that numpy provides to allow +subclasses to handle the various ways that new instances get created. + +Remember that subclass instances can come about in these three ways: + +#. explicit constructor call (``obj = MySubClass(params)``). This will + call the usual sequence of ``MySubClass.__new__`` then (if it exists) + ``MySubClass.__init__``. +#. :ref:`view-casting` +#. :ref:`new-from-template` + +Our ``MySubClass.__new__`` method only gets called in the case of the +explicit constructor call, so we can't rely on ``MySubClass.__new__`` or +``MySubClass.__init__`` to deal with the view casting and +new-from-template. It turns out that ``MySubClass.__array_finalize__`` +*does* get called for all three methods of object creation, so this is +where our object creation housekeeping usually goes. + +* For the explicit constructor call, our subclass will need to create a + new ndarray instance of its own class. In practice this means that + we, the authors of the code, will need to make a call to + ``ndarray.__new__(MySubClass,...)``, a class-hierarchy prepared call to + ``super(MySubClass, cls).__new__(cls, ...)``, or do view casting of an + existing array (see below) +* For view casting and new-from-template, the equivalent of + ``ndarray.__new__(MySubClass,...`` is called, at the C level. + +The arguments that ``__array_finalize__`` receives differ for the three +methods of instance creation above. + +The following code allows us to look at the call sequences and arguments: + +.. testcode:: + + import numpy as np + + class C(np.ndarray): + def __new__(cls, *args, **kwargs): + print('In __new__ with class %s' % cls) + return super(C, cls).__new__(cls, *args, **kwargs) + + def __init__(self, *args, **kwargs): + # in practice you probably will not need or want an __init__ + # method for your subclass + print('In __init__ with class %s' % self.__class__) + + def __array_finalize__(self, obj): + print('In array_finalize:') + print(' self type is %s' % type(self)) + print(' obj type is %s' % type(obj)) + + +Now: + +>>> # Explicit constructor +>>> c = C((10,)) +In __new__ with class <class 'C'> +In array_finalize: + self type is <class 'C'> + obj type is <type 'NoneType'> +In __init__ with class <class 'C'> +>>> # View casting +>>> a = np.arange(10) +>>> cast_a = a.view(C) +In array_finalize: + self type is <class 'C'> + obj type is <type 'numpy.ndarray'> +>>> # Slicing (example of new-from-template) +>>> cv = c[:1] +In array_finalize: + self type is <class 'C'> + obj type is <class 'C'> + +The signature of ``__array_finalize__`` is:: + + def __array_finalize__(self, obj): + +One sees that the ``super`` call, which goes to +``ndarray.__new__``, passes ``__array_finalize__`` the new object, of our +own class (``self``) as well as the object from which the view has been +taken (``obj``). As you can see from the output above, the ``self`` is +always a newly created instance of our subclass, and the type of ``obj`` +differs for the three instance creation methods: + +* When called from the explicit constructor, ``obj`` is ``None`` +* When called from view casting, ``obj`` can be an instance of any + subclass of ndarray, including our own. +* When called in new-from-template, ``obj`` is another instance of our + own subclass, that we might use to update the new ``self`` instance. + +Because ``__array_finalize__`` is the only method that always sees new +instances being created, it is the sensible place to fill in instance +defaults for new object attributes, among other tasks. + +This may be clearer with an example. + +Simple example - adding an extra attribute to ndarray +----------------------------------------------------- + +.. testcode:: + + import numpy as np + + class InfoArray(np.ndarray): + + def __new__(subtype, shape, dtype=float, buffer=None, offset=0, + strides=None, order=None, info=None): + # Create the ndarray instance of our type, given the usual + # ndarray input arguments. This will call the standard + # ndarray constructor, but return an object of our type. + # It also triggers a call to InfoArray.__array_finalize__ + obj = super(InfoArray, subtype).__new__(subtype, shape, dtype, + buffer, offset, strides, + order) + # set the new 'info' attribute to the value passed + obj.info = info + # Finally, we must return the newly created object: + return obj + + def __array_finalize__(self, obj): + # ``self`` is a new object resulting from + # ndarray.__new__(InfoArray, ...), therefore it only has + # attributes that the ndarray.__new__ constructor gave it - + # i.e. those of a standard ndarray. + # + # We could have got to the ndarray.__new__ call in 3 ways: + # From an explicit constructor - e.g. InfoArray(): + # obj is None + # (we're in the middle of the InfoArray.__new__ + # constructor, and self.info will be set when we return to + # InfoArray.__new__) + if obj is None: return + # From view casting - e.g arr.view(InfoArray): + # obj is arr + # (type(obj) can be InfoArray) + # From new-from-template - e.g infoarr[:3] + # type(obj) is InfoArray + # + # Note that it is here, rather than in the __new__ method, + # that we set the default value for 'info', because this + # method sees all creation of default objects - with the + # InfoArray.__new__ constructor, but also with + # arr.view(InfoArray). + self.info = getattr(obj, 'info', None) + # We do not need to return anything + + +Using the object looks like this: + + >>> obj = InfoArray(shape=(3,)) # explicit constructor + >>> type(obj) + <class 'InfoArray'> + >>> obj.info is None + True + >>> obj = InfoArray(shape=(3,), info='information') + >>> obj.info + 'information' + >>> v = obj[1:] # new-from-template - here - slicing + >>> type(v) + <class 'InfoArray'> + >>> v.info + 'information' + >>> arr = np.arange(10) + >>> cast_arr = arr.view(InfoArray) # view casting + >>> type(cast_arr) + <class 'InfoArray'> + >>> cast_arr.info is None + True + +This class isn't very useful, because it has the same constructor as the +bare ndarray object, including passing in buffers and shapes and so on. +We would probably prefer the constructor to be able to take an already +formed ndarray from the usual numpy calls to ``np.array`` and return an +object. + +Slightly more realistic example - attribute added to existing array +------------------------------------------------------------------- + +Here is a class that takes a standard ndarray that already exists, casts +as our type, and adds an extra attribute. + +.. testcode:: + + import numpy as np + + class RealisticInfoArray(np.ndarray): + + def __new__(cls, input_array, info=None): + # Input array is an already formed ndarray instance + # We first cast to be our class type + obj = np.asarray(input_array).view(cls) + # add the new attribute to the created instance + obj.info = info + # Finally, we must return the newly created object: + return obj + + def __array_finalize__(self, obj): + # see InfoArray.__array_finalize__ for comments + if obj is None: return + self.info = getattr(obj, 'info', None) + + +So: + + >>> arr = np.arange(5) + >>> obj = RealisticInfoArray(arr, info='information') + >>> type(obj) + <class 'RealisticInfoArray'> + >>> obj.info + 'information' + >>> v = obj[1:] + >>> type(v) + <class 'RealisticInfoArray'> + >>> v.info + 'information' + +.. _array-ufunc: + +``__array_ufunc__`` for ufuncs +------------------------------ + + .. versionadded:: 1.13 + +A subclass can override what happens when executing numpy ufuncs on it by +overriding the default ``ndarray.__array_ufunc__`` method. This method is +executed *instead* of the ufunc and should return either the result of the +operation, or :obj:`NotImplemented` if the operation requested is not +implemented. + +The signature of ``__array_ufunc__`` is:: + + def __array_ufunc__(ufunc, method, *inputs, **kwargs): + + - *ufunc* is the ufunc object that was called. + - *method* is a string indicating how the Ufunc was called, either + ``"__call__"`` to indicate it was called directly, or one of its + :ref:`methods<ufuncs.methods>`: ``"reduce"``, ``"accumulate"``, + ``"reduceat"``, ``"outer"``, or ``"at"``. + - *inputs* is a tuple of the input arguments to the ``ufunc`` + - *kwargs* contains any optional or keyword arguments passed to the + function. This includes any ``out`` arguments, which are always + contained in a tuple. + +A typical implementation would convert any inputs or outputs that are +instances of one's own class, pass everything on to a superclass using +``super()``, and finally return the results after possible +back-conversion. An example, taken from the test case +``test_ufunc_override_with_super`` in ``core/tests/test_umath.py``, is the +following. + +.. testcode:: + + input numpy as np + + class A(np.ndarray): + def __array_ufunc__(self, ufunc, method, *inputs, out=None, **kwargs): + args = [] + in_no = [] + for i, input_ in enumerate(inputs): + if isinstance(input_, A): + in_no.append(i) + args.append(input_.view(np.ndarray)) + else: + args.append(input_) + + outputs = out + out_no = [] + if outputs: + out_args = [] + for j, output in enumerate(outputs): + if isinstance(output, A): + out_no.append(j) + out_args.append(output.view(np.ndarray)) + else: + out_args.append(output) + kwargs['out'] = tuple(out_args) + else: + outputs = (None,) * ufunc.nout + + info = {} + if in_no: + info['inputs'] = in_no + if out_no: + info['outputs'] = out_no + + results = super(A, self).__array_ufunc__(ufunc, method, + *args, **kwargs) + if results is NotImplemented: + return NotImplemented + + if method == 'at': + if isinstance(inputs[0], A): + inputs[0].info = info + return + + if ufunc.nout == 1: + results = (results,) + + results = tuple((np.asarray(result).view(A) + if output is None else output) + for result, output in zip(results, outputs)) + if results and isinstance(results[0], A): + results[0].info = info + + return results[0] if len(results) == 1 else results + +So, this class does not actually do anything interesting: it just +converts any instances of its own to regular ndarray (otherwise, we'd +get infinite recursion!), and adds an ``info`` dictionary that tells +which inputs and outputs it converted. Hence, e.g., + +>>> a = np.arange(5.).view(A) +>>> b = np.sin(a) +>>> b.info +{'inputs': [0]} +>>> b = np.sin(np.arange(5.), out=(a,)) +>>> b.info +{'outputs': [0]} +>>> a = np.arange(5.).view(A) +>>> b = np.ones(1).view(A) +>>> c = a + b +>>> c.info +{'inputs': [0, 1]} +>>> a += b +>>> a.info +{'inputs': [0, 1], 'outputs': [0]} + +Note that another approach would be to to use ``getattr(ufunc, +methods)(*inputs, **kwargs)`` instead of the ``super`` call. For this example, +the result would be identical, but there is a difference if another operand +also defines ``__array_ufunc__``. E.g., lets assume that we evalulate +``np.add(a, b)``, where ``b`` is an instance of another class ``B`` that has +an override. If you use ``super`` as in the example, +``ndarray.__array_ufunc__`` will notice that ``b`` has an override, which +means it cannot evaluate the result itself. Thus, it will return +`NotImplemented` and so will our class ``A``. Then, control will be passed +over to ``b``, which either knows how to deal with us and produces a result, +or does not and returns `NotImplemented`, raising a ``TypeError``. + +If instead, we replace our ``super`` call with ``getattr(ufunc, method)``, we +effectively do ``np.add(a.view(np.ndarray), b)``. Again, ``B.__array_ufunc__`` +will be called, but now it sees an ``ndarray`` as the other argument. Likely, +it will know how to handle this, and return a new instance of the ``B`` class +to us. Our example class is not set up to handle this, but it might well be +the best approach if, e.g., one were to re-implement ``MaskedArray`` using +``__array_ufunc__``. + +As a final note: if the ``super`` route is suited to a given class, an +advantage of using it is that it helps in constructing class hierarchies. +E.g., suppose that our other class ``B`` also used the ``super`` in its +``__array_ufunc__`` implementation, and we created a class ``C`` that depended +on both, i.e., ``class C(A, B)`` (with, for simplicity, not another +``__array_ufunc__`` override). Then any ufunc on an instance of ``C`` would +pass on to ``A.__array_ufunc__``, the ``super`` call in ``A`` would go to +``B.__array_ufunc__``, and the ``super`` call in ``B`` would go to +``ndarray.__array_ufunc__``, thus allowing ``A`` and ``B`` to collaborate. + +.. _array-wrap: + +``__array_wrap__`` for ufuncs and other functions +------------------------------------------------- + +Prior to numpy 1.13, the behaviour of ufuncs could only be tuned using +``__array_wrap__`` and ``__array_prepare__``. These two allowed one to +change the output type of a ufunc, but, in contrast to +``__array_ufunc__``, did not allow one to make any changes to the inputs. +It is hoped to eventually deprecate these, but ``__array_wrap__`` is also +used by other numpy functions and methods, such as ``squeeze``, so at the +present time is still needed for full functionality. + +Conceptually, ``__array_wrap__`` "wraps up the action" in the sense of +allowing a subclass to set the type of the return value and update +attributes and metadata. Let's show how this works with an example. First +we return to the simpler example subclass, but with a different name and +some print statements: + +.. testcode:: + + import numpy as np + + class MySubClass(np.ndarray): + + def __new__(cls, input_array, info=None): + obj = np.asarray(input_array).view(cls) + obj.info = info + return obj + + def __array_finalize__(self, obj): + print('In __array_finalize__:') + print(' self is %s' % repr(self)) + print(' obj is %s' % repr(obj)) + if obj is None: return + self.info = getattr(obj, 'info', None) + + def __array_wrap__(self, out_arr, context=None): + print('In __array_wrap__:') + print(' self is %s' % repr(self)) + print(' arr is %s' % repr(out_arr)) + # then just call the parent + return super(MySubClass, self).__array_wrap__(self, out_arr, context) + +We run a ufunc on an instance of our new array: + +>>> obj = MySubClass(np.arange(5), info='spam') +In __array_finalize__: + self is MySubClass([0, 1, 2, 3, 4]) + obj is array([0, 1, 2, 3, 4]) +>>> arr2 = np.arange(5)+1 +>>> ret = np.add(arr2, obj) +In __array_wrap__: + self is MySubClass([0, 1, 2, 3, 4]) + arr is array([1, 3, 5, 7, 9]) +In __array_finalize__: + self is MySubClass([1, 3, 5, 7, 9]) + obj is MySubClass([0, 1, 2, 3, 4]) +>>> ret +MySubClass([1, 3, 5, 7, 9]) +>>> ret.info +'spam' + +Note that the ufunc (``np.add``) has called the ``__array_wrap__`` method +with arguments ``self`` as ``obj``, and ``out_arr`` as the (ndarray) result +of the addition. In turn, the default ``__array_wrap__`` +(``ndarray.__array_wrap__``) has cast the result to class ``MySubClass``, +and called ``__array_finalize__`` - hence the copying of the ``info`` +attribute. This has all happened at the C level. + +But, we could do anything we wanted: + +.. testcode:: + + class SillySubClass(np.ndarray): + + def __array_wrap__(self, arr, context=None): + return 'I lost your data' + +>>> arr1 = np.arange(5) +>>> obj = arr1.view(SillySubClass) +>>> arr2 = np.arange(5) +>>> ret = np.multiply(obj, arr2) +>>> ret +'I lost your data' + +So, by defining a specific ``__array_wrap__`` method for our subclass, +we can tweak the output from ufuncs. The ``__array_wrap__`` method +requires ``self``, then an argument - which is the result of the ufunc - +and an optional parameter *context*. This parameter is returned by +ufuncs as a 3-element tuple: (name of the ufunc, arguments of the ufunc, +domain of the ufunc), but is not set by other numpy functions. Though, +as seen above, it is possible to do otherwise, ``__array_wrap__`` should +return an instance of its containing class. See the masked array +subclass for an implementation. + +In addition to ``__array_wrap__``, which is called on the way out of the +ufunc, there is also an ``__array_prepare__`` method which is called on +the way into the ufunc, after the output arrays are created but before any +computation has been performed. The default implementation does nothing +but pass through the array. ``__array_prepare__`` should not attempt to +access the array data or resize the array, it is intended for setting the +output array type, updating attributes and metadata, and performing any +checks based on the input that may be desired before computation begins. +Like ``__array_wrap__``, ``__array_prepare__`` must return an ndarray or +subclass thereof or raise an error. + +Extra gotchas - custom ``__del__`` methods and ndarray.base +----------------------------------------------------------- + +One of the problems that ndarray solves is keeping track of memory +ownership of ndarrays and their views. Consider the case where we have +created an ndarray, ``arr`` and have taken a slice with ``v = arr[1:]``. +The two objects are looking at the same memory. NumPy keeps track of +where the data came from for a particular array or view, with the +``base`` attribute: + +>>> # A normal ndarray, that owns its own data +>>> arr = np.zeros((4,)) +>>> # In this case, base is None +>>> arr.base is None +True +>>> # We take a view +>>> v1 = arr[1:] +>>> # base now points to the array that it derived from +>>> v1.base is arr +True +>>> # Take a view of a view +>>> v2 = v1[1:] +>>> # base points to the original array that it was derived from +>>> v2.base is arr +True + +In general, if the array owns its own memory, as for ``arr`` in this +case, then ``arr.base`` will be None - there are some exceptions to this +- see the numpy book for more details. + +The ``base`` attribute is useful in being able to tell whether we have +a view or the original array. This in turn can be useful if we need +to know whether or not to do some specific cleanup when the subclassed +array is deleted. For example, we may only want to do the cleanup if +the original array is deleted, but not the views. For an example of +how this can work, have a look at the ``memmap`` class in +``numpy.core``. + +Subclassing and Downstream Compatibility +---------------------------------------- + +When sub-classing ``ndarray`` or creating duck-types that mimic the ``ndarray`` +interface, it is your responsibility to decide how aligned your APIs will be +with those of numpy. For convenience, many numpy functions that have a corresponding +``ndarray`` method (e.g., ``sum``, ``mean``, ``take``, ``reshape``) work by checking +if the first argument to a function has a method of the same name. If it exists, the +method is called instead of coercing the arguments to a numpy array. + +For example, if you want your sub-class or duck-type to be compatible with +numpy's ``sum`` function, the method signature for this object's ``sum`` method +should be the following: + +.. testcode:: + + def sum(self, axis=None, dtype=None, out=None, keepdims=False): + ... + +This is the exact same method signature for ``np.sum``, so now if a user calls +``np.sum`` on this object, numpy will call the object's own ``sum`` method and +pass in these arguments enumerated above in the signature, and no errors will +be raised because the signatures are completely compatible with each other. + +If, however, you decide to deviate from this signature and do something like this: + +.. testcode:: + + def sum(self, axis=None, dtype=None): + ... + +This object is no longer compatible with ``np.sum`` because if you call ``np.sum``, +it will pass in unexpected arguments ``out`` and ``keepdims``, causing a TypeError +to be raised. + +If you wish to maintain compatibility with numpy and its subsequent versions (which +might add new keyword arguments) but do not want to surface all of numpy's arguments, +your function's signature should accept ``**kwargs``. For example: + +.. testcode:: + + def sum(self, axis=None, dtype=None, **unused_kwargs): + ... + +This object is now compatible with ``np.sum`` again because any extraneous arguments +(i.e. keywords that are not ``axis`` or ``dtype``) will be hidden away in the +``**unused_kwargs`` parameter. + + diff --git a/doc/source/user/basics.types.rst b/doc/source/user/basics.types.rst index 5ce5af15a..ec2af409a 100644 --- a/doc/source/user/basics.types.rst +++ b/doc/source/user/basics.types.rst @@ -4,4 +4,339 @@ Data types .. seealso:: :ref:`Data type objects <arrays.dtypes>` -.. automodule:: numpy.doc.basics +Array types and conversions between types +========================================= + +NumPy supports a much greater variety of numerical types than Python does. +This section shows which are available, and how to modify an array's data-type. + +The primitive types supported are tied closely to those in C: + +.. list-table:: + :header-rows: 1 + + * - Numpy type + - C type + - Description + + * - `numpy.bool_` + - ``bool`` + - Boolean (True or False) stored as a byte + + * - `numpy.byte` + - ``signed char`` + - Platform-defined + + * - `numpy.ubyte` + - ``unsigned char`` + - Platform-defined + + * - `numpy.short` + - ``short`` + - Platform-defined + + * - `numpy.ushort` + - ``unsigned short`` + - Platform-defined + + * - `numpy.intc` + - ``int`` + - Platform-defined + + * - `numpy.uintc` + - ``unsigned int`` + - Platform-defined + + * - `numpy.int_` + - ``long`` + - Platform-defined + + * - `numpy.uint` + - ``unsigned long`` + - Platform-defined + + * - `numpy.longlong` + - ``long long`` + - Platform-defined + + * - `numpy.ulonglong` + - ``unsigned long long`` + - Platform-defined + + * - `numpy.half` / `numpy.float16` + - + - Half precision float: + sign bit, 5 bits exponent, 10 bits mantissa + + * - `numpy.single` + - ``float`` + - Platform-defined single precision float: + typically sign bit, 8 bits exponent, 23 bits mantissa + + * - `numpy.double` + - ``double`` + - Platform-defined double precision float: + typically sign bit, 11 bits exponent, 52 bits mantissa. + + * - `numpy.longdouble` + - ``long double`` + - Platform-defined extended-precision float + + * - `numpy.csingle` + - ``float complex`` + - Complex number, represented by two single-precision floats (real and imaginary components) + + * - `numpy.cdouble` + - ``double complex`` + - Complex number, represented by two double-precision floats (real and imaginary components). + + * - `numpy.clongdouble` + - ``long double complex`` + - Complex number, represented by two extended-precision floats (real and imaginary components). + + +Since many of these have platform-dependent definitions, a set of fixed-size +aliases are provided: + +.. list-table:: + :header-rows: 1 + + * - Numpy type + - C type + - Description + + * - `numpy.int8` + - ``int8_t`` + - Byte (-128 to 127) + + * - `numpy.int16` + - ``int16_t`` + - Integer (-32768 to 32767) + + * - `numpy.int32` + - ``int32_t`` + - Integer (-2147483648 to 2147483647) + + * - `numpy.int64` + - ``int64_t`` + - Integer (-9223372036854775808 to 9223372036854775807) + + * - `numpy.uint8` + - ``uint8_t`` + - Unsigned integer (0 to 255) + + * - `numpy.uint16` + - ``uint16_t`` + - Unsigned integer (0 to 65535) + + * - `numpy.uint32` + - ``uint32_t`` + - Unsigned integer (0 to 4294967295) + + * - `numpy.uint64` + - ``uint64_t`` + - Unsigned integer (0 to 18446744073709551615) + + * - `numpy.intp` + - ``intptr_t`` + - Integer used for indexing, typically the same as ``ssize_t`` + + * - `numpy.uintp` + - ``uintptr_t`` + - Integer large enough to hold a pointer + + * - `numpy.float32` + - ``float`` + - + + * - `numpy.float64` / `numpy.float_` + - ``double`` + - Note that this matches the precision of the builtin python `float`. + + * - `numpy.complex64` + - ``float complex`` + - Complex number, represented by two 32-bit floats (real and imaginary components) + + * - `numpy.complex128` / `numpy.complex_` + - ``double complex`` + - Note that this matches the precision of the builtin python `complex`. + + +NumPy numerical types are instances of ``dtype`` (data-type) objects, each +having unique characteristics. Once you have imported NumPy using + + :: + + >>> import numpy as np + +the dtypes are available as ``np.bool_``, ``np.float32``, etc. + +Advanced types, not listed in the table above, are explored in +section :ref:`structured_arrays`. + +There are 5 basic numerical types representing booleans (bool), integers (int), +unsigned integers (uint) floating point (float) and complex. Those with numbers +in their name indicate the bitsize of the type (i.e. how many bits are needed +to represent a single value in memory). Some types, such as ``int`` and +``intp``, have differing bitsizes, dependent on the platforms (e.g. 32-bit +vs. 64-bit machines). This should be taken into account when interfacing +with low-level code (such as C or Fortran) where the raw memory is addressed. + +Data-types can be used as functions to convert python numbers to array scalars +(see the array scalar section for an explanation), python sequences of numbers +to arrays of that type, or as arguments to the dtype keyword that many numpy +functions or methods accept. Some examples:: + + >>> import numpy as np + >>> x = np.float32(1.0) + >>> x + 1.0 + >>> y = np.int_([1,2,4]) + >>> y + array([1, 2, 4]) + >>> z = np.arange(3, dtype=np.uint8) + >>> z + array([0, 1, 2], dtype=uint8) + +Array types can also be referred to by character codes, mostly to retain +backward compatibility with older packages such as Numeric. Some +documentation may still refer to these, for example:: + + >>> np.array([1, 2, 3], dtype='f') + array([ 1., 2., 3.], dtype=float32) + +We recommend using dtype objects instead. + +To convert the type of an array, use the .astype() method (preferred) or +the type itself as a function. For example: :: + + >>> z.astype(float) #doctest: +NORMALIZE_WHITESPACE + array([ 0., 1., 2.]) + >>> np.int8(z) + array([0, 1, 2], dtype=int8) + +Note that, above, we use the *Python* float object as a dtype. NumPy knows +that ``int`` refers to ``np.int_``, ``bool`` means ``np.bool_``, +that ``float`` is ``np.float_`` and ``complex`` is ``np.complex_``. +The other data-types do not have Python equivalents. + +To determine the type of an array, look at the dtype attribute:: + + >>> z.dtype + dtype('uint8') + +dtype objects also contain information about the type, such as its bit-width +and its byte-order. The data type can also be used indirectly to query +properties of the type, such as whether it is an integer:: + + >>> d = np.dtype(int) + >>> d + dtype('int32') + + >>> np.issubdtype(d, np.integer) + True + + >>> np.issubdtype(d, np.floating) + False + + +Array Scalars +============= + +NumPy generally returns elements of arrays as array scalars (a scalar +with an associated dtype). Array scalars differ from Python scalars, but +for the most part they can be used interchangeably (the primary +exception is for versions of Python older than v2.x, where integer array +scalars cannot act as indices for lists and tuples). There are some +exceptions, such as when code requires very specific attributes of a scalar +or when it checks specifically whether a value is a Python scalar. Generally, +problems are easily fixed by explicitly converting array scalars +to Python scalars, using the corresponding Python type function +(e.g., ``int``, ``float``, ``complex``, ``str``, ``unicode``). + +The primary advantage of using array scalars is that +they preserve the array type (Python may not have a matching scalar type +available, e.g. ``int16``). Therefore, the use of array scalars ensures +identical behaviour between arrays and scalars, irrespective of whether the +value is inside an array or not. NumPy scalars also have many of the same +methods arrays do. + +Overflow Errors +=============== + +The fixed size of NumPy numeric types may cause overflow errors when a value +requires more memory than available in the data type. For example, +`numpy.power` evaluates ``100 * 10 ** 8`` correctly for 64-bit integers, +but gives 1874919424 (incorrect) for a 32-bit integer. + + >>> np.power(100, 8, dtype=np.int64) + 10000000000000000 + >>> np.power(100, 8, dtype=np.int32) + 1874919424 + +The behaviour of NumPy and Python integer types differs significantly for +integer overflows and may confuse users expecting NumPy integers to behave +similar to Python's ``int``. Unlike NumPy, the size of Python's ``int`` is +flexible. This means Python integers may expand to accommodate any integer and +will not overflow. + +NumPy provides `numpy.iinfo` and `numpy.finfo` to verify the +minimum or maximum values of NumPy integer and floating point values +respectively :: + + >>> np.iinfo(int) # Bounds of the default integer on this system. + iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64) + >>> np.iinfo(np.int32) # Bounds of a 32-bit integer + iinfo(min=-2147483648, max=2147483647, dtype=int32) + >>> np.iinfo(np.int64) # Bounds of a 64-bit integer + iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64) + +If 64-bit integers are still too small the result may be cast to a +floating point number. Floating point numbers offer a larger, but inexact, +range of possible values. + + >>> np.power(100, 100, dtype=np.int64) # Incorrect even with 64-bit int + 0 + >>> np.power(100, 100, dtype=np.float64) + 1e+200 + +Extended Precision +================== + +Python's floating-point numbers are usually 64-bit floating-point numbers, +nearly equivalent to ``np.float64``. In some unusual situations it may be +useful to use floating-point numbers with more precision. Whether this +is possible in numpy depends on the hardware and on the development +environment: specifically, x86 machines provide hardware floating-point +with 80-bit precision, and while most C compilers provide this as their +``long double`` type, MSVC (standard for Windows builds) makes +``long double`` identical to ``double`` (64 bits). NumPy makes the +compiler's ``long double`` available as ``np.longdouble`` (and +``np.clongdouble`` for the complex numbers). You can find out what your +numpy provides with ``np.finfo(np.longdouble)``. + +NumPy does not provide a dtype with more precision than C's +``long double``\\; in particular, the 128-bit IEEE quad precision +data type (FORTRAN's ``REAL*16``\\) is not available. + +For efficient memory alignment, ``np.longdouble`` is usually stored +padded with zero bits, either to 96 or 128 bits. Which is more efficient +depends on hardware and development environment; typically on 32-bit +systems they are padded to 96 bits, while on 64-bit systems they are +typically padded to 128 bits. ``np.longdouble`` is padded to the system +default; ``np.float96`` and ``np.float128`` are provided for users who +want specific padding. In spite of the names, ``np.float96`` and +``np.float128`` provide only as much precision as ``np.longdouble``, +that is, 80 bits on most x86 machines and 64 bits in standard +Windows builds. + +Be warned that even if ``np.longdouble`` offers more precision than +python ``float``, it is easy to lose that extra precision, since +python often forces values to pass through ``float``. For example, +the ``%`` formatting operator requires its arguments to be converted +to standard python types, and it is therefore impossible to preserve +extended precision even if many decimal places are requested. It can +be useful to test your code with the value +``1 + np.finfo(np.longdouble).eps``. + + diff --git a/doc/source/user/building.rst b/doc/source/user/building.rst index 54ece3da3..47399139e 100644 --- a/doc/source/user/building.rst +++ b/doc/source/user/building.rst @@ -142,6 +142,16 @@ will prefer to use ATLAS, then BLIS, then OpenBLAS and as a last resort MKL. If neither of these exists the build will fail (names are compared lower case). +Alternatively one may use ``!`` or ``^`` to negate all items:: + + NPY_BLAS_ORDER='^blas,atlas' python setup.py build + +will allow using anything **but** NetLIB BLAS and ATLAS libraries, the order of the above +list is retained. + +One cannot mix negation and positives, nor have multiple negations, such cases will +raise an error. + LAPACK ~~~~~~ @@ -165,6 +175,17 @@ will prefer to use ATLAS, then OpenBLAS and as a last resort MKL. If neither of these exists the build will fail (names are compared lower case). +Alternatively one may use ``!`` or ``^`` to negate all items:: + + NPY_LAPACK_ORDER='^lapack' python setup.py build + +will allow using anything **but** the NetLIB LAPACK library, the order of the above +list is retained. + +One cannot mix negation and positives, nor have multiple negations, such cases will +raise an error. + + .. deprecated:: 1.20 The native libraries on macOS, provided by Accelerate, are not fit for use in NumPy since they have bugs that cause wrong output under easily reproducible diff --git a/doc/source/user/c-info.beyond-basics.rst b/doc/source/user/c-info.beyond-basics.rst index 9e9cd3067..124162d6c 100644 --- a/doc/source/user/c-info.beyond-basics.rst +++ b/doc/source/user/c-info.beyond-basics.rst @@ -115,7 +115,7 @@ processors that use pipelining to enhance fundamental operations. The :c:func:`PyArray_IterAllButAxis` ( ``array``, ``&dim`` ) constructs an iterator object that is modified so that it will not iterate over the dimension indicated by dim. The only restriction on this iterator -object, is that the :c:func:`PyArray_Iter_GOTO1D` ( ``it``, ``ind`` ) macro +object, is that the :c:func:`PyArray_ITER_GOTO1D` ( ``it``, ``ind`` ) macro cannot be used (thus flat indexing won't work either if you pass this object back to Python --- so you shouldn't do this). Note that the returned object from this routine is still usually cast to diff --git a/doc/source/user/c-info.how-to-extend.rst b/doc/source/user/c-info.how-to-extend.rst index d75242092..845ce0a74 100644 --- a/doc/source/user/c-info.how-to-extend.rst +++ b/doc/source/user/c-info.how-to-extend.rst @@ -363,7 +363,6 @@ particular set of requirements ( *e.g.* contiguous, aligned, and writeable). The syntax is :c:func:`PyArray_FROM_OTF` - Return an ndarray from any Python object, *obj*, that can be converted to an array. The number of dimensions in the returned array is determined by the object. The desired data-type of the @@ -375,7 +374,6 @@ writeable). The syntax is exception is set. *obj* - The object can be any Python object convertible to an ndarray. If the object is already (a subclass of) the ndarray that satisfies the requirements then a new reference is returned. @@ -394,7 +392,6 @@ writeable). The syntax is to the requirements flag. *typenum* - One of the enumerated types or :c:data:`NPY_NOTYPE` if the data-type should be determined from the object itself. The C-based names can be used: @@ -422,7 +419,6 @@ writeable). The syntax is requirements flag to override this behavior. *requirements* - The memory model for an ndarray admits arbitrary strides in each dimension to advance to the next element of the array. Often, however, you need to interface with code that expects a @@ -446,13 +442,11 @@ writeable). The syntax is :c:data:`NPY_OUT_ARRAY`, and :c:data:`NPY_ARRAY_INOUT_ARRAY`: :c:data:`NPY_ARRAY_IN_ARRAY` - This flag is useful for arrays that must be in C-contiguous order and aligned. These kinds of arrays are usually input arrays for some algorithm. :c:data:`NPY_ARRAY_OUT_ARRAY` - This flag is useful to specify an array that is in C-contiguous order, is aligned, and can be written to as well. Such an array is usually returned as output @@ -460,7 +454,6 @@ writeable). The syntax is scratch). :c:data:`NPY_ARRAY_INOUT_ARRAY` - This flag is useful to specify an array that will be used for both input and output. :c:func:`PyArray_ResolveWritebackIfCopy` must be called before :c:func:`Py_DECREF` at @@ -479,16 +472,13 @@ writeable). The syntax is Other useful flags that can be OR'd as additional requirements are: :c:data:`NPY_ARRAY_FORCECAST` - Cast to the desired type, even if it can't be done without losing information. :c:data:`NPY_ARRAY_ENSURECOPY` - Make sure the resulting array is a copy of the original. :c:data:`NPY_ARRAY_ENSUREARRAY` - Make sure the resulting object is an actual ndarray and not a sub- class. diff --git a/doc/source/user/how-to-how-to.rst b/doc/source/user/how-to-how-to.rst new file mode 100644 index 000000000..de8afc28a --- /dev/null +++ b/doc/source/user/how-to-how-to.rst @@ -0,0 +1,118 @@ +.. _how-to-how-to:
+
+##############################################################################
+How to write a NumPy how-to
+##############################################################################
+
+How-tos get straight to the point -- they
+
+ - answer a focused question, or
+ - narrow a broad question into focused questions that the user can
+ choose among.
+
+******************************************************************************
+A stranger has asked for directions...
+******************************************************************************
+
+**"I need to refuel my car."**
+
+******************************************************************************
+Give a brief but explicit answer
+******************************************************************************
+
+ - `"Three kilometers/miles, take a right at Hayseed Road, it's on your left."`
+
+Add helpful details for newcomers ("Hayseed Road", even though it's the only
+turnoff at three km/mi). But not irrelevant ones:
+
+ - Don't also give directions from Route 7.
+ - Don't explain why the town has only one filling station.
+
+If there's related background (tutorial, explanation, reference, alternative
+approach), bring it to the user's attention with a link ("Directions from Route 7,"
+"Why so few filling stations?").
+
+
+******************************************************************************
+Delegate
+******************************************************************************
+
+ - `"Three km/mi, take a right at Hayseed Road, follow the signs."`
+
+If the information is already documented and succinct enough for a how-to,
+just link to it, possibly after an introduction ("Three km/mi, take a right").
+
+******************************************************************************
+If the question is broad, narrow and redirect it
+******************************************************************************
+
+ **"I want to see the sights."**
+
+The `See the sights` how-to should link to a set of narrower how-tos:
+
+- Find historic buildings
+- Find scenic lookouts
+- Find the town center
+
+and these might in turn link to still narrower how-tos -- so the town center
+page might link to
+
+ - Find the court house
+ - Find city hall
+
+By organizing how-tos this way, you not only display the options for people
+who need to narrow their question, you also have provided answers for users
+who start with narrower questions ("I want to see historic buildings," "Which
+way to city hall?").
+
+******************************************************************************
+If there are many steps, break them up
+******************************************************************************
+
+If a how-to has many steps:
+
+ - Consider breaking a step out into an individual how-to and linking to it.
+ - Include subheadings. They help readers grasp what's coming and return
+ where they left off.
+
+******************************************************************************
+Why write how-tos when there's Stack Overflow, Reddit, Gitter...?
+******************************************************************************
+
+ - We have authoritative answers.
+ - How-tos make the site less forbidding to non-experts.
+ - How-tos bring people into the site and help them discover other information
+ that's here .
+ - Creating how-tos helps us see NumPy usability through new eyes.
+
+******************************************************************************
+Aren't how-tos and tutorials the same thing?
+******************************************************************************
+
+People use the terms "how-to" and "tutorial" interchangeably, but we draw a
+distinction, following Daniele Procida's `taxonomy of documentation`_.
+
+ .. _`taxonomy of documentation`: https://documentation.divio.com/
+
+Documentation needs to meet users where they are. `How-tos` offer get-it-done
+information; the user wants steps to copy and doesn't necessarily want to
+understand NumPy. `Tutorials` are warm-fuzzy information; the user wants a
+feel for some aspect of NumPy (and again, may or may not care about deeper
+knowledge).
+
+We distinguish both tutorials and how-tos from `Explanations`, which are
+deep dives intended to give understanding rather than immediate assistance,
+and `References`, which give complete, autoritative data on some concrete
+part of NumPy (like its API) but aren't obligated to paint a broader picture.
+
+For more on tutorials, see the `tutorial how-to`_.
+
+.. _`tutorial how-to`: https://github.com/numpy/numpy-tutorials/blob/master/tutorial_style.ipynb
+
+
+******************************************************************************
+Is this page an example of a how-to?
+******************************************************************************
+
+Yes -- until the sections with question-mark headings; they explain rather
+than giving directions. In a how-to, those would be links.
\ No newline at end of file diff --git a/doc/source/user/how-to-io.rst b/doc/source/user/how-to-io.rst new file mode 100644 index 000000000..ca9fc41f0 --- /dev/null +++ b/doc/source/user/how-to-io.rst @@ -0,0 +1,328 @@ +.. _how-to-io:
+
+##############################################################################
+Reading and writing files
+##############################################################################
+
+This page tackles common applications; for the full collection of I/O
+routines, see :ref:`routines.io`.
+
+
+******************************************************************************
+Reading text and CSV_ files
+******************************************************************************
+
+.. _CSV: https://en.wikipedia.org/wiki/Comma-separated_values
+
+With no missing values
+==============================================================================
+
+Use :func:`numpy.loadtxt`.
+
+With missing values
+==============================================================================
+
+Use :func:`numpy.genfromtxt`.
+
+:func:`numpy.genfromtxt` will either
+
+ - return a :ref:`masked array<maskedarray.generic>`
+ **masking out missing values** (if ``usemask=True``), or
+
+ - **fill in the missing value** with the value specified in
+ ``filling_values`` (default is ``np.nan`` for float, -1 for int).
+
+With non-whitespace delimiters
+------------------------------------------------------------------------------
+::
+
+ >>> print(open("csv.txt").read()) # doctest: +SKIP
+ 1, 2, 3
+ 4,, 6
+ 7, 8, 9
+
+
+Masked-array output
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+::
+
+ >>> np.genfromtxt("csv.txt", delimiter=",", usemask=True) # doctest: +SKIP
+ masked_array(
+ data=[[1.0, 2.0, 3.0],
+ [4.0, --, 6.0],
+ [7.0, 8.0, 9.0]],
+ mask=[[False, False, False],
+ [False, True, False],
+ [False, False, False]],
+ fill_value=1e+20)
+
+Array output
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+::
+
+ >>> np.genfromtxt("csv.txt", delimiter=",") # doctest: +SKIP
+ array([[ 1., 2., 3.],
+ [ 4., nan, 6.],
+ [ 7., 8., 9.]])
+
+Array output, specified fill-in value
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+::
+
+ >>> np.genfromtxt("csv.txt", delimiter=",", dtype=np.int8, filling_values=99) # doctest: +SKIP
+ array([[ 1, 2, 3],
+ [ 4, 99, 6],
+ [ 7, 8, 9]], dtype=int8)
+
+Whitespace-delimited
+-------------------------------------------------------------------------------
+
+:func:`numpy.genfromtxt` can also parse whitespace-delimited data files
+that have missing values if
+
+* **Each field has a fixed width**: Use the width as the `delimiter` argument.
+ ::
+
+ # File with width=4. The data does not have to be justified (for example,
+ # the 2 in row 1), the last column can be less than width (for example, the 6
+ # in row 2), and no delimiting character is required (for instance 8888 and 9
+ # in row 3)
+
+ >>> f = open("fixedwidth.txt").read() # doctest: +SKIP
+ >>> print(f) # doctest: +SKIP
+ 1 2 3
+ 44 6
+ 7 88889
+
+ # Showing spaces as ^
+ >>> print(f.replace(" ","^")) # doctest: +SKIP
+ 1^^^2^^^^^^3
+ 44^^^^^^6
+ 7^^^88889
+
+ >>> np.genfromtxt("fixedwidth.txt", delimiter=4) # doctest: +SKIP
+ array([[1.000e+00, 2.000e+00, 3.000e+00],
+ [4.400e+01, nan, 6.000e+00],
+ [7.000e+00, 8.888e+03, 9.000e+00]])
+
+* **A special value (e.g. "x") indicates a missing field**: Use it as the
+ `missing_values` argument.
+ ::
+
+ >>> print(open("nan.txt").read()) # doctest: +SKIP
+ 1 2 3
+ 44 x 6
+ 7 8888 9
+
+ >>> np.genfromtxt("nan.txt", missing_values="x") # doctest: +SKIP
+ array([[1.000e+00, 2.000e+00, 3.000e+00],
+ [4.400e+01, nan, 6.000e+00],
+ [7.000e+00, 8.888e+03, 9.000e+00]])
+
+* **You want to skip the rows with missing values**: Set
+ `invalid_raise=False`.
+ ::
+
+ >>> print(open("skip.txt").read()) # doctest: +SKIP
+ 1 2 3
+ 44 6
+ 7 888 9
+
+ >>> np.genfromtxt("skip.txt", invalid_raise=False) # doctest: +SKIP
+ __main__:1: ConversionWarning: Some errors were detected !
+ Line #2 (got 2 columns instead of 3)
+ array([[ 1., 2., 3.],
+ [ 7., 888., 9.]])
+
+
+* **The delimiter whitespace character is different from the whitespace that
+ indicates missing data**. For instance, if columns are delimited by ``\t``,
+ then missing data will be recognized if it consists of one
+ or more spaces.
+ ::
+
+ >>> f = open("tabs.txt").read() # doctest: +SKIP
+ >>> print(f) # doctest: +SKIP
+ 1 2 3
+ 44 6
+ 7 888 9
+
+ # Tabs vs. spaces
+ >>> print(f.replace("\t","^")) # doctest: +SKIP
+ 1^2^3
+ 44^ ^6
+ 7^888^9
+
+ >>> np.genfromtxt("tabs.txt", delimiter="\t", missing_values=" +") # doctest: +SKIP
+ array([[ 1., 2., 3.],
+ [ 44., nan, 6.],
+ [ 7., 888., 9.]])
+
+******************************************************************************
+Read a file in .npy or .npz format
+******************************************************************************
+
+Choices:
+
+ - Use :func:`numpy.load`. It can read files generated by any of
+ :func:`numpy.save`, :func:`numpy.savez`, or :func:`numpy.savez_compressed`.
+
+ - Use memory mapping. See `numpy.lib.format.open_memmap`.
+
+******************************************************************************
+Write to a file to be read back by NumPy
+******************************************************************************
+
+Binary
+===============================================================================
+
+Use
+:func:`numpy.save`, or to store multiple arrays :func:`numpy.savez`
+or :func:`numpy.savez_compressed`.
+
+For :ref:`security and portability <how-to-io-pickle-file>`, set
+``allow_pickle=False`` unless the dtype contains Python objects, which
+requires pickling.
+
+Masked arrays :any:`can't currently be saved <MaskedArray.tofile>`,
+nor can other arbitrary array subclasses.
+
+Human-readable
+==============================================================================
+
+:func:`numpy.save` and :func:`numpy.savez` create binary files. To **write a
+human-readable file**, use :func:`numpy.savetxt`. The array can only be 1- or
+2-dimensional, and there's no ` savetxtz` for multiple files.
+
+Large arrays
+==============================================================================
+
+See :ref:`how-to-io-large-arrays`.
+
+******************************************************************************
+Read an arbitrarily formatted binary file ("binary blob")
+******************************************************************************
+
+Use a :doc:`structured array <basics.rec>`.
+
+**Example:**
+
+The ``.wav`` file header is a 44-byte block preceding ``data_size`` bytes of the
+actual sound data::
+
+ chunk_id "RIFF"
+ chunk_size 4-byte unsigned little-endian integer
+ format "WAVE"
+ fmt_id "fmt "
+ fmt_size 4-byte unsigned little-endian integer
+ audio_fmt 2-byte unsigned little-endian integer
+ num_channels 2-byte unsigned little-endian integer
+ sample_rate 4-byte unsigned little-endian integer
+ byte_rate 4-byte unsigned little-endian integer
+ block_align 2-byte unsigned little-endian integer
+ bits_per_sample 2-byte unsigned little-endian integer
+ data_id "data"
+ data_size 4-byte unsigned little-endian integer
+
+The ``.wav`` file header as a NumPy structured dtype::
+
+ wav_header_dtype = np.dtype([
+ ("chunk_id", (bytes, 4)), # flexible-sized scalar type, item size 4
+ ("chunk_size", "<u4"), # little-endian unsigned 32-bit integer
+ ("format", "S4"), # 4-byte string, alternate spelling of (bytes, 4)
+ ("fmt_id", "S4"),
+ ("fmt_size", "<u4"),
+ ("audio_fmt", "<u2"), #
+ ("num_channels", "<u2"), # .. more of the same ...
+ ("sample_rate", "<u4"), #
+ ("byte_rate", "<u4"),
+ ("block_align", "<u2"),
+ ("bits_per_sample", "<u2"),
+ ("data_id", "S4"),
+ ("data_size", "<u4"),
+ #
+ # the sound data itself cannot be represented here:
+ # it does not have a fixed size
+ ])
+
+ header = np.fromfile(f, dtype=wave_header_dtype, count=1)[0]
+
+This ``.wav`` example is for illustration; to read a ``.wav`` file in real
+life, use Python's built-in module :mod:`wave`.
+
+(Adapted from Pauli Virtanen, :ref:`advanced_numpy`, licensed
+under `CC BY 4.0 <https://creativecommons.org/licenses/by/4.0/>`_.)
+
+.. _how-to-io-large-arrays:
+
+******************************************************************************
+Write or read large arrays
+******************************************************************************
+
+**Arrays too large to fit in memory** can be treated like ordinary in-memory
+arrays using memory mapping.
+
+- Raw array data written with :func:`numpy.ndarray.tofile` or
+ :func:`numpy.ndarray.tobytes` can be read with :func:`numpy.memmap`::
+
+ array = numpy.memmap("mydata/myarray.arr", mode="r", dtype=np.int16, shape=(1024, 1024))
+
+- Files output by :func:`numpy.save` (that is, using the numpy format) can be read
+ using :func:`numpy.load` with the ``mmap_mode`` keyword argument::
+
+ large_array[some_slice] = np.load("path/to/small_array", mmap_mode="r")
+
+Memory mapping lacks features like data chunking and compression; more
+full-featured formats and libraries usable with NumPy include:
+
+* **HDF5**: `h5py <https://www.h5py.org/>`_ or `PyTables <https://www.pytables.org/>`_.
+* **Zarr**: `here <https://zarr.readthedocs.io/en/stable/tutorial.html#reading-and-writing-data>`_.
+* **NetCDF**: :class:`scipy.io.netcdf_file`.
+
+For tradeoffs among memmap, Zarr, and HDF5, see
+`pythonspeed.com <https://pythonspeed.com/articles/mmap-vs-zarr-hdf5/>`_.
+
+******************************************************************************
+Write files for reading by other (non-NumPy) tools
+******************************************************************************
+
+Formats for **exchanging data** with other tools include HDF5, Zarr, and
+NetCDF (see :ref:`how-to-io-large-arrays`).
+
+******************************************************************************
+Write or read a JSON file
+******************************************************************************
+
+NumPy arrays are **not** directly
+`JSON serializable <https://github.com/numpy/numpy/issues/12481>`_.
+
+
+.. _how-to-io-pickle-file:
+
+******************************************************************************
+Save/restore using a pickle file
+******************************************************************************
+
+Avoid when possible; :doc:`pickles <python:library/pickle>` are not secure
+against erroneous or maliciously constructed data.
+
+Use :func:`numpy.save` and :func:`numpy.load`. Set ``allow_pickle=False``,
+unless the array dtype includes Python objects, in which case pickling is
+required.
+
+******************************************************************************
+Convert from a pandas DataFrame to a NumPy array
+******************************************************************************
+
+See :meth:`pandas.DataFrame.to_numpy`.
+
+******************************************************************************
+ Save/restore using `~numpy.ndarray.tofile` and `~numpy.fromfile`
+******************************************************************************
+
+In general, prefer :func:`numpy.save` and :func:`numpy.load`.
+
+:func:`numpy.ndarray.tofile` and :func:`numpy.fromfile` lose information on
+endianness and precision and so are unsuitable for anything but scratch
+storage.
+
diff --git a/doc/source/user/howtos_index.rst b/doc/source/user/howtos_index.rst index c052286b9..89a6f54e7 100644 --- a/doc/source/user/howtos_index.rst +++ b/doc/source/user/howtos_index.rst @@ -11,4 +11,5 @@ the package, see the :ref:`API reference <reference>`. .. toctree:: :maxdepth: 1 - ionumpy + how-to-how-to + how-to-io diff --git a/doc/source/user/images/np_indexing.png b/doc/source/user/images/np_indexing.png Binary files differindex 4303ec35b..863b2d46f 100644 --- a/doc/source/user/images/np_indexing.png +++ b/doc/source/user/images/np_indexing.png diff --git a/doc/source/user/index.rst b/doc/source/user/index.rst index 4e6a29d9f..28297d9ea 100644 --- a/doc/source/user/index.rst +++ b/doc/source/user/index.rst @@ -3,18 +3,17 @@ .. _user: ################ -NumPy User Guide +NumPy user guide ################ -This guide is intended as an introductory overview of NumPy and -explains how to install and make use of the most important features of -NumPy. For detailed reference documentation of the functions and -classes contained in the package, see the :ref:`reference`. +This guide is an overview and explains the important features; +details are found in :ref:`reference`. .. toctree:: :maxdepth: 1 - setting-up + whatisnumpy + Installation <https://numpy.org/install/> quickstart absolute_beginners basics @@ -24,3 +23,23 @@ classes contained in the package, see the :ref:`reference`. c-info tutorials_index howtos_index + + +.. Links to these files are placed directly in the top-level html + (doc/source/_templates/indexcontent.html, which appears for the URLs + numpy.org/devdocs and numpy.org/doc/XX) and are not in any toctree, so + we include them here to avoid a "WARNING: document isn't included in any + toctree" message + +.. toctree:: + :hidden: + + explanations_index + ../f2py/index + ../glossary + ../dev/underthehood + ../docs/index + ../bugs + ../release + ../doc_conventions + ../license diff --git a/doc/source/user/install.rst b/doc/source/user/install.rst index 52586f3d7..e05cee2f1 100644 --- a/doc/source/user/install.rst +++ b/doc/source/user/install.rst @@ -1,10 +1,7 @@ -**************** -Installing NumPy -**************** - -In most use cases the best way to install NumPy on your system is by using a -pre-built package for your operating system. Please see -https://scipy.org/install.html for links to available options. - -For instructions on building for source package, see -:doc:`building`. This information is useful mainly for advanced users. +:orphan:
+
+****************
+Installing NumPy
+****************
+
+See `Installing NumPy <https://numpy.org/install/>`_.
\ No newline at end of file diff --git a/doc/source/user/ionumpy.rst b/doc/source/user/ionumpy.rst deleted file mode 100644 index a31720322..000000000 --- a/doc/source/user/ionumpy.rst +++ /dev/null @@ -1,20 +0,0 @@ -================================================ -How to read and write data using NumPy -================================================ - -.. currentmodule:: numpy - -.. testsetup:: - - import numpy as np - np.random.seed(1) - -**Objectives** - -- Writing NumPy arrays to files -- Reading NumPy arrays from files -- Dealing with encoding and dtype issues - -**Content** - -To be completed. diff --git a/doc/source/user/misc.rst b/doc/source/user/misc.rst index c10aea486..031ce4efa 100644 --- a/doc/source/user/misc.rst +++ b/doc/source/user/misc.rst @@ -2,4 +2,224 @@ Miscellaneous ************* -.. automodule:: numpy.doc.misc +IEEE 754 Floating Point Special Values +-------------------------------------- + +Special values defined in numpy: nan, inf, + +NaNs can be used as a poor-man's mask (if you don't care what the +original value was) + +Note: cannot use equality to test NaNs. E.g.: :: + + >>> myarr = np.array([1., 0., np.nan, 3.]) + >>> np.nonzero(myarr == np.nan) + (array([], dtype=int64),) + >>> np.nan == np.nan # is always False! Use special numpy functions instead. + False + >>> myarr[myarr == np.nan] = 0. # doesn't work + >>> myarr + array([ 1., 0., NaN, 3.]) + >>> myarr[np.isnan(myarr)] = 0. # use this instead find + >>> myarr + array([ 1., 0., 0., 3.]) + +Other related special value functions: :: + + isinf(): True if value is inf + isfinite(): True if not nan or inf + nan_to_num(): Map nan to 0, inf to max float, -inf to min float + +The following corresponds to the usual functions except that nans are excluded +from the results: :: + + nansum() + nanmax() + nanmin() + nanargmax() + nanargmin() + + >>> x = np.arange(10.) + >>> x[3] = np.nan + >>> x.sum() + nan + >>> np.nansum(x) + 42.0 + +How numpy handles numerical exceptions +-------------------------------------- + +The default is to ``'warn'`` for ``invalid``, ``divide``, and ``overflow`` +and ``'ignore'`` for ``underflow``. But this can be changed, and it can be +set individually for different kinds of exceptions. The different behaviors +are: + + - 'ignore' : Take no action when the exception occurs. + - 'warn' : Print a `RuntimeWarning` (via the Python `warnings` module). + - 'raise' : Raise a `FloatingPointError`. + - 'call' : Call a function specified using the `seterrcall` function. + - 'print' : Print a warning directly to ``stdout``. + - 'log' : Record error in a Log object specified by `seterrcall`. + +These behaviors can be set for all kinds of errors or specific ones: + + - all : apply to all numeric exceptions + - invalid : when NaNs are generated + - divide : divide by zero (for integers as well!) + - overflow : floating point overflows + - underflow : floating point underflows + +Note that integer divide-by-zero is handled by the same machinery. +These behaviors are set on a per-thread basis. + +Examples +-------- + +:: + + >>> oldsettings = np.seterr(all='warn') + >>> np.zeros(5,dtype=np.float32)/0. + invalid value encountered in divide + >>> j = np.seterr(under='ignore') + >>> np.array([1.e-100])**10 + >>> j = np.seterr(invalid='raise') + >>> np.sqrt(np.array([-1.])) + FloatingPointError: invalid value encountered in sqrt + >>> def errorhandler(errstr, errflag): + ... print("saw stupid error!") + >>> np.seterrcall(errorhandler) + <function err_handler at 0x...> + >>> j = np.seterr(all='call') + >>> np.zeros(5, dtype=np.int32)/0 + FloatingPointError: invalid value encountered in divide + saw stupid error! + >>> j = np.seterr(**oldsettings) # restore previous + ... # error-handling settings + +Interfacing to C +---------------- +Only a survey of the choices. Little detail on how each works. + +1) Bare metal, wrap your own C-code manually. + + - Plusses: + + - Efficient + - No dependencies on other tools + + - Minuses: + + - Lots of learning overhead: + + - need to learn basics of Python C API + - need to learn basics of numpy C API + - need to learn how to handle reference counting and love it. + + - Reference counting often difficult to get right. + + - getting it wrong leads to memory leaks, and worse, segfaults + + - API will change for Python 3.0! + +2) Cython + + - Plusses: + + - avoid learning C API's + - no dealing with reference counting + - can code in pseudo python and generate C code + - can also interface to existing C code + - should shield you from changes to Python C api + - has become the de-facto standard within the scientific Python community + - fast indexing support for arrays + + - Minuses: + + - Can write code in non-standard form which may become obsolete + - Not as flexible as manual wrapping + +3) ctypes + + - Plusses: + + - part of Python standard library + - good for interfacing to existing sharable libraries, particularly + Windows DLLs + - avoids API/reference counting issues + - good numpy support: arrays have all these in their ctypes + attribute: :: + + a.ctypes.data a.ctypes.get_strides + a.ctypes.data_as a.ctypes.shape + a.ctypes.get_as_parameter a.ctypes.shape_as + a.ctypes.get_data a.ctypes.strides + a.ctypes.get_shape a.ctypes.strides_as + + - Minuses: + + - can't use for writing code to be turned into C extensions, only a wrapper + tool. + +4) SWIG (automatic wrapper generator) + + - Plusses: + + - around a long time + - multiple scripting language support + - C++ support + - Good for wrapping large (many functions) existing C libraries + + - Minuses: + + - generates lots of code between Python and the C code + - can cause performance problems that are nearly impossible to optimize + out + - interface files can be hard to write + - doesn't necessarily avoid reference counting issues or needing to know + API's + +5) scipy.weave + + - Plusses: + + - can turn many numpy expressions into C code + - dynamic compiling and loading of generated C code + - can embed pure C code in Python module and have weave extract, generate + interfaces and compile, etc. + + - Minuses: + + - Future very uncertain: it's the only part of Scipy not ported to Python 3 + and is effectively deprecated in favor of Cython. + +6) Psyco + + - Plusses: + + - Turns pure python into efficient machine code through jit-like + optimizations + - very fast when it optimizes well + + - Minuses: + + - Only on intel (windows?) + - Doesn't do much for numpy? + +Interfacing to Fortran: +----------------------- +The clear choice to wrap Fortran code is +`f2py <https://docs.scipy.org/doc/numpy/f2py/>`_. + +Pyfort is an older alternative, but not supported any longer. +Fwrap is a newer project that looked promising but isn't being developed any +longer. + +Interfacing to C++: +------------------- + 1) Cython + 2) CXX + 3) Boost.python + 4) SWIG + 5) SIP (used mainly in PyQT) + + diff --git a/doc/source/user/numpy-for-matlab-users.rst b/doc/source/user/numpy-for-matlab-users.rst index 602192ecd..ed0be82a0 100644 --- a/doc/source/user/numpy-for-matlab-users.rst +++ b/doc/source/user/numpy-for-matlab-users.rst @@ -1,18 +1,15 @@ .. _numpy-for-matlab-users: ====================== -NumPy for Matlab users +NumPy for MATLAB users ====================== Introduction ============ -MATLAB® and NumPy/SciPy have a lot in common. But there are many -differences. NumPy and SciPy were created to do numerical and scientific -computing in the most natural way with Python, not to be MATLAB® clones. -This page is intended to be a place to collect wisdom about the -differences, mostly for the purpose of helping proficient MATLAB® users -become proficient NumPy and SciPy users. +MATLAB® and NumPy have a lot in common, but NumPy was created to work with +Python, not to be a MATLAB clone. This guide will help MATLAB users get started +with NumPy. .. raw:: html @@ -20,234 +17,184 @@ become proficient NumPy and SciPy users. table.docutils td { border: solid 1px #ccc; } </style> -Some Key Differences +Some key differences ==================== .. list-table:: - - * - In MATLAB®, the basic data type is a multidimensional array of - double precision floating point numbers. Most expressions take such - arrays and return such arrays. Operations on the 2-D instances of - these arrays are designed to act more or less like matrix operations - in linear algebra. - - In NumPy the basic type is a multidimensional ``array``. Operations - on these arrays in all dimensionalities including 2D are element-wise - operations. One needs to use specific functions for linear algebra - (though for matrix multiplication, one can use the ``@`` operator - in python 3.5 and above). - - * - MATLAB® uses 1 (one) based indexing. The initial element of a - sequence is found using a(1). + :class: docutils + + * - In MATLAB, the basic type, even for scalars, is a + multidimensional array. Array assignments in MATLAB are stored as + 2D arrays of double precision floating point numbers, unless you + specify the number of dimensions and type. Operations on the 2D + instances of these arrays are modeled on matrix operations in + linear algebra. + + - In NumPy, the basic type is a multidimensional ``array``. Array + assignments in NumPy are usually stored as :ref:`n-dimensional arrays<arrays>` with the + minimum type required to hold the objects in sequence, unless you + specify the number of dimensions and type. NumPy performs + operations element-by-element, so multiplying 2D arrays with + ``*`` is not a matrix multiplication -- it's an + element-by-element multiplication. (The ``@`` operator, available + since Python 3.5, can be used for conventional matrix + multiplication.) + + * - MATLAB numbers indices from 1; ``a(1)`` is the first element. :ref:`See note INDEXING <numpy-for-matlab-users.notes>` - - Python uses 0 (zero) based indexing. The initial element of a - sequence is found using a[0]. - - * - MATLAB®'s scripting language was created for doing linear algebra. - The syntax for basic matrix operations is nice and clean, but the API - for adding GUIs and making full-fledged applications is more or less - an afterthought. - - NumPy is based on Python, which was designed from the outset to be - an excellent general-purpose programming language. While Matlab's - syntax for some array manipulations is more compact than - NumPy's, NumPy (by virtue of being an add-on to Python) can do many - things that Matlab just cannot, for instance dealing properly with - stacks of matrices. - - * - In MATLAB®, arrays have pass-by-value semantics, with a lazy - copy-on-write scheme to prevent actually creating copies until they - are actually needed. Slice operations copy parts of the array. - - In NumPy arrays have pass-by-reference semantics. Slice operations - are views into an array. - - -'array' or 'matrix'? Which should I use? -======================================== - -Historically, NumPy has provided a special matrix type, `np.matrix`, which -is a subclass of ndarray which makes binary operations linear algebra -operations. You may see it used in some existing code instead of `np.array`. -So, which one to use? - -Short answer ------------- - -**Use arrays**. - -- They are the standard vector/matrix/tensor type of numpy. Many numpy - functions return arrays, not matrices. -- There is a clear distinction between element-wise operations and - linear algebra operations. -- You can have standard vectors or row/column vectors if you like. - -Until Python 3.5 the only disadvantage of using the array type was that you -had to use ``dot`` instead of ``*`` to multiply (reduce) two tensors -(scalar product, matrix vector multiplication etc.). Since Python 3.5 you -can use the matrix multiplication ``@`` operator. - -Given the above, we intend to deprecate ``matrix`` eventually. - -Long answer ------------ - -NumPy contains both an ``array`` class and a ``matrix`` class. The -``array`` class is intended to be a general-purpose n-dimensional array -for many kinds of numerical computing, while ``matrix`` is intended to -facilitate linear algebra computations specifically. In practice there -are only a handful of key differences between the two. - -- Operators ``*`` and ``@``, functions ``dot()``, and ``multiply()``: - - - For ``array``, **``*`` means element-wise multiplication**, while - **``@`` means matrix multiplication**; they have associated functions - ``multiply()`` and ``dot()``. (Before python 3.5, ``@`` did not exist - and one had to use ``dot()`` for matrix multiplication). - - For ``matrix``, **``*`` means matrix multiplication**, and for - element-wise multiplication one has to use the ``multiply()`` function. - -- Handling of vectors (one-dimensional arrays) - - - For ``array``, the **vector shapes 1xN, Nx1, and N are all different - things**. Operations like ``A[:,1]`` return a one-dimensional array of - shape N, not a two-dimensional array of shape Nx1. Transpose on a - one-dimensional ``array`` does nothing. - - For ``matrix``, **one-dimensional arrays are always upconverted to 1xN - or Nx1 matrices** (row or column vectors). ``A[:,1]`` returns a - two-dimensional matrix of shape Nx1. - -- Handling of higher-dimensional arrays (ndim > 2) - - - ``array`` objects **can have number of dimensions > 2**; - - ``matrix`` objects **always have exactly two dimensions**. - -- Convenience attributes - - - ``array`` **has a .T attribute**, which returns the transpose of - the data. - - ``matrix`` **also has .H, .I, and .A attributes**, which return - the conjugate transpose, inverse, and ``asarray()`` of the matrix, - respectively. - -- Convenience constructor - - - The ``array`` constructor **takes (nested) Python sequences as - initializers**. As in, ``array([[1,2,3],[4,5,6]])``. - - The ``matrix`` constructor additionally **takes a convenient - string initializer**. As in ``matrix("[1 2 3; 4 5 6]")``. - -There are pros and cons to using both: - -- ``array`` - - - ``:)`` Element-wise multiplication is easy: ``A*B``. - - ``:(`` You have to remember that matrix multiplication has its own - operator, ``@``. - - ``:)`` You can treat one-dimensional arrays as *either* row or column - vectors. ``A @ v`` treats ``v`` as a column vector, while - ``v @ A`` treats ``v`` as a row vector. This can save you having to - type a lot of transposes. - - ``:)`` ``array`` is the "default" NumPy type, so it gets the most - testing, and is the type most likely to be returned by 3rd party - code that uses NumPy. - - ``:)`` Is quite at home handling data of any number of dimensions. - - ``:)`` Closer in semantics to tensor algebra, if you are familiar - with that. - - ``:)`` *All* operations (``*``, ``/``, ``+``, ``-`` etc.) are - element-wise. - - ``:(`` Sparse matrices from ``scipy.sparse`` do not interact as well - with arrays. - -- ``matrix`` - - - ``:\\`` Behavior is more like that of MATLAB® matrices. - - ``<:(`` Maximum of two-dimensional. To hold three-dimensional data you - need ``array`` or perhaps a Python list of ``matrix``. - - ``<:(`` Minimum of two-dimensional. You cannot have vectors. They must be - cast as single-column or single-row matrices. - - ``<:(`` Since ``array`` is the default in NumPy, some functions may - return an ``array`` even if you give them a ``matrix`` as an - argument. This shouldn't happen with NumPy functions (if it does - it's a bug), but 3rd party code based on NumPy may not honor type - preservation like NumPy does. - - ``:)`` ``A*B`` is matrix multiplication, so it looks just like you write - it in linear algebra (For Python >= 3.5 plain arrays have the same - convenience with the ``@`` operator). - - ``<:(`` Element-wise multiplication requires calling a function, - ``multiply(A,B)``. - - ``<:(`` The use of operator overloading is a bit illogical: ``*`` - does not work element-wise but ``/`` does. - - Interaction with ``scipy.sparse`` is a bit cleaner. + - NumPy, like Python, numbers indices from 0; ``a[0]`` is the first + element. -The ``array`` is thus much more advisable to use. Indeed, we intend to -deprecate ``matrix`` eventually. - -Table of Rough MATLAB-NumPy Equivalents + * - MATLAB's scripting language was created for linear algebra so the + syntax for some array manipulations is more compact than + NumPy's. On the other hand, the API for adding GUIs and creating + full-fledged applications is more or less an afterthought. + - NumPy is based on Python, a + general-purpose language. The advantage to NumPy + is access to Python libraries including: `SciPy + <https://www.scipy.org/>`_, `Matplotlib <https://matplotlib.org/>`_, + `Pandas <https://pandas.pydata.org/>`_, `OpenCV <https://opencv.org/>`_, + and more. In addition, Python is often `embedded as a scripting language + <https://en.wikipedia.org/wiki/List_of_Python_software#Embedded_as_a_scripting_language>`_ + in other software, allowing NumPy to be used there too. + + * - MATLAB array slicing uses pass-by-value semantics, with a lazy + copy-on-write scheme to prevent creating copies until they are + needed. Slicing operations copy parts of the array. + - NumPy array slicing uses pass-by-reference, that does not copy + the arguments. Slicing operations are views into an array. + + +Rough equivalents ======================================= -The table below gives rough equivalents for some common MATLAB® -expressions. **These are not exact equivalents**, but rather should be -taken as hints to get you going in the right direction. For more detail -read the built-in documentation on the NumPy functions. +The table below gives rough equivalents for some common MATLAB +expressions. These are similar expressions, not equivalents. For +details, see the :ref:`documentation<reference>`. In the table below, it is assumed that you have executed the following commands in Python: :: - from numpy import * - import scipy.linalg + import numpy as np + from scipy import io, integrate, linalg, signal + from scipy.sparse.linalg import eigs Also assume below that if the Notes talk about "matrix" that the arguments are two-dimensional entities. -General Purpose Equivalents +General purpose equivalents --------------------------- .. list-table:: :header-rows: 1 - * - **MATLAB** - - **numpy** - - **Notes** + * - MATLAB + - NumPy + - Notes * - ``help func`` - - ``info(func)`` or ``help(func)`` or ``func?`` (in Ipython) + - ``info(func)`` or ``help(func)`` or ``func?`` (in IPython) - get help on the function *func* * - ``which func`` - - `see note HELP <numpy-for-matlab-users.notes>`__ + - :ref:`see note HELP <numpy-for-matlab-users.notes>` - find out where *func* is defined * - ``type func`` - - ``source(func)`` or ``func??`` (in Ipython) + - ``np.source(func)`` or ``func??`` (in IPython) - print source for *func* (if not a native function) + * - ``% comment`` + - ``# comment`` + - comment a line of code with the text ``comment`` + + * - :: + + for i=1:3 + fprintf('%i\n',i) + end + + - :: + + for i in range(1, 4): + print(i) + + - use a for-loop to print the numbers 1, 2, and 3 using :py:class:`range <range>` + * - ``a && b`` - ``a and b`` - - short-circuiting logical AND operator (Python native operator); + - short-circuiting logical AND operator (:ref:`Python native operator <python:boolean>`); scalar arguments only * - ``a || b`` - ``a or b`` - - short-circuiting logical OR operator (Python native operator); + - short-circuiting logical OR operator (:ref:`Python native operator <python:boolean>`); scalar arguments only + * - .. code:: matlab + + >> 4 == 4 + ans = 1 + >> 4 == 5 + ans = 0 + + - :: + + >>> 4 == 4 + True + >>> 4 == 5 + False + + - The :ref:`boolean objects <python:bltin-boolean-values>` + in Python are ``True`` and ``False``, as opposed to MATLAB + logical types of ``1`` and ``0``. + + * - .. code:: matlab + + a=4 + if a==4 + fprintf('a = 4\n') + elseif a==5 + fprintf('a = 5\n') + end + + - :: + + a = 4 + if a == 4: + print('a = 4') + elif a == 5: + print('a = 5') + + - create an if-else statement to check if ``a`` is 4 or 5 and print result + * - ``1*i``, ``1*j``, ``1i``, ``1j`` - ``1j`` - complex numbers * - ``eps`` - - ``np.spacing(1)`` - - Distance between 1 and the nearest floating point number. + - ``np.finfo(float).eps`` or ``np.spacing(1)`` + - Upper bound to relative error due to rounding in 64-bit floating point + arithmetic. + + * - ``load data.mat`` + - ``io.loadmat('data.mat')`` + - Load MATLAB variables saved to the file ``data.mat``. (Note: When saving arrays to + ``data.mat`` in MATLAB/Octave, use a recent binary format. :func:`scipy.io.loadmat` + will create a dictionary with the saved arrays and further information.) * - ``ode45`` - - ``scipy.integrate.solve_ivp(f)`` + - ``integrate.solve_ivp(f)`` - integrate an ODE with Runge-Kutta 4,5 * - ``ode15s`` - - ``scipy.integrate.solve_ivp(f, method='BDF')`` + - ``integrate.solve_ivp(f, method='BDF')`` - integrate an ODE with BDF method -Linear Algebra Equivalents + +Linear algebra equivalents -------------------------- .. list-table:: @@ -258,63 +205,63 @@ Linear Algebra Equivalents - Notes * - ``ndims(a)`` - - ``ndim(a)`` or ``a.ndim`` - - get the number of dimensions of an array + - ``np.ndim(a)`` or ``a.ndim`` + - number of dimensions of array ``a`` * - ``numel(a)`` - - ``size(a)`` or ``a.size`` - - get the number of elements of an array + - ``np.size(a)`` or ``a.size`` + - number of elements of array ``a`` * - ``size(a)`` - - ``shape(a)`` or ``a.shape`` - - get the "size" of the matrix + - ``np.shape(a)`` or ``a.shape`` + - "size" of array ``a`` * - ``size(a,n)`` - ``a.shape[n-1]`` - get the number of elements of the n-th dimension of array ``a``. (Note - that MATLAB® uses 1 based indexing while Python uses 0 based indexing, + that MATLAB uses 1 based indexing while Python uses 0 based indexing, See note :ref:`INDEXING <numpy-for-matlab-users.notes>`) * - ``[ 1 2 3; 4 5 6 ]`` - - ``array([[1.,2.,3.], [4.,5.,6.]])`` - - 2x3 matrix literal + - ``np.array([[1. ,2. ,3.], [4. ,5. ,6.]])`` + - define a 2x3 2D array * - ``[ a b; c d ]`` - - ``block([[a,b], [c,d]])`` + - ``np.block([[a, b], [c, d]])`` - construct a matrix from blocks ``a``, ``b``, ``c``, and ``d`` * - ``a(end)`` - ``a[-1]`` - - access last element in the 1xn matrix ``a`` + - access last element in MATLAB vector (1xn or nx1) or 1D NumPy array + ``a`` (length n) * - ``a(2,5)`` - - ``a[1,4]`` - - access element in second row, fifth column + - ``a[1, 4]`` + - access element in second row, fifth column in 2D array ``a`` * - ``a(2,:)`` - - ``a[1]`` or ``a[1,:]`` - - entire second row of ``a`` + - ``a[1]`` or ``a[1, :]`` + - entire second row of 2D array ``a`` * - ``a(1:5,:)`` - - ``a[0:5]`` or ``a[:5]`` or ``a[0:5,:]`` - - the first five rows of ``a`` + - ``a[0:5]`` or ``a[:5]`` or ``a[0:5, :]`` + - first 5 rows of 2D array ``a`` * - ``a(end-4:end,:)`` - ``a[-5:]`` - - the last five rows of ``a`` + - last 5 rows of 2D array ``a`` * - ``a(1:3,5:9)`` - - ``a[0:3][:,4:9]`` - - rows one to three and columns five to nine of ``a``. This gives - read-only access. + - ``a[0:3, 4:9]`` + - The first through third rows and fifth through ninth columns of a 2D array, ``a``. * - ``a([2,4,5],[1,3])`` - - ``a[ix_([1,3,4],[0,2])]`` + - ``a[np.ix_([1, 3, 4], [0, 2])]`` - rows 2,4 and 5 and columns 1 and 3. This allows the matrix to be modified, and doesn't require a regular slice. * - ``a(3:2:21,:)`` - - ``a[ 2:21:2,:]`` + - ``a[2:21:2,:]`` - every other row of ``a``, starting with the third and going to the twenty-first @@ -323,11 +270,11 @@ Linear Algebra Equivalents - every other row of ``a``, starting with the first * - ``a(end:-1:1,:)`` or ``flipud(a)`` - - ``a[ ::-1,:]`` + - ``a[::-1,:]`` - ``a`` with rows in reverse order * - ``a([1:end 1],:)`` - - ``a[r_[:len(a),0]]`` + - ``a[np.r_[:len(a),0]]`` - ``a`` with copy of the first row appended to the end * - ``a.'`` @@ -354,30 +301,30 @@ Linear Algebra Equivalents - ``a**3`` - element-wise exponentiation - * - ``(a>0.5)`` - - ``(a>0.5)`` - - matrix whose i,jth element is (a_ij > 0.5). The Matlab result is an - array of 0s and 1s. The NumPy result is an array of the boolean + * - ``(a > 0.5)`` + - ``(a > 0.5)`` + - matrix whose i,jth element is (a_ij > 0.5). The MATLAB result is an + array of logical values 0 and 1. The NumPy result is an array of the boolean values ``False`` and ``True``. - * - ``find(a>0.5)`` - - ``nonzero(a>0.5)`` + * - ``find(a > 0.5)`` + - ``np.nonzero(a > 0.5)`` - find the indices where (``a`` > 0.5) - * - ``a(:,find(v>0.5))`` - - ``a[:,nonzero(v>0.5)[0]]`` + * - ``a(:,find(v > 0.5))`` + - ``a[:,np.nonzero(v > 0.5)[0]]`` - extract the columms of ``a`` where vector v > 0.5 * - ``a(:,find(v>0.5))`` - - ``a[:,v.T>0.5]`` + - ``a[:, v.T > 0.5]`` - extract the columms of ``a`` where column vector v > 0.5 * - ``a(a<0.5)=0`` - - ``a[a<0.5]=0`` + - ``a[a < 0.5]=0`` - ``a`` with elements less than 0.5 zeroed out * - ``a .* (a>0.5)`` - - ``a * (a>0.5)`` + - ``a * (a > 0.5)`` - ``a`` with elements less than 0.5 zeroed out * - ``a(:) = 3`` @@ -386,74 +333,86 @@ Linear Algebra Equivalents * - ``y=x`` - ``y = x.copy()`` - - numpy assigns by reference + - NumPy assigns by reference * - ``y=x(2,:)`` - - ``y = x[1,:].copy()`` - - numpy slices are by reference + - ``y = x[1, :].copy()`` + - NumPy slices are by reference * - ``y=x(:)`` - ``y = x.flatten()`` - turn array into vector (note that this forces a copy). To obtain the - same data ordering as in Matlab, use ``x.flatten('F')``. + same data ordering as in MATLAB, use ``x.flatten('F')``. * - ``1:10`` - - ``arange(1.,11.)`` or ``r_[1.:11.]`` or ``r_[1:10:10j]`` + - ``np.arange(1., 11.)`` or ``np.r_[1.:11.]`` or ``np.r_[1:10:10j]`` - create an increasing vector (see note :ref:`RANGES <numpy-for-matlab-users.notes>`) * - ``0:9`` - - ``arange(10.)`` or ``r_[:10.]`` or ``r_[:9:10j]`` + - ``np.arange(10.)`` or ``np.r_[:10.]`` or ``np.r_[:9:10j]`` - create an increasing vector (see note :ref:`RANGES <numpy-for-matlab-users.notes>`) * - ``[1:10]'`` - - ``arange(1.,11.)[:, newaxis]`` + - ``np.arange(1.,11.)[:, np.newaxis]`` - create a column vector * - ``zeros(3,4)`` - - ``zeros((3,4))`` + - ``np.zeros((3, 4))`` - 3x4 two-dimensional array full of 64-bit floating point zeros * - ``zeros(3,4,5)`` - - ``zeros((3,4,5))`` + - ``np.zeros((3, 4, 5))`` - 3x4x5 three-dimensional array full of 64-bit floating point zeros * - ``ones(3,4)`` - - ``ones((3,4))`` + - ``np.ones((3, 4))`` - 3x4 two-dimensional array full of 64-bit floating point ones * - ``eye(3)`` - - ``eye(3)`` + - ``np.eye(3)`` - 3x3 identity matrix * - ``diag(a)`` - - ``diag(a)`` - - vector of diagonal elements of ``a`` + - ``np.diag(a)`` + - returns a vector of the diagonal elements of 2D array, ``a`` + + * - ``diag(v,0)`` + - ``np.diag(v, 0)`` + - returns a square diagonal matrix whose nonzero values are the elements of + vector, ``v`` - * - ``diag(a,0)`` - - ``diag(a,0)`` - - square diagonal matrix whose nonzero values are the elements of - ``a`` + * - .. code:: matlab + + rng(42,'twister') + rand(3,4) - * - ``rand(3,4)`` - - ``random.rand(3,4)`` or ``random.random_sample((3, 4))`` - - random 3x4 matrix + - :: + + from numpy.random import default_rng + rng = default_rng(42) + rng.random(3, 4) + + or older version: ``random.rand((3, 4))`` + + - generate a random 3x4 array with default random number generator and + seed = 42 * - ``linspace(1,3,4)`` - - ``linspace(1,3,4)`` + - ``np.linspace(1,3,4)`` - 4 equally spaced samples between 1 and 3, inclusive * - ``[x,y]=meshgrid(0:8,0:5)`` - - ``mgrid[0:9.,0:6.]`` or ``meshgrid(r_[0:9.],r_[0:6.]`` + - ``np.mgrid[0:9.,0:6.]`` or ``np.meshgrid(r_[0:9.],r_[0:6.]`` - two 2D arrays: one of x values, the other of y values * - - - ``ogrid[0:9.,0:6.]`` or ``ix_(r_[0:9.],r_[0:6.]`` + - ``ogrid[0:9.,0:6.]`` or ``np.ix_(np.r_[0:9.],np.r_[0:6.]`` - the best way to eval functions on a grid * - ``[x,y]=meshgrid([1,2,4],[2,4,5])`` - - ``meshgrid([1,2,4],[2,4,5])`` + - ``np.meshgrid([1,2,4],[2,4,5])`` - * - @@ -461,37 +420,38 @@ Linear Algebra Equivalents - the best way to eval functions on a grid * - ``repmat(a, m, n)`` - - ``tile(a, (m, n))`` + - ``np.tile(a, (m, n))`` - create m by n copies of ``a`` * - ``[a b]`` - - ``concatenate((a,b),1)`` or ``hstack((a,b))`` or - ``column_stack((a,b))`` or ``c_[a,b]`` + - ``np.concatenate((a,b),1)`` or ``np.hstack((a,b))`` or + ``np.column_stack((a,b))`` or ``np.c_[a,b]`` - concatenate columns of ``a`` and ``b`` * - ``[a; b]`` - - ``concatenate((a,b))`` or ``vstack((a,b))`` or ``r_[a,b]`` + - ``np.concatenate((a,b))`` or ``np.vstack((a,b))`` or ``np.r_[a,b]`` - concatenate rows of ``a`` and ``b`` * - ``max(max(a))`` - - ``a.max()`` - - maximum element of ``a`` (with ndims(a)<=2 for matlab) + - ``a.max()`` or ``np.nanmax(a)`` + - maximum element of ``a`` (with ndims(a)<=2 for MATLAB, if there are + NaN's, ``nanmax`` will ignore these and return largest value) * - ``max(a)`` - ``a.max(0)`` - - maximum element of each column of matrix ``a`` + - maximum element of each column of array ``a`` * - ``max(a,[],2)`` - ``a.max(1)`` - - maximum element of each row of matrix ``a`` + - maximum element of each row of array ``a`` * - ``max(a,b)`` - - ``maximum(a, b)`` + - ``np.maximum(a, b)`` - compares ``a`` and ``b`` element-wise, and returns the maximum value from each pair * - ``norm(v)`` - - ``sqrt(v @ v)`` or ``np.linalg.norm(v)`` + - ``np.sqrt(v @ v)`` or ``np.linalg.norm(v)`` - L2 norm of vector ``v`` * - ``a & b`` @@ -500,7 +460,7 @@ Linear Algebra Equivalents LOGICOPS <numpy-for-matlab-users.notes>` * - ``a | b`` - - ``logical_or(a,b)`` + - ``np.logical_or(a,b)`` - element-by-element OR operator (NumPy ufunc) :ref:`See note LOGICOPS <numpy-for-matlab-users.notes>` @@ -514,90 +474,99 @@ Linear Algebra Equivalents * - ``inv(a)`` - ``linalg.inv(a)`` - - inverse of square matrix ``a`` + - inverse of square 2D array ``a`` * - ``pinv(a)`` - ``linalg.pinv(a)`` - - pseudo-inverse of matrix ``a`` + - pseudo-inverse of 2D array ``a`` * - ``rank(a)`` - ``linalg.matrix_rank(a)`` - - matrix rank of a 2D array / matrix ``a`` + - matrix rank of a 2D array ``a`` * - ``a\b`` - - ``linalg.solve(a,b)`` if ``a`` is square; ``linalg.lstsq(a,b)`` + - ``linalg.solve(a, b)`` if ``a`` is square; ``linalg.lstsq(a, b)`` otherwise - solution of a x = b for x * - ``b/a`` - - Solve a.T x.T = b.T instead + - Solve ``a.T x.T = b.T`` instead - solution of x a = b for x * - ``[U,S,V]=svd(a)`` - ``U, S, Vh = linalg.svd(a), V = Vh.T`` - singular value decomposition of ``a`` - * - ``chol(a)`` - - ``linalg.cholesky(a).T`` - - cholesky factorization of a matrix (``chol(a)`` in matlab returns an - upper triangular matrix, but ``linalg.cholesky(a)`` returns a lower - triangular matrix) + * - ``c=chol(a)`` where ``a==c'*c`` + - ``c = linalg.cholesky(a)`` where ``a == c@c.T`` + - Cholesky factorization of a 2D array (``chol(a)`` in MATLAB returns an + upper triangular 2D array, but :func:`~scipy.linalg.cholesky` returns a lower + triangular 2D array) * - ``[V,D]=eig(a)`` - ``D,V = linalg.eig(a)`` - - eigenvalues and eigenvectors of ``a`` + - eigenvalues :math:`\lambda` and eigenvectors :math:`\bar{v}` of ``a``, + where :math:`\lambda\bar{v}=\mathbf{a}\bar{v}` * - ``[V,D]=eig(a,b)`` - - ``D,V = scipy.linalg.eig(a,b)`` - - eigenvalues and eigenvectors of ``a``, ``b`` + - ``D,V = linalg.eig(a, b)`` + - eigenvalues :math:`\lambda` and eigenvectors :math:`\bar{v}` of + ``a``, ``b`` + where :math:`\lambda\mathbf{b}\bar{v}=\mathbf{a}\bar{v}` - * - ``[V,D]=eigs(a,k)`` - - - - find the ``k`` largest eigenvalues and eigenvectors of ``a`` + * - ``[V,D]=eigs(a,3)`` + - ``D,V = eigs(a, k = 3)`` + - find the ``k=3`` largest eigenvalues and eigenvectors of 2D array, ``a`` * - ``[Q,R,P]=qr(a,0)`` - - ``Q,R = scipy.linalg.qr(a)`` + - ``Q,R = linalg.qr(a)`` - QR decomposition - * - ``[L,U,P]=lu(a)`` - - ``L,U = scipy.linalg.lu(a)`` or ``LU,P=scipy.linalg.lu_factor(a)`` - - LU decomposition (note: P(Matlab) == transpose(P(numpy)) ) + * - ``[L,U,P]=lu(a)`` where ``a==P'*L*U`` + - ``P,L,U = linalg.lu(a)`` where ``a == P@L@U`` + - LU decomposition (note: P(MATLAB) == transpose(P(NumPy))) * - ``conjgrad`` - - ``scipy.sparse.linalg.cg`` + - ``cg`` - Conjugate gradients solver * - ``fft(a)`` - - ``fft(a)`` + - ``np.fft(a)`` - Fourier transform of ``a`` * - ``ifft(a)`` - - ``ifft(a)`` + - ``np.ifft(a)`` - inverse Fourier transform of ``a`` * - ``sort(a)`` - - ``sort(a)`` or ``a.sort()`` - - sort the matrix + - ``np.sort(a)`` or ``a.sort(axis=0)`` + - sort each column of a 2D array, ``a`` - * - ``[b,I] = sortrows(a,i)`` - - ``I = argsort(a[:,i]), b=a[I,:]`` - - sort the rows of the matrix + * - ``sort(a, 2)`` + - ``np.sort(a, axis = 1)`` or ``a.sort(axis = 1)`` + - sort the each row of 2D array, ``a`` - * - ``regress(y,X)`` - - ``linalg.lstsq(X,y)`` - - multilinear regression + * - ``[b,I]=sortrows(a,1)`` + - ``I = np.argsort(a[:, 0]); b = a[I,:]`` + - save the array ``a`` as array ``b`` with rows sorted by the first column + + * - ``x = Z\y`` + - ``x = linalg.lstsq(Z, y)`` + - perform a linear regression of the form :math:`\mathbf{Zx}=\mathbf{y}` * - ``decimate(x, q)`` - - ``scipy.signal.resample(x, len(x)/q)`` + - ``signal.resample(x, np.ceil(len(x)/q))`` - downsample with low-pass filtering * - ``unique(a)`` - - ``unique(a)`` - - + - ``np.unique(a)`` + - a vector of unique values in array ``a`` * - ``squeeze(a)`` - ``a.squeeze()`` - - + - remove singleton dimensions of array ``a``. Note that MATLAB will always + return arrays of 2D or higher while NumPy will return arrays of 0D or + higher .. _numpy-for-matlab-users.notes: @@ -605,15 +574,15 @@ Notes ===== \ **Submatrix**: Assignment to a submatrix can be done with lists of -indexes using the ``ix_`` command. E.g., for 2d array ``a``, one might -do: ``ind=[1,3]; a[np.ix_(ind,ind)]+=100``. +indices using the ``ix_`` command. E.g., for 2D array ``a``, one might +do: ``ind=[1, 3]; a[np.ix_(ind, ind)] += 100``. \ **HELP**: There is no direct equivalent of MATLAB's ``which`` command, -but the commands ``help`` and ``source`` will usually list the filename +but the commands :func:`help` and :func:`numpy.source` will usually list the filename where the function is located. Python also has an ``inspect`` module (do ``import inspect``) which provides a ``getfile`` that often works. -\ **INDEXING**: MATLAB® uses one based indexing, so the initial element +\ **INDEXING**: MATLAB uses one based indexing, so the initial element of a sequence has index 1. Python uses zero based indexing, so the initial element of a sequence has index 0. Confusion and flamewars arise because each has advantages and disadvantages. One based indexing is @@ -623,55 +592,176 @@ indexing <https://groups.google.com/group/comp.lang.python/msg/1bf4d925dfbf368?q See also `a text by prof.dr. Edsger W. Dijkstra <https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html>`__. -\ **RANGES**: In MATLAB®, ``0:5`` can be used as both a range literal +\ **RANGES**: In MATLAB, ``0:5`` can be used as both a range literal and a 'slice' index (inside parentheses); however, in Python, constructs like ``0:5`` can *only* be used as a slice index (inside square brackets). Thus the somewhat quirky ``r_`` object was created to allow -numpy to have a similarly terse range construction mechanism. Note that +NumPy to have a similarly terse range construction mechanism. Note that ``r_`` is not called like a function or a constructor, but rather *indexed* using square brackets, which allows the use of Python's slice syntax in the arguments. -\ **LOGICOPS**: & or \| in NumPy is bitwise AND/OR, while in Matlab & -and \| are logical AND/OR. The difference should be clear to anyone with -significant programming experience. The two can appear to work the same, -but there are important differences. If you would have used Matlab's & -or \| operators, you should use the NumPy ufuncs -logical\_and/logical\_or. The notable differences between Matlab's and -NumPy's & and \| operators are: +\ **LOGICOPS**: ``&`` or ``|`` in NumPy is bitwise AND/OR, while in MATLAB & +and ``|`` are logical AND/OR. The two can appear to work the same, +but there are important differences. If you would have used MATLAB's ``&`` +or ``|`` operators, you should use the NumPy ufuncs +``logical_and``/``logical_or``. The notable differences between MATLAB's and +NumPy's ``&`` and ``|`` operators are: - Non-logical {0,1} inputs: NumPy's output is the bitwise AND of the - inputs. Matlab treats any non-zero value as 1 and returns the logical - AND. For example (3 & 4) in NumPy is 0, while in Matlab both 3 and 4 - are considered logical true and (3 & 4) returns 1. + inputs. MATLAB treats any non-zero value as 1 and returns the logical + AND. For example ``(3 & 4)`` in NumPy is ``0``, while in MATLAB both ``3`` + and ``4`` + are considered logical true and ``(3 & 4)`` returns ``1``. - Precedence: NumPy's & operator is higher precedence than logical - operators like < and >; Matlab's is the reverse. + operators like ``<`` and ``>``; MATLAB's is the reverse. If you know you have boolean arguments, you can get away with using -NumPy's bitwise operators, but be careful with parentheses, like this: z -= (x > 1) & (x < 2). The absence of NumPy operator forms of logical\_and -and logical\_or is an unfortunate consequence of Python's design. +NumPy's bitwise operators, but be careful with parentheses, like this: ``z += (x > 1) & (x < 2)``. The absence of NumPy operator forms of ``logical_and`` +and ``logical_or`` is an unfortunate consequence of Python's design. -**RESHAPE and LINEAR INDEXING**: Matlab always allows multi-dimensional +**RESHAPE and LINEAR INDEXING**: MATLAB always allows multi-dimensional arrays to be accessed using scalar or linear indices, NumPy does not. -Linear indices are common in Matlab programs, e.g. find() on a matrix +Linear indices are common in MATLAB programs, e.g. ``find()`` on a matrix returns them, whereas NumPy's find behaves differently. When converting -Matlab code it might be necessary to first reshape a matrix to a linear +MATLAB code it might be necessary to first reshape a matrix to a linear sequence, perform some indexing operations and then reshape back. As reshape (usually) produces views onto the same storage, it should be possible to do this fairly efficiently. Note that the scan order used by -reshape in NumPy defaults to the 'C' order, whereas Matlab uses the +reshape in NumPy defaults to the 'C' order, whereas MATLAB uses the Fortran order. If you are simply converting to a linear sequence and -back this doesn't matter. But if you are converting reshapes from Matlab -code which relies on the scan order, then this Matlab code: z = -reshape(x,3,4); should become z = x.reshape(3,4,order='F').copy() in +back this doesn't matter. But if you are converting reshapes from MATLAB +code which relies on the scan order, then this MATLAB code: ``z = +reshape(x,3,4);`` should become ``z = x.reshape(3,4,order='F').copy()`` in NumPy. -Customizing Your Environment +'array' or 'matrix'? Which should I use? +======================================== + +Historically, NumPy has provided a special matrix type, `np.matrix`, which +is a subclass of ndarray which makes binary operations linear algebra +operations. You may see it used in some existing code instead of `np.array`. +So, which one to use? + +Short answer +------------ + +**Use arrays**. + +- They support multidimensional array algebra that is supported in MATLAB +- They are the standard vector/matrix/tensor type of NumPy. Many NumPy + functions return arrays, not matrices. +- There is a clear distinction between element-wise operations and + linear algebra operations. +- You can have standard vectors or row/column vectors if you like. + +Until Python 3.5 the only disadvantage of using the array type was that you +had to use ``dot`` instead of ``*`` to multiply (reduce) two tensors +(scalar product, matrix vector multiplication etc.). Since Python 3.5 you +can use the matrix multiplication ``@`` operator. + +Given the above, we intend to deprecate ``matrix`` eventually. + +Long answer +----------- + +NumPy contains both an ``array`` class and a ``matrix`` class. The +``array`` class is intended to be a general-purpose n-dimensional array +for many kinds of numerical computing, while ``matrix`` is intended to +facilitate linear algebra computations specifically. In practice there +are only a handful of key differences between the two. + +- Operators ``*`` and ``@``, functions ``dot()``, and ``multiply()``: + + - For ``array``, **``*`` means element-wise multiplication**, while + **``@`` means matrix multiplication**; they have associated functions + ``multiply()`` and ``dot()``. (Before Python 3.5, ``@`` did not exist + and one had to use ``dot()`` for matrix multiplication). + - For ``matrix``, **``*`` means matrix multiplication**, and for + element-wise multiplication one has to use the ``multiply()`` function. + +- Handling of vectors (one-dimensional arrays) + + - For ``array``, the **vector shapes 1xN, Nx1, and N are all different + things**. Operations like ``A[:,1]`` return a one-dimensional array of + shape N, not a two-dimensional array of shape Nx1. Transpose on a + one-dimensional ``array`` does nothing. + - For ``matrix``, **one-dimensional arrays are always upconverted to 1xN + or Nx1 matrices** (row or column vectors). ``A[:,1]`` returns a + two-dimensional matrix of shape Nx1. + +- Handling of higher-dimensional arrays (ndim > 2) + + - ``array`` objects **can have number of dimensions > 2**; + - ``matrix`` objects **always have exactly two dimensions**. + +- Convenience attributes + + - ``array`` **has a .T attribute**, which returns the transpose of + the data. + - ``matrix`` **also has .H, .I, and .A attributes**, which return + the conjugate transpose, inverse, and ``asarray()`` of the matrix, + respectively. + +- Convenience constructor + + - The ``array`` constructor **takes (nested) Python sequences as + initializers**. As in, ``array([[1,2,3],[4,5,6]])``. + - The ``matrix`` constructor additionally **takes a convenient + string initializer**. As in ``matrix("[1 2 3; 4 5 6]")``. + +There are pros and cons to using both: + +- ``array`` + + - ``:)`` Element-wise multiplication is easy: ``A*B``. + - ``:(`` You have to remember that matrix multiplication has its own + operator, ``@``. + - ``:)`` You can treat one-dimensional arrays as *either* row or column + vectors. ``A @ v`` treats ``v`` as a column vector, while + ``v @ A`` treats ``v`` as a row vector. This can save you having to + type a lot of transposes. + - ``:)`` ``array`` is the "default" NumPy type, so it gets the most + testing, and is the type most likely to be returned by 3rd party + code that uses NumPy. + - ``:)`` Is quite at home handling data of any number of dimensions. + - ``:)`` Closer in semantics to tensor algebra, if you are familiar + with that. + - ``:)`` *All* operations (``*``, ``/``, ``+``, ``-`` etc.) are + element-wise. + - ``:(`` Sparse matrices from ``scipy.sparse`` do not interact as well + with arrays. + +- ``matrix`` + + - ``:\\`` Behavior is more like that of MATLAB matrices. + - ``<:(`` Maximum of two-dimensional. To hold three-dimensional data you + need ``array`` or perhaps a Python list of ``matrix``. + - ``<:(`` Minimum of two-dimensional. You cannot have vectors. They must be + cast as single-column or single-row matrices. + - ``<:(`` Since ``array`` is the default in NumPy, some functions may + return an ``array`` even if you give them a ``matrix`` as an + argument. This shouldn't happen with NumPy functions (if it does + it's a bug), but 3rd party code based on NumPy may not honor type + preservation like NumPy does. + - ``:)`` ``A*B`` is matrix multiplication, so it looks just like you write + it in linear algebra (For Python >= 3.5 plain arrays have the same + convenience with the ``@`` operator). + - ``<:(`` Element-wise multiplication requires calling a function, + ``multiply(A,B)``. + - ``<:(`` The use of operator overloading is a bit illogical: ``*`` + does not work element-wise but ``/`` does. + - Interaction with ``scipy.sparse`` is a bit cleaner. + +The ``array`` is thus much more advisable to use. Indeed, we intend to +deprecate ``matrix`` eventually. + +Customizing your environment ============================ -In MATLAB® the main tool available to you for customizing the +In MATLAB the main tool available to you for customizing the environment is to modify the search path with the locations of your favorite functions. You can put such customizations into a startup script that MATLAB will run on startup. @@ -685,7 +775,7 @@ NumPy, or rather Python, has similar facilities. interpreter is started, define the ``PYTHONSTARTUP`` environment variable to contain the name of your startup script. -Unlike MATLAB®, where anything on your path can be called immediately, +Unlike MATLAB, where anything on your path can be called immediately, with Python you need to first do an 'import' statement to make functions in a particular file accessible. @@ -696,26 +786,39 @@ this is just an example, not a statement of "best practices"): # Make all numpy available via shorter 'np' prefix import numpy as np - # Make all matlib functions accessible at the top level via M.func() - import numpy.matlib as M - # Make some matlib functions accessible directly at the top level via, e.g. rand(3,3) - from numpy.matlib import rand,zeros,ones,empty,eye + # + # Make the SciPy linear algebra functions available as linalg.func() + # e.g. linalg.lu, linalg.eig (for general l*B@u==A@u solution) + from scipy import linalg + # # Define a Hermitian function def hermitian(A, **kwargs): - return np.transpose(A,**kwargs).conj() - # Make some shortcuts for transpose,hermitian: - # np.transpose(A) --> T(A) + return np.conj(A,**kwargs).T + # Make a shortcut for hermitian: # hermitian(A) --> H(A) - T = np.transpose H = hermitian +To use the deprecated `matrix` and other `matlib` functions: + +:: + + # Make all matlib functions accessible at the top level via M.func() + import numpy.matlib as M + # Make some matlib functions accessible directly at the top level via, e.g. rand(3,3) + from numpy.matlib import matrix,rand,zeros,ones,empty,eye + Links ===== -See http://mathesaurus.sf.net/ for another MATLAB®/NumPy -cross-reference. +Another somewhat outdated MATLAB/NumPy cross-reference can be found at +http://mathesaurus.sf.net/ -An extensive list of tools for scientific work with python can be +An extensive list of tools for scientific work with Python can be found in the `topical software page <https://scipy.org/topical-software.html>`__. -MATLAB® and SimuLink® are registered trademarks of The MathWorks. +See +`List of Python software: scripting +<https://en.wikipedia.org/wiki/List_of_Python_software#Embedded_as_a_scripting_language>`_ +for a list of softwares that use Python as a scripting language + +MATLAB® and SimuLink® are registered trademarks of The MathWorks, Inc. diff --git a/doc/source/user/quickstart.rst b/doc/source/user/quickstart.rst index b1af81886..8fdc6ec36 100644 --- a/doc/source/user/quickstart.rst +++ b/doc/source/user/quickstart.rst @@ -1,5 +1,5 @@ =================== -Quickstart tutorial +NumPy quickstart =================== .. currentmodule:: numpy @@ -12,26 +12,24 @@ Quickstart tutorial Prerequisites ============= -Before reading this tutorial you should know a bit of Python. If you -would like to refresh your memory, take a look at the `Python +You'll need to know a bit of Python. For a refresher, see the `Python tutorial <https://docs.python.org/tutorial/>`__. -If you wish to work the examples in this tutorial, you must also have -some software installed on your computer. Please see -https://scipy.org/install.html for instructions. +To work the examples, you'll need ``matplotlib`` installed +in addition to NumPy. **Learner profile** -This tutorial is intended as a quick overview of -algebra and arrays in NumPy and want to understand how n-dimensional +This is a quick overview of +algebra and arrays in NumPy. It demonstrates how n-dimensional (:math:`n>=2`) arrays are represented and can be manipulated. In particular, if you don't know how to apply common functions to n-dimensional arrays (without using for-loops), or if you want to understand axis and shape properties for -n-dimensional arrays, this tutorial might be of help. +n-dimensional arrays, this article might be of help. **Learning Objectives** -After this tutorial, you should be able to: +After reading, you should be able to: - Understand the difference between one-, two- and n-dimensional arrays in NumPy; @@ -361,7 +359,7 @@ existing array rather than create a new one. >>> a += b # b is not automatically converted to integer type Traceback (most recent call last): ... - numpy.core._exceptions.UFuncTypeError: Cannot cast ufunc 'add' output from dtype('float64') to dtype('int64') with casting rule 'same_kind' + numpy.core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'add' output from dtype('float64') to dtype('int64') with casting rule 'same_kind' When operating with arrays of different types, the type of the resulting array corresponds to the more general or precise one (a behavior known diff --git a/doc/source/user/setting-up.rst b/doc/source/user/setting-up.rst deleted file mode 100644 index 7ca3a365c..000000000 --- a/doc/source/user/setting-up.rst +++ /dev/null @@ -1,10 +0,0 @@ -********** -Setting up -********** - -.. toctree:: - :maxdepth: 1 - - whatisnumpy - install - troubleshooting-importerror diff --git a/doc/source/user/theory.broadcasting.rst b/doc/source/user/theory.broadcasting.rst index b37edeacc..a82d78e6c 100644 --- a/doc/source/user/theory.broadcasting.rst +++ b/doc/source/user/theory.broadcasting.rst @@ -69,7 +69,7 @@ numpy on Windows 2000 with one million element arrays. *Figure 1* *In the simplest example of broadcasting, the scalar ``b`` is - stretched to become an array of with the same shape as ``a`` so the shapes + stretched to become an array of same shape as ``a`` so the shapes are compatible for element-by-element multiplication.* diff --git a/doc/source/user/troubleshooting-importerror.rst b/doc/source/user/troubleshooting-importerror.rst index 7d4846f77..1f99491a1 100644 --- a/doc/source/user/troubleshooting-importerror.rst +++ b/doc/source/user/troubleshooting-importerror.rst @@ -1,3 +1,11 @@ +:orphan: + +.. Reason for orphan: This page is referenced by the installation + instructions, which have moved from Sphinx to https://numpy.org/install. + All install links in Sphinx now point there, leaving no Sphinx references + to this page. + + *************************** Troubleshooting ImportError *************************** @@ -69,7 +77,7 @@ or conda. Using Eclipse/PyDev with Anaconda/conda Python (or environments) ---------------------------------------------------------------- -Please see the +Please see the `Anaconda Documentation <https://docs.anaconda.com/anaconda/user-guide/tasks/integration/eclipse-pydev/>`_ on how to properly configure Eclipse/PyDev to use Anaconda Python with specific conda environments. diff --git a/doc/source/user/tutorial-ma.rst b/doc/source/user/tutorial-ma.rst index c28353371..88bad3cbe 100644 --- a/doc/source/user/tutorial-ma.rst +++ b/doc/source/user/tutorial-ma.rst @@ -9,7 +9,8 @@ Tutorial: Masked Arrays import numpy as np np.random.seed(1) -**Prerequisites** +Prerequisites +------------- Before reading this tutorial, you should know a bit of Python. If you would like to refresh your memory, take a look at the @@ -18,13 +19,15 @@ would like to refresh your memory, take a look at the If you want to be able to run the examples in this tutorial, you should also have `matplotlib <https://matplotlib.org/>`_ installed on your computer. -**Learner profile** +Learner profile +--------------- This tutorial is for people who have a basic understanding of NumPy and want to understand how masked arrays and the :mod:`numpy.ma` module can be used in practice. -**Learning Objectives** +Learning Objectives +------------------- After this tutorial, you should be able to: @@ -33,7 +36,8 @@ After this tutorial, you should be able to: - Decide when the use of masked arrays is appropriate in some of your applications -**What are masked arrays?** +What are masked arrays? +----------------------- Consider the following problem. You have a dataset with missing or invalid entries. If you're doing any kind of processing on this data, and want to @@ -63,7 +67,8 @@ combination of: - A ``fill_value``, a value that may be used to replace the invalid entries in order to return a standard :class:`numpy.ndarray`. -**When can they be useful?** +When can they be useful? +------------------------ There are a few situations where masked arrays can be more useful than just eliminating the invalid entries of an array: @@ -84,7 +89,8 @@ comes with a specific implementation of most :term:`NumPy universal functions functions and operations on masked data. The output is then a masked array. We'll see some examples of how this works in practice below. -**Using masked arrays to see COVID-19 data** +Using masked arrays to see COVID-19 data +---------------------------------------- From `Kaggle <https://www.kaggle.com/atilamadai/covid19>`_ it is possible to download a dataset with initial data about the COVID-19 outbreak in the @@ -149,7 +155,8 @@ can read more about the :func:`numpy.genfromtxt` function from the :func:`Reference Documentation <numpy.genfromtxt>` or from the :doc:`Basic IO tutorial <basics.io.genfromtxt>`. -**Exploring the data** +Exploring the data +------------------ First of all, we can plot the whole set of data we have and see what it looks like. In order to get a readable plot, we select only a few of the dates to @@ -194,7 +201,8 @@ the :func:`numpy.sum` function to sum all the selected rows (``axis=0``): Something's wrong with this data - we are not supposed to have negative values in a cumulative data set. What's going on? -**Missing data** +Missing data +------------ Looking at the data, here's what we find: there is a period with **missing data**: @@ -308,7 +316,8 @@ Mainland China: It's clear that masked arrays are the right solution here. We cannot represent the missing data without mischaracterizing the evolution of the curve. -**Fitting Data** +Fitting Data +------------ One possibility we can think of is to interpolate the missing data to estimate the number of cases in late January. Observe that we can select the masked @@ -367,7 +376,8 @@ after the beginning of the records: plt.title("COVID-19 cumulative cases from Jan 21 to Feb 3 2020 - Mainland China\n" "Cubic estimate for 7 days after start"); -**More reading** +More reading +------------ Topics not covered in this tutorial can be found in the documentation: diff --git a/doc/source/user/tutorial-svd.rst b/doc/source/user/tutorial-svd.rst index 086e0a6de..fd9e366e0 100644 --- a/doc/source/user/tutorial-svd.rst +++ b/doc/source/user/tutorial-svd.rst @@ -9,7 +9,8 @@ Tutorial: Linear algebra on n-dimensional arrays import numpy as np np.random.seed(1) -**Prerequisites** +Prerequisites +------------- Before reading this tutorial, you should know a bit of Python. If you would like to refresh your memory, take a look at the @@ -19,7 +20,8 @@ If you want to be able to run the examples in this tutorial, you should also have `matplotlib <https://matplotlib.org/>`_ and `SciPy <https://scipy.org>`_ installed on your computer. -**Learner profile** +Learner profile +--------------- This tutorial is for people who have a basic understanding of linear algebra and arrays in NumPy and want to understand how n-dimensional @@ -28,7 +30,8 @@ you don't know how to apply common functions to n-dimensional arrays (without using for-loops), or if you want to understand axis and shape properties for n-dimensional arrays, this tutorial might be of help. -**Learning Objectives** +Learning Objectives +------------------- After this tutorial, you should be able to: @@ -38,7 +41,8 @@ After this tutorial, you should be able to: arrays without using for-loops; - Understand axis and shape properties for n-dimensional arrays. -**Content** +Content +------- In this tutorial, we will use a `matrix decomposition <https://en.wikipedia.org/wiki/Matrix_decomposition>`_ from linear algebra, the @@ -78,7 +82,8 @@ We can see the image using the `matplotlib.pyplot.imshow` function:: If you are executing the commands above in the IPython shell, it might be necessary to use the command ``plt.show()`` to show the image window. -**Shape, axis and array properties** +Shape, axis and array properties +-------------------------------- Note that, in linear algebra, the dimension of a vector refers to the number of entries in an array. In NumPy, it instead defines the number of axes. For @@ -162,7 +167,8 @@ syntax:: >>> green_array = img_array[:, :, 1] >>> blue_array = img_array[:, :, 2] -**Operations on an axis** +Operations on an axis +--------------------- It is possible to use methods from linear algebra to approximate an existing set of data. Here, we will use the `SVD (Singular Value Decomposition) @@ -290,7 +296,8 @@ diagonal and with the appropriate dimensions for multiplying: in our case, Now, we want to check if the reconstructed ``U @ Sigma @ Vt`` is close to the original ``img_gray`` matrix. -**Approximation** +Approximation +------------- The `linalg` module includes a ``norm`` function, which computes the norm of a vector or matrix represented in a NumPy array. For @@ -360,7 +367,8 @@ Now, you can go ahead and repeat this experiment with other values of `k`, and each of your experiments should give you a slightly better (or worse) image depending on the value you choose. -**Applying to all colors** +Applying to all colors +---------------------- Now we want to do the same kind of operation, but to all three colors. Our first instinct might be to repeat the same operation we did above to each color @@ -411,7 +419,8 @@ matrices into the approximation. Now, note that To build the final approximation matrix, we must understand how multiplication across different axes works. -**Products with n-dimensional arrays** +Products with n-dimensional arrays +---------------------------------- If you have worked before with only one- or two-dimensional arrays in NumPy, you might use `numpy.dot` and `numpy.matmul` (or the ``@`` operator) @@ -495,7 +504,8 @@ Even though the image is not as sharp, using a small number of ``k`` singular values (compared to the original set of 768 values), we can recover many of the distinguishing features from this image. -**Final words** +Final words +----------- Of course, this is not the best method to *approximate* an image. However, there is, in fact, a result in linear algebra that says that the @@ -504,7 +514,8 @@ terms of the norm of the difference. For more information, see *G. H. Golub and C. F. Van Loan, Matrix Computations, Baltimore, MD, Johns Hopkins University Press, 1985*. -**Further reading** +Further reading +--------------- - :doc:`Python tutorial <python:tutorial/index>` - :ref:`reference` diff --git a/doc/source/user/tutorials_index.rst b/doc/source/user/tutorials_index.rst index 5e9419f96..20e2c256c 100644 --- a/doc/source/user/tutorials_index.rst +++ b/doc/source/user/tutorials_index.rst @@ -11,10 +11,6 @@ classes contained in the package, see the :ref:`API reference <reference>`. .. toctree:: :maxdepth: 1 - basics - misc - numpy-for-matlab-users tutorial-svd tutorial-ma - building - c-info + diff --git a/doc/source/user/whatisnumpy.rst b/doc/source/user/whatisnumpy.rst index 8478a77c4..154f91c84 100644 --- a/doc/source/user/whatisnumpy.rst +++ b/doc/source/user/whatisnumpy.rst @@ -125,7 +125,7 @@ same shape, or a scalar and an array, or even two arrays of with different shapes, provided that the smaller array is "expandable" to the shape of the larger in such a way that the resulting broadcast is unambiguous. For detailed "rules" of broadcasting see -`numpy.doc.broadcasting`. +`basics.broadcasting`. Who Else Uses NumPy? -------------------- |