diff options
author | Sebastian Berg <sebastian@sipsolutions.net> | 2013-11-24 22:35:34 +0100 |
---|---|---|
committer | Sebastian Berg <sebastian@sipsolutions.net> | 2014-05-29 14:09:37 +0200 |
commit | f731d17e57fb36523ede1265b2f48c0d33c3254e (patch) | |
tree | 0b24d7ad41b7d30f683c82d2be57a06a29d21d39 | |
parent | f57c77b88a735d5f49a407881777ff2e9f3b1be2 (diff) | |
download | numpy-f731d17e57fb36523ede1265b2f48c0d33c3254e.tar.gz |
DOC: Rework the advanced indexing documentation.
Mostly makes the advanced indexing doc much more example based,
and prominently mentions the np.ix_ function.
Some subtleties (some of which are new) are also mentioned.
-rw-r--r-- | doc/source/reference/arrays.indexing.rst | 386 |
1 files changed, 279 insertions, 107 deletions
diff --git a/doc/source/reference/arrays.indexing.rst b/doc/source/reference/arrays.indexing.rst index e759b6ff8..d7715b0f1 100644 --- a/doc/source/reference/arrays.indexing.rst +++ b/doc/source/reference/arrays.indexing.rst @@ -21,8 +21,8 @@ slicing, advanced indexing. Which one occurs depends on *obj*. for the former. -Basic Slicing -------------- +Basic Slicing and Indexing +-------------------------- Basic slicing extends Python's basic concept of slicing to N dimensions. Basic slicing occurs when *obj* is a :class:`slice` object @@ -108,8 +108,8 @@ concepts to remember include: [6]]]) - :const:`Ellipsis` expand to the number of ``:`` objects needed to - make a selection tuple of the same length as ``x.ndim``. Only the - first ellipsis is expanded, any others are interpreted as ``:``. + make a selection tuple of the same length as ``x.ndim``. There may + only be a single ellipsis present. .. admonition:: Example @@ -148,7 +148,7 @@ concepts to remember include: ``x[ind1,...,ind2,:]`` acts like ``x[ind1][...,ind2,:]`` under basic slicing. - .. warning:: The above is **not** true for advanced slicing. + .. warning:: The above is **not** true for advanced indexing. - You may use slicing to set values in the array, but (unlike lists) you can never grow the array. The size of the value to be set in @@ -175,7 +175,7 @@ concepts to remember include: :const:`newaxis`. -Advanced indexing +Advanced Indexing ----------------- Advanced indexing is triggered when the selection object, *obj*, is a @@ -187,139 +187,311 @@ and Boolean. Advanced indexing always returns a *copy* of the data (contrast with basic slicing that returns a :term:`view`). -Integer -^^^^^^^ +.. warning:: -Integer indexing allows selection of arbitrary items in the array -based on their *N*-dimensional index. This kind of selection occurs -when advanced indexing is triggered and the selection object is not -an array of data type bool. For the discussion below, when the -selection object is not a tuple, it will be referred to as if it had -been promoted to a 1-tuple, which will be called the selection -tuple. The rules of advanced integer-style indexing are: + The definition of advanced indexing means that ``x[(1,2,3),]`` is + fundamentally different than ``x[(1,2,3)]``. The latter is + equivalent to ``x[1,2,3]`` which will trigger basic selection while + the former will trigger advanced indexing. Be sure to understand + why this is occurs. -- If the length of the selection tuple is larger than *N* an error is raised. + Also recognize that ``x[[1,2,3]]`` will trigger advanced indexing, + whereas ``x[[1,2,slice(None)]]`` will trigger basic slicing. -- All sequences and scalars in the selection tuple are converted to - :class:`intp` indexing arrays. +Integer array indexing +^^^^^^^^^^^^^^^^^^^^^^ -- All selection tuple objects must be convertible to :class:`intp` - arrays, :class:`slice` objects, or the :const:`Ellipsis` object. +Integer array indexing allows selection of arbitrary items in the array +based on their *N*-dimensional index. Each integer array represents a number +of indexes into that dimension. -- The first :const:`Ellipsis` object will be expanded, and any other - :const:`Ellipsis` objects will be treated as full slice (``:``) - objects. The expanded :const:`Ellipsis` object is replaced with as - many full slice (``:``) objects as needed to make the length of the - selection tuple :math:`N`. +Purely integer array indexing +""""""""""""""""""""""""""""" -- If the selection tuple is smaller than *N*, then as many ``:`` - objects as needed are added to the end of the selection tuple so - that the modified selection tuple has length *N*. +When the index consists of as many integer arrays as the array being indexed +has dimensions, the indexing is straight forward, but different from slicing. -- All the integer indexing arrays must be :ref:`broadcastable - <arrays.broadcasting.broadcastable>` to the same shape. +Advanced indexes always are :ref:`broadcast<ufuncs.broadcasting>` and +iterated as *one*:: -- The shape of the output (or the needed shape of the object to be used - for setting) is the broadcasted shape. + result[i_1, ..., i_M] == x[ind_1[i_1, ..., i_M], ind_2[i_1, ..., i_M], + ..., ind_N[i_1, ..., i_M]] -- After expanding any ellipses and filling out any missing ``:`` - objects in the selection tuple, then let :math:`N_t` be the number - of indexing arrays, and let :math:`N_s = N - N_t` be the number of - slice objects. Note that :math:`N_t > 0` (or we wouldn't be doing - advanced integer indexing). +Note that the result shape is identical to the (broadcast) indexing array +shapes ``ind_1, ..., ind_N``. -- If :math:`N_s = 0` then the *M*-dimensional result is constructed by - varying the index tuple ``(i_1, ..., i_M)`` over the range - of the result shape and for each value of the index tuple - ``(ind_1, ..., ind_M)``:: +.. admonition:: Example - result[i_1, ..., i_M] == x[ind_1[i_1, ..., i_M], ind_2[i_1, ..., i_M], - ..., ind_N[i_1, ..., i_M]] + From each row, a specific element should be selected. The row index is just + ``[0, 1, 2]`` and the column index specifies the element to choose for the + corresponding row, here ``[0, 1, 0]``. Using both together the task + can be solved using advanced indexing: - .. admonition:: Example + >>> x = np.array([[1, 2], [3, 4], [5, 6]]) + >>> x[[0, 1, 2], [0, 1, 0]] + array([1, 4, 5]) - Suppose the shape of the broadcasted indexing arrays is 3-dimensional - and *N* is 2. Then the result is found by letting *i, j, k* run over - the shape found by broadcasting ``ind_1`` and ``ind_2``, and each - *i, j, k* yields:: - - result[i,j,k] = x[ind_1[i,j,k], ind_2[i,j,k]] - -- If :math:`N_s > 0`, then partial indexing is done. This can be - somewhat mind-boggling to understand, but if you think in terms of - the shapes of the arrays involved, it can be easier to grasp what - happens. In simple cases (*i.e.* one indexing array and *N - 1* slice - objects) it does exactly what you would expect (concatenation of - repeated application of basic slicing). The rule for partial - indexing is that the shape of the result (or the interpreted shape - of the object to be used in setting) is the shape of *x* with the - indexed subspace replaced with the broadcasted indexing subspace. If - the index subspaces are right next to each other, then the - broadcasted indexing space directly replaces all of the indexed - subspaces in *x*. If the indexing subspaces are separated (by slice - objects), then the broadcasted indexing space is first, followed by - the sliced subspace of *x*. +To achieve a behaviour similar to the basic slicing above, broadcasting can be +used. The function :func:`ix_` can help with this broadcasting. This is best +understood with an example. - .. admonition:: Example +.. admonition:: Example - Suppose ``x.shape`` is (10,20,30) and ``ind`` is a (2,3,4)-shaped - indexing :class:`intp` array, then ``result = x[...,ind,:]`` has - shape (10,2,3,4,30) because the (20,)-shaped subspace has been - replaced with a (2,3,4)-shaped broadcasted indexing subspace. If - we let *i, j, k* loop over the (2,3,4)-shaped subspace then - ``result[...,i,j,k,:] = x[...,ind[i,j,k],:]``. This example - produces the same result as :meth:`x.take(ind, axis=-2) <ndarray.take>`. + From a 4x3 array the corner elements should be selected using advanced + indexing. Thus all elements for which the column is one of ``[0, 2]`` and + the row is one of ``[0, 3]`` need to be selected. To use advanced indexing + one needs to select all elements *explicitly*. Using the method explained + previously one could write: + + >>> x = array([[ 0, 1, 2], + ... [ 3, 4, 5], + ... [ 6, 7, 8], + ... [ 9, 10, 11]]) + >>> rows = np.array([[0, 0], + ... [3, 3]], dtype=np.intp) + >>> columns = np.array([[0, 2], + ... [0, 2]], dtype=np.intp) + >>> x[rows, columns] + array([[ 0, 2], + [ 9, 11]]) + + However, since the indexing arrays above just repeat themselves, + broadcasting can be used (compare operations such as + ``rows[:, np.newaxis] + columns``) to simplify this: + + >>> rows = np.array([0, 3], dtype=np.intp) + >>> columns = np.array([0, 2], dtype=np.intp) + >>> rows[:, np.newaxis] + array([[0], + [3]]) + >>> x[rows[:, np.newaxis], columns] + array([[ 0, 2], + [ 9, 11]]) + + This broadcasting can also be achieved using the function :func:`ix_`: + + >>> x[np.ix_(rows, columns)] + array([[ 0, 2], + [ 9, 11]]) + + Note that without the ``np.ix_`` call, only the diagonal elements would + be selected, as was used in the previous example. This difference is the + most important thing to remember about indexing with multiple advanced + indexes. + +Combining advanced and basic indexing +""""""""""""""""""""""""""""""""""""" + +When there is at least one slice (``:``), ellipsis (``...``) or ``np.newaxis`` +in the index (or the array has more dimensions than there are advanced indexes), +then the behaviour can be more complicated. It is like concatenating the +indexing result for each advanced index element + +In the simplest case, there is only a *single* advanced index. A single +advanced index can for example replace a slice and the result array will be +the same, however, it is a copy and may have a different memory layout. +A slice is preferable when it is possible. - .. admonition:: Example +.. admonition:: Example + + >>> x[1:2, 1:3] + array([[4, 5]]) + >>> x[1:2, [1, 2]] + array([[4, 5]]) + +The easiest way to understand the situation may be to think in +terms of the result shape. There are two parts to the indexing operation, +the subspace defined by the basic indexing (excluding integers) and the +subspace from the advanced indexing part. Two cases of index combination +need to be distinguished: + +* The advanced indexes are separated by a slice, ellipsis or newaxis. + For example ``x[arr1, :, arr2]``. +* The advanced indexes are all next to each other. + For example ``x[..., arr1, arr2, :]`` but *not* ``x[arr1, :, 1]`` + since ``1`` is an advanced index in this regard. + +In the first case, the dimensions resulting from the advanced indexing +operation come first in the result array, and the subspace dimensions after +that. +In the second case, the dimensions from the advanced indexing operations +are inserted into the result array at the same spot as they were in the +initial array (the latter logic is what makes simple advanced indexing +behave just like slicing). + +.. admonition:: Example + + Suppose ``x.shape`` is (10,20,30) and ``ind`` is a (2,3,4)-shaped + indexing :class:`intp` array, then ``result = x[...,ind,:]`` has + shape (10,2,3,4,30) because the (20,)-shaped subspace has been + replaced with a (2,3,4)-shaped broadcasted indexing subspace. If + we let *i, j, k* loop over the (2,3,4)-shaped subspace then + ``result[...,i,j,k,:] = x[...,ind[i,j,k],:]``. This example + produces the same result as :meth:`x.take(ind, axis=-2) <ndarray.take>`. + +.. admonition:: Example - Now let ``x.shape`` be (10,20,30,40,50) and suppose ``ind_1`` - and ``ind_2`` are broadcastable to the shape (2,3,4). Then - ``x[:,ind_1,ind_2]`` has shape (10,2,3,4,40,50) because the - (20,30)-shaped subspace from X has been replaced with the - (2,3,4) subspace from the indices. However, - ``x[:,ind_1,:,ind_2]`` has shape (2,3,4,10,30,50) because there - is no unambiguous place to drop in the indexing subspace, thus - it is tacked-on to the beginning. It is always possible to use - :meth:`.transpose() <ndarray.transpose>` to move the subspace - anywhere desired. (Note that this example cannot be replicated - using :func:`take`.) + Let ``x.shape`` be (10,20,30,40,50) and suppose ``ind_1`` + and ``ind_2`` can be broadcast to the shape (2,3,4). Then + ``x[:,ind_1,ind_2]`` has shape (10,2,3,4,40,50) because the + (20,30)-shaped subspace from X has been replaced with the + (2,3,4) subspace from the indices. However, + ``x[:,ind_1,:,ind_2]`` has shape (2,3,4,10,30,50) because there + is no unambiguous place to drop in the indexing subspace, thus + it is tacked-on to the beginning. It is always possible to use + :meth:`.transpose() <ndarray.transpose>` to move the subspace + anywhere desired. Note that this example cannot be replicated + using :func:`take`. -Boolean -^^^^^^^ +Boolean array indexing +^^^^^^^^^^^^^^^^^^^^^^ This advanced indexing occurs when obj is an array object of Boolean -type (such as may be returned from comparison operators). It is always -equivalent to (but faster than) ``x[obj.nonzero()]`` where, as -described above, :meth:`obj.nonzero() <ndarray.nonzero>` returns a +type, such as may be returned from comparison operators. A single +boolean index array is practically identical to ``x[obj.nonzero()]`` where, +as described above, :meth:`obj.nonzero() <ndarray.nonzero>` returns a tuple (of length :attr:`obj.ndim <ndarray.ndim>`) of integer index -arrays showing the :const:`True` elements of *obj*. +arrays showing the :const:`True` elements of *obj*. However, it is +faster when ``obj.shape == x.shape``. -The special case when ``obj.ndim == x.ndim`` is worth mentioning. In -this case ``x[obj]`` returns a 1-dimensional array filled with the -elements of *x* corresponding to the :const:`True` values of *obj*. +If ``obj.ndim == x.ndim``, ``x[obj]`` returns a 1-dimensional array +filled with the elements of *x* corresponding to the :const:`True` +values of *obj*. The search order will be C-style (last index varies the fastest). If *obj* has :const:`True` values at entries that are outside of the -bounds of *x*, then an index error will be raised. +bounds of *x*, then an index error will be raised. If *obj* is smaller +than *x* it is identical to filling it with :const:`False`. -You can also use Boolean arrays as element of the selection tuple. In -such instances, they will always be interpreted as :meth:`nonzero(obj) -<ndarray.nonzero>` and the equivalent integer indexing will be -done. +.. admonition:: Example -.. warning:: + A common use case for this is filtering for desired element values. + For example one may wish to select all entries from an array which + are not NaN: - The definition of advanced indexing means that ``x[(1,2,3),]`` is - fundamentally different than ``x[(1,2,3)]``. The latter is - equivalent to ``x[1,2,3]`` which will trigger basic selection while - the former will trigger advanced indexing. Be sure to understand - why this is occurs. + >>> x = np.array([[1., 2.], [np.nan, 3.], [np.nan, np.nan]]) + >>> x[~np.isnan(x)] + array([ 1., 2., 3.]) - Also recognize that ``x[[1,2,3]]`` will trigger advanced indexing, - whereas ``x[[1,2,slice(None)]]`` will trigger basic slicing. + Or wish to add a constant to all negative elements: + + >>> x = np.array([1., -1., -2., 3]) + >>> x[x < 0] += 20 + >>> x + array([ 1., 19., 18., 3.]) + +In general if an index includes a Boolean array, the result will be +identical to inserting ``obj.nonzero()`` into the same position +and using the integer array indexing mechanism described above. +``x[ind_1, boolean_array, ind_2]`` is equivalent to +``x[(ind_1,) + boolean_array.nonzero() + (ind_2,)]``. + +If there is only one Boolean array and no integer indexing array present, +this is straight forward. Care must only be taken to make sure that the +boolean index has *exactly* as many dimensions as it is supposed to work +with. + +.. admonition:: Example + + From an array, select all rows which sum up to less or equal two: + + >>> x = np.array([[0, 1], [1, 1], [2, 2]]) + >>> rowsum = x.sum(-1) + >>> x[rowsum <= 2, :] + array([[0, 1], + [1, 1]]) + + But if ``rowsum`` would have two dimensions as well: + + >>> rowsum = x.sum(-1, keepdims=True) + >>> rowsum.shape + (3, 1) + >>> x[rowsum <= 2, :] # fails + IndexError: too many indices + >>> x[rowsum <= 2] + array([0, 1]) + + The last one giving only the first elements because of the extra dimension. + Compare ``rowsum.nonzero()`` to understand this example. + +Combining multiple Boolean indexing arrays or a Boolean with an integer +indexing array can best be understood with the +:meth:`obj.nonzero() <ndarray.nonzero>` analogy. The function :func:`ix_` +also supports boolean arrays and will work without any surprises. + +.. admonition:: Example + + Use boolean indexing to select all rows adding up to an even + number. At the same time columns 0 and 2 should be selected with an + advanced integer index. Using the :func:`ix_` function this can be done + with: + + >>> x = array([[ 0, 1, 2], + ... [ 3, 4, 5], + ... [ 6, 7, 8], + ... [ 9, 10, 11]]) + >>> rows = (x.sum(-1) % 2) == 0 + >>> rows + array([False, True, False, True], dtype=bool) + >>> columns = [0, 2] + >>> x[np.ix_(rows, columns)] + array([[ 3, 5], + [ 9, 11]]) + + Without the ``np.ix_`` call or only the diagonal elements would be + selected. + + Or without ``np.ix_`` (compare the integer array examples): + + >>> rows = rows.nonzero()[0] + >>> x[rows[:, np.newaxis], columns] + array([[ 3, 5], + [ 9, 11]]) + +Detailed notes +-------------- + +These are some detailed notes, which are not of importance for day to day +indexing (in no particular order): + +* The native NumPy indexing type is ``intp`` and may differ from the + default integer array type. ``intp`` is the smallest data type + sufficient to safely index any array; for advanced indexing it may be + faster than other types. +* For advanced assignments, there is in general no guarantee for the + iteration order. This means that if an element is set more than once, + it is not possible to predict the final result. +* An empty (tuple) index is a full scalar index into a zero dimensional array. + ``x[()]`` returns a *scalar* if ``x`` is zero dimensional and a view + otherwise. On the other hand ``x[...]`` always returns a view. +* If a zero dimensional array is present in the index *and* it is a full + integer index the result will be a *scalar* and not a zero dimensional array. + (Advanced indexing is not triggered.) +* When an ellipsis (``...``) is present but has no size (i.e. replaces zero + ``:``) the result will still always be an array. A view if no advanced index + is present, otherwise a copy. +* the ``nonzero`` equivalence for Boolean arrays does not hold for zero + dimensional boolean arrays. +* When the result of an advanced indexing operation has no elements but an + individual index is out of bounds, whether or not an ``IndexError`` is + raised is undefined (e.g. ``x[[], [123]]`` with ``123`` being out of bounds). +* When a *casting* error occurs during assignment (for example updating a + numerical array using a sequence of strings), the array being assigned + to may end up in an unpredictable partially updated state. + However, if any other error (such as an out of bounds index) occurs, the + array will remain unchanged. +* The memory layout of an advanced indexing result is optimized for each + indexing operation and no particular memory order can be assumed. +* When using a subclass (especially one which manipulates its shape), the + default ``ndarray.__setitem__`` behaviour will call ``__getitem__`` for + *basic* indexing but not for *advanced* indexing. For such a subclass it may + be preferable to call ``ndarray.__setitem__`` with a *base class* ndarray + view on the data. This *must* be done if the subclasses ``__getitem__`` does + not return views. .. _arrays.indexing.rec: + Record Access ------------- |