summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSebastian Berg <sebastian@sipsolutions.net>2013-11-24 22:35:34 +0100
committerSebastian Berg <sebastian@sipsolutions.net>2014-05-29 14:09:37 +0200
commitf731d17e57fb36523ede1265b2f48c0d33c3254e (patch)
tree0b24d7ad41b7d30f683c82d2be57a06a29d21d39
parentf57c77b88a735d5f49a407881777ff2e9f3b1be2 (diff)
downloadnumpy-f731d17e57fb36523ede1265b2f48c0d33c3254e.tar.gz
DOC: Rework the advanced indexing documentation.
Mostly makes the advanced indexing doc much more example based, and prominently mentions the np.ix_ function. Some subtleties (some of which are new) are also mentioned.
-rw-r--r--doc/source/reference/arrays.indexing.rst386
1 files changed, 279 insertions, 107 deletions
diff --git a/doc/source/reference/arrays.indexing.rst b/doc/source/reference/arrays.indexing.rst
index e759b6ff8..d7715b0f1 100644
--- a/doc/source/reference/arrays.indexing.rst
+++ b/doc/source/reference/arrays.indexing.rst
@@ -21,8 +21,8 @@ slicing, advanced indexing. Which one occurs depends on *obj*.
for the former.
-Basic Slicing
--------------
+Basic Slicing and Indexing
+--------------------------
Basic slicing extends Python's basic concept of slicing to N
dimensions. Basic slicing occurs when *obj* is a :class:`slice` object
@@ -108,8 +108,8 @@ concepts to remember include:
[6]]])
- :const:`Ellipsis` expand to the number of ``:`` objects needed to
- make a selection tuple of the same length as ``x.ndim``. Only the
- first ellipsis is expanded, any others are interpreted as ``:``.
+ make a selection tuple of the same length as ``x.ndim``. There may
+ only be a single ellipsis present.
.. admonition:: Example
@@ -148,7 +148,7 @@ concepts to remember include:
``x[ind1,...,ind2,:]`` acts like ``x[ind1][...,ind2,:]`` under basic
slicing.
- .. warning:: The above is **not** true for advanced slicing.
+ .. warning:: The above is **not** true for advanced indexing.
- You may use slicing to set values in the array, but (unlike lists) you
can never grow the array. The size of the value to be set in
@@ -175,7 +175,7 @@ concepts to remember include:
:const:`newaxis`.
-Advanced indexing
+Advanced Indexing
-----------------
Advanced indexing is triggered when the selection object, *obj*, is a
@@ -187,139 +187,311 @@ and Boolean.
Advanced indexing always returns a *copy* of the data (contrast with
basic slicing that returns a :term:`view`).
-Integer
-^^^^^^^
+.. warning::
-Integer indexing allows selection of arbitrary items in the array
-based on their *N*-dimensional index. This kind of selection occurs
-when advanced indexing is triggered and the selection object is not
-an array of data type bool. For the discussion below, when the
-selection object is not a tuple, it will be referred to as if it had
-been promoted to a 1-tuple, which will be called the selection
-tuple. The rules of advanced integer-style indexing are:
+ The definition of advanced indexing means that ``x[(1,2,3),]`` is
+ fundamentally different than ``x[(1,2,3)]``. The latter is
+ equivalent to ``x[1,2,3]`` which will trigger basic selection while
+ the former will trigger advanced indexing. Be sure to understand
+ why this is occurs.
-- If the length of the selection tuple is larger than *N* an error is raised.
+ Also recognize that ``x[[1,2,3]]`` will trigger advanced indexing,
+ whereas ``x[[1,2,slice(None)]]`` will trigger basic slicing.
-- All sequences and scalars in the selection tuple are converted to
- :class:`intp` indexing arrays.
+Integer array indexing
+^^^^^^^^^^^^^^^^^^^^^^
-- All selection tuple objects must be convertible to :class:`intp`
- arrays, :class:`slice` objects, or the :const:`Ellipsis` object.
+Integer array indexing allows selection of arbitrary items in the array
+based on their *N*-dimensional index. Each integer array represents a number
+of indexes into that dimension.
-- The first :const:`Ellipsis` object will be expanded, and any other
- :const:`Ellipsis` objects will be treated as full slice (``:``)
- objects. The expanded :const:`Ellipsis` object is replaced with as
- many full slice (``:``) objects as needed to make the length of the
- selection tuple :math:`N`.
+Purely integer array indexing
+"""""""""""""""""""""""""""""
-- If the selection tuple is smaller than *N*, then as many ``:``
- objects as needed are added to the end of the selection tuple so
- that the modified selection tuple has length *N*.
+When the index consists of as many integer arrays as the array being indexed
+has dimensions, the indexing is straight forward, but different from slicing.
-- All the integer indexing arrays must be :ref:`broadcastable
- <arrays.broadcasting.broadcastable>` to the same shape.
+Advanced indexes always are :ref:`broadcast<ufuncs.broadcasting>` and
+iterated as *one*::
-- The shape of the output (or the needed shape of the object to be used
- for setting) is the broadcasted shape.
+ result[i_1, ..., i_M] == x[ind_1[i_1, ..., i_M], ind_2[i_1, ..., i_M],
+ ..., ind_N[i_1, ..., i_M]]
-- After expanding any ellipses and filling out any missing ``:``
- objects in the selection tuple, then let :math:`N_t` be the number
- of indexing arrays, and let :math:`N_s = N - N_t` be the number of
- slice objects. Note that :math:`N_t > 0` (or we wouldn't be doing
- advanced integer indexing).
+Note that the result shape is identical to the (broadcast) indexing array
+shapes ``ind_1, ..., ind_N``.
-- If :math:`N_s = 0` then the *M*-dimensional result is constructed by
- varying the index tuple ``(i_1, ..., i_M)`` over the range
- of the result shape and for each value of the index tuple
- ``(ind_1, ..., ind_M)``::
+.. admonition:: Example
- result[i_1, ..., i_M] == x[ind_1[i_1, ..., i_M], ind_2[i_1, ..., i_M],
- ..., ind_N[i_1, ..., i_M]]
+ From each row, a specific element should be selected. The row index is just
+ ``[0, 1, 2]`` and the column index specifies the element to choose for the
+ corresponding row, here ``[0, 1, 0]``. Using both together the task
+ can be solved using advanced indexing:
- .. admonition:: Example
+ >>> x = np.array([[1, 2], [3, 4], [5, 6]])
+ >>> x[[0, 1, 2], [0, 1, 0]]
+ array([1, 4, 5])
- Suppose the shape of the broadcasted indexing arrays is 3-dimensional
- and *N* is 2. Then the result is found by letting *i, j, k* run over
- the shape found by broadcasting ``ind_1`` and ``ind_2``, and each
- *i, j, k* yields::
-
- result[i,j,k] = x[ind_1[i,j,k], ind_2[i,j,k]]
-
-- If :math:`N_s > 0`, then partial indexing is done. This can be
- somewhat mind-boggling to understand, but if you think in terms of
- the shapes of the arrays involved, it can be easier to grasp what
- happens. In simple cases (*i.e.* one indexing array and *N - 1* slice
- objects) it does exactly what you would expect (concatenation of
- repeated application of basic slicing). The rule for partial
- indexing is that the shape of the result (or the interpreted shape
- of the object to be used in setting) is the shape of *x* with the
- indexed subspace replaced with the broadcasted indexing subspace. If
- the index subspaces are right next to each other, then the
- broadcasted indexing space directly replaces all of the indexed
- subspaces in *x*. If the indexing subspaces are separated (by slice
- objects), then the broadcasted indexing space is first, followed by
- the sliced subspace of *x*.
+To achieve a behaviour similar to the basic slicing above, broadcasting can be
+used. The function :func:`ix_` can help with this broadcasting. This is best
+understood with an example.
- .. admonition:: Example
+.. admonition:: Example
- Suppose ``x.shape`` is (10,20,30) and ``ind`` is a (2,3,4)-shaped
- indexing :class:`intp` array, then ``result = x[...,ind,:]`` has
- shape (10,2,3,4,30) because the (20,)-shaped subspace has been
- replaced with a (2,3,4)-shaped broadcasted indexing subspace. If
- we let *i, j, k* loop over the (2,3,4)-shaped subspace then
- ``result[...,i,j,k,:] = x[...,ind[i,j,k],:]``. This example
- produces the same result as :meth:`x.take(ind, axis=-2) <ndarray.take>`.
+ From a 4x3 array the corner elements should be selected using advanced
+ indexing. Thus all elements for which the column is one of ``[0, 2]`` and
+ the row is one of ``[0, 3]`` need to be selected. To use advanced indexing
+ one needs to select all elements *explicitly*. Using the method explained
+ previously one could write:
+
+ >>> x = array([[ 0, 1, 2],
+ ... [ 3, 4, 5],
+ ... [ 6, 7, 8],
+ ... [ 9, 10, 11]])
+ >>> rows = np.array([[0, 0],
+ ... [3, 3]], dtype=np.intp)
+ >>> columns = np.array([[0, 2],
+ ... [0, 2]], dtype=np.intp)
+ >>> x[rows, columns]
+ array([[ 0, 2],
+ [ 9, 11]])
+
+ However, since the indexing arrays above just repeat themselves,
+ broadcasting can be used (compare operations such as
+ ``rows[:, np.newaxis] + columns``) to simplify this:
+
+ >>> rows = np.array([0, 3], dtype=np.intp)
+ >>> columns = np.array([0, 2], dtype=np.intp)
+ >>> rows[:, np.newaxis]
+ array([[0],
+ [3]])
+ >>> x[rows[:, np.newaxis], columns]
+ array([[ 0, 2],
+ [ 9, 11]])
+
+ This broadcasting can also be achieved using the function :func:`ix_`:
+
+ >>> x[np.ix_(rows, columns)]
+ array([[ 0, 2],
+ [ 9, 11]])
+
+ Note that without the ``np.ix_`` call, only the diagonal elements would
+ be selected, as was used in the previous example. This difference is the
+ most important thing to remember about indexing with multiple advanced
+ indexes.
+
+Combining advanced and basic indexing
+"""""""""""""""""""""""""""""""""""""
+
+When there is at least one slice (``:``), ellipsis (``...``) or ``np.newaxis``
+in the index (or the array has more dimensions than there are advanced indexes),
+then the behaviour can be more complicated. It is like concatenating the
+indexing result for each advanced index element
+
+In the simplest case, there is only a *single* advanced index. A single
+advanced index can for example replace a slice and the result array will be
+the same, however, it is a copy and may have a different memory layout.
+A slice is preferable when it is possible.
- .. admonition:: Example
+.. admonition:: Example
+
+ >>> x[1:2, 1:3]
+ array([[4, 5]])
+ >>> x[1:2, [1, 2]]
+ array([[4, 5]])
+
+The easiest way to understand the situation may be to think in
+terms of the result shape. There are two parts to the indexing operation,
+the subspace defined by the basic indexing (excluding integers) and the
+subspace from the advanced indexing part. Two cases of index combination
+need to be distinguished:
+
+* The advanced indexes are separated by a slice, ellipsis or newaxis.
+ For example ``x[arr1, :, arr2]``.
+* The advanced indexes are all next to each other.
+ For example ``x[..., arr1, arr2, :]`` but *not* ``x[arr1, :, 1]``
+ since ``1`` is an advanced index in this regard.
+
+In the first case, the dimensions resulting from the advanced indexing
+operation come first in the result array, and the subspace dimensions after
+that.
+In the second case, the dimensions from the advanced indexing operations
+are inserted into the result array at the same spot as they were in the
+initial array (the latter logic is what makes simple advanced indexing
+behave just like slicing).
+
+.. admonition:: Example
+
+ Suppose ``x.shape`` is (10,20,30) and ``ind`` is a (2,3,4)-shaped
+ indexing :class:`intp` array, then ``result = x[...,ind,:]`` has
+ shape (10,2,3,4,30) because the (20,)-shaped subspace has been
+ replaced with a (2,3,4)-shaped broadcasted indexing subspace. If
+ we let *i, j, k* loop over the (2,3,4)-shaped subspace then
+ ``result[...,i,j,k,:] = x[...,ind[i,j,k],:]``. This example
+ produces the same result as :meth:`x.take(ind, axis=-2) <ndarray.take>`.
+
+.. admonition:: Example
- Now let ``x.shape`` be (10,20,30,40,50) and suppose ``ind_1``
- and ``ind_2`` are broadcastable to the shape (2,3,4). Then
- ``x[:,ind_1,ind_2]`` has shape (10,2,3,4,40,50) because the
- (20,30)-shaped subspace from X has been replaced with the
- (2,3,4) subspace from the indices. However,
- ``x[:,ind_1,:,ind_2]`` has shape (2,3,4,10,30,50) because there
- is no unambiguous place to drop in the indexing subspace, thus
- it is tacked-on to the beginning. It is always possible to use
- :meth:`.transpose() <ndarray.transpose>` to move the subspace
- anywhere desired. (Note that this example cannot be replicated
- using :func:`take`.)
+ Let ``x.shape`` be (10,20,30,40,50) and suppose ``ind_1``
+ and ``ind_2`` can be broadcast to the shape (2,3,4). Then
+ ``x[:,ind_1,ind_2]`` has shape (10,2,3,4,40,50) because the
+ (20,30)-shaped subspace from X has been replaced with the
+ (2,3,4) subspace from the indices. However,
+ ``x[:,ind_1,:,ind_2]`` has shape (2,3,4,10,30,50) because there
+ is no unambiguous place to drop in the indexing subspace, thus
+ it is tacked-on to the beginning. It is always possible to use
+ :meth:`.transpose() <ndarray.transpose>` to move the subspace
+ anywhere desired. Note that this example cannot be replicated
+ using :func:`take`.
-Boolean
-^^^^^^^
+Boolean array indexing
+^^^^^^^^^^^^^^^^^^^^^^
This advanced indexing occurs when obj is an array object of Boolean
-type (such as may be returned from comparison operators). It is always
-equivalent to (but faster than) ``x[obj.nonzero()]`` where, as
-described above, :meth:`obj.nonzero() <ndarray.nonzero>` returns a
+type, such as may be returned from comparison operators. A single
+boolean index array is practically identical to ``x[obj.nonzero()]`` where,
+as described above, :meth:`obj.nonzero() <ndarray.nonzero>` returns a
tuple (of length :attr:`obj.ndim <ndarray.ndim>`) of integer index
-arrays showing the :const:`True` elements of *obj*.
+arrays showing the :const:`True` elements of *obj*. However, it is
+faster when ``obj.shape == x.shape``.
-The special case when ``obj.ndim == x.ndim`` is worth mentioning. In
-this case ``x[obj]`` returns a 1-dimensional array filled with the
-elements of *x* corresponding to the :const:`True` values of *obj*.
+If ``obj.ndim == x.ndim``, ``x[obj]`` returns a 1-dimensional array
+filled with the elements of *x* corresponding to the :const:`True`
+values of *obj*.
The search order will be C-style (last index varies the fastest). If
*obj* has :const:`True` values at entries that are outside of the
-bounds of *x*, then an index error will be raised.
+bounds of *x*, then an index error will be raised. If *obj* is smaller
+than *x* it is identical to filling it with :const:`False`.
-You can also use Boolean arrays as element of the selection tuple. In
-such instances, they will always be interpreted as :meth:`nonzero(obj)
-<ndarray.nonzero>` and the equivalent integer indexing will be
-done.
+.. admonition:: Example
-.. warning::
+ A common use case for this is filtering for desired element values.
+ For example one may wish to select all entries from an array which
+ are not NaN:
- The definition of advanced indexing means that ``x[(1,2,3),]`` is
- fundamentally different than ``x[(1,2,3)]``. The latter is
- equivalent to ``x[1,2,3]`` which will trigger basic selection while
- the former will trigger advanced indexing. Be sure to understand
- why this is occurs.
+ >>> x = np.array([[1., 2.], [np.nan, 3.], [np.nan, np.nan]])
+ >>> x[~np.isnan(x)]
+ array([ 1., 2., 3.])
- Also recognize that ``x[[1,2,3]]`` will trigger advanced indexing,
- whereas ``x[[1,2,slice(None)]]`` will trigger basic slicing.
+ Or wish to add a constant to all negative elements:
+
+ >>> x = np.array([1., -1., -2., 3])
+ >>> x[x < 0] += 20
+ >>> x
+ array([ 1., 19., 18., 3.])
+
+In general if an index includes a Boolean array, the result will be
+identical to inserting ``obj.nonzero()`` into the same position
+and using the integer array indexing mechanism described above.
+``x[ind_1, boolean_array, ind_2]`` is equivalent to
+``x[(ind_1,) + boolean_array.nonzero() + (ind_2,)]``.
+
+If there is only one Boolean array and no integer indexing array present,
+this is straight forward. Care must only be taken to make sure that the
+boolean index has *exactly* as many dimensions as it is supposed to work
+with.
+
+.. admonition:: Example
+
+ From an array, select all rows which sum up to less or equal two:
+
+ >>> x = np.array([[0, 1], [1, 1], [2, 2]])
+ >>> rowsum = x.sum(-1)
+ >>> x[rowsum <= 2, :]
+ array([[0, 1],
+ [1, 1]])
+
+ But if ``rowsum`` would have two dimensions as well:
+
+ >>> rowsum = x.sum(-1, keepdims=True)
+ >>> rowsum.shape
+ (3, 1)
+ >>> x[rowsum <= 2, :] # fails
+ IndexError: too many indices
+ >>> x[rowsum <= 2]
+ array([0, 1])
+
+ The last one giving only the first elements because of the extra dimension.
+ Compare ``rowsum.nonzero()`` to understand this example.
+
+Combining multiple Boolean indexing arrays or a Boolean with an integer
+indexing array can best be understood with the
+:meth:`obj.nonzero() <ndarray.nonzero>` analogy. The function :func:`ix_`
+also supports boolean arrays and will work without any surprises.
+
+.. admonition:: Example
+
+ Use boolean indexing to select all rows adding up to an even
+ number. At the same time columns 0 and 2 should be selected with an
+ advanced integer index. Using the :func:`ix_` function this can be done
+ with:
+
+ >>> x = array([[ 0, 1, 2],
+ ... [ 3, 4, 5],
+ ... [ 6, 7, 8],
+ ... [ 9, 10, 11]])
+ >>> rows = (x.sum(-1) % 2) == 0
+ >>> rows
+ array([False, True, False, True], dtype=bool)
+ >>> columns = [0, 2]
+ >>> x[np.ix_(rows, columns)]
+ array([[ 3, 5],
+ [ 9, 11]])
+
+ Without the ``np.ix_`` call or only the diagonal elements would be
+ selected.
+
+ Or without ``np.ix_`` (compare the integer array examples):
+
+ >>> rows = rows.nonzero()[0]
+ >>> x[rows[:, np.newaxis], columns]
+ array([[ 3, 5],
+ [ 9, 11]])
+
+Detailed notes
+--------------
+
+These are some detailed notes, which are not of importance for day to day
+indexing (in no particular order):
+
+* The native NumPy indexing type is ``intp`` and may differ from the
+ default integer array type. ``intp`` is the smallest data type
+ sufficient to safely index any array; for advanced indexing it may be
+ faster than other types.
+* For advanced assignments, there is in general no guarantee for the
+ iteration order. This means that if an element is set more than once,
+ it is not possible to predict the final result.
+* An empty (tuple) index is a full scalar index into a zero dimensional array.
+ ``x[()]`` returns a *scalar* if ``x`` is zero dimensional and a view
+ otherwise. On the other hand ``x[...]`` always returns a view.
+* If a zero dimensional array is present in the index *and* it is a full
+ integer index the result will be a *scalar* and not a zero dimensional array.
+ (Advanced indexing is not triggered.)
+* When an ellipsis (``...``) is present but has no size (i.e. replaces zero
+ ``:``) the result will still always be an array. A view if no advanced index
+ is present, otherwise a copy.
+* the ``nonzero`` equivalence for Boolean arrays does not hold for zero
+ dimensional boolean arrays.
+* When the result of an advanced indexing operation has no elements but an
+ individual index is out of bounds, whether or not an ``IndexError`` is
+ raised is undefined (e.g. ``x[[], [123]]`` with ``123`` being out of bounds).
+* When a *casting* error occurs during assignment (for example updating a
+ numerical array using a sequence of strings), the array being assigned
+ to may end up in an unpredictable partially updated state.
+ However, if any other error (such as an out of bounds index) occurs, the
+ array will remain unchanged.
+* The memory layout of an advanced indexing result is optimized for each
+ indexing operation and no particular memory order can be assumed.
+* When using a subclass (especially one which manipulates its shape), the
+ default ``ndarray.__setitem__`` behaviour will call ``__getitem__`` for
+ *basic* indexing but not for *advanced* indexing. For such a subclass it may
+ be preferable to call ``ndarray.__setitem__`` with a *base class* ndarray
+ view on the data. This *must* be done if the subclasses ``__getitem__`` does
+ not return views.
.. _arrays.indexing.rec:
+
Record Access
-------------