diff options
-rw-r--r-- | doc/source/reference/c-api.iterator.rst | 1101 | ||||
-rw-r--r-- | doc/source/reference/c-api.rst | 1 |
2 files changed, 1102 insertions, 0 deletions
diff --git a/doc/source/reference/c-api.iterator.rst b/doc/source/reference/c-api.iterator.rst new file mode 100644 index 000000000..adb6f6081 --- /dev/null +++ b/doc/source/reference/c-api.iterator.rst @@ -0,0 +1,1101 @@ +Array Iterator API +================== + +.. sectionauthor:: Mark Wiebe + +.. versionadded:: 1.6 + +Array Iterator +-------------- + +The array iterator encapsulates many of the key features in ufuncs, +allowing user code to support features like output parameters, +preservation of memory layouts, and buffering of data with the wrong +alignment or type, without requiring difficult coding. + +This page documents the API for the iterator. +The C-API naming convention chosen is based on the one in the numpy-refactor +branch, so will integrate naturally into the refactored code base. +The iterator is named ``NpyIter`` and functions are +named ``NpyIter_*``. + +Converting from Previous NumPy Iterators +---------------------------------------- + +The existing iterator API includes functions like PyArrayIter_Check, +PyArray_Iter* and PyArray_ITER_*. The multi-iterator array includes +PyArray_MultiIter*, PyArray_Broadcast, and PyArray_RemoveSmallest. The +new iterator design replaces all of this functionality with a single object +and associated API. One goal of the new API is that all uses of the +existing iterator should be replaceable with the new iterator without +significant effort. In 1.6, the major exception to this is the neighborhood +iterator, which does not have corresponding features in this iterator. + +Here is a conversion table for the regular iterator: + +=============================== ============================================= +``PyArray_IterNew`` ``NpyIter_New`` +``PyArray_IterAllButAxis`` ``NpyIter_New`` + ``axes`` parameter **or** + Iterator flag ``NPY_ITER_NO_INNER_ITERATION`` +``PyArray_BroadcastToShape`` **NOT SUPPORTED** (Use the support for + multiple operands instead.) +``PyArrayIter_Check`` Will need to add this in Python exposure +``PyArray_ITER_RESET`` ``NpyIter_Reset`` +``PyArray_ITER_NEXT`` Function pointer from ``NpyIter_GetIterNext`` +``PyArray_ITER_DATA`` ``NpyIter_GetDataPtrArray`` +``PyArray_ITER_GOTO`` ``NpyIter_GotoCoords`` +``PyArray_ITER_GOTO1D`` ``NpyIter_GotoIndex`` or + ``NpyIter_GotoIterIndex`` +``PyArray_ITER_NOTDONE`` Return value of ``iternext`` function pointer +=============================== ============================================= + +For the multi-iterator: + +=============================== ============================================= +``PyArray_MultiIterNew`` ``NpyIter_MultiNew`` +``PyArray_MultiIter_RESET`` ``NpyIter_Reset`` +``PyArray_MultiIter_NEXT`` Function pointer from ``NpyIter_GetIterNext`` +``PyArray_MultiIter_DATA`` ``NpyIter_GetDataPtrArray`` +``PyArray_MultiIter_NEXTi`` **NOT SUPPORTED** (always lock-step iteration) +``PyArray_MultiIter_GOTO`` ``NpyIter_GotoCoords`` +``PyArray_MultiIter_GOTO1D`` ``NpyIter_GotoIndex`` or + ``NpyIter_GotoIterIndex`` +``PyArray_MultiIter_NOTDONE`` Return value of ``iternext`` function pointer +``PyArray_Broadcast`` Handled by ``NpyIter_MultiNew`` +``PyArray_RemoveSmallest`` Iterator flag ``NPY_ITER_NO_INNER_ITERATION`` +=============================== ============================================= + +For other API calls: + +=============================== ============================================= +``PyArray_ConvertToCommonType`` Iterator flag ``NPY_ITER_COMMON_DTYPE`` +=============================== ============================================= + +Simple Iteration Example +------------------------ + +The best way to become familiar with the iterator is to look at its +usage within the NumPy codebase itself. For example, here is a slightly +tweaked version of the code for ``PyArray_CountNonzero``, which counts the +number of non-zero elements in an array. + +.. code-block:: c + + npy_intp PyArray_CountNonzero(PyArrayObject* self) + { + /* Nonzero boolean function */ + PyArray_NonzeroFunc* nonzero = PyArray_DESCR(self)->f->nonzero; + + NpyIter* iter; + NpyIter_IterNext_Fn iternext; + char** dataptr; + npy_intp* strideptr,* innersizeptr; + + /* Handle zero-sized arrays specially */ + if (PyArray_SIZE(self) == 0) { + return 0; + } + + /* + * Create and use an iterator to count the nonzeros. + * flag NPY_ITER_READONLY + * - The array is never written to. + * flag NPY_ITER_NO_INNER_ITERATION + * - Inner loop is done outside the iterator for efficiency. + * flag NPY_ITER_NPY_ITER_REFS_OK + * - Reference types are acceptable. + * order NPY_KEEPORDER + * - Visit elements in memory order, regardless of strides. + * This is good for performance when the specific order + * elements are visited is unimportant. + * casting NPY_NO_CASTING + * - No casting is required for this operation. + */ + iter = NpyIter_New(self, NPY_ITER_READONLY| + NPY_ITER_NO_INNER_ITERATION| + NPY_ITER_REFS_OK, + NPY_KEEPORDER, NPY_NO_CASTING, + NULL, 0, NULL, 0); + if (iter == NULL) { + return -1; + } + + /* + * The iternext function gets stored in a local variable + * so it can be called repeatedly in an efficient manner. + */ + iternext = NpyIter_GetIterNext(iter, NULL); + if (iternext == NULL) { + NpyIter_Deallocate(iter); + return -1; + } + /* The location of the data pointer which the iterator may update */ + dataptr = NpyIter_GetDataPtrArray(iter); + /* The location of the stride which the iterator may update */ + strideptr = NpyIter_GetInnerStrideArray(iter); + /* The location of the inner loop size which the iterator may update */ + innersizeptr = NpyIter_GetInnerLoopSizePtr(iter); + + /* The iteration loop */ + do { + /* Get the inner loop data/stride/count values */ + char* data = *dataptr; + npy_intp stride = *strideptr; + npy_intp count = *innersizeptr; + + /* This is a typical inner loop for NPY_ITER_NO_INNER_ITERATION */ + while (count--) { + if (nonzero(data, self)) { + ++nonzero_count; + } + data += stride; + } + + /* Increment the iterator to the next inner loop */ + } while(iternext(iter)); + + NpyIter_Deallocate(iter); + + return nonzero_count; + } + +Simple Multi-Iteration Example +------------------------------ + +Here is a simple copy function using the iterator. The ``order`` parameter +is used to control the memory layout of the allocated result, typically +NPY_KEEPORDER is desired. + +.. code-block:: c + + PyObject *CopyArray(PyObject *arr, NPY_ORDER order) + { + NpyIter *iter; + NpyIter_IterNext_Fn iternext; + PyObject *op[2], *ret; + npy_uint32 flags; + npy_uint32 op_flags[2]; + npy_intp itemsize, *innersizeptr, innerstride; + char **dataptrarray; + + /* + * No inner iteration - inner loop is handled by CopyArray code + */ + flags = NPY_ITER_NO_INNER_ITERATION; + /* + * Tell the constructor to automatically allocate the output. + * The data type of the output will match that of the input. + */ + op[0] = arr; + op[1] = NULL; + op_flags[0] = NPY_ITER_READONLY; + op_flags[1] = NPY_ITER_WRITEONLY | NPY_ITER_ALLOCATE; + + /* Construct the iterator */ + iter = NpyIter_MultiNew(2, op, flags, order, NPY_NO_CASTING, + op_flags, NULL, 0, NULL); + if (iter == NULL) { + return NULL; + } + + /* + * Make a copy of the iternext function pointer and + * a few other variables the inner loop needs. + */ + iternext = NpyIter_GetIterNext(iter); + innerstride = NpyIter_GetInnerStrideArray(iter)[0]; + itemsize = NpyIter_GetDescrArray(iter)[0]->elsize; + /* + * The inner loop size and data pointers may change during the + * loop, so just cache the addresses. + */ + innersizeptr = NpyIter_GetInnerLoopSizePtr(iter); + dataptrarray = NpyIter_GetDataPtrArray(iter); + + /* + * Note that because the iterator allocated the output, + * it matches the iteration order and is packed tightly, + * so we don't need to check it like the input. + */ + if (innerstride == itemsize) { + do { + memcpy(dataptrarray[1], dataptrarray[0], + itemsize * (*innersizeptr)); + } while (iternext(iter)); + } else { + /* For efficiency, should specialize this based on item size... */ + npy_intp i; + do { + npy_intp size = *innersizeptr; + char *src = dataaddr[0], *dst = dataaddr[1]; + for(i = 0; i < size; i++, src += innerstride, dst += itemsize) { + memcpy(dst, src, itemsize); + } + } while (iternext(iter)); + } + + /* Get the result from the iterator object array */ + ret = NpyIter_GetOperandArray(iter)[1]; + Py_INCREF(ret); + + if (NpyIter_Deallocate(iter) != NPY_SUCCEED) { + Py_DECREF(ret); + return NULL; + } + + return ret; + } + + +Iterator Pointer Type +--------------------- + +The iterator layout is an internal detail, and user code only sees +an incomplete struct. + +.. code-block:: c + + typedef struct NpyIter_InternalOnly NpyIter; + + +Construction and Destruction +---------------------------- + +.. cfunction:: NpyIter* NpyIter_New(PyArrayObject* op, npy_uint32 flags, NPY_ORDER order, NPY_CASTING casting, PyArray_Descr* dtype, npy_intp a_ndim, npy_intp* axes, npy_intp buffersize) + + Creates an iterator for the given numpy array object ``op``. + + Flags that may be passed in ``flags`` are any combination + of the global and per-operand flags documented in + ``NpyIter_MultiNew``, except for ``NPY_ITER_ALLOCATE``. + + Any of the ``NPY_ORDER`` enum values may be passed to ``order``. For + efficient iteration, ``NPY_KEEPORDER`` is the best option, and the other + orders enforce the particular iteration pattern. + + Any of the ``NPY_CASTING`` enum values may be passed to ``casting``. + The values include ``NPY_NO_CASTING``, ``NPY_EQUIV_CASTING``, + ``NPY_SAFE_CASTING``, ``NPY_SAME_KIND_CASTING``, and + ``NPY_UNSAFE_CASTING``. To allow the casts to occur, copying or + buffering must also be enabled. + + If ``dtype`` isn't ``NULL``, then it requires that data type. + If copying is allowed, it will make a temporary copy if the data + is castable. If ``UPDATEIFCOPY`` is enabled, it will also copy + the data back with another cast upon iterator destruction. + + If ``a_ndim`` is greater than zero, ``axes`` must also be provided. + In this case, ``axes`` is an ``a_ndim``-sized array of ``op``'s axes. + A value of -1 in ``axes`` means ``newaxis``. Within the ``axes`` + array, axes may not be repeated. + + If ``buffersize`` is zero, a default buffer size is used, + otherwise it specifies how big of a buffer to use. Buffers + which are powers of 2 such as 512 or 1024 are recommended. + + Returns NULL if there is an error, otherwise returns the allocated + iterator. + + To make an iterator similar to the old iterator, this should work. + + .. code-block:: c + + iter = NpyIter_New(op, NPY_ITER_READWRITE, + NPY_CORDER, NPY_NO_CASTING, NULL, 0, NULL); + + If you want to edit an array with aligned ``double`` code, + but the order doesn't matter, you would use this. + + .. code-block:: c + + dtype = PyArray_DescrFromType(NPY_DOUBLE); + iter = NpyIter_New(op, NPY_ITER_READWRITE | + NPY_ITER_BUFFERED | + NPY_ITER_NBO| + NPY_ITER_ALIGNED, + NPY_KEEPORDER, + NPY_SAME_KIND_CASTING, + dtype, 0, NULL); + Py_DECREF(dtype); + +.. cfunction:: NpyIter* NpyIter_MultiNew(npy_intp niter, PyArrayObject** op, npy_uint32 flags, NPY_ORDER order, NPY_CASTING casting, npy_uint32* op_flags, PyArray_Descr** op_dtypes, npy_intp oa_ndim, npy_intp** op_axes, npy_intp buffersize) + + Creates an iterator for broadcasting the ``niter`` array objects provided + in ``op``. + + For normal usage, use 0 for ``oa_ndim`` and NULL for ``op_axes``. + See below for a description of these parameters, which allow for + custom manual broadcasting as well as reordering and leaving out axes. + + Any of the ``NPY_ORDER`` enum values may be passed to ``order``. For + efficient iteration, ``NPY_KEEPORDER`` is the best option, and the other + orders enforce the particular iteration pattern. When using + ``NPY_KEEPORDER``, if you also want to ensure that the iteration is + not reversed along an axis, you should pass the flag + ``NPY_ITER_DONT_NEGATE_STRIDES``. + + Any of the ``NPY_CASTING`` enum values may be passed to ``casting``. + The values include ``NPY_NO_CASTING``, ``NPY_EQUIV_CASTING``, + ``NPY_SAFE_CASTING``, ``NPY_SAME_KIND_CASTING``, and + ``NPY_UNSAFE_CASTING``. To allow the casts to occur, copying or + buffering must also be enabled. + + If ``op_dtypes`` isn't ``NULL``, it specifies a data type or ``NULL`` + for each ``op[i]``. + + The parameter ``oa_ndim``, when non-zero, specifies the number of + dimensions that will be iterated with customized broadcasting. + If it is provided, ``op_axes`` must also be provided. + These two parameters let you control in detail how the + axes of the operand arrays get matched together and iterated. + In ``op_axes``, you must provide an array of ``niter`` pointers + to ``oa_ndim``-sized arrays of type ``npy_intp``. If an entry + in ``op_axes`` is NULL, normal broadcasting rules will apply. + In ``op_axes[j][i]`` is stored either a valid axis of ``op[j]``, or + -1 which means ``newaxis``. Within each ``op_axes[j]`` array, axes + may not be repeated. The following example is how normal broadcasting + applies to a 3-D array, a 2-D array, a 1-D array and a scalar. + + .. code-block:: c + + npy_intp oa_ndim = 3; /* # iteration axes */ + npy_intp op0_axes[] = {0, 1, 2}; /* 3-D operand */ + npy_intp op1_axes[] = {-1, 0, 1}; /* 2-D operand */ + npy_intp op2_axes[] = {-1, -1, 0}; /* 1-D operand */ + npy_intp op3_axes[] = {-1, -1, -1} /* 0-D (scalar) operand */ + npy_intp* op_axes[] = {op0_axes, op1_axes, op2_axes, op3_axes}; + + If ``buffersize`` is zero, a default buffer size is used, + otherwise it specifies how big of a buffer to use. Buffers + which are powers of 2 such as 512 or 1024 are recommended. + + Returns NULL if there is an error, otherwise returns the allocated + iterator. + + Flags that may be passed in ``flags``, applying to the whole + iterator, are: + + ``NPY_ITER_C_INDEX``, ``NPY_ITER_F_INDEX`` + + Causes the iterator to track an index matching C or + Fortran order. These options are mutually exclusive. + + ``NPY_ITER_COORDS`` + + Causes the iterator to track array coordinates. + This prevents the iterator from coalescing axes to + produce bigger inner loops. + + ``NPY_ITER_NO_INNER_ITERATION`` + + Causes the iterator to skip iteration of the innermost + loop, allowing the user of the iterator to handle it. + + This flag is incompatible with ``NPY_ITER_C_INDEX``, + ``NPY_ITER_F_INDEX``, and ``NPY_ITER_COORDS``. + + ``NPY_ITER_DONT_NEGATE_STRIDES`` + + This only affects the iterator when NPY_KEEPORDER is specified + for the order parameter. By default with NPY_KEEPORDER, the + iterator reverses axes which have negative strides, so that + memory is traversed in a forward direction. This disables + this step. Use this flag if you want to use the underlying + memory-ordering of the axes, but don't want an axis reversed. + This is the behavior of ``numpy.ravel(a, order='K')``, for + instance. + + ``NPY_ITER_COMMON_DTYPE`` + + Causes the iterator to convert all the operands to a common + data type, calculated based on the ufunc type promotion rules. + Copying or buffering must be enabled. + + If the common data type is known ahead of time, don't use this + flag. Instead, set the requested dtype for all the operands. + + ``NPY_ITER_REFS_OK`` + + Indicates that arrays with reference types (object + arrays or structured arrays containing an object type) + may be accepted and used in the iterator. If this flag + is enabled, the caller must be sure to check whether + ``NpyIter_IterationNeedsAPI(iter)`` is true, in which case + it may not release the GIL during iteration. + + ``NPY_ITER_ZEROSIZE_OK`` + + Indicates that arrays with a size of zero should be permitted. + Since the typical iteration loop does not naturally work with + zero-sized arrays, you must check that the IterSize is non-zero + before entering the iteration loop. + + ``NPY_ITER_REDUCE_OK`` + + Permits writeable operands with a dimension with zero + stride and size greater than one. Note that such operands + must be read/write. + + When buffering is enabled, this also switches to a special + buffering mode which reduces the loop length as necessary to + not trample on values being reduced. + + Note that if you want to do a reduction on an automatically + allocated output, you must use ``NpyIter_GetOperandArray`` + to get its reference, then set every value to the reduction + unit before doing the iteration loop. In the case of a + buffered reduction, this means you must also specify the + flag ``NPY_ITER_DELAY_BUFALLOC``, then reset the iterator + after initializing the allocated operand to prepare the + buffers. + + ``NPY_ITER_RANGED`` + + Enables support for iteration of sub-ranges of the full + ``iterindex`` range ``[0, NpyIter_IterSize(iter))``. Use + the function ``NpyIter_ResetToIterIndexRange`` to specify + a range for iteration. + + This flag can only be used with ``NPY_ITER_NO_INNER_ITERATION`` + when ``NPY_ITER_BUFFERED`` is enabled. This is because + without buffering, the inner loop is always the size of the + innermost iteration dimension, and allowing it to get cut up + would require special handling, effectively making it more + like the buffered version. + + ``NPY_ITER_BUFFERED`` + + Causes the iterator to store buffering data, and use buffering + to satisfy data type, alignment, and byte-order requirements. + To buffer an operand, do not specify the ``NPY_ITER_COPY`` + or ``NPY_ITER_UPDATEIFCOPY`` flags, because they will + override buffering. Buffering is especially useful for Python + code using the iterator, allowing for larger chunks + of data at once to amortize the Python interpreter overhead. + + If used with ``NPY_ITER_NO_INNER_ITERATION``, the inner loop + for the caller may get larger chunks than would be possible + without buffering, because of how the strides are laid out. + + Note that if an operand is given the flag ``NPY_ITER_COPY`` + or ``NPY_ITER_UPDATEIFCOPY``, a copy will be made in preference + to buffering. Buffering will still occur when the array was + broadcast so elements need to be duplicated to get a constant + stride. + + In normal buffering, the size of each inner loop is equal + to the buffer size, or possibly larger if ``NPY_ITER_GROWINNER`` + is specified. If ``NPY_ITER_REDUCE_OK`` is enabled and + a reduction occurs, the inner loops may become smaller depending + on the structure of the reduction. + + ``NPY_ITER_GROWINNER`` + + When buffering is enabled, this allows the size of the inner + loop to grow when buffering isn't necessary. This option + is best used if you're doing a straight pass through all the + data, rather than anything with small cache-friendly arrays + of temporary values for each inner loop. + + ``NPY_ITER_DELAY_BUFALLOC`` + + When buffering is enabled, this delays allocation of the + buffers until one of the ``NpyIter_Reset*`` functions is + called. This flag exists to avoid wasteful copying of + buffer data when making multiple copies of a buffered + iterator for multi-threaded iteration. + + Another use of this flag is for setting up reduction operations. + After the iterator is created, and a reduction output + is allocated automatically by the iterator (be sure to use + READWRITE access), its value may be initialized to the reduction + unit. Use ``NpyIter_GetOperandArray`` to get the object. + Then, call ``NpyIter_Reset`` to allocate and fill the buffers + with their initial values. + + Flags that may be passed in ``op_flags[i]``, where ``0 <= i < niter``: + + ``NPY_ITER_READWRITE``, ``NPY_ITER_READONLY``, ``NPY_ITER_WRITEONLY`` + + Indicate how the user of the iterator will read or write + to ``op[i]``. Exactly one of these flags must be specified + per operand. + + ``NPY_ITER_COPY`` + + Allow a copy of ``op[i]`` to be made if it does not + meet the data type or alignment requirements as specified + by the constructor flags and parameters. + + ``NPY_ITER_UPDATEIFCOPY`` + + Triggers ``NPY_ITER_COPY``, and when an array operand + is flagged for writing and is copied, causes the data + in a copy to be copied back to ``op[i]`` when the iterator + is destroyed. + + If the operand is flagged as write-only and a copy is needed, + an uninitialized temporary array will be created and then copied + to back to ``op[i]`` on destruction, instead of doing + the unecessary copy operation. + + ``NPY_ITER_NBO``, ``NPY_ITER_ALIGNED``, ``NPY_ITER_CONTIG`` + + Causes the iterator to provide data for ``op[i]`` + that is in native byte order, aligned according to + the dtype requirements, contiguous, or any combination. + + By default, the iterator produces pointers into the + arrays provided, which may be aligned or unaligned, and + with any byte order. If copying or buffering is not + enabled and the operand data doesn't satisfy the constraints, + an error will be raised. + + The contiguous constraint applies only to the inner loop, + successive inner loops may have arbitrary pointer changes. + + If the requested data type is in non-native byte order, + the NBO flag overrides it and the requested data type is + converted to be in native byte order. + + ``NPY_ITER_ALLOCATE`` + + This is for output arrays, and requires that the flag + ``NPY_ITER_WRITEONLY`` be set. If ``op[i]`` is NULL, + creates a new array with the final broadcast dimensions, + and a layout matching the iteration order of the iterator. + + When ``op[i]`` is NULL, the requested data type + ``op_dtypes[i]`` may be NULL as well, in which case it is + automatically generated from the dtypes of the arrays which + are flagged as readable. The rules for generating the dtype + are the same is for UFuncs. Of special note is handling + of byte order in the selected dtype. If there is exactly + one input, the input's dtype is used as is. Otherwise, + if more than one input dtypes are combined together, the + output will be in native byte order. + + After being allocated with this flag, the caller may retrieve + the new array by calling ``NpyIter_GetOperandArray`` and + getting the i-th object in the returned C array. The caller + must call Py_INCREF on it to claim a reference to the array. + + ``NPY_ITER_NO_SUBTYPE`` + + For use with ``NPY_ITER_ALLOCATE``, this flag disables + allocating an array subtype for the output, forcing + it to be a straight ndarray. + + TODO: Maybe it would be better to introduce a function + ``NpyIter_GetWrappedOutput`` and remove this flag? + + ``NPY_ITER_NO_BROADCAST`` + + Ensures that the input or output matches the iteration + dimensions exactly. + +.. cfunction:: NpyIter* NpyIter_Copy(NpyIter* iter) + + Makes a copy of the given iterator. This function is provided + primarily to enable multi-threaded iteration of the data. + + *TODO*: Move this to a section about multithreaded iteration. + + The recommended approach to multithreaded iteration is to + first create an iterator with the flags + ``NPY_ITER_NO_INNER_ITERATION``, ``NPY_ITER_RANGED``, + ``NPY_ITER_BUFFERED``, ``NPY_ITER_DELAY_BUFALLOC``, and + possibly ``NPY_ITER_GROWINNER``. Create a copy of this iterator + for each thread (minus one for the first iterator). Then, take + the iteration index range ``[0, NpyIter_GetIterSize(iter))`` and + split it up into tasks, for example using a TBB parallel_for loop. + When a thread gets a task to execute, it then uses its copy of + the iterator by calling ``NpyIter_ResetToIterIndexRange`` and + iterating over the full range. + + When using the iterator in multi-threaded code or in code not + holding the Python GIL, care must be taken to only call functions + which are safe in that context. ``NpyIter_Copy`` cannot be safely + called without the Python GIL, because it increments Python + references. The ``Reset*`` and some other functions may be safely + called by passing in the ``errmsg`` parameter as non-NULL, so that + the functions will pass back errors through it instead of setting + a Python exception. + +.. cfunction:: int NpyIter_RemoveAxis(NpyIter* iter, npy_intp axis)`` + + Removes an axis from iteration. This requires that + ``NPY_ITER_COORDS`` was set for iterator creation, and does not work + if buffering is enabled or an index is being tracked. This function + also resets the iterator to its initial state. + + This is useful for setting up an accumulation loop, for example. + The iterator can first be created with all the dimensions, including + the accumulation axis, so that the output gets created correctly. + Then, the accumulation axis can be removed, and the calculation + done in a nested fashion. + + **WARNING**: This function may change the internal memory layout of + the iterator. Any cached functions or pointers from the iterator + must be retrieved again! + + Returns ``NPY_SUCCEED`` or ``NPY_FAIL``. + + +.. cfunction:: int NpyIter_RemoveCoords(NpyIter* iter) + + If the iterator has coordinates, this strips support for them, and + does further iterator optimizations that are possible if coordinates + are not needed. This function also resets the iterator to its initial + state. + + **WARNING**: This function may change the internal memory layout of + the iterator. Any cached functions or pointers from the iterator + must be retrieved again! + + After calling this function, ``NpyIter_HasCoords(iter)`` will + return false. + + Returns ``NPY_SUCCEED`` or ``NPY_FAIL``. + +.. cfunction:: int NpyIter_RemoveInnerLoop(NpyIter* iter) + + If RemoveCoords was used, you may want to specify the + flag ``NPY_ITER_NO_INNER_ITERATION``. This flag is not permitted + together with ``NPY_ITER_COORDS``, so this function is provided + to enable the feature after ``NpyIter_RemoveCoords`` is called. + This function also resets the iterator to its initial state. + + **WARNING**: This function changes the internal logic of the iterator. + Any cached functions or pointers from the iterator must be retrieved + again! + + Returns ``NPY_SUCCEED`` or ``NPY_FAIL``. + +.. cfunction:: int NpyIter_Deallocate(NpyIter* iter) + + Deallocates the iterator object. This additionally frees any + copies made, triggering UPDATEIFCOPY behavior where necessary. + + Returns ``NPY_SUCCEED`` or ``NPY_FAIL``. + +.. cfunction:: int NpyIter_Reset(NpyIter* iter, char** errmsg) + + Resets the iterator back to its initial state, at the beginning + of the iteration range. + + Returns ``NPY_SUCCEED`` or ``NPY_FAIL``. If errmsg is non-NULL, + no Python exception is set when ``NPY_FAIL`` is returned. + Instead, \*errmsg is set to an error message. When errmsg is + non-NULL, the function may be safely called without holding + the Python GIL. + +.. cfunction:: int NpyIter_ResetToIterIndexRange(NpyIter* iter, npy_intp istart, npy_intp iend, char** errmsg) + + Resets the iterator and restricts it to the ``iterindex`` range + ``[istart, iend)``. See ``NpyIter_Copy`` for an explanation of + how to use this for multi-threaded iteration. This requires that + the flag ``NPY_ITER_RANGED`` was passed to the iterator constructor. + + If you want to reset both the ``iterindex`` range and the base + pointers at the same time, you can do the following to avoid + extra buffer copying (be sure to add the return code error checks + when you copy this code). + + .. code-block:: c + + /* Set to a trivial empty range */ + NpyIter_ResetToIterIndexRange(iter, 0, 0); + /* Set the base pointers */ + NpyIter_ResetBasePointers(iter, baseptrs); + /* Set to the desired range */ + NpyIter_ResetToIterIndexRange(iter, istart, iend); + + Returns ``NPY_SUCCEED`` or ``NPY_FAIL``. If errmsg is non-NULL, + no Python exception is set when ``NPY_FAIL`` is returned. + Instead, \*errmsg is set to an error message. When errmsg is + non-NULL, the function may be safely called without holding + the Python GIL. + +.. cfunction:: int NpyIter_ResetBasePointers(NpyIter *iter, char** baseptrs, char** errmsg) + + Resets the iterator back to its initial state, but using the values + in ``baseptrs`` for the data instead of the pointers from the arrays + being iterated. This functions is intended to be used, together with + the ``op_axes`` parameter, by nested iteration code with two or more + iterators. + + Returns ``NPY_SUCCEED`` or ``NPY_FAIL``. If errmsg is non-NULL, + no Python exception is set when ``NPY_FAIL`` is returned. + Instead, \*errmsg is set to an error message. When errmsg is + non-NULL, the function may be safely called without holding + the Python GIL. + + *TODO*: Move the following into a special section on nested iterators. + + Creating iterators for nested iteration requires some care. All + the iterator operands must match exactly, or the calls to + ``NpyIter_ResetBasePointers`` will be invalid. This means that + automatic copies and output allocation should not be used haphazardly. + It is possible to still use the automatic data conversion and casting + features of the iterator by creating one of the iterators with + all the conversion parameters enabled, then grabbing the allocated + operands with the ``NpyIter_GetOperandArray`` function and passing + them into the constructors for the rest of the iterators. + + **WARNING**: When creating iterators for nested iteration, + the code must not use a dimension more than once in the different + iterators. If this is done, nested iteration will produce + out-of-bounds pointers during iteration. + + **WARNING**: When creating iterators for nested iteration, buffering + can only be applied to the innermost iterator. If a buffered iterator + is used as the source for ``baseptrs``, it will point into a small buffer + instead of the array and the inner iteration will be invalid. + + The pattern for using nested iterators is as follows. + + .. code-block:: c + + NpyIter *iter1, *iter1; + NpyIter_IterNext_Fn iternext1, iternext2; + char **dataptrs1; + + /* + * With the exact same operands, no copies allowed, and + * no axis in op_axes used both in iter1 and iter2. + * Buffering may be enabled for iter2, but not for iter1. + */ + iter1 = ...; iter2 = ...; + + iternext1 = NpyIter_GetIterNext(iter1); + iternext2 = NpyIter_GetIterNext(iter2); + dataptrs1 = NpyIter_GetDataPtrArray(iter1); + + do { + NpyIter_ResetBasePointers(iter2, dataptrs1); + do { + /* Use the iter2 values */ + } while (iternext2(iter2)); + } while (iternext1(iter1)); + +.. cfunction:: int NpyIter_GotoCoords(NpyIter* iter, npy_intp* coords) + + Adjusts the iterator to point to the ``ndim`` coordinates + pointed to by ``coords``. Returns an error if coordinates + are not being tracked, the coordinates are out of bounds, + or inner loop iteration is disabled. + + Returns ``NPY_SUCCEED`` or ``NPY_FAIL``. + +.. cfunction:: int NpyIter_GotoIndex(NpyIter* iter, npy_intp index) + + Adjusts the iterator to point to the ``index`` specified. + If the iterator was constructed with the flag + ``NPY_ITER_C_INDEX``, ``index`` is the C-order index, + and if the iterator was constructed with the flag + ``NPY_ITER_F_INDEX``, ``index`` is the Fortran-order + index. Returns an error if there is no index being tracked, + the index is out of bounds, or inner loop iteration is disabled. + + Returns ``NPY_SUCCEED`` or ``NPY_FAIL``. + +.. cfunction:: npy_intp NpyIter_GetIterSize(NpyIter* iter) + + Returns the number of elements being iterated. This is the product + of all the dimensions in the shape. + +.. cfunction:: npy_intp NpyIter_GetIterIndex(NpyIter* iter) + + Gets the ``iterindex`` of the iterator, which is an index matching + the iteration order of the iterator. + +.. cfunction:: void NpyIter_GetIterIndexRange(NpyIter* iter, npy_intp* istart, npy_intp* iend) + + Gets the ``iterindex`` sub-range that is being iterated. If + ``NPY_ITER_RANGED`` was not specified, this always returns the + range ``[0, NpyIter_IterSize(iter))``. + +.. cfunction:: int NpyIter_GotoIterIndex(NpyIter* iter, npy_intp iterindex) + + Adjusts the iterator to point to the ``iterindex`` specified. + The IterIndex is an index matching the iteration order of the iterator. + Returns an error if the ``iterindex`` is out of bounds, + buffering is enabled, or inner loop iteration is disabled. + + Returns ``NPY_SUCCEED`` or ``NPY_FAIL``. + +.. cfunction:: int NpyIter_HasInnerLoop(NpyIter* iter) + + Returns 1 if the iterator handles the inner loop, + or 0 if the caller needs to handle it. This is controlled + by the constructor flag ``NPY_ITER_NO_INNER_ITERATION``. + +.. cfunction:: int NpyIter_HasCoords(NpyIter* iter) + + Returns 1 if the iterator was created with the + ``NPY_ITER_COORDS`` flag, 0 otherwise. + +.. cfunction:: int NpyIter_HasIndex(NpyIter* iter) + + Returns 1 if the iterator was created with the + ``NPY_ITER_C_INDEX`` or ``NPY_ITER_F_INDEX`` + flag, 0 otherwise. + +.. cfunction:: int NpyIter_IsBuffered(NpyIter* iter) + + Returns 1 if the iterator was created with the + ``NPY_ITER_BUFFERED`` flag, 0 otherwise. + +.. cfunction:: int NpyIter_IsGrowInner(NpyIter* iter) + + Returns 1 if the iterator was created with the + ``NPY_ITER_GROWINNER`` flag, 0 otherwise. + +.. cfunction:: npy_intp NpyIter_GetBufferSize(NpyIter* iter) + + If the iterator is buffered, returns the size of the buffer + being used, otherwise returns 0. + +.. cfunction:: npy_intp NpyIter_GetNDim(NpyIter* iter) + + Returns the number of dimensions being iterated. If coordinates + were not requested in the iterator constructor, this value + may be smaller than the number of dimensions in the original + objects. + +.. cfunction:: npy_intp NpyIter_GetNIter(NpyIter* iter) + + Returns the number of objects being iterated. + +.. cfunction:: npy_intp* NpyIter_GetAxisStrideArray(NpyIter* iter, npy_intp axis) + + Gets the array of strides for the specified axis. Requires that + the iterator be tracking coordinates, and that buffering not + be enabled. + + This may be used when you want to match up operand axes in + some fashion, then remove them with ``NpyIter_RemoveAxis`` to + handle their processing manually. By calling this function + before removing the axes, you can get the strides for the + manual processing. + + Returns ``NULL`` on error. + +.. cfunction:: int NpyIter_GetShape(NpyIter* iter, npy_intp* outshape) + + Returns the broadcast shape of the iterator in ``outshape``. + This can only be called on an iterator which supports coordinates. + + Returns ``NPY_SUCCEED`` or ``NPY_FAIL``. + +.. cfunction:: PyArray_Descr** NpyIter_GetDescrArray(NpyIter* iter) + + This gives back a pointer to the ``niter`` data type Descrs for + the objects being iterated. The result points into ``iter``, + so the caller does not gain any references to the Descrs. + + This pointer may be cached before the iteration loop, calling + ``iternext`` will not change it. + +.. cfunction:: PyObject** NpyIter_GetOperandArray(NpyIter* iter) + + This gives back a pointer to the ``niter`` operand PyObjects + that are being iterated. The result points into ``iter``, + so the caller does not gain any references to the PyObjects. + +.. cfunction:: PyObject* NpyIter_GetIterView(NpyIter* iter, npy_intp i) + + This gives back a reference to a new ndarray view, which is a view + into the i-th object in the array ``NpyIter_GetOperandArray()``, + whose dimensions and strides match the internal optimized + iteration pattern. A C-order iteration of this view is equivalent + to the iterator's iteration order. + + For example, if an iterator was created with a single array as its + input, and it was possible to rearrange all its axes and then + collapse it into a single strided iteration, this would return + a view that is a one-dimensional array. + +.. cfunction:: void NpyIter_GetReadFlags(NpyIter* iter, char* outreadflags) + + Fills ``niter`` flags. Sets ``outreadflags[i]`` to 1 if + ``op[i]`` can be read from, and to 0 if not. + +.. cfunction:: void NpyIter_GetWriteFlags(NpyIter* iter, char* outwriteflags) + + Fills ``niter`` flags. Sets ``outwriteflags[i]`` to 1 if + ``op[i]`` can be written to, and to 0 if not. + +Functions For Iteration +----------------------- + +.. cfunction:: NpyIter_IterNext_Fn NpyIter_GetIterNext(NpyIter* iter, char** errmsg) + + Returns a function pointer for iteration. A specialized version + of the function pointer may be calculated by this function + instead of being stored in the iterator structure. Thus, to + get good performance, it is required that the function pointer + be saved in a variable rather than retrieved for each loop iteration. + + Returns NULL if there is an error. If errmsg is non-NULL, + no Python exception is set when ``NPY_FAIL`` is returned. + Instead, \*errmsg is set to an error message. When errmsg is + non-NULL, the function may be safely called without holding + the Python GIL. + + The typical looping construct is as follows. + + .. code-block:: c + + NpyIter_IterNext_Fn iternext = NpyIter_GetIterNext(iter, NULL); + char** dataptr = NpyIter_GetDataPtrArray(iter); + + do { + /* use the addresses dataptr[0], ... dataptr[niter-1] */ + } while(iternext(iter)); + + When ``NPY_ITER_NO_INNER_ITERATION`` is specified, the typical + inner loop construct is as follows. + + .. code-block:: c + + NpyIter_IterNext_Fn iternext = NpyIter_GetIterNext(iter, NULL); + char** dataptr = NpyIter_GetDataPtrArray(iter); + npy_intp* stride = NpyIter_GetInnerStrideArray(iter); + npy_intp* size_ptr = NpyIter_GetInnerLoopSizePtr(iter), size; + npy_intp iiter, niter = NpyIter_GetNIter(iter); + + do { + size = *size_ptr; + while (size--) { + /* use the addresses dataptr[0], ... dataptr[niter-1] */ + for (iiter = 0; iiter < niter; ++iiter) { + dataptr[iiter] += stride[iiter]; + } + } + } while (iternext()); + + Observe that we are using the dataptr array inside the iterator, not + copying the values to a local temporary. This is possible because + when ``iternext()`` is called, these pointers will be overwritten + with fresh values, not incrementally updated. + + If a compile-time fixed buffer is being used (both flags + ``NPY_ITER_BUFFERED`` and ``NPY_ITER_NO_INNER_ITERATION``), the + inner size may be used as a signal as well. The size is guaranteed + to become zero when ``iternext()`` returns false, enabling the + following loop construct. Note that if you use this construct, + you should not pass ``NPY_ITER_GROWINNER`` as a flag, because it + will cause larger sizes under some circumstances. + + .. code-block:: c + + /* The constructor should have buffersize passed as this value */ + #define FIXED_BUFFER_SIZE 1024 + + NpyIter_IterNext_Fn iternext = NpyIter_GetIterNext(iter, NULL); + char **dataptr = NpyIter_GetDataPtrArray(iter); + npy_intp *stride = NpyIter_GetInnerStrideArray(iter); + npy_intp *size_ptr = NpyIter_GetInnerLoopSizePtr(iter), size; + npy_intp i, iiter, niter = NpyIter_GetNIter(iter); + + /* One loop with a fixed inner size */ + size = *size_ptr; + while (size == FIXED_BUFFER_SIZE) { + /* + * This loop could be manually unrolled by a factor + * which divides into FIXED_BUFFER_SIZE + */ + for (i = 0; i < FIXED_BUFFER_SIZE; ++i) { + /* use the addresses dataptr[0], ... dataptr[niter-1] */ + for (iiter = 0; iiter < niter; ++iiter) { + dataptr[iiter] += stride[iiter]; + } + } + iternext(); + size = *size_ptr; + } + + /* Finish-up loop with variable inner size */ + if (size > 0) do { + size = *size_ptr; + while (size--) { + /* use the addresses dataptr[0], ... dataptr[niter-1] */ + for (iiter = 0; iiter < niter; ++iiter) { + dataptr[iiter] += stride[iiter]; + } + } + } while (iternext()); + +.. cfunction:: NpyIter_GetCoords_Fn NpyIter_GetGetCoords(NpyIter* iter, char** errmsg) + + Returns a function pointer for getting the coordinates + of the iterator. Returns NULL if the iterator does not + support coordinates. It is recommended that this function + pointer be cached in a local variable before the iteration + loop. + + Returns NULL if there is an error. If errmsg is non-NULL, + no Python exception is set when ``NPY_FAIL`` is returned. + Instead, \*errmsg is set to an error message. When errmsg is + non-NULL, the function may be safely called without holding + the Python GIL. + +.. cfunction:: char** NpyIter_GetDataPtrArray(NpyIter* iter) + + This gives back a pointer to the ``niter`` data pointers. If + ``NPY_ITER_NO_INNER_ITERATION`` was not specified, each data + pointer points to the current data item of the iterator. If + no inner iteration was specified, it points to the first data + item of the inner loop. + + This pointer may be cached before the iteration loop, calling + ``iternext`` will not change it. This function may be safely + called without holding the Python GIL. + +.. cfunction:: npy_intp* NpyIter_GetIndexPtr(NpyIter* iter) + + This gives back a pointer to the index being tracked, or NULL + if no index is being tracked. It is only useable if one of + the flags ``NPY_ITER_C_INDEX`` or ``NPY_ITER_F_INDEX`` + were specified during construction. + +When the flag ``NPY_ITER_NO_INNER_ITERATION`` is used, the code +needs to know the parameters for doing the inner loop. These +functions provide that information. + +.. cfunction:: npy_intp* NpyIter_GetInnerStrideArray(NpyIter* iter) + + Returns a pointer to an array of the ``niter`` strides, + one for each iterated object, to be used by the inner loop. + + This pointer may be cached before the iteration loop, calling + ``iternext`` will not change it. This function may be safely + called without holding the Python GIL. + +.. cfunction:: npy_intp* NpyIter_GetInnerLoopSizePtr(NpyIter* iter) + + Returns a pointer to the number of iterations the + inner loop should execute. + + This address may be cached before the iteration loop, calling + ``iternext`` will not change it. The value itself may change during + iteration, in particular if buffering is enabled. This function + may be safely called without holding the Python GIL. + +.. cfunction:: void NpyIter_GetInnerFixedStrideArray(NpyIter* iter, npy_intp* out_strides) + + Gets an array of strides which are fixed, or will not change during + the entire iteration. For strides that may change, the value + NPY_MAX_INTP is placed in the stride. + + Once the iterator is prepared for iteration (after a reset if + ``NPY_DELAY_BUFALLOC`` was used), call this to get the strides + which may be used to select a fast inner loop function. For example, + if the stride is 0, that means the inner loop can always load its + value into a variable once, then use the variable throughout the loop, + or if the stride equals the itemsize, a contiguous version for that + operand may be used. + + This function may be safely called without holding the Python GIL. diff --git a/doc/source/reference/c-api.rst b/doc/source/reference/c-api.rst index 9bcc68b49..7c7775889 100644 --- a/doc/source/reference/c-api.rst +++ b/doc/source/reference/c-api.rst @@ -44,6 +44,7 @@ code. c-api.config c-api.dtype c-api.array + c-api.iterator c-api.ufunc c-api.generalized-ufuncs c-api.coremath |