diff options
author | Gagandeep Singh <gdp.1807@gmail.com> | 2021-11-02 11:28:17 +0530 |
---|---|---|
committer | Gagandeep Singh <gdp.1807@gmail.com> | 2021-11-02 11:28:17 +0530 |
commit | c04509e86e97a69a0b5fcbeebdbec66faad3dbe0 (patch) | |
tree | b5940db3ad46e55b88d566ec058007dc3c6b7e72 /doc/source/dev/internals.code-explanations.rst | |
parent | 56647dd47345a7fd24b4ee8d9d52025fcdc3b9ae (diff) | |
parent | fae6fa47a3cf9b9c64af2f5bd11a3b644b1763d2 (diff) | |
download | numpy-c04509e86e97a69a0b5fcbeebdbec66faad3dbe0.tar.gz |
resolved conflicts
Diffstat (limited to 'doc/source/dev/internals.code-explanations.rst')
-rw-r--r-- | doc/source/dev/internals.code-explanations.rst | 646 |
1 files changed, 646 insertions, 0 deletions
diff --git a/doc/source/dev/internals.code-explanations.rst b/doc/source/dev/internals.code-explanations.rst new file mode 100644 index 000000000..b6edd61b1 --- /dev/null +++ b/doc/source/dev/internals.code-explanations.rst @@ -0,0 +1,646 @@ +.. currentmodule:: numpy + +.. _c-code-explanations: + +************************* +NumPy C code explanations +************************* + + Fanaticism consists of redoubling your efforts when you have forgotten + your aim. + --- *George Santayana* + + An authority is a person who can tell you more about something than + you really care to know. + --- *Unknown* + +This page attempts to explain the logic behind some of the new +pieces of code. The purpose behind these explanations is to enable +somebody to be able to understand the ideas behind the implementation +somewhat more easily than just staring at the code. Perhaps in this +way, the algorithms can be improved on, borrowed from, and/or +optimized by more people. + + +Memory model +============ + +.. index:: + pair: ndarray; memory model + +One fundamental aspect of the :class:`ndarray` is that an array is seen as a +"chunk" of memory starting at some location. The interpretation of +this memory depends on the :term:`stride` information. For each dimension in +an :math:`N`-dimensional array, an integer (:term:`stride`) dictates how many +bytes must be skipped to get to the next element in that dimension. +Unless you have a single-segment array, this :term:`stride` information must +be consulted when traversing through an array. It is not difficult to +write code that accepts strides, you just have to use ``char*`` +pointers because strides are in units of bytes. Keep in mind also that +strides do not have to be unit-multiples of the element size. Also, +remember that if the number of dimensions of the array is 0 (sometimes +called a ``rank-0`` array), then the :term:`strides <stride>` and +:term:`dimensions <dimension>` variables are ``NULL``. + +Besides the structural information contained in the strides and +dimensions members of the :c:type:`PyArrayObject`, the flags contain +important information about how the data may be accessed. In particular, +the :c:data:`NPY_ARRAY_ALIGNED` flag is set when the memory is on a +suitable boundary according to the datatype array. Even if you have +a :term:`contiguous` chunk of memory, you cannot just assume it is safe to +dereference a datatype-specific pointer to an element. Only if the +:c:data:`NPY_ARRAY_ALIGNED` flag is set, this is a safe operation. On +some platforms it will work but on others, like Solaris, it will cause +a bus error. The :c:data:`NPY_ARRAY_WRITEABLE` should also be ensured +if you plan on writing to the memory area of the array. It is also +possible to obtain a pointer to an unwritable memory area. Sometimes, +writing to the memory area when the :c:data:`NPY_ARRAY_WRITEABLE` flag is not +set will just be rude. Other times it can cause program crashes (*e.g.* +a data-area that is a read-only memory-mapped file). + + +Data-type encapsulation +======================= + +.. seealso:: :ref:`arrays.dtypes` + +.. index:: + single: dtype + +The :ref:`datatype <arrays.dtypes>` is an important abstraction of the +:class:`ndarray`. Operations +will look to the datatype to provide the key functionality that is +needed to operate on the array. This functionality is provided in the +list of function pointers pointed to by the ``f`` member of the +:c:type:`PyArray_Descr` structure. In this way, the number of datatypes can be +extended simply by providing a :c:type:`PyArray_Descr` structure with suitable +function pointers in the ``f`` member. For built-in types, there are some +optimizations that bypass this mechanism, but the point of the datatype +abstraction is to allow new datatypes to be added. + +One of the built-in datatypes, the :class:`void` datatype allows for +arbitrary :term:`structured types <structured data type>` containing 1 or more +fields as elements of the array. A :term:`field` is simply another datatype +object along with an offset into the current structured type. In order to +support arbitrarily nested fields, several recursive implementations of +datatype access are implemented for the void type. A common idiom is to cycle +through the elements of the dictionary and perform a specific operation based on +the datatype object stored at the given offset. These offsets can be +arbitrary numbers. Therefore, the possibility of encountering misaligned +data must be recognized and taken into account if necessary. + + +N-D Iterators +============= + +.. seealso:: :ref:`arrays.nditer` + +.. index:: + single: array iterator + +A very common operation in much of NumPy code is the need to iterate +over all the elements of a general, strided, N-dimensional array. This +operation of a general-purpose N-dimensional loop is abstracted in the +notion of an iterator object. To write an N-dimensional loop, you only +have to create an iterator object from an ndarray, work with the +:c:member:`dataptr <PyArrayIterObject.dataptr>` member of the iterator object +structure and call the macro :c:func:`PyArray_ITER_NEXT` on the iterator +object to move to the next element. The ``next`` element is always in +C-contiguous order. The macro works by first special-casing the C-contiguous, +1-D, and 2-D cases which work very simply. + +For the general case, the iteration works by keeping track of a list +of coordinate counters in the iterator object. At each iteration, the +last coordinate counter is increased (starting from 0). If this +counter is smaller than one less than the size of the array in that +dimension (a pre-computed and stored value), then the counter is +increased and the :c:member:`dataptr <PyArrayIterObject.dataptr>` member is +increased by the strides in that +dimension and the macro ends. If the end of a dimension is reached, +the counter for the last dimension is reset to zero and the +:c:member:`dataptr <PyArrayIterObject.dataptr>` is +moved back to the beginning of that dimension by subtracting the +strides value times one less than the number of elements in that +dimension (this is also pre-computed and stored in the +:c:member:`backstrides <PyArrayIterObject.backstrides>` +member of the iterator object). In this case, the macro does not end, +but a local dimension counter is decremented so that the next-to-last +dimension replaces the role that the last dimension played and the +previously-described tests are executed again on the next-to-last +dimension. In this way, the :c:member:`dataptr <PyArrayIterObject.dataptr>` +is adjusted appropriately for arbitrary striding. + +The :c:member:`coordinates <PyArrayIterObject.coordinates>` member of the +:c:type:`PyArrayIterObject` structure maintains +the current N-d counter unless the underlying array is C-contiguous in +which case the coordinate counting is bypassed. The +:c:member:`index <PyArrayIterObject.index>` member of +the :c:type:`PyArrayIterObject` keeps track of the current flat index of the +iterator. It is updated by the :c:func:`PyArray_ITER_NEXT` macro. + + +Broadcasting +============ + +.. seealso:: :ref:`basics.broadcasting` + +.. index:: + single: broadcasting + +In Numeric, the ancestor of NumPy, broadcasting was implemented in several +lines of code buried deep in ``ufuncobject.c``. In NumPy, the notion of +broadcasting has been abstracted so that it can be performed in multiple places. +Broadcasting is handled by the function :c:func:`PyArray_Broadcast`. This +function requires a :c:type:`PyArrayMultiIterObject` (or something that is a +binary equivalent) to be passed in. The :c:type:`PyArrayMultiIterObject` keeps +track of the broadcast number of dimensions and size in each +dimension along with the total size of the broadcast result. It also +keeps track of the number of arrays being broadcast and a pointer to +an iterator for each of the arrays being broadcast. + +The :c:func:`PyArray_Broadcast` function takes the iterators that have already +been defined and uses them to determine the broadcast shape in each +dimension (to create the iterators at the same time that broadcasting +occurs then use the :c:func:`PyArray_MultiIterNew` function). +Then, the iterators are +adjusted so that each iterator thinks it is iterating over an array +with the broadcast size. This is done by adjusting the iterators +number of dimensions, and the :term:`shape` in each dimension. This works +because the iterator strides are also adjusted. Broadcasting only +adjusts (or adds) length-1 dimensions. For these dimensions, the +strides variable is simply set to 0 so that the data-pointer for the +iterator over that array doesn't move as the broadcasting operation +operates over the extended dimension. + +Broadcasting was always implemented in Numeric using 0-valued strides +for the extended dimensions. It is done in exactly the same way in +NumPy. The big difference is that now the array of strides is kept +track of in a :c:type:`PyArrayIterObject`, the iterators involved in a +broadcast result are kept track of in a :c:type:`PyArrayMultiIterObject`, +and the :c:func:`PyArray_Broadcast` call implements the +:ref:`general-broadcasting-rules`. + + +Array Scalars +============= + +.. seealso:: :ref:`arrays.scalars` + +.. index:: + single: array scalars + +The array scalars offer a hierarchy of Python types that allow a one-to-one +correspondence between the datatype stored in an array and the +Python-type that is returned when an element is extracted from the +array. An exception to this rule was made with object arrays. Object +arrays are heterogeneous collections of arbitrary Python objects. When +you select an item from an object array, you get back the original +Python object (and not an object array scalar which does exist but is +rarely used for practical purposes). + +The array scalars also offer the same methods and attributes as arrays +with the intent that the same code can be used to support arbitrary +dimensions (including 0-dimensions). The array scalars are read-only +(immutable) with the exception of the void scalar which can also be +written to so that structured array field setting works more naturally +(``a[0]['f1'] = value``). + + +Indexing +======== + +.. seealso:: :ref:`basics.indexing`, :ref:`arrays.indexing` + +.. index:: + single: indexing + +All Python indexing operations ``arr[index]`` are organized by first preparing +the index and finding the index type. The supported index types are: + +* integer +* :const:`newaxis` +* :term:`python:slice` +* :py:data:`Ellipsis` +* integer arrays/array-likes (advanced) +* boolean (single boolean array); if there is more than one boolean array as + the index or the shape does not match exactly, the boolean array will be + converted to an integer array instead. +* 0-d boolean (and also integer); 0-d boolean arrays are a special + case that has to be handled in the advanced indexing code. They signal + that a 0-d boolean array had to be interpreted as an integer array. + +As well as the scalar array special case signaling that an integer array +was interpreted as an integer index, which is important because an integer +array index forces a copy but is ignored if a scalar is returned (full integer +index). The prepared index is guaranteed to be valid with the exception of +out of bound values and broadcasting errors for advanced indexing. This +includes that an :py:data:`Ellipsis` is added for incomplete indices for +example when a two-dimensional array is indexed with a single integer. + +The next step depends on the type of index which was found. If all +dimensions are indexed with an integer a scalar is returned or set. A +single boolean indexing array will call specialized boolean functions. +Indices containing an :py:data:`Ellipsis` or :term:`python:slice` but no +advanced indexing will always create a view into the old array by calculating +the new strides and memory offset. This view can then either be returned or, +for assignments, filled using ``PyArray_CopyObject``. Note that +``PyArray_CopyObject`` may also be called on temporary arrays in other branches +to support complicated assignments when the array is of object :class:`dtype`. + +Advanced indexing +----------------- + +By far the most complex case is advanced indexing, which may or may not be +combined with typical view-based indexing. Here integer indices are +interpreted as view-based. Before trying to understand this, you may want +to make yourself familiar with its subtleties. The advanced indexing code +has three different branches and one special case: + +* There is one indexing array and it, as well as the assignment array, can + be iterated trivially. For example, they may be contiguous. Also, the + indexing array must be of :class:`intp` type and the value array in + assignments should be of the correct type. This is purely a fast path. +* There are only integer array indices so that no subarray exists. +* View-based and advanced indexing is mixed. In this case, the view-based + indexing defines a collection of subarrays that are combined by the + advanced indexing. For example, ``arr[[1, 2, 3], :]`` is created by + vertically stacking the subarrays ``arr[1, :]``, ``arr[2, :]``, and + ``arr[3, :]``. +* There is a subarray but it has exactly one element. This case can be handled + as if there is no subarray but needs some care during setup. + +Deciding what case applies, checking broadcasting, and determining the kind +of transposition needed are all done in :c:func:`PyArray_MapIterNew`. After +setting up, there are two cases. If there is no subarray or it only has one +element, no subarray iteration is necessary and an iterator is prepared +which iterates all indexing arrays *as well as* the result or value array. +If there is a subarray, there are three iterators prepared. One for the +indexing arrays, one for the result or value array (minus its subarray), +and one for the subarrays of the original and the result/assignment array. +The first two iterators give (or allow calculation) of the pointers into +the start of the subarray, which then allows restarting the subarray +iteration. + +When advanced indices are next to each other transposing may be necessary. +All necessary transposing is handled by :c:func:`PyArray_MapIterSwapAxes` and +has to be handled by the caller unless :c:func:`PyArray_MapIterNew` is asked to +allocate the result. + +After preparation, getting and setting are relatively straightforward, +although the different modes of iteration need to be considered. Unless +there is only a single indexing array during item getting, the validity of +the indices is checked beforehand. Otherwise, it is handled in the inner +loop itself for optimization. + +.. _ufuncs-internals: + +Universal functions +=================== + +.. seealso:: :ref:`ufuncs`, :ref:`ufuncs-basics` + +.. index:: + single: ufunc + +Universal functions are callable objects that take :math:`N` inputs +and produce :math:`M` outputs by wrapping basic 1-D loops that work +element-by-element into full easy-to-use functions that seamlessly +implement :ref:`broadcasting <basics.broadcasting>`, +:ref:`type-checking <ufuncs.casting>`, +:ref:`buffered coercion <use-of-internal-buffers>`, and +:ref:`output-argument handling <ufuncs-output-type>`. New universal functions +are normally created in C, although there is a mechanism for creating ufuncs +from Python functions (:func:`frompyfunc`). The user must supply a 1-D loop that +implements the basic function taking the input scalar values and +placing the resulting scalars into the appropriate output slots as +explained in implementation. + + +Setup +----- + +Every :class:`ufunc` calculation involves some overhead related to setting up +the calculation. The practical significance of this overhead is that +even though the actual calculation of the ufunc is very fast, you will +be able to write array and type-specific code that will work faster +for small arrays than the ufunc. In particular, using ufuncs to +perform many calculations on 0-D arrays will be slower than other +Python-based solutions (the silently-imported ``scalarmath`` module exists +precisely to give array scalars the look-and-feel of ufunc based +calculations with significantly reduced overhead). + +When a :class:`ufunc` is called, many things must be done. The information +collected from these setup operations is stored in a loop object. This +loop object is a C-structure (that could become a Python object but is +not initialized as such because it is only used internally). This loop +object has the layout needed to be used with :c:func:`PyArray_Broadcast` +so that the broadcasting can be handled in the same way as it is handled in +other sections of code. + +The first thing done is to look up in the thread-specific global +dictionary the current values for the buffer-size, the error mask, and +the associated error object. The state of the error mask controls what +happens when an error condition is found. It should be noted that +checking of the hardware error flags is only performed after each 1-D +loop is executed. This means that if the input and output arrays are +contiguous and of the correct type so that a single 1-D loop is +performed, then the flags may not be checked until all elements of the +array have been calculated. Looking up these values in a thread-specific +dictionary takes time which is easily ignored for all but +very small arrays. + +After checking, the thread-specific global variables, the inputs are +evaluated to determine how the ufunc should proceed and the input and +output arrays are constructed if necessary. Any inputs which are not +arrays are converted to arrays (using context if necessary). Which of +the inputs are scalars (and therefore converted to 0-D arrays) is +noted. + +Next, an appropriate 1-D loop is selected from the 1-D loops available +to the :class:`ufunc` based on the input array types. This 1-D loop is selected +by trying to match the signature of the datatypes of the inputs +against the available signatures. The signatures corresponding to +built-in types are stored in the :attr:`ufunc.types` member of the ufunc +structure. The signatures corresponding to user-defined types are stored in a +linked list of function information with the head element stored as a +``CObject`` in the ``userloops`` dictionary keyed by the datatype number +(the first user-defined type in the argument list is used as the key). +The signatures are searched until a signature is found to which the +input arrays can all be cast safely (ignoring any scalar arguments +which are not allowed to determine the type of the result). The +implication of this search procedure is that "lesser types" should be +placed below "larger types" when the signatures are stored. If no 1-D +loop is found, then an error is reported. Otherwise, the ``argument_list`` +is updated with the stored signature --- in case casting is necessary +and to fix the output types assumed by the 1-D loop. + +If the ufunc has 2 inputs and 1 output and the second input is an +``Object`` array then a special-case check is performed so that +``NotImplemented`` is returned if the second input is not an ndarray, has +the :obj:`~numpy.class.__array_priority__` attribute, and has an ``__r{op}__`` +special method. In this way, Python is signaled to give the other object a +chance to complete the operation instead of using generic object-array +calculations. This allows (for example) sparse matrices to override +the multiplication operator 1-D loop. + +For input arrays that are smaller than the specified buffer size, +copies are made of all non-contiguous, misaligned, or out-of-byteorder +arrays to ensure that for small arrays, a single loop is +used. Then, array iterators are created for all the input arrays and +the resulting collection of iterators is broadcast to a single shape. + +The output arguments (if any) are then processed and any missing +return arrays are constructed. If any provided output array doesn't +have the correct type (or is misaligned) and is smaller than the +buffer size, then a new output array is constructed with the special +:c:data:`NPY_ARRAY_WRITEBACKIFCOPY` flag set. At the end of the function, +:c:func:`PyArray_ResolveWritebackIfCopy` is called so that +its contents will be copied back into the output array. +Iterators for the output arguments are then processed. + +Finally, the decision is made about how to execute the looping +mechanism to ensure that all elements of the input arrays are combined +to produce the output arrays of the correct type. The options for loop +execution are one-loop (for :term`contiguous`, aligned, and correct data +type), strided-loop (for non-contiguous but still aligned and correct +data type), and a buffered loop (for misaligned or incorrect data +type situations). Depending on which execution method is called for, +the loop is then set up and computed. + + +Function call +------------- + +This section describes how the basic universal function computation loop is +set up and executed for each of the three different kinds of execution. If +:c:data:`NPY_ALLOW_THREADS` is defined during compilation, then as long as +no object arrays are involved, the Python Global Interpreter Lock (GIL) is +released prior to calling the loops. It is re-acquired if necessary to +handle error conditions. The hardware error flags are checked only after +the 1-D loop is completed. + + +One loop +^^^^^^^^ + +This is the simplest case of all. The ufunc is executed by calling the +underlying 1-D loop exactly once. This is possible only when we have +aligned data of the correct type (including byteorder) for both input +and output and all arrays have uniform strides (either :term:`contiguous`, +0-D, or 1-D). In this case, the 1-D computational loop is called once +to compute the calculation for the entire array. Note that the +hardware error flags are only checked after the entire calculation is +complete. + + +Strided loop +^^^^^^^^^^^^ + +When the input and output arrays are aligned and of the correct type, +but the striding is not uniform (non-contiguous and 2-D or larger), +then a second looping structure is employed for the calculation. This +approach converts all of the iterators for the input and output +arguments to iterate over all but the largest dimension. The inner +loop is then handled by the underlying 1-D computational loop. The +outer loop is a standard iterator loop on the converted iterators. The +hardware error flags are checked after each 1-D loop is completed. + + +Buffered loop +^^^^^^^^^^^^^ + +This is the code that handles the situation whenever the input and/or +output arrays are either misaligned or of the wrong datatype +(including being byteswapped) from what the underlying 1-D loop +expects. The arrays are also assumed to be non-contiguous. The code +works very much like the strided-loop except for the inner 1-D loop is +modified so that pre-processing is performed on the inputs and post-processing +is performed on the outputs in ``bufsize`` chunks (where +``bufsize`` is a user-settable parameter). The underlying 1-D +computational loop is called on data that is copied over (if it needs +to be). The setup code and the loop code is considerably more +complicated in this case because it has to handle: + +- memory allocation of the temporary buffers + +- deciding whether or not to use buffers on the input and output data + (misaligned and/or wrong datatype) + +- copying and possibly casting data for any inputs or outputs for which + buffers are necessary. + +- special-casing ``Object`` arrays so that reference counts are properly + handled when copies and/or casts are necessary. + +- breaking up the inner 1-D loop into ``bufsize`` chunks (with a possible + remainder). + +Again, the hardware error flags are checked at the end of each 1-D +loop. + + +Final output manipulation +------------------------- + +Ufuncs allow other array-like classes to be passed seamlessly through +the interface in that inputs of a particular class will induce the +outputs to be of that same class. The mechanism by which this works is +the following. If any of the inputs are not ndarrays and define the +:obj:`~numpy.class.__array_wrap__` method, then the class with the largest +:obj:`~numpy.class.__array_priority__` attribute determines the type of all the +outputs (with the exception of any output arrays passed in). The +:obj:`~numpy.class.__array_wrap__` method of the input array will be called +with the ndarray being returned from the ufunc as its input. There are two +calling styles of the :obj:`~numpy.class.__array_wrap__` function supported. +The first takes the ndarray as the first argument and a tuple of "context" as +the second argument. The context is (ufunc, arguments, output argument +number). This is the first call tried. If a ``TypeError`` occurs, then the +function is called with just the ndarray as the first argument. + + +Methods +------- + +There are three methods of ufuncs that require calculation similar to +the general-purpose ufuncs. These are :meth:`ufunc.reduce`, +:meth:`ufunc.accumulate`, and :meth:`ufunc.reduceat`. Each of these +methods requires a setup command followed by a +loop. There are four loop styles possible for the methods +corresponding to no-elements, one-element, strided-loop, and buffered-loop. +These are the same basic loop styles as implemented for the +general-purpose function call except for the no-element and one-element +cases which are special-cases occurring when the input array +objects have 0 and 1 elements respectively. + + +Setup +^^^^^ + +The setup function for all three methods is ``construct_reduce``. +This function creates a reducing loop object and fills it with the +parameters needed to complete the loop. All of the methods only work +on ufuncs that take 2-inputs and return 1 output. Therefore, the +underlying 1-D loop is selected assuming a signature of ``[otype, +otype, otype]`` where ``otype`` is the requested reduction +datatype. The buffer size and error handling are then retrieved from +(per-thread) global storage. For small arrays that are misaligned or +have incorrect datatype, a copy is made so that the un-buffered +section of code is used. Then, the looping strategy is selected. If +there is 1 element or 0 elements in the array, then a simple looping +method is selected. If the array is not misaligned and has the +correct datatype, then strided looping is selected. Otherwise, +buffered looping must be performed. Looping parameters are then +established, and the return array is constructed. The output array is +of a different :term:`shape` depending on whether the method is +:meth:`reduce <ufunc.reduce>`, :meth:`accumulate <ufunc.accumulate>`, or +:meth:`reduceat <ufunc.reduceat>`. If an output array is already provided, then +its shape is checked. If the output array is not C-contiguous, +aligned, and of the correct data type, then a temporary copy is made +with the :c:data:`NPY_ARRAY_WRITEBACKIFCOPY` flag set. In this way, the methods +will be able to work with a well-behaved output array but the result will be +copied back into the true output array when +:c:func:`PyArray_ResolveWritebackIfCopy` is called at function completion. +Finally, iterators are set up to loop over the correct :term:`axis` +(depending on the value of axis provided to the method) and the setup +routine returns to the actual computation routine. + + +:meth:`Reduce <ufunc.reduce>` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. index:: + triple: ufunc; methods; reduce + +All of the ufunc methods use the same underlying 1-D computational +loops with input and output arguments adjusted so that the appropriate +reduction takes place. For example, the key to the functioning of +:meth:`reduce <ufunc.reduce>` is that the 1-D loop is called with the output +and the second input pointing to the same position in memory and both having +a step-size of 0. The first input is pointing to the input array with a +step-size given by the appropriate stride for the selected axis. In this +way, the operation performed is + +.. math:: + :nowrap: + + \begin{align*} + o & = & i[0] \\ + o & = & i[k]\textrm{<op>}o\quad k=1\ldots N + \end{align*} + +where :math:`N+1` is the number of elements in the input, :math:`i`, +:math:`o` is the output, and :math:`i[k]` is the +:math:`k^{\textrm{th}}` element of :math:`i` along the selected axis. +This basic operation is repeated for arrays with greater than 1 +dimension so that the reduction takes place for every 1-D sub-array +along the selected axis. An iterator with the selected dimension +removed handles this looping. + +For buffered loops, care must be taken to copy and cast data before +the loop function is called because the underlying loop expects +aligned data of the correct datatype (including byteorder). The +buffered loop must handle this copying and casting prior to calling +the loop function on chunks no greater than the user-specified +``bufsize``. + + +:meth:`Accumulate <ufunc.accumulate>` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. index:: + triple: ufunc; methods; accumulate + +The :meth:`accumulate <ufunc.accumulate>` method is very similar to +the :meth:`reduce <ufunc.reduce>` method in that +the output and the second input both point to the output. The +difference is that the second input points to memory one stride behind +the current output pointer. Thus, the operation performed is + +.. math:: + :nowrap: + + \begin{align*} + o[0] & = & i[0] \\ + o[k] & = & i[k]\textrm{<op>}o[k-1]\quad k=1\ldots N. + \end{align*} + +The output has the same shape as the input and each 1-D loop operates +over :math:`N` elements when the shape in the selected axis is :math:`N+1`. +Again, buffered loops take care to copy and cast the data before +calling the underlying 1-D computational loop. + + +:meth:`Reduceat <ufunc.reduceat>` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. index:: + triple: ufunc; methods; reduceat + single: ufunc + +The :meth:`reduceat <ufunc.reduceat>` function is a generalization of both the +:meth:`reduce <ufunc.reduce>` and :meth:`accumulate <ufunc.accumulate>` +functions. It implements a :meth:`reduce <ufunc.reduce>` over ranges of +the input array specified by indices. The extra indices argument is checked to +be sure that every input is not too large for the input array along +the selected dimension before the loop calculations take place. The +loop implementation is handled using code that is very similar to the +:meth:`reduce <ufunc.reduce>` code repeated as many times as there are elements +in the indices input. In particular: the first input pointer passed to the +underlying 1-D computational loop points to the input array at the +correct location indicated by the index array. In addition, the output +pointer and the second input pointer passed to the underlying 1-D loop +point to the same position in memory. The size of the 1-D +computational loop is fixed to be the difference between the current +index and the next index (when the current index is the last index, +then the next index is assumed to be the length of the array along the +selected dimension). In this way, the 1-D loop will implement a +:meth:`reduce <ufunc.reduce>` over the specified indices. + +Misaligned or a loop datatype that does not match the input and/or +output datatype is handled using buffered code wherein data is +copied to a temporary buffer and cast to the correct datatype if +necessary prior to calling the underlying 1-D function. The temporary +buffers are created in (element) sizes no bigger than the user +settable buffer-size value. Thus, the loop must be flexible enough to +call the underlying 1-D computational loop enough times to complete +the total calculation in chunks no bigger than the buffer-size. |