diff options
-rw-r--r-- | doc/neps/new-iterator-ufunc.rst | 64 |
1 files changed, 42 insertions, 22 deletions
diff --git a/doc/neps/new-iterator-ufunc.rst b/doc/neps/new-iterator-ufunc.rst index 0ea96cb42..c1b805069 100644 --- a/doc/neps/new-iterator-ufunc.rst +++ b/doc/neps/new-iterator-ufunc.rst @@ -409,24 +409,23 @@ everywhere appropriate. UFuncs additionally should gain an ``order=`` parameter to control the layout of their output(s). The iterator can do automatic casting, and I have created a sequence -of more permissive casting rules. Perhaps for 2.0, NumPy could adopt -this enum as its prefered way of dealing with casting.:: +of progressively more permissive casting rules. Perhaps for 2.0, NumPy +could adopt this enum as its prefered way of dealing with casting.:: /* For specifying allowed casting in operations which support it */ typedef enum { - /* Only allow exactly equivalent types */ + /* Only allow identical types */ NPY_NO_CASTING=0, - /* Allow casts between equivalent types of different byte orders */ - NPY_EQUIV_CASTING=0, + /* Allow identical and byte swapped types */ + NPY_EQUIV_CASTING=1, /* Only allow safe casts */ - NPY_SAFE_CASTING=1, - /* Allow safe casts or casts within the same kind */ - NPY_SAME_KIND_CASTING=2, + NPY_SAFE_CASTING=2, + /* Allow safe casts and casts within the same kind */ + NPY_SAME_KIND_CASTING=3, /* Allow any casts */ - NPY_UNSAFE_CASTING=3 + NPY_UNSAFE_CASTING=4 } NPY_CASTING; - Iterator Rewrite ================ @@ -652,7 +651,7 @@ For the multi-iterator: For other API calls: =============================== ============================================= -``PyArray_ConvertToCommonType`` Iterator flag ``NPY_ITER_COMMON_DATA_TYPE`` +``PyArray_ConvertToCommonType`` Iterator flag ``NPY_ITER_COMMON_DTYPE`` =============================== ============================================= @@ -743,8 +742,10 @@ Construction and Destruction If ``op_dtypes`` isn't ``NULL``, it specifies a data type or ``NULL`` for each ``op[i]``. - If the parameter ``oa_ndim`` is not 0, you must also specify - ``op_axes``. These parameters let you control in detail how the + The parameter ``oa_ndim``, when non-zero, specifies the number of + dimensions that will be iterated with customized broadcasting. + If it is provided, ``op_axes`` must also be provided. + These two parameters let you control in detail how the axes of the operand arrays get matched together and iterated. In ``op_axes``, you must provide an array of ``niter`` pointers to ``oa_ndim``-sized arrays of type ``npy_intp``. If an entry @@ -754,11 +755,12 @@ Construction and Destruction may not be repeated. The following example is how normal broadcasting applies to a 3-D array, a 2-D array, a 1-D array and a scalar.:: - npy_intp oa_ndim = 3; npy_intp op0_axes[] = {0, 1, 2}; /* 3-D - operand */ npy_intp op1_axes[] = {-1, 0, 1}; /* 2-D operand */ - npy_intp op2_axes[] = {-1, -1, 0}; /* 1-D operand */ npy_intp - op3_axes[] = {-1, -1, -1} /* 0-D (scalar) operand */ npy_intp - *op_axes[] = {op0_axes, op1_axes, op2_axes, op3_axes}; + npy_intp oa_ndim = 3; /* # iteration axes */ + npy_intp op0_axes[] = {0, 1, 2}; /* 3-D operand */ + npy_intp op1_axes[] = {-1, 0, 1}; /* 2-D operand */ + npy_intp op2_axes[] = {-1, -1, 0}; /* 1-D operand */ + npy_intp op3_axes[] = {-1, -1, -1} /* 0-D (scalar) operand */ + npy_intp *op_axes[] = {op0_axes, op1_axes, op2_axes, op3_axes}; If ``buffersize`` is zero, a default buffer size is used, otherwise it specifies how big of a buffer to use. Buffers @@ -789,7 +791,7 @@ Construction and Destruction This flag is incompatible with ``NPY_ITER_C_ORDER_INDEX``, ``NPY_ITER_F_ORDER_INDEX``, and ``NPY_ITER_COORDS``. - ``NPY_ITER_COMMON_DATA_TYPE`` + ``NPY_ITER_COMMON_DTYPE`` Causes the iterator to convert all the operands to a common data type, calculated based on the ufunc type promotion rules. @@ -813,6 +815,23 @@ Construction and Destruction be made with it. The sum of the inner loop pointers and the outer loop offsets produces the innermost element addresses. + To help understand how the offsets work, here is a simple + nested iteration example. Let's say our array ``a`` has shape + (2, 3, 4), and strides (48, 16, 4). The data pointer for element + (i, j, k) is at address + ``PyArray_BYTES(a) + 48*i + 16*j + 4*k``. Now consider two + iterators with custom op_axes (0,1) and (2,). The first + one will produce addresses like + ``PyArray_BYTES(a) + 48*i + 16*j``, and the second one will + produce addresses like ``PyArray_BYTES(a) + 4*k``. Simply + adding together these values would produce invalid pointers. + Instead, we can make the outer iterator produce offsets, + in which case it will produce the values ``48*i + 16*j``, + and its sum with the other iterator's pointer gives the + correct data address. It's important to note that this + will not work if any of the iterators share an axis. The + iterator cannot check this, so your code must handle it. + This flag is incompatible with copying or buffering inputs. ``NPY_ITER_BUFFERED`` **PARTIALLY IMPLEMENTED** @@ -1054,7 +1073,8 @@ Construction and Destruction ``npy_intp NpyIter_GetIterSize(NpyIter *iter)`` Returns the number of times the iterator will iterate - starting from a new or reset state. + starting from a new or reset state. If buffering is enabled, + it returns 0. ``int NpyIter_GetShape(NpyIter *iter, npy_intp *outshape)`` @@ -1125,7 +1145,7 @@ Functions For Iteration This gives back a pointer to the ``niter`` data pointers. If ``NPY_ITER_NO_INNER_ITERATION`` was not specified, each data pointer points to the current data item of the iterator. If - inner iteration was specified, it points to the first data + no inner iteration was specified, it points to the first data item of the inner loop. This pointer may be cached before the iteration loop, calling @@ -1397,7 +1417,7 @@ as dramatically. Let's use ``c`` instead of ``b`` to see how this works.:: It's still a lot better than seven seconds, but still over ten times worse than the built-in function. Here, the inner loop has 100 elements, -and its iterating 10000 times. If we were coding in C, our performance +and it's iterating 10000 times. If we were coding in C, our performance would already be as good as the built-in performance, but in Python there is too much overhead. |