diff options
author | Charles Harris <charlesr.harris@gmail.com> | 2011-08-27 21:46:08 -0600 |
---|---|---|
committer | Charles Harris <charlesr.harris@gmail.com> | 2011-08-27 21:46:08 -0600 |
commit | 9ecd91b7bf8c77d696ec9856ba10896d8f60309a (patch) | |
tree | 9884131ece5eada06212538c591965bf5928afa2 /doc/source/reference | |
parent | aa55ba7437fbe6b8772a360a641b5aa7d3e669e0 (diff) | |
parent | 10fac981763e87f949bed15c66127fc380fa9b27 (diff) | |
download | numpy-9ecd91b7bf8c77d696ec9856ba10896d8f60309a.tar.gz |
Merge branch 'pull-141'
* pull-141: (167 commits)
ENH: missingdata: Make PyArray_Converter and PyArray_OutputConverter safer for legacy code
DOC: missingdata: Add a mention of the design NEP, and masks vs bitpatterns
DOC: missingdata: Updates from pull request feedback
DOC: missingdata: Updates based on pull request feedback
ENH: nditer: Change the Python nditer exposure to automatically add NPY_ITER_USE_MASKNA
ENH: missingdata: Make comparisons with NA return NA(dtype='bool')
BLD: core: onefile build fix and Python3 compatibility change
DOC: Mention the update to np.all and np.any in the release notes
TST: dtype: Adjust void dtype test to pass without raising a zero-size exception
STY: Remove trailing whitespace
TST: missingdata: Write some tests for the np.any and np.all NA behavior
ENH: missingdata: Make numpy.all follow the NA && False == False rule
ENH: missingdata: Make numpy.all follow the NA || True == True rule
DOC: missingdata: Also show what assigning a non-NA value does in each case
DOC: missingdata: Add introductory documentation for NA-masked arrays
ENH: core: Rename PyArrayObject_fieldaccess to PyArrayObject_fields
DOC: missingdata: Some tweaks to the NA mask documentation
DOC: missingdata: Add example of a C-API function supporting NA masks
DOC: missingdata: Documenting C API for NA-masked arrays
ENH: missingdata: Finish adding C-API access to the NpyNA object
...
Diffstat (limited to 'doc/source/reference')
-rw-r--r-- | doc/source/reference/arrays.maskna.rst | 301 | ||||
-rw-r--r-- | doc/source/reference/arrays.rst | 1 | ||||
-rw-r--r-- | doc/source/reference/c-api.array.rst | 332 | ||||
-rw-r--r-- | doc/source/reference/c-api.iterator.rst | 87 | ||||
-rw-r--r-- | doc/source/reference/c-api.maskna.rst | 592 | ||||
-rw-r--r-- | doc/source/reference/c-api.rst | 1 | ||||
-rw-r--r-- | doc/source/reference/routines.maskna.rst | 11 | ||||
-rw-r--r-- | doc/source/reference/routines.rst | 1 | ||||
-rw-r--r-- | doc/source/reference/routines.sort.rst | 1 |
9 files changed, 1219 insertions, 108 deletions
diff --git a/doc/source/reference/arrays.maskna.rst b/doc/source/reference/arrays.maskna.rst new file mode 100644 index 000000000..2faabde83 --- /dev/null +++ b/doc/source/reference/arrays.maskna.rst @@ -0,0 +1,301 @@ +.. currentmodule:: numpy + +.. _arrays.maskna: + +**************** +NA-Masked Arrays +**************** + +.. versionadded:: 1.7.0 + +NumPy 1.7 adds preliminary support for missing values using an interface +based on an NA (Not Available) placeholder, implemented as masks in the +core ndarray. This system is highly flexible, allowing NAs to be used +with any underlying dtype, and supports creating multiple views of the same +data with different choices of NAs. + +Other Missing Data Approaches +============================= + +The previous recommended approach for working with missing values was the +:mod:`numpy.ma` module, a subclass of ndarray written purely in Python. +By placing NA-masks directly in the NumPy core, it's possible to avoid +the need for calling "ma.<func>(arr)" instead of "np.<func>(arr)". + +Another approach many people have taken is to use NaN as the +placeholder for missing values. There are a few functions +like :func:`numpy.nansum` which behave similarly to usage of the +ufunc.reduce *skipna* parameter. + +As experienced in the R language, a programming interface based on an +NA placeholder is generally more intuitive to work with than direct +mask manipulation. + +Missing Data Model +================== + +The model adopted by NumPy for missing values is that NA is a +placeholder for a value which is there, but is unknown to computations. +The value may be temporarily hidden by the mask, or may be unknown +for any reason, but could be any value the dtype of the array is able +to hold. + +This model affects computations in specific, well-defined ways. Any time +we have a computation, like *c = NA + 1*, we must reason about whether +*c* will be an NA or not. The NA is not available now, but maybe a +measurement will be made later to determine what its value is, so anything +we calculate must be consistent with it eventually being revealed. One way +to do this is with thought experiments imagining we have discovered +the value of this NA. If the NA is 0, then *c* is 1. If the NA is +100, then *c* is 101. Because the value of *c* is ambiguous, it +isn't available either, so must be NA as well. + +A consequence of separating the NA model from the dtype is that, unlike +in R, NaNs are not considered to be NA. An NA is a value that is completely +unknown, whereas a NaN is usually the result of an invalid computation +as defined in the IEEE 754 floating point arithmetic specification. + +Most computations whose input is NA will output NA as well, a property +known as propagation. Some operations, however, always produce the +same result no matter what the value of the NA is. The clearest +example of this is with the logical operations *and* and *or*. Since both +np.logical_or(True, True) and np.logical_or(False, True) are True, +all possible boolean values on the left hand side produce the +same answer. This means that np.logical_or(np.NA, True) can produce +True instead of the more conservative np.NA. There is a similar case +for np.logical_and. + +A similar, but slightly deceptive, example is wanting to treat (NA * 0.0) +as 0.0 instead of as NA. This is invalid because the NA might be Inf +or NaN, in which case the result is NaN instead of 0.0. This idea is +valid for integer dtypes, but NumPy still chooses to return NA because +checking this special case would adversely affect performance. + +The NA Object +============= + +In the root numpy namespace, there is a new object NA. This is not +the only possible instance of an NA as is the case for None, since an NA +may have a dtype associated with it and has been designed for future +expansion to carry a multi-NA payload. It can be used in computations +like any value:: + + >>> np.NA + NA + >>> np.NA * 3 + NA(dtype='int64') + >>> np.sin(np.NA) + NA(dtype='float64') + +To check whether a value is NA, use the :func:`numpy.isna` function:: + + >>> np.isna(np.NA) + True + >>> np.isna(1.5) + False + >>> np.isna(np.nan) + False + >>> np.isna(np.NA * 3) + True + >>> (np.NA * 3) is np.NA + False + + +Creating NA-Masked Arrays +========================= + +Because having NA support adds some overhead to NumPy arrays, one +must explicitly request it when creating arrays. There are several ways +to get an NA-masked array. The easiest way is to include an NA +value in the list used to construct the array.:: + + >>> a = np.array([1,3,5]) + >>> a + array([1, 3, 5]) + >>> a.flags.maskna + False + + >>> b = np.array([1,3,np.NA]) + >>> b + array([1, 3, NA]) + >>> b.flags.maskna + True + +If one already has an array without an NA-mask, it can be added +by directly setting the *maskna* flag to True. Assigning an NA +to an array without NA support will raise an error rather than +automatically creating an NA-mask, with the idea that supporting +NA should be an explicit user choice.:: + + >>> a = np.array([1,3,5]) + >>> a[1] = np.NA + Traceback (most recent call last): + File "<stdin>", line 1, in <module> + ValueError: Cannot assign NA to an array which does not support NAs + >>> a.flags.maskna = True + >>> a[1] = np.NA + >>> a + array([1, NA, 5]) + +Most array construction functions have a new parameter *maskna*, which +can be set to True to produce an array with an NA-mask.:: + + >>> np.arange(5., maskna=True) + array([ 0., 1., 2., 3., 4.], maskna=True) + >>> np.eye(3, maskna=True) + array([[ 1., 0., 0.], + [ 0., 1., 0.], + [ 0., 0., 1.]], maskna=True) + >>> np.array([1,3,5], maskna=True) + array([1, 3, 5], maskna=True) + +Creating NA-Masked Views +======================== + +It will sometimes be desirable to view an array with an NA-mask, without +adding an NA-mask to that array. This is possible by taking an NA-masked +view of the array. There are two ways to do this, one which simply +guarantees that the view has an NA-mask, and another which guarantees that the +view has its own NA-mask, even if the array already had an NA-mask. + +Starting with a non-masked array, we can use the :func:`ndarray.view` method +to get an NA-masked view.:: + + >>> a = np.array([1,3,5]) + >>> b = a.view(maskna=True) + + >>> b[2] = np.NA + >>> a + array([1, 3, 5]) + >>> b + array([1, 3, NA]) + + >>> b[0] = 2 + >>> a + array([2, 3, 5]) + >>> b + array([2, 3, NA]) + + +It is important to be cautious here, though, since if the array already +has a mask, this will also take a view of that mask. This means the original +array's mask will be affected by assigning NA to the view.:: + + >>> a = np.array([1,np.NA,5]) + >>> b = a.view(maskna=True) + + >>> b[2] = np.NA + >>> a + array([1, NA, NA]) + >>> b + array([1, NA, NA]) + + >>> b[1] = 4 + >>> a + array([1, 4, NA]) + >>> b + array([1, 4, NA]) + + +To guarantee that the view created has its own NA-mask, there is another +flag *ownmaskna*. Using this flag will cause a copy of the array's mask +to be created for the view when the array already has a mask.:: + + >>> a = np.array([1,np.NA,5]) + >>> b = a.view(ownmaskna=True) + + >>> b[2] = np.NA + >>> a + array([1, NA, 5]) + >>> b + array([1, NA, NA]) + + >>> b[1] = 4 + >>> a + array([1, NA, 5]) + >>> b + array([1, 4, NA]) + + +In general, when an NA-masked view of an array has been taken, any time +an NA is assigned to an element of the array the data for that element +will remain untouched. This mechanism allows for multiple temporary +views with NAs of the same original array. + +NA-Masked Reductions +==================== + +Many of NumPy's reductions like :func:`numpy.sum` and :func:`numpy.std` +have been extended to work with NA-masked arrays. A consequence of the +missing value model is that any NA value in an array will cause the +output including that value to become NA.:: + + >>> a = np.array([[1,2,np.NA,3], [0,np.NA,1,1]]) + >>> a.sum(axis=0) + array([1, NA, NA, 4]) + >>> a.sum(axis=1) + array([NA, NA], dtype=int64) + +This is not always the desired result, so NumPy includes a parameter +*skipna* which causes the NA values to be skipped during computation.:: + + >>> a = np.array([[1,2,np.NA,3], [0,np.NA,1,1]]) + >>> a.sum(axis=0, skipna=True) + array([1, 2, 1, 4]) + >>> a.sum(axis=1, skipna=True) + array([6, 2]) + +Iterating Over NA-Masked Arrays +=============================== + +The :class:`nditer` object can be used to iterate over arrays with +NA values just like over normal arrays.:: + + >>> a = np.array([1,3,np.NA]) + >>> for x in np.nditer(a): + ... print x, + ... + 1 3 NA + >>> b = np.zeros(3, maskna=True) + >>> for x, y in np.nditer([a,b], op_flags=[['readonly'], + ... ['writeonly']]): + ... y[...] = -x + ... + >>> b + array([-1., -3., NA]) + +When using the C-API version of the nditer, one must explicitly +add the NPY_ITER_USE_MASKNA flag and take care to deal with the NA +mask appropriately. In the Python exposure, this flag is added +automatically. + +Planned Future Additions +======================== + +The NA support in 1.7 is fairly preliminary, and is focused on getting +the basics solid. This particularly meant getting the API in C refined +to a level where adding NA support to all of NumPy and to third party +software using NumPy would be a reasonable task. + +The biggest missing feature within the core is supporting NA values with +structured arrays. The design for this involves a mask slot for each +field in the structured array, motivated by the fact that many important +uses of structured arrays involve treating the structured fields like +another dimension. + +Another feature that was discussed during the design process is the ability +to support more than one NA value. The design created supports this multi-NA +idea with the addition of a payload to the NA value and to the NA-mask. +The API has been designed in such a way that adding this feature in a future +release should be possible without changing existing API functions in any way. + +To see a more complete list of what is supported and unsupported in the +1.7 release of NumPy, please refer to the release notes. + +During the design phase of this feature, two implementation approaches +for NA values were discussed, called "mask" and "bitpattern". What +has been implemented is the "mask" approach, but the design document, +or "NEP", describes a way both approaches could co-operatively exist +in NumPy, since each has both pros and cons. This design document is +available in the file "doc/neps/missing-data.rst" of the NumPy source +code. diff --git a/doc/source/reference/arrays.rst b/doc/source/reference/arrays.rst index 40c9f755d..91b43132a 100644 --- a/doc/source/reference/arrays.rst +++ b/doc/source/reference/arrays.rst @@ -44,6 +44,7 @@ of also more complicated arrangements of data. arrays.indexing arrays.nditer arrays.classes + arrays.maskna maskedarray arrays.interface arrays.datetime diff --git a/doc/source/reference/c-api.array.rst b/doc/source/reference/c-api.array.rst index 4b57945e9..46a215a12 100644 --- a/doc/source/reference/c-api.array.rst +++ b/doc/source/reference/c-api.array.rst @@ -21,13 +21,30 @@ Array structure and data access ------------------------------- These macros all access the :ctype:`PyArrayObject` structure members. The input -argument, obj, can be any :ctype:`PyObject *` that is directly interpretable +argument, arr, can be any :ctype:`PyObject *` that is directly interpretable as a :ctype:`PyArrayObject *` (any instance of the :cdata:`PyArray_Type` and its sub-types). -.. cfunction:: void *PyArray_DATA(PyObject *obj) +.. cfunction:: int PyArray_NDIM(PyArrayObject *arr) -.. cfunction:: char *PyArray_BYTES(PyObject *obj) + The number of dimensions in the array. + +.. cfunction:: npy_intp *PyArray_DIMS(PyArrayObject *arr) + + Returns a pointer to the dimensions/shape of the array. The + number of elements matches the number of dimensions + of the array. + +.. cfunction:: npy_intp *PyArray_SHAPE(PyArrayObject *arr) + + .. versionadded:: 1.7 + + A synonym for PyArray_DIMS, named to be consistent with the + 'shape' usage within Python. + +.. cfunction:: void *PyArray_DATA(PyArrayObject *arr) + +.. cfunction:: char *PyArray_BYTES(PyArrayObject *arr) These two macros are similar and obtain the pointer to the data-buffer for the array. The first macro can (and should be) @@ -36,19 +53,21 @@ sub-types). array then be sure you understand how to access the data in the array to avoid memory and/or alignment problems. -.. cfunction:: npy_intp *PyArray_DIMS(PyObject *arr) +.. cfunction:: npy_intp *PyArray_STRIDES(PyArrayObject* arr) -.. cfunction:: npy_intp *PyArray_STRIDES(PyObject* arr) + Returns a pointer to the strides of the array. The + number of elements matches the number of dimensions + of the array. -.. cfunction:: npy_intp PyArray_DIM(PyObject* arr, int n) +.. cfunction:: npy_intp PyArray_DIM(PyArrayObject* arr, int n) Return the shape in the *n* :math:`^{\textrm{th}}` dimension. -.. cfunction:: npy_intp PyArray_STRIDE(PyObject* arr, int n) +.. cfunction:: npy_intp PyArray_STRIDE(PyArrayObject* arr, int n) Return the stride in the *n* :math:`^{\textrm{th}}` dimension. -.. cfunction:: PyObject *PyArray_BASE(PyObject* arr) +.. cfunction:: PyObject *PyArray_BASE(PyArrayObject* arr) This returns the base object of the array. In most cases, this means the object which owns the memory the array is pointing at. @@ -62,40 +81,97 @@ sub-types). be copied upon destruction. This overloading of the base property for two functions is likely to change in a future version of NumPy. -.. cfunction:: PyArray_Descr *PyArray_DESCR(PyObject* arr) +.. cfunction:: PyArray_Descr *PyArray_DESCR(PyArrayObject* arr) + + Returns a borrowed reference to the dtype property of the array. + +.. cfunction:: PyArray_Descr *PyArray_DTYPE(PyArrayObject* arr) + + .. versionadded:: 1.7 + + A synonym for PyArray_DESCR, named to be consistent with the + 'dtype' usage within Python. + +.. cfunction:: npy_bool PyArray_HASMASKNA(PyArrayObject* arr) + + .. versionadded:: 1.7 + + Returns true if the array has an NA-mask, false otherwise. + +.. cfunction:: PyArray_Descr *PyArray_MASKNA_DTYPE(PyArrayObject* arr) + + .. versionadded:: 1.7 + + Returns a borrowed reference to the dtype property for the NA mask + of the array, or NULL if the array has no NA mask. This function does + not raise an exception when it returns NULL, it is simply returning + the appropriate field. + +.. cfunction:: char *PyArray_MASKNA_DATA(PyArrayObject* arr) -.. cfunction:: int PyArray_FLAGS(PyObject* arr) + .. versionadded:: 1.7 -.. cfunction:: int PyArray_ITEMSIZE(PyObject* arr) + Returns a pointer to the raw data for the NA mask of the array, + or NULL if the array has no NA mask. This function does + not raise an exception when it returns NULL, it is simply returning + the appropriate field. + +.. cfunction:: npy_intp *PyArray_MASKNA_STRIDES(PyArrayObject* arr) + + .. versionadded:: 1.7 + + Returns a pointer to strides of the NA mask of the array, If the + array has no NA mask, the values contained in the array will be + invalid. The shape of the NA mask is identical to the shape of the + array itself, so the number of strides is always the same as the + number of array dimensions. + +.. cfunction:: void PyArray_ENABLEFLAGS(PyArrayObject* arr, int flags) + + .. versionadded:: 1.7 + + Enables the specified array flags. This function does no validation, + and assumes that you know what you're doing. + +.. cfunction:: void PyArray_CLEARFLAGS(PyArrayObject* arr, int flags) + + .. versionadded:: 1.7 + + Clears the specified array flags. This function does no validation, + and assumes that you know what you're doing. + +.. cfunction:: int PyArray_FLAGS(PyArrayObject* arr) + +.. cfunction:: int PyArray_ITEMSIZE(PyArrayObject* arr) Return the itemsize for the elements of this array. -.. cfunction:: int PyArray_TYPE(PyObject* arr) +.. cfunction:: int PyArray_TYPE(PyArrayObject* arr) Return the (builtin) typenumber for the elements of this array. -.. cfunction:: PyObject *PyArray_GETITEM(PyObject* arr, void* itemptr) +.. cfunction:: PyObject *PyArray_GETITEM(PyArrayObject* arr, void* itemptr) Get a Python object from the ndarray, *arr*, at the location pointed to by itemptr. Return ``NULL`` on failure. -.. cfunction:: int PyArray_SETITEM(PyObject* arr, void* itemptr, PyObject* obj) +.. cfunction:: int PyArray_SETITEM(PyArrayObject* arr, void* itemptr, PyObject* obj) Convert obj and place it in the ndarray, *arr*, at the place pointed to by itemptr. Return -1 if an error occurs or 0 on success. -.. cfunction:: npy_intp PyArray_SIZE(PyObject* arr) +.. cfunction:: npy_intp PyArray_SIZE(PyArrayObject* arr) Returns the total size (in number of elements) of the array. -.. cfunction:: npy_intp PyArray_Size(PyObject* obj) +.. cfunction:: npy_intp PyArray_Size(PyArrayObject* obj) Returns 0 if *obj* is not a sub-class of bigndarray. Otherwise, returns the total number of elements in the array. Safer version of :cfunc:`PyArray_SIZE` (*obj*). -.. cfunction:: npy_intp PyArray_NBYTES(PyObject* arr) +.. cfunction:: npy_intp PyArray_NBYTES(PyArrayObject* arr) Returns the total number of bytes consumed by the array. @@ -149,48 +225,64 @@ From scratch .. cfunction:: PyObject* PyArray_NewFromDescr(PyTypeObject* subtype, PyArray_Descr* descr, int nd, npy_intp* dims, npy_intp* strides, void* data, int flags, PyObject* obj) This is the main array creation function. Most new arrays are - created with this flexible function. The returned object is an - object of Python-type *subtype*, which must be a subtype of - :cdata:`PyArray_Type`. The array has *nd* dimensions, described by - *dims*. The data-type descriptor of the new array is *descr*. If - *subtype* is not :cdata:`&PyArray_Type` (*e.g.* a Python subclass of - the ndarray), then *obj* is the object to pass to the - :obj:`__array_finalize__` method of the subclass. If *data* is - ``NULL``, then new memory will be allocated and *flags* can be - non-zero to indicate a Fortran-style contiguous array. If *data* - is not ``NULL``, then it is assumed to point to the memory to be - used for the array and the *flags* argument is used as the new - flags for the array (except the state of :cdata:`NPY_OWNDATA` and - :cdata:`NPY_ARRAY_UPDATEIFCOPY` flags of the new array will be reset). In - addition, if *data* is non-NULL, then *strides* can also be - provided. If *strides* is ``NULL``, then the array strides are - computed as C-style contiguous (default) or Fortran-style + created with this flexible function. + + The returned object is an object of Python-type *subtype*, which + must be a subtype of :cdata:`PyArray_Type`. The array has *nd* + dimensions, described by *dims*. The data-type descriptor of the + new array is *descr*. + + If *subtype* is of an array subclass instead of the base + :cdata:`&PyArray_Type`, then *obj* is the object to pass to + the :obj:`__array_finalize__` method of the subclass. + + If *data* is ``NULL``, then new memory will be allocated and *flags* + can be non-zero to indicate a Fortran-style contiguous array. If + *data* is not ``NULL``, then it is assumed to point to the memory + to be used for the array and the *flags* argument is used as the + new flags for the array (except the state of :cdata:`NPY_OWNDATA` + and :cdata:`NPY_ARRAY_UPDATEIFCOPY` flags of the new array will + be reset). + + In addition, if *data* is non-NULL, then *strides* can + also be provided. If *strides* is ``NULL``, then the array strides + are computed as C-style contiguous (default) or Fortran-style contiguous (*flags* is nonzero for *data* = ``NULL`` or *flags* & - :cdata:`NPY_ARRAY_F_CONTIGUOUS` is nonzero non-NULL *data*). Any provided - *dims* and *strides* are copied into newly allocated dimension and - strides arrays for the new array object. + :cdata:`NPY_ARRAY_F_CONTIGUOUS` is nonzero non-NULL *data*). Any + provided *dims* and *strides* are copied into newly allocated + dimension and strides arrays for the new array object. + + Because the flags are ignored when *data* is NULL, you cannot + create a new array from scratch with an NA mask. If one is desired, + call the function :cfunc:`PyArray_AllocateMaskNA` after the array + is created. .. cfunction:: PyObject* PyArray_NewLikeArray(PyArrayObject* prototype, NPY_ORDER order, PyArray_Descr* descr, int subok) .. versionadded:: 1.6 - This function steals a reference to *descr* if it is not NULL. + This function steals a reference to *descr* if it is not NULL. + + This array creation routine allows for the convenient creation of + a new array matching an existing array's shapes and memory layout, + possibly changing the layout and/or data type. - This array creation routine allows for the convenient creation of - a new array matching an existing array's shapes and memory layout, - possibly changing the layout and/or data type. + When *order* is :cdata:`NPY_ANYORDER`, the result order is + :cdata:`NPY_FORTRANORDER` if *prototype* is a fortran array, + :cdata:`NPY_CORDER` otherwise. When *order* is + :cdata:`NPY_KEEPORDER`, the result order matches that of *prototype*, even + when the axes of *prototype* aren't in C or Fortran order. - When *order* is :cdata:`NPY_ANYORDER`, the result order is - :cdata:`NPY_FORTRANORDER` if *prototype* is a fortran array, - :cdata:`NPY_CORDER` otherwise. When *order* is - :cdata:`NPY_KEEPORDER`, the result order matches that of *prototype*, even - when the axes of *prototype* aren't in C or Fortran order. + If *descr* is NULL, the data type of *prototype* is used. - If *descr* is NULL, the data type of *prototype* is used. + If *subok* is 1, the newly created array will use the sub-type of + *prototype* to create the new array, otherwise it will create a + base-class array. - If *subok* is 1, the newly created array will use the sub-type of - *prototype* to create the new array, otherwise it will create a - base-class array. + The newly allocated array does not have an NA mask even if the + *prototype* provided does. If an NA mask is desired in the array, + call the function :cfunc:`PyArray_AllocateMaskNA` after the array + is created. .. cfunction:: PyObject* PyArray_New(PyTypeObject* subtype, int nd, npy_intp* dims, int type_num, npy_intp* strides, void* data, int itemsize, int flags, PyObject* obj) @@ -278,7 +370,9 @@ From scratch increments of ``step``. Equivalent to arange( ``start``, ``stop``, ``step``, ``typenum`` ). -.. cfunction:: int PyArray_SetBaseObject(PyArrayObject *arr, PyObject *obj) +.. cfunction:: int PyArray_SetBaseObject(PyArrayObject* arr, PyObject* obj) + + .. versionadded:: 1.7 If you construct an array by passing in your own memory buffer as a parameter, you need to set the array's `base` property to ensure @@ -377,6 +471,31 @@ From other objects with, then an error is raised. If *op* is not already an array, then this flag has no effect. + .. cvar:: NPY_ARRAY_MASKNA + + .. versionadded:: 1.7 + + Make sure the array has an NA mask associated with its data. + + .. cvar:: NPY_ARRAY_OWNMASKNA + + .. versionadded:: 1.7 + + Make sure the array has an NA mask which it owns + associated with its data. + + .. cvar:: NPY_ARRAY_ALLOWNA + + .. versionadded:: 1.7 + + To prevent simple errors from slipping in, arrays with NA + masks are not permitted to pass through by default. Instead + an exception is raised indicating the operation doesn't support + NA masks yet. In order to enable NA mask support, this flag + must be passed in to allow the NA mask through, signalling that + the later code is written appropriately to handle NA mask + semantics. + .. cvar:: NPY_ARRAY_BEHAVED :cdata:`NPY_ARRAY_ALIGNED` \| :cdata:`NPY_ARRAY_WRITEABLE` @@ -1292,6 +1411,24 @@ or :cdata:`NPY_ARRAY_F_CONTIGUOUS` can be determined by the ``strides``, would have returned an error because :cdata:`NPY_ARRAY_UPDATEIFCOPY` would not have been possible. +.. cvar:: NPY_ARRAY_MASKNA + + If this flag is enabled, the array has an NA mask associated with + the data. C code which interacts with the NA mask must follow + specific semantic rules about when to overwrite data and when not + to. The mask can be accessed through the functions + :cfunc:`PyArray_MASKNA_DTYPE`, :cfunc:`PyArray_MASKNA_DATA`, and + :cfunc:`PyArray_MASKNA_STRIDES`. + +.. cvar:: NPY_ARRAY_OWNMASKNA + + If this flag is enabled, the array owns its own NA mask. If it is not + enabled, the NA mask is a view into a different array's NA mask. + + In order to ensure that an array owns its own NA mask, you can + call :cfunc:`PyArray_AllocateMaskNA` with the parameter *ownmaskna* + set to 1. + :cfunc:`PyArray_UpdateFlags` (obj, flags) will update the ``obj->flags`` for ``flags`` which can be any of :cdata:`NPY_ARRAY_C_CONTIGUOUS`, :cdata:`NPY_ARRAY_F_CONTIGUOUS`, :cdata:`NPY_ARRAY_ALIGNED`, or @@ -1536,18 +1673,20 @@ Conversion copied into every location. A -1 is returned if an error occurs, otherwise 0 is returned. -.. cfunction:: PyObject* PyArray_View(PyArrayObject* self, PyArray_Descr* dtype) +.. cfunction:: PyObject* PyArray_View(PyArrayObject* self, PyArray_Descr* dtype, PyTypeObject *ptype) - Equivalent to :meth:`ndarray.view` (*self*, *dtype*). Return a new view of - the array *self* as possibly a different data-type, *dtype*. If - *dtype* is ``NULL``, then the returned array will have the same - data type as *self*. The new data-type must be consistent with - the size of *self*. Either the itemsizes must be identical, or - *self* must be single-segment and the total number of bytes must - be the same. In the latter case the dimensions of the returned - array will be altered in the last (or first for Fortran-style - contiguous arrays) dimension. The data area of the returned array - and self is exactly the same. + Equivalent to :meth:`ndarray.view` (*self*, *dtype*). Return a new + view of the array *self* as possibly a different data-type, *dtype*, + and different array subclass *ptype*. + + If *dtype* is ``NULL``, then the returned array will have the same + data type as *self*. The new data-type must be consistent with the + size of *self*. Either the itemsizes must be identical, or *self* must + be single-segment and the total number of bytes must be the same. + In the latter case the dimensions of the returned array will be + altered in the last (or first for Fortran-style contiguous arrays) + dimension. The data area of the returned array and self is exactly + the same. Shape Manipulation @@ -2152,50 +2291,6 @@ an element copier function as a primitive.:: A macro which calls the auxdata's clone function appropriately, returning a deep copy of the auxiliary data. -Masks for Selecting Elements to Modify --------------------------------------- - -.. versionadded:: 1.7.0 - -The array iterator, :ctype:`NpyIter`, has some new flags which -allow control over which elements are intended to be modified, -providing the ability to do masking even when doing casts to a buffer -of a different type. Some inline functions have been added -to facilitate consistent usage of these masks. - -A mask dtype can be one of three different possibilities. It can -be :cdata:`NPY_BOOL`, :cdata:`NPY_MASK`, or a struct dtype whose -fields are all mask dtypes. - -A mask of :cdata:`NPY_BOOL` can just indicate True, with underlying -value 1, for an element that is exposed, and False, with underlying -value 0, for an element that is hidden. - -A mask of :cdata:`NPY_MASK` can additionally carry a payload which -is a value from 0 to 127. This allows for missing data implementations -based on such masks to support multiple reasons for data being missing. - -A mask of a struct dtype can only pair up with another struct dtype -with the same field names. In this way, each field of the mask controls -the masking for the corresponding field in the associated data array. - -Inline functions to work with masks are as follows. - -.. cfunction:: npy_bool NpyMask_IsExposed(npy_mask mask) - - Returns true if the data element corresponding to the mask element - can be modified, false if not. - -.. cfunction:: npy_uint8 NpyMask_GetPayload(npy_mask mask) - - Returns the payload contained in the mask. The return value - is between 0 and 127. - -.. cfunction:: npy_mask NpyMask_Create(npy_bool exposed, npy_int8 payload) - - Creates a mask from a flag indicating whether the element is exposed - or not and a payload value. - Array Iterators --------------- @@ -2442,6 +2537,9 @@ Array Scalars if so, returns the appropriate array scalar. It should be used whenever 0-dimensional arrays could be returned to Python. + If *arr* is a 0-dimensional NA-masked array with its value hidden, + an instance of :ctype:`NpyNA *` is returned. + .. cfunction:: PyObject* PyArray_Scalar(void* data, PyArray_Descr* dtype, PyObject* itemsize) Return an array scalar object of the given enumerated *typenum* @@ -2654,6 +2752,19 @@ to. . No matter what is returned, you must DECREF the object returned by this routine in *address* when you are done with it. + If the input is an array with NA support, this will either raise + an error if it contains any NAs, or will make a copy of the array + without NA support if it does not contain any NAs. Use the function + :cfunc:`PyArray_AllowNAConverter` to support NA-arrays directly + and more efficiently. + +.. cfunction:: int PyArray_AllowConverter(PyObject* obj, PyObject** address) + + This is the same as :cfunc:`PyArray_Converter`, but allows arrays + with NA support to pass through untouched. This function was created + so that the existing converter could raise errors appropriately + for functions which have not been updated with NA support + .. cfunction:: int PyArray_OutputConverter(PyObject* obj, PyArrayObject** address) This is a default converter for output arrays given to @@ -2662,6 +2773,17 @@ to. *obj*) is TRUE then it is returned in *\*address* without incrementing its reference count. + If the output is an array with NA support, this will raise an error. + Use the function :cfunc:`PyArray_OutputAllowNAConverter` to support + NA-arrays directly. + +.. cfunction:: int PyArray_OutputAllowNAConverter(PyObject* obj, PyArrayObject** address) + + This is the same as :cfunc:`PyArray_OutputConverter`, but allows arrays + with NA support to pass through. This function was created + so that the existing output converter could raise errors appropriately + for functions which have not been updated with NA support + .. cfunction:: int PyArray_IntpConverter(PyObject* obj, PyArray_Dims* seq) Convert any Python sequence, *obj*, smaller than :cdata:`NPY_MAXDIMS` diff --git a/doc/source/reference/c-api.iterator.rst b/doc/source/reference/c-api.iterator.rst index 01385acfd..0915f765e 100644 --- a/doc/source/reference/c-api.iterator.rst +++ b/doc/source/reference/c-api.iterator.rst @@ -551,9 +551,10 @@ Construction and Destruction .. cvar:: NPY_ITER_ALLOCATE This is for output arrays, and requires that the flag - :cdata:`NPY_ITER_WRITEONLY` be set. If ``op[i]`` is NULL, - creates a new array with the final broadcast dimensions, - and a layout matching the iteration order of the iterator. + :cdata:`NPY_ITER_WRITEONLY` or :cdata:`NPY_ITER_READWRITE` + be set. If ``op[i]`` is NULL, creates a new array with + the final broadcast dimensions, and a layout matching + the iteration order of the iterator. When ``op[i]`` is NULL, the requested data type ``op_dtypes[i]`` may be NULL as well, in which case it is @@ -586,6 +587,8 @@ Construction and Destruction .. cvar:: NPY_ITER_ARRAYMASK + .. versionadded:: 1.7 + Indicates that this operand is the mask to use for selecting elements when writing to operands which have the :cdata:`NPY_ITER_WRITEMASKED` flag applied to them. @@ -609,6 +612,8 @@ Construction and Destruction .. cvar:: NPY_ITER_WRITEMASKED + .. versionadded:: 1.7 + Indicates that only elements which the operand with the ARRAYMASK flag indicates are intended to be modified by the iteration. In general, the iterator does not enforce @@ -624,6 +629,35 @@ Construction and Destruction returns true from the corresponding element in the ARRAYMASK operand. + .. cvar:: NPY_ITER_USE_MASKNA + + .. versionadded:: 1.7 + + Adds a new operand to the end of the operand list which + iterates over the mask of this operand. If this operand has + no mask and is read-only, it broadcasts a constant + one-valued mask to indicate every value is valid. If this + operand has no mask and is writeable, an error is raised. + + Each operand which has this flag applied to it causes + an additional operand to be tacked on the end of the operand + list, in an order matching that of the operand array. + For example, if there are four operands, and operands with index + one and three have the flag :cdata:`NPY_ITER_USE_MASKNA` + specified, there will be six operands total, and they will + look like [op0, op1, op2, op3, op1_mask, op3_mask]. + + .. cvar:: NPY_ITER_IGNORE_MASKNA + + .. versionadded:: 1.7 + + Under some circumstances, code doing an iteration will + have already called :cfunc:`PyArray_ContainsNA` on an + operand which has a mask, and seen that its return value + was false. When this occurs, it is safe to do the iteration + without simultaneously iterating over the mask, and this + flag allows that to be done. + .. cfunction:: NpyIter* NpyIter_AdvancedNew(npy_intp nop, PyArrayObject** op, npy_uint32 flags, NPY_ORDER order, NPY_CASTING casting, npy_uint32* op_flags, PyArray_Descr** op_dtypes, int oa_ndim, int** op_axes, npy_intp* itershape, npy_intp buffersize) Extends :cfunc:`NpyIter_MultiNew` with several advanced options providing @@ -955,6 +989,19 @@ Construction and Destruction Returns the number of operands in the iterator. + When :cdata:`NPY_ITER_USE_MASKNA` is used on an operand, a new + operand is added to the end of the operand list in the iterator + to track that operand's NA mask. Thus, this equals the number + of construction operands plus the number of operands for + which the flag :cdata:`NPY_ITER_USE_MASKNA` was specified. + +.. cfunction:: int NpyIter_GetFirstMaskNAOp(NpyIter* iter) + + .. versionadded:: 1.7 + + Returns the index of the first NA mask operand in the array. This + value is equal to the number of operands passed into the constructor. + .. cfunction:: npy_intp* NpyIter_GetAxisStrideArray(NpyIter* iter, int axis) Gets the array of strides for the specified axis. Requires that @@ -991,6 +1038,16 @@ Construction and Destruction that are being iterated. The result points into ``iter``, so the caller does not gain any references to the PyObjects. +.. cfunction:: npy_int8* NpyIter_GetMaskNAIndexArray(NpyIter* iter) + + .. versionadded:: 1.7 + + This gives back a pointer to the ``nop`` indices which map + construction operands with :cdata:`NPY_ITER_USE_MASKNA` flagged + to their corresponding NA mask operands and vice versa. For + operands which were not flagged with :cdata:`NPY_ITER_USE_MASKNA`, + this array contains negative values. + .. cfunction:: PyObject* NpyIter_GetIterView(NpyIter* iter, npy_intp i) This gives back a reference to a new ndarray view, which is a view @@ -1041,6 +1098,30 @@ Construction and Destruction Returns ``NPY_SUCCEED`` or ``NPY_FAIL``. +.. cfunction:: npy_bool NpyIter_IsFirstVisit(NpyIter* iter, int iop) + + .. versionadded:: 1.7 + + Checks to see whether this is the first time the elements of the + specified reduction operand which the iterator points at are being + seen for the first time. The function returns a reasonable answer + for reduction operands and when buffering is disabled. The answer + may be incorrect for buffered non-reduction operands. + + This function is intended to be used in EXTERNAL_LOOP mode only, + and will produce some wrong answers when that mode is not enabled. + + If this function returns true, the caller should also check the inner + loop stride of the operand, because if that stride is 0, then only + the first element of the innermost external loop is being visited + for the first time. + + *WARNING*: For performance reasons, 'iop' is not bounds-checked, + it is not confirmed that 'iop' is actually a reduction operand, + and it is not confirmed that EXTERNAL_LOOP mode is enabled. These + checks are the responsibility of the caller, and should be done + outside of any inner loops. + Functions For Iteration ----------------------- diff --git a/doc/source/reference/c-api.maskna.rst b/doc/source/reference/c-api.maskna.rst new file mode 100644 index 000000000..6abb624eb --- /dev/null +++ b/doc/source/reference/c-api.maskna.rst @@ -0,0 +1,592 @@ +Array NA Mask API +================== + +.. sectionauthor:: Mark Wiebe + +.. index:: + pair: maskna; C-API + pair: C-API; maskna + +.. versionadded:: 1.7 + +NA Masks in Arrays +------------------ + +NumPy supports the idea of NA (Not Available) missing values in its +arrays. In the design document leading up to the implementation, two +mechanisms for this were proposed, NA masks and NA bitpatterns. NA masks +have been implemented as the first representation of these values. This +mechanism supports working with NA values similar to what the R language +provides, and when combined with views, allows one to temporarily mark +elements as NA without affecting the original data. + +The C API has been updated with mechanisms to allow NumPy extensions +to work with these masks, and this document provides some examples and +reference for the NA mask-related functions. + +The NA Object +------------- + +The main *numpy* namespace in Python has a new object called *NA*. +This is an instance of :ctype:`NpyNA`, which is a Python object +representing an NA value. This object is analogous to the NumPy +scalars, and is returned by :cfunc:`PyArray_Return` instead of +a scalar where appropriate. + +The global *numpy.NA* object is accessible from C as :cdata:`Npy_NA`. +This is an NA value with no data type or multi-NA payload. Use it +just as you would Py_None, except use :cfunc:`NpyNA_Check` to +see if an object is an :ctype:`NpyNA`, because :cdata:`Npy_NA` isn't +the only instance of NA possible. + +If you want to see whether a general PyObject* is NA, you should +use the API function :cfunc:`NpyNA_FromObject` with *suppress_error* +set to true. If this returns NULL, the object is not an NA, and if +it returns an NpyNA instance, the object is NA and you can then +access its *dtype* and *payload* fields as needed. + +To make new :ctype:`NpyNA` objects, use +:cfunc:`NpyNA_FromDTypeAndPayload`. The functions +:cfunc:`NpyNA_GetDType`, :cfunc:`NpyNA_IsMultiNA`, and +:cfunc:`NpyNA_GetPayload` provide access to the data members. + +Working With NA-Masked Arrays +----------------------------- + +The starting point for many C-API functions which manipulate NumPy +arrays is the function :cfunc:`PyArray_FromAny`. This function converts +a general PyObject* object into a NumPy ndarray, based on options +specified in the flags. To avoid surprises, this function does +not allow NA-masked arrays to pass through by default. + +To allow third-party code to work with NA-masked arrays which contain +no NAs, :cfunc:`PyArray_FromAny` will make a copy of the array into +a new array without an NA-mask, and return that. This allows for +proper interoperability in cases where it's possible until functions +are updated to provide optimal code paths for NA-masked arrays. + +To update a function with NA-mask support, add the flag +:cdata:`NPY_ARRAY_ALLOWNA` when calling :cfunc:`PyArray_FromAny`. +This allows NA-masked arrays to pass through untouched, and will +convert PyObject lists containing NA values into NA-masked arrays +instead of the alternative of switching to object arrays. + +To check whether an array has an NA-mask, use the function +:cfunc:`PyArray_HASMASKNA`, which checks the appropriate flag. +There are a number of things that one will typically want to do +when encountering an NA-masked array. We'll go through a few +of these cases. + +Forbidding Any NA Values +~~~~~~~~~~~~~~~~~~~~~~~~ + +The simplest case is to forbid any NA values. Note that it is better +to still be aware of the NA mask and explicitly test for NA values +than to leave out the :cdata:`NPY_ARRAY_ALLOWNA`, because it is possible +to avoid the extra copy that :cfunc:`PyArray_FromAny` will make. The +check for NAs will go something like this:: + + PyArrayObject *arr = ...; + int containsna; + + /* ContainsNA checks HASMASKNA() for you */ + containsna = PyArray_ContainsNA(arr, NULL, NULL); + /* Error case */ + if (containsna < 0) { + return NULL; + } + /* If it found an NA */ + else if (containsna) { + PyErr_SetString(PyExc_ValueError, + "this operation does not support arrays with NA values"); + return NULL; + } + +After this check, you can be certain that the array doesn't contain any +NA values, and can proceed accordingly. For example, if you iterate +over the elements of the array, you may pass the flag +:cdata:`NPY_ITER_IGNORE_MASKNA` to iterate over the data without +touching the NA-mask at all. + +Manipulating NA Values +~~~~~~~~~~~~~~~~~~~~~~ + +The semantics of the NA-mask demand that whenever an array element +is hidden by the NA-mask, no computations are permitted to modify +the data backing that element. The :ctype:`NpyIter` provides +a number of flags to assist with visiting both the array data +and the mask data simultaneously, and preserving the masking semantics +even when buffering is required. + +The main flag for iterating over NA-masked arrays is +:cdata:`NPY_ITER_USE_MASKNA`. For each iterator operand which has this +flag specified, a new operand is added to the end of the iterator operand +list, and is set to iterate over the original operand's NA-mask. Operands +which do not have an NA mask are permitted as well when they are flagged +as read-only. The new operand in this case points to a single exposed +mask value and all its strides are zero. The latter feature is useful +when combining multiple read-only inputs, where some of them have masks. + +Accumulating NA Values +~~~~~~~~~~~~~~~~~~~~~~ + +More complex operations, like the NumPy ufunc reduce functions, need +to take extra care to follow the masking semantics. If we accumulate +the NA mask and the data values together, we could discover half way +through that the output is NA, and that we have violated the contract +to never change the underlying output value when it is being assigned +NA. + +The solution to this problem is to first accumulate the NA-mask as necessary +to produce the output's NA-mask, then accumulate the data values without +touching NA-masked values in the output. The parameter *preservena* in +functions like :cfunc:`PyArray_AssignArray` can assist when initializing +values in such an algorithm. + +Example NA-Masked Operation in C +-------------------------------- + +As an example, let's implement a simple binary NA-masked operation +for the double dtype. We'll make a divide operation which turns +divide by zero into NA instead of Inf or NaN. + +To start, we define the function prototype and some basic +:ctype:`NpyIter` boilerplate setup. We'll make a function which +supports an optional *out* parameter, which may be NULL.:: + + static PyArrayObject* + SpecialDivide(PyArrayObject* a, PyArrayObject* b, PyArrayObject *out) + { + NpyIter *iter = NULL; + PyArrayObject *op[3]; + PyArray_Descr *dtypes[3]; + npy_uint32 flags, op_flags[3]; + + /* Iterator construction parameters */ + op[0] = a; + op[1] = b; + op[2] = out; + + dtypes[0] = PyArray_DescrFromType(NPY_DOUBLE); + if (dtypes[0] == NULL) { + return NULL; + } + dtypes[1] = dtypes[0]; + dtypes[2] = dtypes[0]; + + flags = NPY_ITER_BUFFERED | + NPY_ITER_EXTERNAL_LOOP | + NPY_ITER_GROWINNER | + NPY_ITER_REFS_OK | + NPY_ITER_ZEROSIZE_OK; + + /* Every operand gets the flag NPY_ITER_USE_MASKNA */ + op_flags[0] = NPY_ITER_READONLY | + NPY_ITER_ALIGNED | + NPY_ITER_USE_MASKNA; + op_flags[1] = op_flags[0]; + op_flags[2] = NPY_ITER_WRITEONLY | + NPY_ITER_ALIGNED | + NPY_ITER_USE_MASKNA | + NPY_ITER_NO_BROADCAST | + NPY_ITER_ALLOCATE; + + iter = NpyIter_MultiNew(3, op, flags, NPY_KEEPORDER, + NPY_SAME_KIND_CASTING, op_flags, dtypes); + /* Don't need the dtype reference anymore */ + Py_DECREF(dtypes[0]); + if (iter == NULL) { + return NULL; + } + +At this point, the input operands have been validated according to +the casting rule, the shapes of the arrays have been broadcast together, +and any buffering necessary has been prepared. This means we can +dive into the inner loop of this function.:: + + ... + if (NpyIter_GetIterSize(iter) > 0) { + NpyIter_IterNextFunc *iternext; + char **dataptr; + npy_intp *stridesptr, *countptr; + + /* Variables needed for looping */ + iternext = NpyIter_GetIterNext(iter, NULL); + if (iternext == NULL) { + NpyIter_Deallocate(iter); + return NULL; + } + dataptr = NpyIter_GetDataPtrArray(iter); + stridesptr = NpyIter_GetInnerStrideArray(iter); + countptr = NpyIter_GetInnerLoopSizePtr(iter); + +The loop gets a bit messy when dealing with NA-masks, because it +doubles the number of operands being processed in the iterator. Here +we are naming things clearly so that the content of the innermost loop +can be easy to work with.:: + + ... + do { + /* Data pointers and strides needed for innermost loop */ + char *data_a = dataptr[0], *data_b = dataptr[1]; + char *data_out = dataptr[2]; + char *maskna_a = dataptr[3], *maskna_b = dataptr[4]; + char *maskna_out = dataptr[5]; + npy_intp stride_a = stridesptr[0], stride_b = stridesptr[1]; + npy_intp stride_out = strides[2]; + npy_intp maskna_stride_a = stridesptr[3]; + npy_intp maskna_stride_b = stridesptr[4]; + npy_intp maskna_stride_out = stridesptr[5]; + npy_intp i, count = *countptr; + + for (i = 0; i < count; ++i) { + +Here is the code for performing one special division. We use +the functions :cfunc:`NpyMaskValue_IsExposed` and +:cfunc:`NpyMaskValue_Create` to work with the masks, in order to be +as general as possible. These are inline functions, and the compiler +optimizer should be able to produce the same result as if you performed +these operations directly inline here.:: + + ... + /* If neither of the inputs are NA */ + if (NpyMaskValue_IsExposed((npy_mask)*maskna_a) && + NpyMaskValue_IsExposed((npy_mask)*maskna_b)) { + double a_val = *(double *)data_a; + double b_val = *(double *)data_b; + /* Do the divide if 'b' isn't zero */ + if (b_val != 0.0) { + *(double *)data_out = a_val / b_val; + /* Need to also set this element to exposed */ + *maskna_out = NpyMaskValue_Create(1, 0); + } + /* Otherwise output an NA without touching its data */ + else { + *maskna_out = NpyMaskValue_Create(0, 0); + } + } + /* Turn the output into NA without touching its data */ + else { + *maskna_out = NpyMaskValue_Create(0, 0); + } + + data_a += stride_a; + data_b += stride_b; + data_out += stride_out; + maskna_a += maskna_stride_a; + maskna_b += maskna_stride_b; + maskna_out += maskna_stride_out; + } + } while (iternext(iter)); + } + +A little bit more boilerplate for returning the result from the iterator, +and the function is done.:: + + ... + if (out == NULL) { + out = NpyIter_GetOperandArray(iter)[2]; + } + Py_INCREF(out); + NpyIter_Deallocate(iter); + + return out; + } + +To run this example, you can create a simple module with a C-file spdiv_mod.c +consisting of:: + + #include <Python.h> + #include <numpy/arrayobject.h> + + /* INSERT SpecialDivide source code here */ + + static PyObject * + spdiv(PyObject *self, PyObject *args, PyObject *kwds) + { + PyArrayObject *a, *b, *out = NULL; + static char *kwlist[] = {"a", "b", "out", NULL}; + + if (!PyArg_ParseTupleAndKeywords(args, kwds, "O&O&|O&", kwlist, + &PyArray_AllowNAConverter, &a, + &PyArray_AllowNAConverter, &b, + &PyArray_OutputAllowNAConverter, &out)) { + return NULL; + } + + /* + * The usual NumPy way is to only use PyArray_Return when + * the 'out' parameter is not provided. + */ + if (out == NULL) { + return PyArray_Return(SpecialDivide(a, b, out)); + } + else { + return (PyObject *)SpecialDivide(a, b, out); + } + } + + static PyMethodDef SpDivMethods[] = { + {"spdiv", (PyCFunction)spdiv, METH_VARARGS | METH_KEYWORDS, NULL}, + {NULL, NULL, 0, NULL} + }; + + + PyMODINIT_FUNC initspdiv_mod(void) + { + PyObject *m; + + m = Py_InitModule("spdiv_mod", SpDivMethods); + if (m == NULL) { + return; + } + + /* Make sure NumPy is initialized */ + import_array(); + } + +Create a setup.py file like:: + + #!/usr/bin/env python + def configuration(parent_package='',top_path=None): + from numpy.distutils.misc_util import Configuration + config = Configuration('.',parent_package,top_path) + config.add_extension('spdiv_mod',['spdiv_mod.c']) + return config + + if __name__ == "__main__": + from numpy.distutils.core import setup + setup(configuration=configuration) + +With these two files in a directory by itself, run:: + + $ python setup.py build_ext --inplace + +and the file spdiv_mod.so (or .dll) will be placed in the same directory. +Now you can try out this sample, to see how it behaves.:: + + >>> import numpy as np + >>> from spdiv_mod import spdiv + +Because we used :cfunc:`PyArray_Return` when wrapping SpecialDivide, +it returns scalars like any typical NumPy function does:: + + >>> spdiv(1, 2) + 0.5 + >>> spdiv(2, 0) + NA(dtype='float64') + >>> spdiv(np.NA, 1.5) + NA(dtype='float64') + +Here we can see how NAs propagate, and how 0 in the output turns into NA +as desired.:: + + >>> a = np.arange(6) + >>> b = np.array([0,np.NA,0,2,1,0]) + >>> spdiv(a, b) + array([ NA, NA, NA, 1.5, 4. , NA]) + +Finally, we can see the masking behavior by creating a masked +view of an array. The ones in *c_orig* are preserved whereever +NA got assigned.:: + + >>> c_orig = np.ones(6) + >>> c = c_orig.view(maskna=True) + >>> spdiv(a, b, out=c) + array([ NA, NA, NA, 1.5, 4. , NA]) + >>> c_orig + array([ 1. , 1. , 1. , 1.5, 4. , 1. ]) + +NA Object Data Type +------------------- + +.. ctype:: NpyNA + + This is the C object corresponding to objects of type + numpy.NAType. The fields themselves are hidden from consumers of the + API, you must use the functions provided to create new NA objects + and get their properties. + + This object contains two fields, a :ctype:`PyArray_Descr *` dtype + which is either NULL or indicates the data type the NA represents, + and a payload which is there for the future addition of multi-NA support. + +.. cvar:: Npy_NA + + This is a global singleton, similar to Py_None, which is the + *numpy.NA* object. Note that unlike Py_None, multiple NAs may be + created, for instance with different multi-NA payloads or with + different dtypes. If you want to return an NA with no payload + or dtype, return a new reference to Npy_NA. + +NA Object Functions +------------------- + +.. cfunction:: NpyNA_Check(obj) + + Evaluates to true if *obj* is an instance of :ctype:`NpyNA`. + +.. cfunction:: PyArray_Descr* NpyNA_GetDType(NpyNA* na) + + Returns the *dtype* field of the NA object, which is NULL when + the NA has no dtype. Does not raise an error. + +.. cfunction:: npy_bool NpyNA_IsMultiNA(NpyNA* na) + + Returns true if the NA has a multi-NA payload, false otherwise. + +.. cfunction:: int NpyNA_GetPayload(NpyNA* na) + + Gets the multi-NA payload of the NA, or 0 if *na* doesn't have + a multi-NA payload. + +.. cfunction:: NpyNA* NpyNA_FromObject(PyObject* obj, int suppress_error) + + If *obj* represents an object which is NA, for example if it + is an :ctype:`NpyNA`, or a zero-dimensional NA-masked array with + its value hidden by the mask, returns a new reference to an + :ctype:`NpyNA` object representing *obj*. Otherwise returns + NULL. + + If *suppress_error* is true, this function doesn't raise an exception + when the input isn't NA and it returns NULL, otherwise it does. + +.. cfunction:: NpyNA* NpyNA_FromDTypeAndPayload(PyArray_Descr *dtype, int multina, int payload) + + + Constructs a new :ctype:`NpyNA` instance with the specified *dtype* + and *payload*. For an NA with no dtype, provide NULL in *dtype*. + + Until multi-NA is implemented, just pass 0 for both *multina* + and *payload*. + +NA Mask Functions +----------------- + +A mask dtype can be one of three different possibilities. It can +be :cdata:`NPY_BOOL`, :cdata:`NPY_MASK`, or a struct dtype whose +fields are all mask dtypes. + +A mask of :cdata:`NPY_BOOL` can just indicate True, with underlying +value 1, for an element that is exposed, and False, with underlying +value 0, for an element that is hidden. + +A mask of :cdata:`NPY_MASK` can additionally carry a payload which +is a value from 0 to 127. This allows for missing data implementations +based on such masks to support multiple reasons for data being missing. + +A mask of a struct dtype can only pair up with another struct dtype +with the same field names. In this way, each field of the mask controls +the masking for the corresponding field in the associated data array. + +Inline functions to work with masks are as follows. + +.. cfunction:: npy_bool NpyMaskValue_IsExposed(npy_mask mask) + + Returns true if the data element corresponding to the mask element + can be modified, false if not. + +.. cfunction:: npy_uint8 NpyMaskValue_GetPayload(npy_mask mask) + + Returns the payload contained in the mask. The return value + is between 0 and 127. + +.. cfunction:: npy_mask NpyMaskValue_Create(npy_bool exposed, npy_int8 payload) + + Creates a mask from a flag indicating whether the element is exposed + or not and a payload value. + +NA Mask Array Functions +----------------------- + +.. cfunction:: int PyArray_AllocateMaskNA(PyArrayObject *arr, npy_bool ownmaskna, npy_bool multina, npy_mask defaultmask) + + Allocates an NA mask for the array *arr* if necessary. If *ownmaskna* + if false, it only allocates an NA mask if none exists, but if + *ownmaskna* is true, it also allocates one if the NA mask is a view + into another array's NA mask. Here are the two most common usage + patterns:: + + /* Use this to make sure 'arr' has an NA mask */ + if (PyArray_AllocateMaskNA(arr, 0, 0, 1) < 0) { + return NULL; + } + + /* Use this to make sure 'arr' owns an NA mask */ + if (PyArray_AllocateMaskNA(arr, 1, 0, 1) < 0) { + return NULL; + } + + The parameter *multina* is provided for future expansion, when + mult-NA support is added to NumPy. This will affect the dtype of + the NA mask, which currently must be always NPY_BOOL, but will be + NPY_MASK for arrays multi-NA when this is implemented. + + When a new NA mask is allocated, and the mask needs to be filled, + it uses the value *defaultmask*. In nearly all cases, this should be set + to 1, indicating that the elements are exposed. If a mask is allocated + just because of *ownmaskna*, the existing mask values are copied + into the newly allocated mask. + + This function returns 0 for success, -1 for failure. + +.. cfunction:: npy_bool PyArray_HasNASupport(PyArrayObject *arr) + + Returns true if *arr* is an array which supports NA. This function + exists because the design for adding NA proposed two mechanisms + for NAs in NumPy, NA masks and NA bitpatterns. Currently, just + NA masks have been implemented, but when NA bitpatterns are implemented + this would return true for arrays with an NA bitpattern dtype as well. + +.. cfunction:: int PyArray_ContainsNA(PyArrayObject *arr, PyArrayObject *wheremask, npy_bool *whichna) + + Checks whether the array *arr* contains any NA values. + + If *wheremask* is non-NULL, it must be an NPY_BOOL mask which can + broadcast onto *arr*. Whereever the where mask is True, *arr* + is checked for NA, and whereever it is False, the *arr* value is + ignored. + + The parameter *whichna* is provided for future expansion to multi-NA + support. When implemented, this parameter will be a 128 element + array of npy_bool, with the value True for the NA values that are + being looked for. + + This function returns 1 when the array contains NA values, 0 when + it does not, and -1 when a error has occurred. + +.. cfunction:: int PyArray_AssignNA(PyArrayObject *arr, NpyNA *na, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna) + + Assigns the given *na* value to elements of *arr*. + + If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable + onto *arr*, and only elements of *arr* with a corresponding value + of True in *wheremask* will have *na* assigned. + + The parameters *preservena* and *preservewhichna* are provided for + future expansion to multi-NA support. With a single NA value, one + NA cannot be distinguished from another, so preserving NA values + does not make sense. With multiple NA values, preserving NA values + becomes an important concept because that implies not overwriting the + multi-NA payloads. The parameter *preservewhichna* will be a 128 element + array of npy_bool, indicating which NA payloads to preserve. + + This function returns 0 for success, -1 for failure. + +.. cfunction:: int PyArray_AssignMaskNA(PyArrayObject *arr, npy_mask maskvalue, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna) + + Assigns the given NA mask *maskvalue* to elements of *arr*. + + If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable + onto *arr*, and only elements of *arr* with a corresponding value + of True in *wheremask* will have the NA *maskvalue* assigned. + + The parameters *preservena* and *preservewhichna* are provided for + future expansion to multi-NA support. With a single NA value, one + NA cannot be distinguished from another, so preserving NA values + does not make sense. With multiple NA values, preserving NA values + becomes an important concept because that implies not overwriting the + multi-NA payloads. The parameter *preservewhichna* will be a 128 element + array of npy_bool, indicating which NA payloads to preserve. + + This function returns 0 for success, -1 for failure. diff --git a/doc/source/reference/c-api.rst b/doc/source/reference/c-api.rst index 7c7775889..0766fdf14 100644 --- a/doc/source/reference/c-api.rst +++ b/doc/source/reference/c-api.rst @@ -45,6 +45,7 @@ code. c-api.dtype c-api.array c-api.iterator + c-api.maskna c-api.ufunc c-api.generalized-ufuncs c-api.coremath diff --git a/doc/source/reference/routines.maskna.rst b/doc/source/reference/routines.maskna.rst new file mode 100644 index 000000000..2910acbac --- /dev/null +++ b/doc/source/reference/routines.maskna.rst @@ -0,0 +1,11 @@ +NA-Masked Array Routines +======================== + +.. currentmodule:: numpy + +NA Values +--------- +.. autosummary:: + :toctree: generated/ + + isna diff --git a/doc/source/reference/routines.rst b/doc/source/reference/routines.rst index fb53aac3b..14b4f4d04 100644 --- a/doc/source/reference/routines.rst +++ b/doc/source/reference/routines.rst @@ -35,6 +35,7 @@ indentation. routines.set routines.window routines.err + routines.maskna routines.ma routines.help routines.other diff --git a/doc/source/reference/routines.sort.rst b/doc/source/reference/routines.sort.rst index c10252c69..517ea5897 100644 --- a/doc/source/reference/routines.sort.rst +++ b/doc/source/reference/routines.sort.rst @@ -37,3 +37,4 @@ Counting :toctree: generated/ count_nonzero + count_reduce_items |