diff options
-rw-r--r-- | doc/source/reference/c-api.array.rst | 3 | ||||
-rw-r--r-- | doc/source/reference/c-api.iterator.rst | 25 | ||||
-rw-r--r-- | doc/source/reference/c-api.maskna.rst | 212 |
3 files changed, 240 insertions, 0 deletions
diff --git a/doc/source/reference/c-api.array.rst b/doc/source/reference/c-api.array.rst index 369360407..35f5c6030 100644 --- a/doc/source/reference/c-api.array.rst +++ b/doc/source/reference/c-api.array.rst @@ -2575,6 +2575,9 @@ Array Scalars if so, returns the appropriate array scalar. It should be used whenever 0-dimensional arrays could be returned to Python. + If *arr* is a 0-dimensional NA-masked array with its value hidden, + an instance of :ctype:`NpyNA *` is returned. + .. cfunction:: PyObject* PyArray_Scalar(void* data, PyArray_Descr* dtype, PyObject* itemsize) Return an array scalar object of the given enumerated *typenum* diff --git a/doc/source/reference/c-api.iterator.rst b/doc/source/reference/c-api.iterator.rst index 7173418ee..4132ccf2b 100644 --- a/doc/source/reference/c-api.iterator.rst +++ b/doc/source/reference/c-api.iterator.rst @@ -988,6 +988,19 @@ Construction and Destruction Returns the number of operands in the iterator. + When :cdata:`NPY_ITER_USE_MASKNA` is used on an operand, a new + operand is added to the end of the operand list in the iterator + to track that operand's NA mask. Thus, this equals the number + of construction operands plus the number of operands for + which the flag :cdata:`NPY_ITER_USE_MASKNA` was specified. + +.. cfunction:: int NpyIter_GetFirstMaskNAOp(NpyIter* iter) + + .. versionadded:: 1.7 + + Returns the index of the first NA mask operand in the array. This + value is equal to the number of operands passed into the constructor. + .. cfunction:: npy_intp* NpyIter_GetAxisStrideArray(NpyIter* iter, int axis) Gets the array of strides for the specified axis. Requires that @@ -1024,6 +1037,16 @@ Construction and Destruction that are being iterated. The result points into ``iter``, so the caller does not gain any references to the PyObjects. +.. cfunction:: npy_int8* NpyIter_GetMaskNAIndexArray(NpyIter* iter) + + .. versionadded:: 1.7 + + This gives back a pointer to the ``nop`` indices which map + construction operands with :cdata:`NPY_ITER_USE_MASKNA` flagged + to their corresponding NA mask operands and vice versa. For + operands which were not flagged with :cdata:`NPY_ITER_USE_MASKNA`, + this array contains negative values. + .. cfunction:: PyObject* NpyIter_GetIterView(NpyIter* iter, npy_intp i) This gives back a reference to a new ndarray view, which is a view @@ -1076,6 +1099,8 @@ Construction and Destruction .. cfunction:: npy_bool NpyIter_IsFirstVisit(NpyIter* iter, int iop) + .. versionadded:: 1.7 + Checks to see whether this is the first time the elements of the specified reduction operand which the iterator points at are being seen for the first time. The function returns a reasonable answer diff --git a/doc/source/reference/c-api.maskna.rst b/doc/source/reference/c-api.maskna.rst new file mode 100644 index 000000000..a7613e1b2 --- /dev/null +++ b/doc/source/reference/c-api.maskna.rst @@ -0,0 +1,212 @@ +Array NA Mask API +================== + +.. sectionauthor:: Mark Wiebe + +.. index:: + pair: maskna; C-API + pair: C-API; maskna + +.. versionadded:: 1.7 + +NA Masks in Arrays +------------------ + +NumPy supports the idea of NA (Not Available) missing values in its arrays. +In the design document leading up to the implementation, two mechanisms +for this were proposed, NA masks and NA bitpatterns. NA masks have been +implemented as the first representation of these values. This mechanism +supports working with NA values similar to the approach taking in the R +project, while when combined with views, allows one to temporarily mark +elements as NA since the mask is independent of the raw array data. + +The C API has been updated with mechanisms to allow NumPy extensions +to work with these masks, and this document provides some examples and +reference for the NA mask-related functions. + +The NA Object +------------- + +The main *numpy* namespace in Python has a new object called *NA*. +This is an instance of :ctype:`NpyNA *`, which is a Python object +representing an NA value. This object is analogous to the NumPy +scalars, and is returned by :cfunc:`PyArray_Return` instead of +a scalar where appropriate. + +The global *numpy.NA* object is accessible from C as :cdata:`Npy_NA`. +This is an NA value with no data type or multi-NA payload. Use it +just as you would Py_None, except use :cfunc:`NpyNA_Check` to +see if an object is an :ctype:`NpyNA *`, because :cdata:`Npy_NA` isn't +the only instance of NA possible. + +If you want to see whether a general PyObject* is NA, you should +use the API function :cfunc:`NpyNA_FromObject` with *suppress_error* +set to true. If this returns NULL, the object is not an NA, and if +it returns an NpyNA instance, the object is NA and you can then +access its *dtype* and *payload* fields as needed. + +To make new :ctype:`NpyNA *` objects, use +:cfunc:`NpyNA_FromDTypeAndPayload`, and the functions +:cfunc:`NpyNA_GetDType`, :cfunc:`NpyNA_IsMultiNA`, and +:cfunc:`NpyNA_GetPayload` provide access to the data members. + +Example NA-Masked Operation in C +-------------------------------- + +NA Object Data Type +------------------- + +.. ctype:: NpyNA * + + This is the C object corresponding to objects of type + numpy.NAType. The fields themselves are hidden from consumers of the + API, you must use the functions provided to create new NA objects + and get their properties. + + This object contains two fields, a :ctype:`PyArray_Descr *` dtype + which is either NULL or indicates the data type the NA represents, + and a payload which is there for the future addition of multi-NA support. + +.. cvar:: Npy_NA + + This is a global singleton, similar to Py_None, which is the + *numpy.NA* object. Note that unlike Py_None, multiple NAs may be + created, for instance with different multi-NA payloads or with + different dtypes. If you want to return an NA with no payload + or dtype, return a new reference to Npy_NA. + +NA Object Functions +------------------- + +.. cfunction:: NpyNA_Check(obj) + + Evaluates to true if *obj* is an instance of :ctype:`NpyNA *`. + +.. cfunction:: PyArray_Descr* NpyNA_GetDType(NpyNA* na) + + Returns the *dtype* field of the NA object, which is NULL when + the NA has no dtype. Does not raise an error. + +.. cfunction:: npy_bool NpyNA_IsMultiNA(NpyNA* na) + + Returns true if the NA has a multi-NA payload, false otherwise. + +.. cfunction:: int NpyNA_GetPayload(NpyNA* na) + + Gets the multi-NA payload of the NA, or 0 if *na* doesn't have + a multi-NA payload. + +.. cfunction:: NpyNA* NpyNA_FromObject(PyObject* obj, int suppress_error) + + If *obj* represents an object which is NA, for example if it + is an :ctype:`NpyNA *`, or a zero-dimensional NA-masked array with + its value hidden by the mask, returns a new reference to an + :ctype:`NpyNA *` object representing *obj*. Otherwise returns + NULL. + + If *suppress_error* is true, this function doesn't raise an exception + when the input isn't NA and it returns NULL, otherwise it does. + +.. cfunction:: NpyNA* NpyNA_FromDTypeAndPayload(PyArray_Descr *dtype, int multina, int payload) + + + Constructs a new :ctype:`NpyNA *` instance with the specified *dtype* + and *payload*. For an NA with no dtype, provide NULL in *dtype*. + + Until multi-NA is implemented, just pass 0 for both *multina* + and *payload*. + +NA Mask Functions +----------------- + +.. cfunction:: int PyArray_AllocateMaskNA(PyArrayObject *arr, npy_bool ownmaskna, npy_bool multina, npy_mask defaultmask) + + Allocates an NA mask for the array *arr* if necessary. If *ownmaskna* + if false, it only allocates an NA mask if none exists, but if + *ownmaskna* is true, it also allocates one if the NA mask is a view + into another array's NA mask. Here are the two most common usage + patterns:: + + /* Use this to make sure 'arr' has an NA mask */ + if (PyArray_AllocateMaskNA(arr, 0, 0, 1) < 0) { + return NULL; + } + + /* Use this to make sure 'arr' owns an NA mask */ + if (PyArray_AllocateMaskNA(arr, 1, 0, 1) < 0) { + return NULL; + } + + The parameter *multina* is provided for future expansion, when + mult-NA support is added to NumPy. This will affect the dtype of + the NA mask, which currently must be always NPY_BOOL, but will be + NPY_MASK for arrays multi-NA when this is implemented. + + When a new NA mask is allocated, and the mask needs to be filled, + it uses the value *defaultmask*. In nearly all cases, this should be set + to 1, indicating that the elements are exposed. If a mask is allocated + just because of *ownmaskna*, the existing mask values are copied + into the newly allocated mask. + + This function returns 0 for success, -1 for failure. + +.. cfunction:: npy_bool PyArray_HasNASupport(PyArrayObject *arr) + + Returns true if *arr* is an array which supports NA. This function + exists because the design for adding NA proposed two mechanisms + for NAs in NumPy, NA masks and NA bitpatterns. Currently, just + NA masks have been implemented, but when NA bitpatterns are implemented + this would return true for arrays with an NA bitpattern dtype as well. + +.. cfunction:: int PyArray_ContainsNA(PyArrayObject *arr, PyArrayObject *wheremask, npy_bool *whichna) + + Checks whether the array *arr* contains any NA values. + + If *wheremask* is non-NULL, it must be an NPY_BOOL mask which can + broadcast onto *arr*. Whereever the where mask is True, *arr* + is checked for NA, and whereever it is False, the *arr* value is + ignored. + + The parameter *whichna* is provided for future expansion to multi-NA + support. When implemented, this parameter will be a 128 element + array of npy_bool, with the value True for the NA values that are + being looked for. + + This function returns 1 when the array contains NA values, 0 when + it does not, and -1 when a error has occurred. + +.. cfunction:: int PyArray_AssignNA(PyArrayObject *arr, NpyNA *na, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna) + + Assigns the given *na* value to elements of *arr*. + + If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable + onto *arr*, and only elements of *arr* with a corresponding value + of True in *wheremask* will have *na* assigned. + + The parameters *preservena* and *preservewhichna* are provided for + future expansion to multi-NA support. With a single NA value, one + NA cannot be distinguished from another, so preserving NA values + does not make sense. With multiple NA values, preserving NA values + becomes an important concept because that implies not overwriting the + multi-NA payloads. The parameter *preservewhichna* will be a 128 element + array of npy_bool, indicating which NA payloads to preserve. + + This function returns 0 for success, -1 for failure. + +.. cfunction:: int PyArray_AssignMaskNA(PyArrayObject *arr, npy_mask maskvalue, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna) + + Assigns the given NA mask *maskvalue* to elements of *arr*. + + If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable + onto *arr*, and only elements of *arr* with a corresponding value + of True in *wheremask* will have the NA *maskvalue* assigned. + + The parameters *preservena* and *preservewhichna* are provided for + future expansion to multi-NA support. With a single NA value, one + NA cannot be distinguished from another, so preserving NA values + does not make sense. With multiple NA values, preserving NA values + becomes an important concept because that implies not overwriting the + multi-NA payloads. The parameter *preservewhichna* will be a 128 element + array of npy_bool, indicating which NA payloads to preserve. + + This function returns 0 for success, -1 for failure. |