diff options
author | Mark Wiebe <mwwiebe@gmail.com> | 2011-08-24 14:03:16 -0700 |
---|---|---|
committer | Charles Harris <charlesr.harris@gmail.com> | 2011-08-27 07:27:01 -0600 |
commit | 847404a650757ba8ab6dae3af937890230b00f84 (patch) | |
tree | bed404cbe052cf7e671af8baf7fa41a26b049090 /doc/source/reference/c-api.maskna.rst | |
parent | 5459e09ca6874ab0cdd7b6d4b69a068bcd0b12ed (diff) | |
download | numpy-847404a650757ba8ab6dae3af937890230b00f84.tar.gz |
DOC: missingdata: Documenting C API for NA-masked arrays
Diffstat (limited to 'doc/source/reference/c-api.maskna.rst')
-rw-r--r-- | doc/source/reference/c-api.maskna.rst | 212 |
1 files changed, 212 insertions, 0 deletions
diff --git a/doc/source/reference/c-api.maskna.rst b/doc/source/reference/c-api.maskna.rst new file mode 100644 index 000000000..a7613e1b2 --- /dev/null +++ b/doc/source/reference/c-api.maskna.rst @@ -0,0 +1,212 @@ +Array NA Mask API +================== + +.. sectionauthor:: Mark Wiebe + +.. index:: + pair: maskna; C-API + pair: C-API; maskna + +.. versionadded:: 1.7 + +NA Masks in Arrays +------------------ + +NumPy supports the idea of NA (Not Available) missing values in its arrays. +In the design document leading up to the implementation, two mechanisms +for this were proposed, NA masks and NA bitpatterns. NA masks have been +implemented as the first representation of these values. This mechanism +supports working with NA values similar to the approach taking in the R +project, while when combined with views, allows one to temporarily mark +elements as NA since the mask is independent of the raw array data. + +The C API has been updated with mechanisms to allow NumPy extensions +to work with these masks, and this document provides some examples and +reference for the NA mask-related functions. + +The NA Object +------------- + +The main *numpy* namespace in Python has a new object called *NA*. +This is an instance of :ctype:`NpyNA *`, which is a Python object +representing an NA value. This object is analogous to the NumPy +scalars, and is returned by :cfunc:`PyArray_Return` instead of +a scalar where appropriate. + +The global *numpy.NA* object is accessible from C as :cdata:`Npy_NA`. +This is an NA value with no data type or multi-NA payload. Use it +just as you would Py_None, except use :cfunc:`NpyNA_Check` to +see if an object is an :ctype:`NpyNA *`, because :cdata:`Npy_NA` isn't +the only instance of NA possible. + +If you want to see whether a general PyObject* is NA, you should +use the API function :cfunc:`NpyNA_FromObject` with *suppress_error* +set to true. If this returns NULL, the object is not an NA, and if +it returns an NpyNA instance, the object is NA and you can then +access its *dtype* and *payload* fields as needed. + +To make new :ctype:`NpyNA *` objects, use +:cfunc:`NpyNA_FromDTypeAndPayload`, and the functions +:cfunc:`NpyNA_GetDType`, :cfunc:`NpyNA_IsMultiNA`, and +:cfunc:`NpyNA_GetPayload` provide access to the data members. + +Example NA-Masked Operation in C +-------------------------------- + +NA Object Data Type +------------------- + +.. ctype:: NpyNA * + + This is the C object corresponding to objects of type + numpy.NAType. The fields themselves are hidden from consumers of the + API, you must use the functions provided to create new NA objects + and get their properties. + + This object contains two fields, a :ctype:`PyArray_Descr *` dtype + which is either NULL or indicates the data type the NA represents, + and a payload which is there for the future addition of multi-NA support. + +.. cvar:: Npy_NA + + This is a global singleton, similar to Py_None, which is the + *numpy.NA* object. Note that unlike Py_None, multiple NAs may be + created, for instance with different multi-NA payloads or with + different dtypes. If you want to return an NA with no payload + or dtype, return a new reference to Npy_NA. + +NA Object Functions +------------------- + +.. cfunction:: NpyNA_Check(obj) + + Evaluates to true if *obj* is an instance of :ctype:`NpyNA *`. + +.. cfunction:: PyArray_Descr* NpyNA_GetDType(NpyNA* na) + + Returns the *dtype* field of the NA object, which is NULL when + the NA has no dtype. Does not raise an error. + +.. cfunction:: npy_bool NpyNA_IsMultiNA(NpyNA* na) + + Returns true if the NA has a multi-NA payload, false otherwise. + +.. cfunction:: int NpyNA_GetPayload(NpyNA* na) + + Gets the multi-NA payload of the NA, or 0 if *na* doesn't have + a multi-NA payload. + +.. cfunction:: NpyNA* NpyNA_FromObject(PyObject* obj, int suppress_error) + + If *obj* represents an object which is NA, for example if it + is an :ctype:`NpyNA *`, or a zero-dimensional NA-masked array with + its value hidden by the mask, returns a new reference to an + :ctype:`NpyNA *` object representing *obj*. Otherwise returns + NULL. + + If *suppress_error* is true, this function doesn't raise an exception + when the input isn't NA and it returns NULL, otherwise it does. + +.. cfunction:: NpyNA* NpyNA_FromDTypeAndPayload(PyArray_Descr *dtype, int multina, int payload) + + + Constructs a new :ctype:`NpyNA *` instance with the specified *dtype* + and *payload*. For an NA with no dtype, provide NULL in *dtype*. + + Until multi-NA is implemented, just pass 0 for both *multina* + and *payload*. + +NA Mask Functions +----------------- + +.. cfunction:: int PyArray_AllocateMaskNA(PyArrayObject *arr, npy_bool ownmaskna, npy_bool multina, npy_mask defaultmask) + + Allocates an NA mask for the array *arr* if necessary. If *ownmaskna* + if false, it only allocates an NA mask if none exists, but if + *ownmaskna* is true, it also allocates one if the NA mask is a view + into another array's NA mask. Here are the two most common usage + patterns:: + + /* Use this to make sure 'arr' has an NA mask */ + if (PyArray_AllocateMaskNA(arr, 0, 0, 1) < 0) { + return NULL; + } + + /* Use this to make sure 'arr' owns an NA mask */ + if (PyArray_AllocateMaskNA(arr, 1, 0, 1) < 0) { + return NULL; + } + + The parameter *multina* is provided for future expansion, when + mult-NA support is added to NumPy. This will affect the dtype of + the NA mask, which currently must be always NPY_BOOL, but will be + NPY_MASK for arrays multi-NA when this is implemented. + + When a new NA mask is allocated, and the mask needs to be filled, + it uses the value *defaultmask*. In nearly all cases, this should be set + to 1, indicating that the elements are exposed. If a mask is allocated + just because of *ownmaskna*, the existing mask values are copied + into the newly allocated mask. + + This function returns 0 for success, -1 for failure. + +.. cfunction:: npy_bool PyArray_HasNASupport(PyArrayObject *arr) + + Returns true if *arr* is an array which supports NA. This function + exists because the design for adding NA proposed two mechanisms + for NAs in NumPy, NA masks and NA bitpatterns. Currently, just + NA masks have been implemented, but when NA bitpatterns are implemented + this would return true for arrays with an NA bitpattern dtype as well. + +.. cfunction:: int PyArray_ContainsNA(PyArrayObject *arr, PyArrayObject *wheremask, npy_bool *whichna) + + Checks whether the array *arr* contains any NA values. + + If *wheremask* is non-NULL, it must be an NPY_BOOL mask which can + broadcast onto *arr*. Whereever the where mask is True, *arr* + is checked for NA, and whereever it is False, the *arr* value is + ignored. + + The parameter *whichna* is provided for future expansion to multi-NA + support. When implemented, this parameter will be a 128 element + array of npy_bool, with the value True for the NA values that are + being looked for. + + This function returns 1 when the array contains NA values, 0 when + it does not, and -1 when a error has occurred. + +.. cfunction:: int PyArray_AssignNA(PyArrayObject *arr, NpyNA *na, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna) + + Assigns the given *na* value to elements of *arr*. + + If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable + onto *arr*, and only elements of *arr* with a corresponding value + of True in *wheremask* will have *na* assigned. + + The parameters *preservena* and *preservewhichna* are provided for + future expansion to multi-NA support. With a single NA value, one + NA cannot be distinguished from another, so preserving NA values + does not make sense. With multiple NA values, preserving NA values + becomes an important concept because that implies not overwriting the + multi-NA payloads. The parameter *preservewhichna* will be a 128 element + array of npy_bool, indicating which NA payloads to preserve. + + This function returns 0 for success, -1 for failure. + +.. cfunction:: int PyArray_AssignMaskNA(PyArrayObject *arr, npy_mask maskvalue, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna) + + Assigns the given NA mask *maskvalue* to elements of *arr*. + + If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable + onto *arr*, and only elements of *arr* with a corresponding value + of True in *wheremask* will have the NA *maskvalue* assigned. + + The parameters *preservena* and *preservewhichna* are provided for + future expansion to multi-NA support. With a single NA value, one + NA cannot be distinguished from another, so preserving NA values + does not make sense. With multiple NA values, preserving NA values + becomes an important concept because that implies not overwriting the + multi-NA payloads. The parameter *preservewhichna* will be a 128 element + array of npy_bool, indicating which NA payloads to preserve. + + This function returns 0 for success, -1 for failure. |