summaryrefslogtreecommitdiff
path: root/doc/source/reference/c-api.maskna.rst
diff options
context:
space:
mode:
authorMark Wiebe <mwwiebe@gmail.com>2011-08-24 14:03:16 -0700
committerCharles Harris <charlesr.harris@gmail.com>2011-08-27 07:27:01 -0600
commit847404a650757ba8ab6dae3af937890230b00f84 (patch)
treebed404cbe052cf7e671af8baf7fa41a26b049090 /doc/source/reference/c-api.maskna.rst
parent5459e09ca6874ab0cdd7b6d4b69a068bcd0b12ed (diff)
downloadnumpy-847404a650757ba8ab6dae3af937890230b00f84.tar.gz
DOC: missingdata: Documenting C API for NA-masked arrays
Diffstat (limited to 'doc/source/reference/c-api.maskna.rst')
-rw-r--r--doc/source/reference/c-api.maskna.rst212
1 files changed, 212 insertions, 0 deletions
diff --git a/doc/source/reference/c-api.maskna.rst b/doc/source/reference/c-api.maskna.rst
new file mode 100644
index 000000000..a7613e1b2
--- /dev/null
+++ b/doc/source/reference/c-api.maskna.rst
@@ -0,0 +1,212 @@
+Array NA Mask API
+==================
+
+.. sectionauthor:: Mark Wiebe
+
+.. index::
+ pair: maskna; C-API
+ pair: C-API; maskna
+
+.. versionadded:: 1.7
+
+NA Masks in Arrays
+------------------
+
+NumPy supports the idea of NA (Not Available) missing values in its arrays.
+In the design document leading up to the implementation, two mechanisms
+for this were proposed, NA masks and NA bitpatterns. NA masks have been
+implemented as the first representation of these values. This mechanism
+supports working with NA values similar to the approach taking in the R
+project, while when combined with views, allows one to temporarily mark
+elements as NA since the mask is independent of the raw array data.
+
+The C API has been updated with mechanisms to allow NumPy extensions
+to work with these masks, and this document provides some examples and
+reference for the NA mask-related functions.
+
+The NA Object
+-------------
+
+The main *numpy* namespace in Python has a new object called *NA*.
+This is an instance of :ctype:`NpyNA *`, which is a Python object
+representing an NA value. This object is analogous to the NumPy
+scalars, and is returned by :cfunc:`PyArray_Return` instead of
+a scalar where appropriate.
+
+The global *numpy.NA* object is accessible from C as :cdata:`Npy_NA`.
+This is an NA value with no data type or multi-NA payload. Use it
+just as you would Py_None, except use :cfunc:`NpyNA_Check` to
+see if an object is an :ctype:`NpyNA *`, because :cdata:`Npy_NA` isn't
+the only instance of NA possible.
+
+If you want to see whether a general PyObject* is NA, you should
+use the API function :cfunc:`NpyNA_FromObject` with *suppress_error*
+set to true. If this returns NULL, the object is not an NA, and if
+it returns an NpyNA instance, the object is NA and you can then
+access its *dtype* and *payload* fields as needed.
+
+To make new :ctype:`NpyNA *` objects, use
+:cfunc:`NpyNA_FromDTypeAndPayload`, and the functions
+:cfunc:`NpyNA_GetDType`, :cfunc:`NpyNA_IsMultiNA`, and
+:cfunc:`NpyNA_GetPayload` provide access to the data members.
+
+Example NA-Masked Operation in C
+--------------------------------
+
+NA Object Data Type
+-------------------
+
+.. ctype:: NpyNA *
+
+ This is the C object corresponding to objects of type
+ numpy.NAType. The fields themselves are hidden from consumers of the
+ API, you must use the functions provided to create new NA objects
+ and get their properties.
+
+ This object contains two fields, a :ctype:`PyArray_Descr *` dtype
+ which is either NULL or indicates the data type the NA represents,
+ and a payload which is there for the future addition of multi-NA support.
+
+.. cvar:: Npy_NA
+
+ This is a global singleton, similar to Py_None, which is the
+ *numpy.NA* object. Note that unlike Py_None, multiple NAs may be
+ created, for instance with different multi-NA payloads or with
+ different dtypes. If you want to return an NA with no payload
+ or dtype, return a new reference to Npy_NA.
+
+NA Object Functions
+-------------------
+
+.. cfunction:: NpyNA_Check(obj)
+
+ Evaluates to true if *obj* is an instance of :ctype:`NpyNA *`.
+
+.. cfunction:: PyArray_Descr* NpyNA_GetDType(NpyNA* na)
+
+ Returns the *dtype* field of the NA object, which is NULL when
+ the NA has no dtype. Does not raise an error.
+
+.. cfunction:: npy_bool NpyNA_IsMultiNA(NpyNA* na)
+
+ Returns true if the NA has a multi-NA payload, false otherwise.
+
+.. cfunction:: int NpyNA_GetPayload(NpyNA* na)
+
+ Gets the multi-NA payload of the NA, or 0 if *na* doesn't have
+ a multi-NA payload.
+
+.. cfunction:: NpyNA* NpyNA_FromObject(PyObject* obj, int suppress_error)
+
+ If *obj* represents an object which is NA, for example if it
+ is an :ctype:`NpyNA *`, or a zero-dimensional NA-masked array with
+ its value hidden by the mask, returns a new reference to an
+ :ctype:`NpyNA *` object representing *obj*. Otherwise returns
+ NULL.
+
+ If *suppress_error* is true, this function doesn't raise an exception
+ when the input isn't NA and it returns NULL, otherwise it does.
+
+.. cfunction:: NpyNA* NpyNA_FromDTypeAndPayload(PyArray_Descr *dtype, int multina, int payload)
+
+
+ Constructs a new :ctype:`NpyNA *` instance with the specified *dtype*
+ and *payload*. For an NA with no dtype, provide NULL in *dtype*.
+
+ Until multi-NA is implemented, just pass 0 for both *multina*
+ and *payload*.
+
+NA Mask Functions
+-----------------
+
+.. cfunction:: int PyArray_AllocateMaskNA(PyArrayObject *arr, npy_bool ownmaskna, npy_bool multina, npy_mask defaultmask)
+
+ Allocates an NA mask for the array *arr* if necessary. If *ownmaskna*
+ if false, it only allocates an NA mask if none exists, but if
+ *ownmaskna* is true, it also allocates one if the NA mask is a view
+ into another array's NA mask. Here are the two most common usage
+ patterns::
+
+ /* Use this to make sure 'arr' has an NA mask */
+ if (PyArray_AllocateMaskNA(arr, 0, 0, 1) < 0) {
+ return NULL;
+ }
+
+ /* Use this to make sure 'arr' owns an NA mask */
+ if (PyArray_AllocateMaskNA(arr, 1, 0, 1) < 0) {
+ return NULL;
+ }
+
+ The parameter *multina* is provided for future expansion, when
+ mult-NA support is added to NumPy. This will affect the dtype of
+ the NA mask, which currently must be always NPY_BOOL, but will be
+ NPY_MASK for arrays multi-NA when this is implemented.
+
+ When a new NA mask is allocated, and the mask needs to be filled,
+ it uses the value *defaultmask*. In nearly all cases, this should be set
+ to 1, indicating that the elements are exposed. If a mask is allocated
+ just because of *ownmaskna*, the existing mask values are copied
+ into the newly allocated mask.
+
+ This function returns 0 for success, -1 for failure.
+
+.. cfunction:: npy_bool PyArray_HasNASupport(PyArrayObject *arr)
+
+ Returns true if *arr* is an array which supports NA. This function
+ exists because the design for adding NA proposed two mechanisms
+ for NAs in NumPy, NA masks and NA bitpatterns. Currently, just
+ NA masks have been implemented, but when NA bitpatterns are implemented
+ this would return true for arrays with an NA bitpattern dtype as well.
+
+.. cfunction:: int PyArray_ContainsNA(PyArrayObject *arr, PyArrayObject *wheremask, npy_bool *whichna)
+
+ Checks whether the array *arr* contains any NA values.
+
+ If *wheremask* is non-NULL, it must be an NPY_BOOL mask which can
+ broadcast onto *arr*. Whereever the where mask is True, *arr*
+ is checked for NA, and whereever it is False, the *arr* value is
+ ignored.
+
+ The parameter *whichna* is provided for future expansion to multi-NA
+ support. When implemented, this parameter will be a 128 element
+ array of npy_bool, with the value True for the NA values that are
+ being looked for.
+
+ This function returns 1 when the array contains NA values, 0 when
+ it does not, and -1 when a error has occurred.
+
+.. cfunction:: int PyArray_AssignNA(PyArrayObject *arr, NpyNA *na, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna)
+
+ Assigns the given *na* value to elements of *arr*.
+
+ If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable
+ onto *arr*, and only elements of *arr* with a corresponding value
+ of True in *wheremask* will have *na* assigned.
+
+ The parameters *preservena* and *preservewhichna* are provided for
+ future expansion to multi-NA support. With a single NA value, one
+ NA cannot be distinguished from another, so preserving NA values
+ does not make sense. With multiple NA values, preserving NA values
+ becomes an important concept because that implies not overwriting the
+ multi-NA payloads. The parameter *preservewhichna* will be a 128 element
+ array of npy_bool, indicating which NA payloads to preserve.
+
+ This function returns 0 for success, -1 for failure.
+
+.. cfunction:: int PyArray_AssignMaskNA(PyArrayObject *arr, npy_mask maskvalue, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna)
+
+ Assigns the given NA mask *maskvalue* to elements of *arr*.
+
+ If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable
+ onto *arr*, and only elements of *arr* with a corresponding value
+ of True in *wheremask* will have the NA *maskvalue* assigned.
+
+ The parameters *preservena* and *preservewhichna* are provided for
+ future expansion to multi-NA support. With a single NA value, one
+ NA cannot be distinguished from another, so preserving NA values
+ does not make sense. With multiple NA values, preserving NA values
+ becomes an important concept because that implies not overwriting the
+ multi-NA payloads. The parameter *preservewhichna* will be a 128 element
+ array of npy_bool, indicating which NA payloads to preserve.
+
+ This function returns 0 for success, -1 for failure.