summaryrefslogtreecommitdiff
path: root/doc/source/reference/c-api.maskna.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/source/reference/c-api.maskna.rst')
-rw-r--r--doc/source/reference/c-api.maskna.rst592
1 files changed, 0 insertions, 592 deletions
diff --git a/doc/source/reference/c-api.maskna.rst b/doc/source/reference/c-api.maskna.rst
deleted file mode 100644
index 6abb624eb..000000000
--- a/doc/source/reference/c-api.maskna.rst
+++ /dev/null
@@ -1,592 +0,0 @@
-Array NA Mask API
-==================
-
-.. sectionauthor:: Mark Wiebe
-
-.. index::
- pair: maskna; C-API
- pair: C-API; maskna
-
-.. versionadded:: 1.7
-
-NA Masks in Arrays
-------------------
-
-NumPy supports the idea of NA (Not Available) missing values in its
-arrays. In the design document leading up to the implementation, two
-mechanisms for this were proposed, NA masks and NA bitpatterns. NA masks
-have been implemented as the first representation of these values. This
-mechanism supports working with NA values similar to what the R language
-provides, and when combined with views, allows one to temporarily mark
-elements as NA without affecting the original data.
-
-The C API has been updated with mechanisms to allow NumPy extensions
-to work with these masks, and this document provides some examples and
-reference for the NA mask-related functions.
-
-The NA Object
--------------
-
-The main *numpy* namespace in Python has a new object called *NA*.
-This is an instance of :ctype:`NpyNA`, which is a Python object
-representing an NA value. This object is analogous to the NumPy
-scalars, and is returned by :cfunc:`PyArray_Return` instead of
-a scalar where appropriate.
-
-The global *numpy.NA* object is accessible from C as :cdata:`Npy_NA`.
-This is an NA value with no data type or multi-NA payload. Use it
-just as you would Py_None, except use :cfunc:`NpyNA_Check` to
-see if an object is an :ctype:`NpyNA`, because :cdata:`Npy_NA` isn't
-the only instance of NA possible.
-
-If you want to see whether a general PyObject* is NA, you should
-use the API function :cfunc:`NpyNA_FromObject` with *suppress_error*
-set to true. If this returns NULL, the object is not an NA, and if
-it returns an NpyNA instance, the object is NA and you can then
-access its *dtype* and *payload* fields as needed.
-
-To make new :ctype:`NpyNA` objects, use
-:cfunc:`NpyNA_FromDTypeAndPayload`. The functions
-:cfunc:`NpyNA_GetDType`, :cfunc:`NpyNA_IsMultiNA`, and
-:cfunc:`NpyNA_GetPayload` provide access to the data members.
-
-Working With NA-Masked Arrays
------------------------------
-
-The starting point for many C-API functions which manipulate NumPy
-arrays is the function :cfunc:`PyArray_FromAny`. This function converts
-a general PyObject* object into a NumPy ndarray, based on options
-specified in the flags. To avoid surprises, this function does
-not allow NA-masked arrays to pass through by default.
-
-To allow third-party code to work with NA-masked arrays which contain
-no NAs, :cfunc:`PyArray_FromAny` will make a copy of the array into
-a new array without an NA-mask, and return that. This allows for
-proper interoperability in cases where it's possible until functions
-are updated to provide optimal code paths for NA-masked arrays.
-
-To update a function with NA-mask support, add the flag
-:cdata:`NPY_ARRAY_ALLOWNA` when calling :cfunc:`PyArray_FromAny`.
-This allows NA-masked arrays to pass through untouched, and will
-convert PyObject lists containing NA values into NA-masked arrays
-instead of the alternative of switching to object arrays.
-
-To check whether an array has an NA-mask, use the function
-:cfunc:`PyArray_HASMASKNA`, which checks the appropriate flag.
-There are a number of things that one will typically want to do
-when encountering an NA-masked array. We'll go through a few
-of these cases.
-
-Forbidding Any NA Values
-~~~~~~~~~~~~~~~~~~~~~~~~
-
-The simplest case is to forbid any NA values. Note that it is better
-to still be aware of the NA mask and explicitly test for NA values
-than to leave out the :cdata:`NPY_ARRAY_ALLOWNA`, because it is possible
-to avoid the extra copy that :cfunc:`PyArray_FromAny` will make. The
-check for NAs will go something like this::
-
- PyArrayObject *arr = ...;
- int containsna;
-
- /* ContainsNA checks HASMASKNA() for you */
- containsna = PyArray_ContainsNA(arr, NULL, NULL);
- /* Error case */
- if (containsna < 0) {
- return NULL;
- }
- /* If it found an NA */
- else if (containsna) {
- PyErr_SetString(PyExc_ValueError,
- "this operation does not support arrays with NA values");
- return NULL;
- }
-
-After this check, you can be certain that the array doesn't contain any
-NA values, and can proceed accordingly. For example, if you iterate
-over the elements of the array, you may pass the flag
-:cdata:`NPY_ITER_IGNORE_MASKNA` to iterate over the data without
-touching the NA-mask at all.
-
-Manipulating NA Values
-~~~~~~~~~~~~~~~~~~~~~~
-
-The semantics of the NA-mask demand that whenever an array element
-is hidden by the NA-mask, no computations are permitted to modify
-the data backing that element. The :ctype:`NpyIter` provides
-a number of flags to assist with visiting both the array data
-and the mask data simultaneously, and preserving the masking semantics
-even when buffering is required.
-
-The main flag for iterating over NA-masked arrays is
-:cdata:`NPY_ITER_USE_MASKNA`. For each iterator operand which has this
-flag specified, a new operand is added to the end of the iterator operand
-list, and is set to iterate over the original operand's NA-mask. Operands
-which do not have an NA mask are permitted as well when they are flagged
-as read-only. The new operand in this case points to a single exposed
-mask value and all its strides are zero. The latter feature is useful
-when combining multiple read-only inputs, where some of them have masks.
-
-Accumulating NA Values
-~~~~~~~~~~~~~~~~~~~~~~
-
-More complex operations, like the NumPy ufunc reduce functions, need
-to take extra care to follow the masking semantics. If we accumulate
-the NA mask and the data values together, we could discover half way
-through that the output is NA, and that we have violated the contract
-to never change the underlying output value when it is being assigned
-NA.
-
-The solution to this problem is to first accumulate the NA-mask as necessary
-to produce the output's NA-mask, then accumulate the data values without
-touching NA-masked values in the output. The parameter *preservena* in
-functions like :cfunc:`PyArray_AssignArray` can assist when initializing
-values in such an algorithm.
-
-Example NA-Masked Operation in C
---------------------------------
-
-As an example, let's implement a simple binary NA-masked operation
-for the double dtype. We'll make a divide operation which turns
-divide by zero into NA instead of Inf or NaN.
-
-To start, we define the function prototype and some basic
-:ctype:`NpyIter` boilerplate setup. We'll make a function which
-supports an optional *out* parameter, which may be NULL.::
-
- static PyArrayObject*
- SpecialDivide(PyArrayObject* a, PyArrayObject* b, PyArrayObject *out)
- {
- NpyIter *iter = NULL;
- PyArrayObject *op[3];
- PyArray_Descr *dtypes[3];
- npy_uint32 flags, op_flags[3];
-
- /* Iterator construction parameters */
- op[0] = a;
- op[1] = b;
- op[2] = out;
-
- dtypes[0] = PyArray_DescrFromType(NPY_DOUBLE);
- if (dtypes[0] == NULL) {
- return NULL;
- }
- dtypes[1] = dtypes[0];
- dtypes[2] = dtypes[0];
-
- flags = NPY_ITER_BUFFERED |
- NPY_ITER_EXTERNAL_LOOP |
- NPY_ITER_GROWINNER |
- NPY_ITER_REFS_OK |
- NPY_ITER_ZEROSIZE_OK;
-
- /* Every operand gets the flag NPY_ITER_USE_MASKNA */
- op_flags[0] = NPY_ITER_READONLY |
- NPY_ITER_ALIGNED |
- NPY_ITER_USE_MASKNA;
- op_flags[1] = op_flags[0];
- op_flags[2] = NPY_ITER_WRITEONLY |
- NPY_ITER_ALIGNED |
- NPY_ITER_USE_MASKNA |
- NPY_ITER_NO_BROADCAST |
- NPY_ITER_ALLOCATE;
-
- iter = NpyIter_MultiNew(3, op, flags, NPY_KEEPORDER,
- NPY_SAME_KIND_CASTING, op_flags, dtypes);
- /* Don't need the dtype reference anymore */
- Py_DECREF(dtypes[0]);
- if (iter == NULL) {
- return NULL;
- }
-
-At this point, the input operands have been validated according to
-the casting rule, the shapes of the arrays have been broadcast together,
-and any buffering necessary has been prepared. This means we can
-dive into the inner loop of this function.::
-
- ...
- if (NpyIter_GetIterSize(iter) > 0) {
- NpyIter_IterNextFunc *iternext;
- char **dataptr;
- npy_intp *stridesptr, *countptr;
-
- /* Variables needed for looping */
- iternext = NpyIter_GetIterNext(iter, NULL);
- if (iternext == NULL) {
- NpyIter_Deallocate(iter);
- return NULL;
- }
- dataptr = NpyIter_GetDataPtrArray(iter);
- stridesptr = NpyIter_GetInnerStrideArray(iter);
- countptr = NpyIter_GetInnerLoopSizePtr(iter);
-
-The loop gets a bit messy when dealing with NA-masks, because it
-doubles the number of operands being processed in the iterator. Here
-we are naming things clearly so that the content of the innermost loop
-can be easy to work with.::
-
- ...
- do {
- /* Data pointers and strides needed for innermost loop */
- char *data_a = dataptr[0], *data_b = dataptr[1];
- char *data_out = dataptr[2];
- char *maskna_a = dataptr[3], *maskna_b = dataptr[4];
- char *maskna_out = dataptr[5];
- npy_intp stride_a = stridesptr[0], stride_b = stridesptr[1];
- npy_intp stride_out = strides[2];
- npy_intp maskna_stride_a = stridesptr[3];
- npy_intp maskna_stride_b = stridesptr[4];
- npy_intp maskna_stride_out = stridesptr[5];
- npy_intp i, count = *countptr;
-
- for (i = 0; i < count; ++i) {
-
-Here is the code for performing one special division. We use
-the functions :cfunc:`NpyMaskValue_IsExposed` and
-:cfunc:`NpyMaskValue_Create` to work with the masks, in order to be
-as general as possible. These are inline functions, and the compiler
-optimizer should be able to produce the same result as if you performed
-these operations directly inline here.::
-
- ...
- /* If neither of the inputs are NA */
- if (NpyMaskValue_IsExposed((npy_mask)*maskna_a) &&
- NpyMaskValue_IsExposed((npy_mask)*maskna_b)) {
- double a_val = *(double *)data_a;
- double b_val = *(double *)data_b;
- /* Do the divide if 'b' isn't zero */
- if (b_val != 0.0) {
- *(double *)data_out = a_val / b_val;
- /* Need to also set this element to exposed */
- *maskna_out = NpyMaskValue_Create(1, 0);
- }
- /* Otherwise output an NA without touching its data */
- else {
- *maskna_out = NpyMaskValue_Create(0, 0);
- }
- }
- /* Turn the output into NA without touching its data */
- else {
- *maskna_out = NpyMaskValue_Create(0, 0);
- }
-
- data_a += stride_a;
- data_b += stride_b;
- data_out += stride_out;
- maskna_a += maskna_stride_a;
- maskna_b += maskna_stride_b;
- maskna_out += maskna_stride_out;
- }
- } while (iternext(iter));
- }
-
-A little bit more boilerplate for returning the result from the iterator,
-and the function is done.::
-
- ...
- if (out == NULL) {
- out = NpyIter_GetOperandArray(iter)[2];
- }
- Py_INCREF(out);
- NpyIter_Deallocate(iter);
-
- return out;
- }
-
-To run this example, you can create a simple module with a C-file spdiv_mod.c
-consisting of::
-
- #include <Python.h>
- #include <numpy/arrayobject.h>
-
- /* INSERT SpecialDivide source code here */
-
- static PyObject *
- spdiv(PyObject *self, PyObject *args, PyObject *kwds)
- {
- PyArrayObject *a, *b, *out = NULL;
- static char *kwlist[] = {"a", "b", "out", NULL};
-
- if (!PyArg_ParseTupleAndKeywords(args, kwds, "O&O&|O&", kwlist,
- &PyArray_AllowNAConverter, &a,
- &PyArray_AllowNAConverter, &b,
- &PyArray_OutputAllowNAConverter, &out)) {
- return NULL;
- }
-
- /*
- * The usual NumPy way is to only use PyArray_Return when
- * the 'out' parameter is not provided.
- */
- if (out == NULL) {
- return PyArray_Return(SpecialDivide(a, b, out));
- }
- else {
- return (PyObject *)SpecialDivide(a, b, out);
- }
- }
-
- static PyMethodDef SpDivMethods[] = {
- {"spdiv", (PyCFunction)spdiv, METH_VARARGS | METH_KEYWORDS, NULL},
- {NULL, NULL, 0, NULL}
- };
-
-
- PyMODINIT_FUNC initspdiv_mod(void)
- {
- PyObject *m;
-
- m = Py_InitModule("spdiv_mod", SpDivMethods);
- if (m == NULL) {
- return;
- }
-
- /* Make sure NumPy is initialized */
- import_array();
- }
-
-Create a setup.py file like::
-
- #!/usr/bin/env python
- def configuration(parent_package='',top_path=None):
- from numpy.distutils.misc_util import Configuration
- config = Configuration('.',parent_package,top_path)
- config.add_extension('spdiv_mod',['spdiv_mod.c'])
- return config
-
- if __name__ == "__main__":
- from numpy.distutils.core import setup
- setup(configuration=configuration)
-
-With these two files in a directory by itself, run::
-
- $ python setup.py build_ext --inplace
-
-and the file spdiv_mod.so (or .dll) will be placed in the same directory.
-Now you can try out this sample, to see how it behaves.::
-
- >>> import numpy as np
- >>> from spdiv_mod import spdiv
-
-Because we used :cfunc:`PyArray_Return` when wrapping SpecialDivide,
-it returns scalars like any typical NumPy function does::
-
- >>> spdiv(1, 2)
- 0.5
- >>> spdiv(2, 0)
- NA(dtype='float64')
- >>> spdiv(np.NA, 1.5)
- NA(dtype='float64')
-
-Here we can see how NAs propagate, and how 0 in the output turns into NA
-as desired.::
-
- >>> a = np.arange(6)
- >>> b = np.array([0,np.NA,0,2,1,0])
- >>> spdiv(a, b)
- array([ NA, NA, NA, 1.5, 4. , NA])
-
-Finally, we can see the masking behavior by creating a masked
-view of an array. The ones in *c_orig* are preserved whereever
-NA got assigned.::
-
- >>> c_orig = np.ones(6)
- >>> c = c_orig.view(maskna=True)
- >>> spdiv(a, b, out=c)
- array([ NA, NA, NA, 1.5, 4. , NA])
- >>> c_orig
- array([ 1. , 1. , 1. , 1.5, 4. , 1. ])
-
-NA Object Data Type
--------------------
-
-.. ctype:: NpyNA
-
- This is the C object corresponding to objects of type
- numpy.NAType. The fields themselves are hidden from consumers of the
- API, you must use the functions provided to create new NA objects
- and get their properties.
-
- This object contains two fields, a :ctype:`PyArray_Descr *` dtype
- which is either NULL or indicates the data type the NA represents,
- and a payload which is there for the future addition of multi-NA support.
-
-.. cvar:: Npy_NA
-
- This is a global singleton, similar to Py_None, which is the
- *numpy.NA* object. Note that unlike Py_None, multiple NAs may be
- created, for instance with different multi-NA payloads or with
- different dtypes. If you want to return an NA with no payload
- or dtype, return a new reference to Npy_NA.
-
-NA Object Functions
--------------------
-
-.. cfunction:: NpyNA_Check(obj)
-
- Evaluates to true if *obj* is an instance of :ctype:`NpyNA`.
-
-.. cfunction:: PyArray_Descr* NpyNA_GetDType(NpyNA* na)
-
- Returns the *dtype* field of the NA object, which is NULL when
- the NA has no dtype. Does not raise an error.
-
-.. cfunction:: npy_bool NpyNA_IsMultiNA(NpyNA* na)
-
- Returns true if the NA has a multi-NA payload, false otherwise.
-
-.. cfunction:: int NpyNA_GetPayload(NpyNA* na)
-
- Gets the multi-NA payload of the NA, or 0 if *na* doesn't have
- a multi-NA payload.
-
-.. cfunction:: NpyNA* NpyNA_FromObject(PyObject* obj, int suppress_error)
-
- If *obj* represents an object which is NA, for example if it
- is an :ctype:`NpyNA`, or a zero-dimensional NA-masked array with
- its value hidden by the mask, returns a new reference to an
- :ctype:`NpyNA` object representing *obj*. Otherwise returns
- NULL.
-
- If *suppress_error* is true, this function doesn't raise an exception
- when the input isn't NA and it returns NULL, otherwise it does.
-
-.. cfunction:: NpyNA* NpyNA_FromDTypeAndPayload(PyArray_Descr *dtype, int multina, int payload)
-
-
- Constructs a new :ctype:`NpyNA` instance with the specified *dtype*
- and *payload*. For an NA with no dtype, provide NULL in *dtype*.
-
- Until multi-NA is implemented, just pass 0 for both *multina*
- and *payload*.
-
-NA Mask Functions
------------------
-
-A mask dtype can be one of three different possibilities. It can
-be :cdata:`NPY_BOOL`, :cdata:`NPY_MASK`, or a struct dtype whose
-fields are all mask dtypes.
-
-A mask of :cdata:`NPY_BOOL` can just indicate True, with underlying
-value 1, for an element that is exposed, and False, with underlying
-value 0, for an element that is hidden.
-
-A mask of :cdata:`NPY_MASK` can additionally carry a payload which
-is a value from 0 to 127. This allows for missing data implementations
-based on such masks to support multiple reasons for data being missing.
-
-A mask of a struct dtype can only pair up with another struct dtype
-with the same field names. In this way, each field of the mask controls
-the masking for the corresponding field in the associated data array.
-
-Inline functions to work with masks are as follows.
-
-.. cfunction:: npy_bool NpyMaskValue_IsExposed(npy_mask mask)
-
- Returns true if the data element corresponding to the mask element
- can be modified, false if not.
-
-.. cfunction:: npy_uint8 NpyMaskValue_GetPayload(npy_mask mask)
-
- Returns the payload contained in the mask. The return value
- is between 0 and 127.
-
-.. cfunction:: npy_mask NpyMaskValue_Create(npy_bool exposed, npy_int8 payload)
-
- Creates a mask from a flag indicating whether the element is exposed
- or not and a payload value.
-
-NA Mask Array Functions
------------------------
-
-.. cfunction:: int PyArray_AllocateMaskNA(PyArrayObject *arr, npy_bool ownmaskna, npy_bool multina, npy_mask defaultmask)
-
- Allocates an NA mask for the array *arr* if necessary. If *ownmaskna*
- if false, it only allocates an NA mask if none exists, but if
- *ownmaskna* is true, it also allocates one if the NA mask is a view
- into another array's NA mask. Here are the two most common usage
- patterns::
-
- /* Use this to make sure 'arr' has an NA mask */
- if (PyArray_AllocateMaskNA(arr, 0, 0, 1) < 0) {
- return NULL;
- }
-
- /* Use this to make sure 'arr' owns an NA mask */
- if (PyArray_AllocateMaskNA(arr, 1, 0, 1) < 0) {
- return NULL;
- }
-
- The parameter *multina* is provided for future expansion, when
- mult-NA support is added to NumPy. This will affect the dtype of
- the NA mask, which currently must be always NPY_BOOL, but will be
- NPY_MASK for arrays multi-NA when this is implemented.
-
- When a new NA mask is allocated, and the mask needs to be filled,
- it uses the value *defaultmask*. In nearly all cases, this should be set
- to 1, indicating that the elements are exposed. If a mask is allocated
- just because of *ownmaskna*, the existing mask values are copied
- into the newly allocated mask.
-
- This function returns 0 for success, -1 for failure.
-
-.. cfunction:: npy_bool PyArray_HasNASupport(PyArrayObject *arr)
-
- Returns true if *arr* is an array which supports NA. This function
- exists because the design for adding NA proposed two mechanisms
- for NAs in NumPy, NA masks and NA bitpatterns. Currently, just
- NA masks have been implemented, but when NA bitpatterns are implemented
- this would return true for arrays with an NA bitpattern dtype as well.
-
-.. cfunction:: int PyArray_ContainsNA(PyArrayObject *arr, PyArrayObject *wheremask, npy_bool *whichna)
-
- Checks whether the array *arr* contains any NA values.
-
- If *wheremask* is non-NULL, it must be an NPY_BOOL mask which can
- broadcast onto *arr*. Whereever the where mask is True, *arr*
- is checked for NA, and whereever it is False, the *arr* value is
- ignored.
-
- The parameter *whichna* is provided for future expansion to multi-NA
- support. When implemented, this parameter will be a 128 element
- array of npy_bool, with the value True for the NA values that are
- being looked for.
-
- This function returns 1 when the array contains NA values, 0 when
- it does not, and -1 when a error has occurred.
-
-.. cfunction:: int PyArray_AssignNA(PyArrayObject *arr, NpyNA *na, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna)
-
- Assigns the given *na* value to elements of *arr*.
-
- If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable
- onto *arr*, and only elements of *arr* with a corresponding value
- of True in *wheremask* will have *na* assigned.
-
- The parameters *preservena* and *preservewhichna* are provided for
- future expansion to multi-NA support. With a single NA value, one
- NA cannot be distinguished from another, so preserving NA values
- does not make sense. With multiple NA values, preserving NA values
- becomes an important concept because that implies not overwriting the
- multi-NA payloads. The parameter *preservewhichna* will be a 128 element
- array of npy_bool, indicating which NA payloads to preserve.
-
- This function returns 0 for success, -1 for failure.
-
-.. cfunction:: int PyArray_AssignMaskNA(PyArrayObject *arr, npy_mask maskvalue, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna)
-
- Assigns the given NA mask *maskvalue* to elements of *arr*.
-
- If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable
- onto *arr*, and only elements of *arr* with a corresponding value
- of True in *wheremask* will have the NA *maskvalue* assigned.
-
- The parameters *preservena* and *preservewhichna* are provided for
- future expansion to multi-NA support. With a single NA value, one
- NA cannot be distinguished from another, so preserving NA values
- does not make sense. With multiple NA values, preserving NA values
- becomes an important concept because that implies not overwriting the
- multi-NA payloads. The parameter *preservewhichna* will be a 128 element
- array of npy_bool, indicating which NA payloads to preserve.
-
- This function returns 0 for success, -1 for failure.