summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorMark Wiebe <mwwiebe@gmail.com>2011-08-24 16:24:52 -0700
committerCharles Harris <charlesr.harris@gmail.com>2011-08-27 07:27:01 -0600
commit770c94ef5ab2478bc9f5451f931613d7424459e1 (patch)
tree7a885fd8ac4dc5853c815628c2e6089afd6c94ff /doc
parent847404a650757ba8ab6dae3af937890230b00f84 (diff)
downloadnumpy-770c94ef5ab2478bc9f5451f931613d7424459e1.tar.gz
DOC: missingdata: Add example of a C-API function supporting NA masks
Diffstat (limited to 'doc')
-rw-r--r--doc/source/reference/c-api.array.rst50
-rw-r--r--doc/source/reference/c-api.iterator.rst7
-rw-r--r--doc/source/reference/c-api.maskna.rst384
-rw-r--r--doc/source/reference/c-api.rst1
4 files changed, 387 insertions, 55 deletions
diff --git a/doc/source/reference/c-api.array.rst b/doc/source/reference/c-api.array.rst
index 35f5c6030..eab2779a4 100644
--- a/doc/source/reference/c-api.array.rst
+++ b/doc/source/reference/c-api.array.rst
@@ -92,6 +92,12 @@ sub-types).
A synonym for PyArray_DESCR, named to be consistent with the
'dtype' usage within Python.
+.. cfunction:: npy_bool PyArray_HASMASKNA(PyArrayObject* arr)
+
+ .. versionadded:: 1.7
+
+ Returns true if the array has an NA-mask, false otherwise.
+
.. cfunction:: PyArray_Descr *PyArray_MASKNA_DTYPE(PyArrayObject* arr)
.. versionadded:: 1.7
@@ -2285,50 +2291,6 @@ an element copier function as a primitive.::
A macro which calls the auxdata's clone function appropriately,
returning a deep copy of the auxiliary data.
-Masks for Selecting Elements to Modify
---------------------------------------
-
-.. versionadded:: 1.7.0
-
-The array iterator, :ctype:`NpyIter`, has some new flags which
-allow control over which elements are intended to be modified,
-providing the ability to do masking even when doing casts to a buffer
-of a different type. Some inline functions have been added
-to facilitate consistent usage of these masks.
-
-A mask dtype can be one of three different possibilities. It can
-be :cdata:`NPY_BOOL`, :cdata:`NPY_MASK`, or a struct dtype whose
-fields are all mask dtypes.
-
-A mask of :cdata:`NPY_BOOL` can just indicate True, with underlying
-value 1, for an element that is exposed, and False, with underlying
-value 0, for an element that is hidden.
-
-A mask of :cdata:`NPY_MASK` can additionally carry a payload which
-is a value from 0 to 127. This allows for missing data implementations
-based on such masks to support multiple reasons for data being missing.
-
-A mask of a struct dtype can only pair up with another struct dtype
-with the same field names. In this way, each field of the mask controls
-the masking for the corresponding field in the associated data array.
-
-Inline functions to work with masks are as follows.
-
-.. cfunction:: npy_bool NpyMask_IsExposed(npy_mask mask)
-
- Returns true if the data element corresponding to the mask element
- can be modified, false if not.
-
-.. cfunction:: npy_uint8 NpyMask_GetPayload(npy_mask mask)
-
- Returns the payload contained in the mask. The return value
- is between 0 and 127.
-
-.. cfunction:: npy_mask NpyMask_Create(npy_bool exposed, npy_int8 payload)
-
- Creates a mask from a flag indicating whether the element is exposed
- or not and a payload value.
-
Array Iterators
---------------
diff --git a/doc/source/reference/c-api.iterator.rst b/doc/source/reference/c-api.iterator.rst
index 4132ccf2b..0915f765e 100644
--- a/doc/source/reference/c-api.iterator.rst
+++ b/doc/source/reference/c-api.iterator.rst
@@ -551,9 +551,10 @@ Construction and Destruction
.. cvar:: NPY_ITER_ALLOCATE
This is for output arrays, and requires that the flag
- :cdata:`NPY_ITER_WRITEONLY` be set. If ``op[i]`` is NULL,
- creates a new array with the final broadcast dimensions,
- and a layout matching the iteration order of the iterator.
+ :cdata:`NPY_ITER_WRITEONLY` or :cdata:`NPY_ITER_READWRITE`
+ be set. If ``op[i]`` is NULL, creates a new array with
+ the final broadcast dimensions, and a layout matching
+ the iteration order of the iterator.
When ``op[i]`` is NULL, the requested data type
``op_dtypes[i]`` may be NULL as well, in which case it is
diff --git a/doc/source/reference/c-api.maskna.rst b/doc/source/reference/c-api.maskna.rst
index a7613e1b2..09fa44d35 100644
--- a/doc/source/reference/c-api.maskna.rst
+++ b/doc/source/reference/c-api.maskna.rst
@@ -28,7 +28,7 @@ The NA Object
-------------
The main *numpy* namespace in Python has a new object called *NA*.
-This is an instance of :ctype:`NpyNA *`, which is a Python object
+This is an instance of :ctype:`NpyNA`, which is a Python object
representing an NA value. This object is analogous to the NumPy
scalars, and is returned by :cfunc:`PyArray_Return` instead of
a scalar where appropriate.
@@ -36,7 +36,7 @@ a scalar where appropriate.
The global *numpy.NA* object is accessible from C as :cdata:`Npy_NA`.
This is an NA value with no data type or multi-NA payload. Use it
just as you would Py_None, except use :cfunc:`NpyNA_Check` to
-see if an object is an :ctype:`NpyNA *`, because :cdata:`Npy_NA` isn't
+see if an object is an :ctype:`NpyNA`, because :cdata:`Npy_NA` isn't
the only instance of NA possible.
If you want to see whether a general PyObject* is NA, you should
@@ -45,18 +45,350 @@ set to true. If this returns NULL, the object is not an NA, and if
it returns an NpyNA instance, the object is NA and you can then
access its *dtype* and *payload* fields as needed.
-To make new :ctype:`NpyNA *` objects, use
+To make new :ctype:`NpyNA` objects, use
:cfunc:`NpyNA_FromDTypeAndPayload`, and the functions
:cfunc:`NpyNA_GetDType`, :cfunc:`NpyNA_IsMultiNA`, and
:cfunc:`NpyNA_GetPayload` provide access to the data members.
+Working With NA-Masked Arrays
+-----------------------------
+
+The starting point for many C-API functions which manipulate NumPy
+arrays is the function :cfunc:`PyArray_FromAny`. This function converts
+a general PyObject* object into a NumPy ndarray, based on options
+specified in the flags. To avoid surprises, this function does
+not allow NA-masked arrays to pass through by default.
+
+To allow third-party code to work with NA-masked arrays which contain
+no NAs, :cfunc:`PyArray_FromAny` will make a copy of the array into
+a new array without an NA-mask, and return that. This allows for
+proper interoperability in cases where it's possible until functions
+are updated to provide optimal code paths for NA-masked arrays.
+
+To update a function with NA-mask support, add the flag
+:cdata:`NPY_ARRAY_ALLOWNA` when calling :cfunc:`PyArray_FromAny`.
+This allows NA-masked arrays to pass through untouched, and will
+convert PyObject lists containing NA values into NA-masked arrays
+instead of the alternative of switching to object arrays.
+
+To check whether an array has an NA-mask, use the function
+:cfunc:`PyArray_HASMASKNA`, which checks the appropriate flag.
+There are a number of things that one will typically want to do
+when encountering an NA-masked array. We'll go through a few
+of these cases.
+
+Forbidding Any NA Values
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+The simplest case is to forbid any NA values. Note that it is better
+to still be aware of the NA mask and explicitly test for NA values
+than to leave out the :cdata:`NPY_ARRAY_ALLOWNA`, because it is possible
+to avoid the extra copy that :cfunc:`PyArray_FromAny` will make. The
+check for NAs will go something like this::
+
+ PyArrayObject *arr = ...;
+ int containsna;
+
+ /* ContainsNA checks HASMASKNA() for you */
+ containsna = PyArray_ContainsNA(arr, NULL, NULL);
+ /* Error case */
+ if (containsna < 0) {
+ return NULL;
+ }
+ /* If it found an NA */
+ else if (containsna) {
+ PyErr_SetString(PyExc_ValueError,
+ "this operation does not support arrays with NA values");
+ return NULL;
+ }
+
+After this check, you can be certain that the array doesn't contain any
+NA values, and can proceed accordingly. For example, if you iterate
+over the elements of the array, you may pass the flag
+:cdata:`NPY_ITER_IGNORE_MASKNA` to iterate over the data without
+touching the NA-mask at all.
+
+Manipulating NA Values
+~~~~~~~~~~~~~~~~~~~~~~
+
+The semantics of the NA-mask demand that whenever an array element
+is hidden by the NA-mask, no computations are permitted to modify
+the data backing that element. The :ctype:`NpyIter` provides
+a number of flags to assist with visiting both the array data
+and the mask data simultaneously, and preserving the masking semantics
+even when buffering is required.
+
+The main flag for iterating over NA-masked arrays is
+:cdata:`NPY_ITER_USE_MASKNA`. For each iterator operand which has this
+flag specified, a new operand is added to the end of the iterator operand
+list, and is set to iterate over the original operand's NA-mask. Operands
+which do not have an NA mask are permitted as well when they are flagged
+as read-only. The new operand in this case points to a single exposed
+mask value and all its strides are zero. The latter feature is useful
+when combining multiple read-only inputs, where some of them have masks.
+
+Accumulating NA Values
+~~~~~~~~~~~~~~~~~~~~~~
+
+More complex operations, like the NumPy ufunc reduce functions, need
+to take extra care to follow the masking semantics. If we accumulate
+the NA mask and the data values together, we could discover half way
+through that the output is NA, and that we have violated the contract
+to never change the underlying output value when it is being assigned
+NA.
+
+The solution to this problem is to first accumulate the NA-mask as necessary
+to produce the output's NA-mask, then accumulate the data values without
+touching NA-masked values in the output. The parameter *preservena* in
+functions like :cfunc:`PyArray_AssignArray` can assist when initializing
+values in such an algorithm.
+
Example NA-Masked Operation in C
--------------------------------
+As an example, let's implement a simple binary NA-masked operation
+for the double dtype. We'll make a divide operation which turns
+divide by zero into NA instead of NaN.
+
+To start, we define the function prototype and some basic
+:ctype:`NpyIter` boilerplate setup. We'll make a function which
+supports an optional *out* parameter, which may be NULL.::
+
+ PyArrayObject*
+ SpecialDivide(PyArrayObject* a, PyArrayObject* b, PyArrayObject *out)
+ {
+ NpyIter *iter = NULL;
+ PyArrayObject *op[3];
+ PyArray_Descr *dtypes[3];
+ npy_uint32 flags, op_flags[3];
+
+ /* Iterator construction parameters */
+ op[0] = a;
+ op[1] = b;
+ op[2] = out;
+
+ dtypes[0] = PyArray_DescrFromType(NPY_DOUBLE);
+ if (dtypes[0] == NULL) {
+ return NULL;
+ }
+ dtypes[1] = dtypes[0];
+ dtypes[2] = dtypes[0];
+
+ flags = NPY_ITER_BUFFERED |
+ NPY_ITER_EXTERNAL_LOOP |
+ NPY_ITER_GROWINNER |
+ NPY_ITER_REFS_OK |
+ NPY_ITER_ZEROSIZE_OK;
+
+ /* Every operand gets the flag NPY_ITER_USE_MASKNA */
+ op_flags[0] = NPY_ITER_READONLY |
+ NPY_ITER_ALIGNED |
+ NPY_ITER_USE_MASKNA;
+ op_flags[1] = op_flags[0];
+ op_flags[2] = NPY_ITER_WRITEONLY |
+ NPY_ITER_ALIGNED |
+ NPY_ITER_USE_MASKNA |
+ NPY_ITER_NO_BROADCAST |
+ NPY_ITER_ALLOCATE;
+
+ iter = NpyIter_MultiNew(3, op, flags, NPY_KEEPORDER,
+ NPY_SAME_KIND_CASTING, op_flags, dtypes);
+ /* Don't need the dtype reference anymore */
+ Py_DECREF(dtypes[0]);
+ if (iter == NULL) {
+ return NULL;
+ }
+
+At this point, the input operands have been validated according to
+the casting rule, the shapes of the arrays have been broadcast together,
+and any buffering necessary has been prepared. This means we can
+dive into the inner loop of this function.::
+
+ ...
+ if (NpyIter_GetIterSize(iter) > 0) {
+ NpyIter_IterNextFunc *iternext;
+ char **dataptr;
+ npy_intp *stridesptr, *countptr;
+
+ /* Variables needed for looping */
+ iternext = NpyIter_GetIterNext(iter, NULL);
+ if (iternext == NULL) {
+ NpyIter_Deallocate(iter);
+ return NULL;
+ }
+ dataptr = NpyIter_GetDataPtrArray(iter);
+ stridesptr = NpyIter_GetInnerStrideArray(iter);
+ countptr = NpyIter_GetInnerLoopSizePtr(iter);
+
+The loop gets a bit messy when dealing with NA-masks, because it
+doubles the number of operands being processed in the iterator. Here
+we are naming things clearly so that the content of the innermost loop
+can be easy to work with.::
+
+ ...
+ do {
+ /* Data pointers and strides needed for innermost loop */
+ char *data_a = dataptr[0], *data_b = dataptr[1];
+ char *data_out = dataptr[2];
+ char *maskna_a = dataptr[3], *maskna_b = dataptr[4];
+ char *maskna_out = dataptr[5];
+ npy_intp stride_a = stridesptr[0], stride_b = stridesptr[1];
+ npy_intp stride_out = strides[2];
+ npy_intp maskna_stride_a = stridesptr[3];
+ npy_intp maskna_stride_b = stridesptr[4];
+ npy_intp maskna_stride_out = stridesptr[5];
+ npy_intp i, count = *countptr;
+
+ for (i = 0; i < count; ++i) {
+
+Here is the code for performing one special division. We use
+the functions :cfunc:`NpyMaskValue_IsExposed` and
+:cfunc:`NpyMaskValue_Create` to work with the masks, in order to be
+as general as possible. These are inline functions, and the compiler
+optimizer should be able to produce the same result as if you performed
+these operations directly inline here.::
+
+ ...
+ /* If neither of the inputs are NA */
+ if (NpyMaskValue_IsExposed((npy_mask)*maskna_a) &&
+ NpyMaskValue_IsExposed((npy_mask)*maskna_b)) {
+ double a_val = *(double *)data_a;
+ double b_val = *(double *)data_b;
+ /* Do the divide if 'b' isn't zero */
+ if (b_val != 0.0) {
+ *(double *)data_out = a_val / b_val;
+ /* Need to also set this element to exposed */
+ *maskna_out = NpyMaskValue_Create(1, 0);
+ }
+ /* Otherwise output an NA without touching its data */
+ else {
+ *maskna_out = NpyMaskValue_Create(0, 0);
+ }
+ }
+ /* Turn the output into NA without touching its data */
+ else {
+ *maskna_out = NpyMaskValue_Create(0, 0);
+ }
+
+ data_a += stride_a;
+ data_b += stride_b;
+ data_out += stride_out;
+ maskna_a += maskna_stride_a;
+ maskna_b += maskna_stride_b;
+ maskna_out += maskna_stride_out;
+ }
+ } while (iternext(iter));
+ }
+
+A little bit more boilerplate for returning the result from the iterator,
+and the function is done.::
+
+ ...
+ if (out == NULL) {
+ out = NpyIter_GetOperandArray(iter)[2];
+ }
+ Py_INCREF(out);
+ NpyIter_Deallocate(iter);
+
+ return out;
+ }
+
+To run this example, you can create a simple module with a C-file spdiv_mod.c
+consisting of::
+
+ #include <Python.h>
+ #include <numpy/arrayobject.h>
+
+ /* INSERT SpecialDivide source code here */
+
+ static PyObject *
+ spdiv(PyObject *self, PyObject *args, PyObject *kwds)
+ {
+ PyArrayObject *a, *b, *out = NULL;
+ static char *kwlist[] = {"a", "b", "out", NULL};
+
+ if (!PyArg_ParseTupleAndKeywords(args, kwds, "O&O&|O&", kwlist,
+ &PyArray_Converter, &a,
+ &PyArray_Converter, &b,
+ &PyArray_OutputConverter, &out)) {
+ return NULL;
+ }
+
+ return (PyObject *)SpecialDivide(a, b, out);
+ }
+
+ static PyMethodDef SpDivMethods[] = {
+ {"spdiv", (PyCFunction)spdiv, METH_VARARGS | METH_KEYWORDS, NULL},
+ {NULL, NULL, 0, NULL}
+ };
+
+
+ PyMODINIT_FUNC initspdiv_mod(void)
+ {
+ PyObject *m;
+
+ m = Py_InitModule("spdiv_mod", SpDivMethods);
+ if (m == NULL) {
+ return;
+ }
+
+ /* Make sure NumPy is initialized */
+ import_array();
+ }
+
+Create a setup.py file like::
+
+ #!/usr/bin/env python
+ def configuration(parent_package='',top_path=None):
+ from numpy.distutils.misc_util import Configuration
+ config = Configuration('.',parent_package,top_path)
+ config.add_extension('spdiv_mod',['spdiv_mod.c'])
+ return config
+
+ if __name__ == "__main__":
+ from numpy.distutils.core import setup
+ setup(configuration=configuration)
+
+With these two files in a directory by itself, run::
+
+ $ python setup.py build_ext --inplace
+
+and the file spdiv_mod.so (or .dll) will be placed in the same directory.
+Now you can try out this sample, to see how it behaves.::
+
+ >>> import numpy as np
+ >>> from spdiv_mod import spdiv
+
+It always produces a masked output array. To get the NumPy scalar returning
+behavior, replace the return in spdiv() with
+"PyArray_Return(SpecialDivide(a, b, out)".::
+
+ >>> spdiv(1,2)
+ array(0.5, maskna=True)
+
+Here we can see how NAs propagate, and how 0 in the output turns into NA
+as desired.::
+
+ >>> a = np.arange(6)
+ >>> b = np.array([0,np.NA,0,2,1,0])
+ >>> spdiv(a,b)
+ array([ NA, NA, NA, 1.5, 4. , NA])
+
+Finally, we can see the masking behavior by creating a masked
+view of an array. The ones in *c_orig* are preserved whereever
+NA got assigned.::
+
+ >>> c_orig = np.ones(6)
+ >>> c = c_orig.view(maskna=True)
+ >>> spdiv(a,b,out=c)
+ array([ NA, NA, NA, 1.5, 4. , NA])
+ >>> c_orig
+ array([ 1. , 1. , 1. , 1.5, 4. , 1. ])
+
NA Object Data Type
-------------------
-.. ctype:: NpyNA *
+.. ctype:: NpyNA
This is the C object corresponding to objects of type
numpy.NAType. The fields themselves are hidden from consumers of the
@@ -80,7 +412,7 @@ NA Object Functions
.. cfunction:: NpyNA_Check(obj)
- Evaluates to true if *obj* is an instance of :ctype:`NpyNA *`.
+ Evaluates to true if *obj* is an instance of :ctype:`NpyNA`.
.. cfunction:: PyArray_Descr* NpyNA_GetDType(NpyNA* na)
@@ -99,9 +431,9 @@ NA Object Functions
.. cfunction:: NpyNA* NpyNA_FromObject(PyObject* obj, int suppress_error)
If *obj* represents an object which is NA, for example if it
- is an :ctype:`NpyNA *`, or a zero-dimensional NA-masked array with
+ is an :ctype:`NpyNA`, or a zero-dimensional NA-masked array with
its value hidden by the mask, returns a new reference to an
- :ctype:`NpyNA *` object representing *obj*. Otherwise returns
+ :ctype:`NpyNA` object representing *obj*. Otherwise returns
NULL.
If *suppress_error* is true, this function doesn't raise an exception
@@ -110,7 +442,7 @@ NA Object Functions
.. cfunction:: NpyNA* NpyNA_FromDTypeAndPayload(PyArray_Descr *dtype, int multina, int payload)
- Constructs a new :ctype:`NpyNA *` instance with the specified *dtype*
+ Constructs a new :ctype:`NpyNA` instance with the specified *dtype*
and *payload*. For an NA with no dtype, provide NULL in *dtype*.
Until multi-NA is implemented, just pass 0 for both *multina*
@@ -119,6 +451,42 @@ NA Object Functions
NA Mask Functions
-----------------
+A mask dtype can be one of three different possibilities. It can
+be :cdata:`NPY_BOOL`, :cdata:`NPY_MASK`, or a struct dtype whose
+fields are all mask dtypes.
+
+A mask of :cdata:`NPY_BOOL` can just indicate True, with underlying
+value 1, for an element that is exposed, and False, with underlying
+value 0, for an element that is hidden.
+
+A mask of :cdata:`NPY_MASK` can additionally carry a payload which
+is a value from 0 to 127. This allows for missing data implementations
+based on such masks to support multiple reasons for data being missing.
+
+A mask of a struct dtype can only pair up with another struct dtype
+with the same field names. In this way, each field of the mask controls
+the masking for the corresponding field in the associated data array.
+
+Inline functions to work with masks are as follows.
+
+.. cfunction:: npy_bool NpyMaskValue_IsExposed(npy_mask mask)
+
+ Returns true if the data element corresponding to the mask element
+ can be modified, false if not.
+
+.. cfunction:: npy_uint8 NpyMaskValue_GetPayload(npy_mask mask)
+
+ Returns the payload contained in the mask. The return value
+ is between 0 and 127.
+
+.. cfunction:: npy_mask NpyMaskValue_Create(npy_bool exposed, npy_int8 payload)
+
+ Creates a mask from a flag indicating whether the element is exposed
+ or not and a payload value.
+
+NA Mask Array Functions
+-----------------------
+
.. cfunction:: int PyArray_AllocateMaskNA(PyArrayObject *arr, npy_bool ownmaskna, npy_bool multina, npy_mask defaultmask)
Allocates an NA mask for the array *arr* if necessary. If *ownmaskna*
diff --git a/doc/source/reference/c-api.rst b/doc/source/reference/c-api.rst
index 7c7775889..0766fdf14 100644
--- a/doc/source/reference/c-api.rst
+++ b/doc/source/reference/c-api.rst
@@ -45,6 +45,7 @@ code.
c-api.dtype
c-api.array
c-api.iterator
+ c-api.maskna
c-api.ufunc
c-api.generalized-ufuncs
c-api.coremath