summaryrefslogtreecommitdiff
path: root/doc/source/reference
diff options
context:
space:
mode:
authorCharles Harris <charlesr.harris@gmail.com>2011-08-27 21:46:08 -0600
committerCharles Harris <charlesr.harris@gmail.com>2011-08-27 21:46:08 -0600
commit9ecd91b7bf8c77d696ec9856ba10896d8f60309a (patch)
tree9884131ece5eada06212538c591965bf5928afa2 /doc/source/reference
parentaa55ba7437fbe6b8772a360a641b5aa7d3e669e0 (diff)
parent10fac981763e87f949bed15c66127fc380fa9b27 (diff)
downloadnumpy-9ecd91b7bf8c77d696ec9856ba10896d8f60309a.tar.gz
Merge branch 'pull-141'
* pull-141: (167 commits) ENH: missingdata: Make PyArray_Converter and PyArray_OutputConverter safer for legacy code DOC: missingdata: Add a mention of the design NEP, and masks vs bitpatterns DOC: missingdata: Updates from pull request feedback DOC: missingdata: Updates based on pull request feedback ENH: nditer: Change the Python nditer exposure to automatically add NPY_ITER_USE_MASKNA ENH: missingdata: Make comparisons with NA return NA(dtype='bool') BLD: core: onefile build fix and Python3 compatibility change DOC: Mention the update to np.all and np.any in the release notes TST: dtype: Adjust void dtype test to pass without raising a zero-size exception STY: Remove trailing whitespace TST: missingdata: Write some tests for the np.any and np.all NA behavior ENH: missingdata: Make numpy.all follow the NA && False == False rule ENH: missingdata: Make numpy.all follow the NA || True == True rule DOC: missingdata: Also show what assigning a non-NA value does in each case DOC: missingdata: Add introductory documentation for NA-masked arrays ENH: core: Rename PyArrayObject_fieldaccess to PyArrayObject_fields DOC: missingdata: Some tweaks to the NA mask documentation DOC: missingdata: Add example of a C-API function supporting NA masks DOC: missingdata: Documenting C API for NA-masked arrays ENH: missingdata: Finish adding C-API access to the NpyNA object ...
Diffstat (limited to 'doc/source/reference')
-rw-r--r--doc/source/reference/arrays.maskna.rst301
-rw-r--r--doc/source/reference/arrays.rst1
-rw-r--r--doc/source/reference/c-api.array.rst332
-rw-r--r--doc/source/reference/c-api.iterator.rst87
-rw-r--r--doc/source/reference/c-api.maskna.rst592
-rw-r--r--doc/source/reference/c-api.rst1
-rw-r--r--doc/source/reference/routines.maskna.rst11
-rw-r--r--doc/source/reference/routines.rst1
-rw-r--r--doc/source/reference/routines.sort.rst1
9 files changed, 1219 insertions, 108 deletions
diff --git a/doc/source/reference/arrays.maskna.rst b/doc/source/reference/arrays.maskna.rst
new file mode 100644
index 000000000..2faabde83
--- /dev/null
+++ b/doc/source/reference/arrays.maskna.rst
@@ -0,0 +1,301 @@
+.. currentmodule:: numpy
+
+.. _arrays.maskna:
+
+****************
+NA-Masked Arrays
+****************
+
+.. versionadded:: 1.7.0
+
+NumPy 1.7 adds preliminary support for missing values using an interface
+based on an NA (Not Available) placeholder, implemented as masks in the
+core ndarray. This system is highly flexible, allowing NAs to be used
+with any underlying dtype, and supports creating multiple views of the same
+data with different choices of NAs.
+
+Other Missing Data Approaches
+=============================
+
+The previous recommended approach for working with missing values was the
+:mod:`numpy.ma` module, a subclass of ndarray written purely in Python.
+By placing NA-masks directly in the NumPy core, it's possible to avoid
+the need for calling "ma.<func>(arr)" instead of "np.<func>(arr)".
+
+Another approach many people have taken is to use NaN as the
+placeholder for missing values. There are a few functions
+like :func:`numpy.nansum` which behave similarly to usage of the
+ufunc.reduce *skipna* parameter.
+
+As experienced in the R language, a programming interface based on an
+NA placeholder is generally more intuitive to work with than direct
+mask manipulation.
+
+Missing Data Model
+==================
+
+The model adopted by NumPy for missing values is that NA is a
+placeholder for a value which is there, but is unknown to computations.
+The value may be temporarily hidden by the mask, or may be unknown
+for any reason, but could be any value the dtype of the array is able
+to hold.
+
+This model affects computations in specific, well-defined ways. Any time
+we have a computation, like *c = NA + 1*, we must reason about whether
+*c* will be an NA or not. The NA is not available now, but maybe a
+measurement will be made later to determine what its value is, so anything
+we calculate must be consistent with it eventually being revealed. One way
+to do this is with thought experiments imagining we have discovered
+the value of this NA. If the NA is 0, then *c* is 1. If the NA is
+100, then *c* is 101. Because the value of *c* is ambiguous, it
+isn't available either, so must be NA as well.
+
+A consequence of separating the NA model from the dtype is that, unlike
+in R, NaNs are not considered to be NA. An NA is a value that is completely
+unknown, whereas a NaN is usually the result of an invalid computation
+as defined in the IEEE 754 floating point arithmetic specification.
+
+Most computations whose input is NA will output NA as well, a property
+known as propagation. Some operations, however, always produce the
+same result no matter what the value of the NA is. The clearest
+example of this is with the logical operations *and* and *or*. Since both
+np.logical_or(True, True) and np.logical_or(False, True) are True,
+all possible boolean values on the left hand side produce the
+same answer. This means that np.logical_or(np.NA, True) can produce
+True instead of the more conservative np.NA. There is a similar case
+for np.logical_and.
+
+A similar, but slightly deceptive, example is wanting to treat (NA * 0.0)
+as 0.0 instead of as NA. This is invalid because the NA might be Inf
+or NaN, in which case the result is NaN instead of 0.0. This idea is
+valid for integer dtypes, but NumPy still chooses to return NA because
+checking this special case would adversely affect performance.
+
+The NA Object
+=============
+
+In the root numpy namespace, there is a new object NA. This is not
+the only possible instance of an NA as is the case for None, since an NA
+may have a dtype associated with it and has been designed for future
+expansion to carry a multi-NA payload. It can be used in computations
+like any value::
+
+ >>> np.NA
+ NA
+ >>> np.NA * 3
+ NA(dtype='int64')
+ >>> np.sin(np.NA)
+ NA(dtype='float64')
+
+To check whether a value is NA, use the :func:`numpy.isna` function::
+
+ >>> np.isna(np.NA)
+ True
+ >>> np.isna(1.5)
+ False
+ >>> np.isna(np.nan)
+ False
+ >>> np.isna(np.NA * 3)
+ True
+ >>> (np.NA * 3) is np.NA
+ False
+
+
+Creating NA-Masked Arrays
+=========================
+
+Because having NA support adds some overhead to NumPy arrays, one
+must explicitly request it when creating arrays. There are several ways
+to get an NA-masked array. The easiest way is to include an NA
+value in the list used to construct the array.::
+
+ >>> a = np.array([1,3,5])
+ >>> a
+ array([1, 3, 5])
+ >>> a.flags.maskna
+ False
+
+ >>> b = np.array([1,3,np.NA])
+ >>> b
+ array([1, 3, NA])
+ >>> b.flags.maskna
+ True
+
+If one already has an array without an NA-mask, it can be added
+by directly setting the *maskna* flag to True. Assigning an NA
+to an array without NA support will raise an error rather than
+automatically creating an NA-mask, with the idea that supporting
+NA should be an explicit user choice.::
+
+ >>> a = np.array([1,3,5])
+ >>> a[1] = np.NA
+ Traceback (most recent call last):
+ File "<stdin>", line 1, in <module>
+ ValueError: Cannot assign NA to an array which does not support NAs
+ >>> a.flags.maskna = True
+ >>> a[1] = np.NA
+ >>> a
+ array([1, NA, 5])
+
+Most array construction functions have a new parameter *maskna*, which
+can be set to True to produce an array with an NA-mask.::
+
+ >>> np.arange(5., maskna=True)
+ array([ 0., 1., 2., 3., 4.], maskna=True)
+ >>> np.eye(3, maskna=True)
+ array([[ 1., 0., 0.],
+ [ 0., 1., 0.],
+ [ 0., 0., 1.]], maskna=True)
+ >>> np.array([1,3,5], maskna=True)
+ array([1, 3, 5], maskna=True)
+
+Creating NA-Masked Views
+========================
+
+It will sometimes be desirable to view an array with an NA-mask, without
+adding an NA-mask to that array. This is possible by taking an NA-masked
+view of the array. There are two ways to do this, one which simply
+guarantees that the view has an NA-mask, and another which guarantees that the
+view has its own NA-mask, even if the array already had an NA-mask.
+
+Starting with a non-masked array, we can use the :func:`ndarray.view` method
+to get an NA-masked view.::
+
+ >>> a = np.array([1,3,5])
+ >>> b = a.view(maskna=True)
+
+ >>> b[2] = np.NA
+ >>> a
+ array([1, 3, 5])
+ >>> b
+ array([1, 3, NA])
+
+ >>> b[0] = 2
+ >>> a
+ array([2, 3, 5])
+ >>> b
+ array([2, 3, NA])
+
+
+It is important to be cautious here, though, since if the array already
+has a mask, this will also take a view of that mask. This means the original
+array's mask will be affected by assigning NA to the view.::
+
+ >>> a = np.array([1,np.NA,5])
+ >>> b = a.view(maskna=True)
+
+ >>> b[2] = np.NA
+ >>> a
+ array([1, NA, NA])
+ >>> b
+ array([1, NA, NA])
+
+ >>> b[1] = 4
+ >>> a
+ array([1, 4, NA])
+ >>> b
+ array([1, 4, NA])
+
+
+To guarantee that the view created has its own NA-mask, there is another
+flag *ownmaskna*. Using this flag will cause a copy of the array's mask
+to be created for the view when the array already has a mask.::
+
+ >>> a = np.array([1,np.NA,5])
+ >>> b = a.view(ownmaskna=True)
+
+ >>> b[2] = np.NA
+ >>> a
+ array([1, NA, 5])
+ >>> b
+ array([1, NA, NA])
+
+ >>> b[1] = 4
+ >>> a
+ array([1, NA, 5])
+ >>> b
+ array([1, 4, NA])
+
+
+In general, when an NA-masked view of an array has been taken, any time
+an NA is assigned to an element of the array the data for that element
+will remain untouched. This mechanism allows for multiple temporary
+views with NAs of the same original array.
+
+NA-Masked Reductions
+====================
+
+Many of NumPy's reductions like :func:`numpy.sum` and :func:`numpy.std`
+have been extended to work with NA-masked arrays. A consequence of the
+missing value model is that any NA value in an array will cause the
+output including that value to become NA.::
+
+ >>> a = np.array([[1,2,np.NA,3], [0,np.NA,1,1]])
+ >>> a.sum(axis=0)
+ array([1, NA, NA, 4])
+ >>> a.sum(axis=1)
+ array([NA, NA], dtype=int64)
+
+This is not always the desired result, so NumPy includes a parameter
+*skipna* which causes the NA values to be skipped during computation.::
+
+ >>> a = np.array([[1,2,np.NA,3], [0,np.NA,1,1]])
+ >>> a.sum(axis=0, skipna=True)
+ array([1, 2, 1, 4])
+ >>> a.sum(axis=1, skipna=True)
+ array([6, 2])
+
+Iterating Over NA-Masked Arrays
+===============================
+
+The :class:`nditer` object can be used to iterate over arrays with
+NA values just like over normal arrays.::
+
+ >>> a = np.array([1,3,np.NA])
+ >>> for x in np.nditer(a):
+ ... print x,
+ ...
+ 1 3 NA
+ >>> b = np.zeros(3, maskna=True)
+ >>> for x, y in np.nditer([a,b], op_flags=[['readonly'],
+ ... ['writeonly']]):
+ ... y[...] = -x
+ ...
+ >>> b
+ array([-1., -3., NA])
+
+When using the C-API version of the nditer, one must explicitly
+add the NPY_ITER_USE_MASKNA flag and take care to deal with the NA
+mask appropriately. In the Python exposure, this flag is added
+automatically.
+
+Planned Future Additions
+========================
+
+The NA support in 1.7 is fairly preliminary, and is focused on getting
+the basics solid. This particularly meant getting the API in C refined
+to a level where adding NA support to all of NumPy and to third party
+software using NumPy would be a reasonable task.
+
+The biggest missing feature within the core is supporting NA values with
+structured arrays. The design for this involves a mask slot for each
+field in the structured array, motivated by the fact that many important
+uses of structured arrays involve treating the structured fields like
+another dimension.
+
+Another feature that was discussed during the design process is the ability
+to support more than one NA value. The design created supports this multi-NA
+idea with the addition of a payload to the NA value and to the NA-mask.
+The API has been designed in such a way that adding this feature in a future
+release should be possible without changing existing API functions in any way.
+
+To see a more complete list of what is supported and unsupported in the
+1.7 release of NumPy, please refer to the release notes.
+
+During the design phase of this feature, two implementation approaches
+for NA values were discussed, called "mask" and "bitpattern". What
+has been implemented is the "mask" approach, but the design document,
+or "NEP", describes a way both approaches could co-operatively exist
+in NumPy, since each has both pros and cons. This design document is
+available in the file "doc/neps/missing-data.rst" of the NumPy source
+code.
diff --git a/doc/source/reference/arrays.rst b/doc/source/reference/arrays.rst
index 40c9f755d..91b43132a 100644
--- a/doc/source/reference/arrays.rst
+++ b/doc/source/reference/arrays.rst
@@ -44,6 +44,7 @@ of also more complicated arrangements of data.
arrays.indexing
arrays.nditer
arrays.classes
+ arrays.maskna
maskedarray
arrays.interface
arrays.datetime
diff --git a/doc/source/reference/c-api.array.rst b/doc/source/reference/c-api.array.rst
index 4b57945e9..46a215a12 100644
--- a/doc/source/reference/c-api.array.rst
+++ b/doc/source/reference/c-api.array.rst
@@ -21,13 +21,30 @@ Array structure and data access
-------------------------------
These macros all access the :ctype:`PyArrayObject` structure members. The input
-argument, obj, can be any :ctype:`PyObject *` that is directly interpretable
+argument, arr, can be any :ctype:`PyObject *` that is directly interpretable
as a :ctype:`PyArrayObject *` (any instance of the :cdata:`PyArray_Type` and its
sub-types).
-.. cfunction:: void *PyArray_DATA(PyObject *obj)
+.. cfunction:: int PyArray_NDIM(PyArrayObject *arr)
-.. cfunction:: char *PyArray_BYTES(PyObject *obj)
+ The number of dimensions in the array.
+
+.. cfunction:: npy_intp *PyArray_DIMS(PyArrayObject *arr)
+
+ Returns a pointer to the dimensions/shape of the array. The
+ number of elements matches the number of dimensions
+ of the array.
+
+.. cfunction:: npy_intp *PyArray_SHAPE(PyArrayObject *arr)
+
+ .. versionadded:: 1.7
+
+ A synonym for PyArray_DIMS, named to be consistent with the
+ 'shape' usage within Python.
+
+.. cfunction:: void *PyArray_DATA(PyArrayObject *arr)
+
+.. cfunction:: char *PyArray_BYTES(PyArrayObject *arr)
These two macros are similar and obtain the pointer to the
data-buffer for the array. The first macro can (and should be)
@@ -36,19 +53,21 @@ sub-types).
array then be sure you understand how to access the data in the
array to avoid memory and/or alignment problems.
-.. cfunction:: npy_intp *PyArray_DIMS(PyObject *arr)
+.. cfunction:: npy_intp *PyArray_STRIDES(PyArrayObject* arr)
-.. cfunction:: npy_intp *PyArray_STRIDES(PyObject* arr)
+ Returns a pointer to the strides of the array. The
+ number of elements matches the number of dimensions
+ of the array.
-.. cfunction:: npy_intp PyArray_DIM(PyObject* arr, int n)
+.. cfunction:: npy_intp PyArray_DIM(PyArrayObject* arr, int n)
Return the shape in the *n* :math:`^{\textrm{th}}` dimension.
-.. cfunction:: npy_intp PyArray_STRIDE(PyObject* arr, int n)
+.. cfunction:: npy_intp PyArray_STRIDE(PyArrayObject* arr, int n)
Return the stride in the *n* :math:`^{\textrm{th}}` dimension.
-.. cfunction:: PyObject *PyArray_BASE(PyObject* arr)
+.. cfunction:: PyObject *PyArray_BASE(PyArrayObject* arr)
This returns the base object of the array. In most cases, this
means the object which owns the memory the array is pointing at.
@@ -62,40 +81,97 @@ sub-types).
be copied upon destruction. This overloading of the base property
for two functions is likely to change in a future version of NumPy.
-.. cfunction:: PyArray_Descr *PyArray_DESCR(PyObject* arr)
+.. cfunction:: PyArray_Descr *PyArray_DESCR(PyArrayObject* arr)
+
+ Returns a borrowed reference to the dtype property of the array.
+
+.. cfunction:: PyArray_Descr *PyArray_DTYPE(PyArrayObject* arr)
+
+ .. versionadded:: 1.7
+
+ A synonym for PyArray_DESCR, named to be consistent with the
+ 'dtype' usage within Python.
+
+.. cfunction:: npy_bool PyArray_HASMASKNA(PyArrayObject* arr)
+
+ .. versionadded:: 1.7
+
+ Returns true if the array has an NA-mask, false otherwise.
+
+.. cfunction:: PyArray_Descr *PyArray_MASKNA_DTYPE(PyArrayObject* arr)
+
+ .. versionadded:: 1.7
+
+ Returns a borrowed reference to the dtype property for the NA mask
+ of the array, or NULL if the array has no NA mask. This function does
+ not raise an exception when it returns NULL, it is simply returning
+ the appropriate field.
+
+.. cfunction:: char *PyArray_MASKNA_DATA(PyArrayObject* arr)
-.. cfunction:: int PyArray_FLAGS(PyObject* arr)
+ .. versionadded:: 1.7
-.. cfunction:: int PyArray_ITEMSIZE(PyObject* arr)
+ Returns a pointer to the raw data for the NA mask of the array,
+ or NULL if the array has no NA mask. This function does
+ not raise an exception when it returns NULL, it is simply returning
+ the appropriate field.
+
+.. cfunction:: npy_intp *PyArray_MASKNA_STRIDES(PyArrayObject* arr)
+
+ .. versionadded:: 1.7
+
+ Returns a pointer to strides of the NA mask of the array, If the
+ array has no NA mask, the values contained in the array will be
+ invalid. The shape of the NA mask is identical to the shape of the
+ array itself, so the number of strides is always the same as the
+ number of array dimensions.
+
+.. cfunction:: void PyArray_ENABLEFLAGS(PyArrayObject* arr, int flags)
+
+ .. versionadded:: 1.7
+
+ Enables the specified array flags. This function does no validation,
+ and assumes that you know what you're doing.
+
+.. cfunction:: void PyArray_CLEARFLAGS(PyArrayObject* arr, int flags)
+
+ .. versionadded:: 1.7
+
+ Clears the specified array flags. This function does no validation,
+ and assumes that you know what you're doing.
+
+.. cfunction:: int PyArray_FLAGS(PyArrayObject* arr)
+
+.. cfunction:: int PyArray_ITEMSIZE(PyArrayObject* arr)
Return the itemsize for the elements of this array.
-.. cfunction:: int PyArray_TYPE(PyObject* arr)
+.. cfunction:: int PyArray_TYPE(PyArrayObject* arr)
Return the (builtin) typenumber for the elements of this array.
-.. cfunction:: PyObject *PyArray_GETITEM(PyObject* arr, void* itemptr)
+.. cfunction:: PyObject *PyArray_GETITEM(PyArrayObject* arr, void* itemptr)
Get a Python object from the ndarray, *arr*, at the location
pointed to by itemptr. Return ``NULL`` on failure.
-.. cfunction:: int PyArray_SETITEM(PyObject* arr, void* itemptr, PyObject* obj)
+.. cfunction:: int PyArray_SETITEM(PyArrayObject* arr, void* itemptr, PyObject* obj)
Convert obj and place it in the ndarray, *arr*, at the place
pointed to by itemptr. Return -1 if an error occurs or 0 on
success.
-.. cfunction:: npy_intp PyArray_SIZE(PyObject* arr)
+.. cfunction:: npy_intp PyArray_SIZE(PyArrayObject* arr)
Returns the total size (in number of elements) of the array.
-.. cfunction:: npy_intp PyArray_Size(PyObject* obj)
+.. cfunction:: npy_intp PyArray_Size(PyArrayObject* obj)
Returns 0 if *obj* is not a sub-class of bigndarray. Otherwise,
returns the total number of elements in the array. Safer version
of :cfunc:`PyArray_SIZE` (*obj*).
-.. cfunction:: npy_intp PyArray_NBYTES(PyObject* arr)
+.. cfunction:: npy_intp PyArray_NBYTES(PyArrayObject* arr)
Returns the total number of bytes consumed by the array.
@@ -149,48 +225,64 @@ From scratch
.. cfunction:: PyObject* PyArray_NewFromDescr(PyTypeObject* subtype, PyArray_Descr* descr, int nd, npy_intp* dims, npy_intp* strides, void* data, int flags, PyObject* obj)
This is the main array creation function. Most new arrays are
- created with this flexible function. The returned object is an
- object of Python-type *subtype*, which must be a subtype of
- :cdata:`PyArray_Type`. The array has *nd* dimensions, described by
- *dims*. The data-type descriptor of the new array is *descr*. If
- *subtype* is not :cdata:`&PyArray_Type` (*e.g.* a Python subclass of
- the ndarray), then *obj* is the object to pass to the
- :obj:`__array_finalize__` method of the subclass. If *data* is
- ``NULL``, then new memory will be allocated and *flags* can be
- non-zero to indicate a Fortran-style contiguous array. If *data*
- is not ``NULL``, then it is assumed to point to the memory to be
- used for the array and the *flags* argument is used as the new
- flags for the array (except the state of :cdata:`NPY_OWNDATA` and
- :cdata:`NPY_ARRAY_UPDATEIFCOPY` flags of the new array will be reset). In
- addition, if *data* is non-NULL, then *strides* can also be
- provided. If *strides* is ``NULL``, then the array strides are
- computed as C-style contiguous (default) or Fortran-style
+ created with this flexible function.
+
+ The returned object is an object of Python-type *subtype*, which
+ must be a subtype of :cdata:`PyArray_Type`. The array has *nd*
+ dimensions, described by *dims*. The data-type descriptor of the
+ new array is *descr*.
+
+ If *subtype* is of an array subclass instead of the base
+ :cdata:`&PyArray_Type`, then *obj* is the object to pass to
+ the :obj:`__array_finalize__` method of the subclass.
+
+ If *data* is ``NULL``, then new memory will be allocated and *flags*
+ can be non-zero to indicate a Fortran-style contiguous array. If
+ *data* is not ``NULL``, then it is assumed to point to the memory
+ to be used for the array and the *flags* argument is used as the
+ new flags for the array (except the state of :cdata:`NPY_OWNDATA`
+ and :cdata:`NPY_ARRAY_UPDATEIFCOPY` flags of the new array will
+ be reset).
+
+ In addition, if *data* is non-NULL, then *strides* can
+ also be provided. If *strides* is ``NULL``, then the array strides
+ are computed as C-style contiguous (default) or Fortran-style
contiguous (*flags* is nonzero for *data* = ``NULL`` or *flags* &
- :cdata:`NPY_ARRAY_F_CONTIGUOUS` is nonzero non-NULL *data*). Any provided
- *dims* and *strides* are copied into newly allocated dimension and
- strides arrays for the new array object.
+ :cdata:`NPY_ARRAY_F_CONTIGUOUS` is nonzero non-NULL *data*). Any
+ provided *dims* and *strides* are copied into newly allocated
+ dimension and strides arrays for the new array object.
+
+ Because the flags are ignored when *data* is NULL, you cannot
+ create a new array from scratch with an NA mask. If one is desired,
+ call the function :cfunc:`PyArray_AllocateMaskNA` after the array
+ is created.
.. cfunction:: PyObject* PyArray_NewLikeArray(PyArrayObject* prototype, NPY_ORDER order, PyArray_Descr* descr, int subok)
.. versionadded:: 1.6
- This function steals a reference to *descr* if it is not NULL.
+ This function steals a reference to *descr* if it is not NULL.
+
+ This array creation routine allows for the convenient creation of
+ a new array matching an existing array's shapes and memory layout,
+ possibly changing the layout and/or data type.
- This array creation routine allows for the convenient creation of
- a new array matching an existing array's shapes and memory layout,
- possibly changing the layout and/or data type.
+ When *order* is :cdata:`NPY_ANYORDER`, the result order is
+ :cdata:`NPY_FORTRANORDER` if *prototype* is a fortran array,
+ :cdata:`NPY_CORDER` otherwise. When *order* is
+ :cdata:`NPY_KEEPORDER`, the result order matches that of *prototype*, even
+ when the axes of *prototype* aren't in C or Fortran order.
- When *order* is :cdata:`NPY_ANYORDER`, the result order is
- :cdata:`NPY_FORTRANORDER` if *prototype* is a fortran array,
- :cdata:`NPY_CORDER` otherwise. When *order* is
- :cdata:`NPY_KEEPORDER`, the result order matches that of *prototype*, even
- when the axes of *prototype* aren't in C or Fortran order.
+ If *descr* is NULL, the data type of *prototype* is used.
- If *descr* is NULL, the data type of *prototype* is used.
+ If *subok* is 1, the newly created array will use the sub-type of
+ *prototype* to create the new array, otherwise it will create a
+ base-class array.
- If *subok* is 1, the newly created array will use the sub-type of
- *prototype* to create the new array, otherwise it will create a
- base-class array.
+ The newly allocated array does not have an NA mask even if the
+ *prototype* provided does. If an NA mask is desired in the array,
+ call the function :cfunc:`PyArray_AllocateMaskNA` after the array
+ is created.
.. cfunction:: PyObject* PyArray_New(PyTypeObject* subtype, int nd, npy_intp* dims, int type_num, npy_intp* strides, void* data, int itemsize, int flags, PyObject* obj)
@@ -278,7 +370,9 @@ From scratch
increments of ``step``. Equivalent to arange( ``start``,
``stop``, ``step``, ``typenum`` ).
-.. cfunction:: int PyArray_SetBaseObject(PyArrayObject *arr, PyObject *obj)
+.. cfunction:: int PyArray_SetBaseObject(PyArrayObject* arr, PyObject* obj)
+
+ .. versionadded:: 1.7
If you construct an array by passing in your own memory buffer as
a parameter, you need to set the array's `base` property to ensure
@@ -377,6 +471,31 @@ From other objects
with, then an error is raised. If *op* is not already an array,
then this flag has no effect.
+ .. cvar:: NPY_ARRAY_MASKNA
+
+ .. versionadded:: 1.7
+
+ Make sure the array has an NA mask associated with its data.
+
+ .. cvar:: NPY_ARRAY_OWNMASKNA
+
+ .. versionadded:: 1.7
+
+ Make sure the array has an NA mask which it owns
+ associated with its data.
+
+ .. cvar:: NPY_ARRAY_ALLOWNA
+
+ .. versionadded:: 1.7
+
+ To prevent simple errors from slipping in, arrays with NA
+ masks are not permitted to pass through by default. Instead
+ an exception is raised indicating the operation doesn't support
+ NA masks yet. In order to enable NA mask support, this flag
+ must be passed in to allow the NA mask through, signalling that
+ the later code is written appropriately to handle NA mask
+ semantics.
+
.. cvar:: NPY_ARRAY_BEHAVED
:cdata:`NPY_ARRAY_ALIGNED` \| :cdata:`NPY_ARRAY_WRITEABLE`
@@ -1292,6 +1411,24 @@ or :cdata:`NPY_ARRAY_F_CONTIGUOUS` can be determined by the ``strides``,
would have returned an error because :cdata:`NPY_ARRAY_UPDATEIFCOPY`
would not have been possible.
+.. cvar:: NPY_ARRAY_MASKNA
+
+ If this flag is enabled, the array has an NA mask associated with
+ the data. C code which interacts with the NA mask must follow
+ specific semantic rules about when to overwrite data and when not
+ to. The mask can be accessed through the functions
+ :cfunc:`PyArray_MASKNA_DTYPE`, :cfunc:`PyArray_MASKNA_DATA`, and
+ :cfunc:`PyArray_MASKNA_STRIDES`.
+
+.. cvar:: NPY_ARRAY_OWNMASKNA
+
+ If this flag is enabled, the array owns its own NA mask. If it is not
+ enabled, the NA mask is a view into a different array's NA mask.
+
+ In order to ensure that an array owns its own NA mask, you can
+ call :cfunc:`PyArray_AllocateMaskNA` with the parameter *ownmaskna*
+ set to 1.
+
:cfunc:`PyArray_UpdateFlags` (obj, flags) will update the ``obj->flags``
for ``flags`` which can be any of :cdata:`NPY_ARRAY_C_CONTIGUOUS`,
:cdata:`NPY_ARRAY_F_CONTIGUOUS`, :cdata:`NPY_ARRAY_ALIGNED`, or
@@ -1536,18 +1673,20 @@ Conversion
copied into every location. A -1 is returned if an error occurs,
otherwise 0 is returned.
-.. cfunction:: PyObject* PyArray_View(PyArrayObject* self, PyArray_Descr* dtype)
+.. cfunction:: PyObject* PyArray_View(PyArrayObject* self, PyArray_Descr* dtype, PyTypeObject *ptype)
- Equivalent to :meth:`ndarray.view` (*self*, *dtype*). Return a new view of
- the array *self* as possibly a different data-type, *dtype*. If
- *dtype* is ``NULL``, then the returned array will have the same
- data type as *self*. The new data-type must be consistent with
- the size of *self*. Either the itemsizes must be identical, or
- *self* must be single-segment and the total number of bytes must
- be the same. In the latter case the dimensions of the returned
- array will be altered in the last (or first for Fortran-style
- contiguous arrays) dimension. The data area of the returned array
- and self is exactly the same.
+ Equivalent to :meth:`ndarray.view` (*self*, *dtype*). Return a new
+ view of the array *self* as possibly a different data-type, *dtype*,
+ and different array subclass *ptype*.
+
+ If *dtype* is ``NULL``, then the returned array will have the same
+ data type as *self*. The new data-type must be consistent with the
+ size of *self*. Either the itemsizes must be identical, or *self* must
+ be single-segment and the total number of bytes must be the same.
+ In the latter case the dimensions of the returned array will be
+ altered in the last (or first for Fortran-style contiguous arrays)
+ dimension. The data area of the returned array and self is exactly
+ the same.
Shape Manipulation
@@ -2152,50 +2291,6 @@ an element copier function as a primitive.::
A macro which calls the auxdata's clone function appropriately,
returning a deep copy of the auxiliary data.
-Masks for Selecting Elements to Modify
---------------------------------------
-
-.. versionadded:: 1.7.0
-
-The array iterator, :ctype:`NpyIter`, has some new flags which
-allow control over which elements are intended to be modified,
-providing the ability to do masking even when doing casts to a buffer
-of a different type. Some inline functions have been added
-to facilitate consistent usage of these masks.
-
-A mask dtype can be one of three different possibilities. It can
-be :cdata:`NPY_BOOL`, :cdata:`NPY_MASK`, or a struct dtype whose
-fields are all mask dtypes.
-
-A mask of :cdata:`NPY_BOOL` can just indicate True, with underlying
-value 1, for an element that is exposed, and False, with underlying
-value 0, for an element that is hidden.
-
-A mask of :cdata:`NPY_MASK` can additionally carry a payload which
-is a value from 0 to 127. This allows for missing data implementations
-based on such masks to support multiple reasons for data being missing.
-
-A mask of a struct dtype can only pair up with another struct dtype
-with the same field names. In this way, each field of the mask controls
-the masking for the corresponding field in the associated data array.
-
-Inline functions to work with masks are as follows.
-
-.. cfunction:: npy_bool NpyMask_IsExposed(npy_mask mask)
-
- Returns true if the data element corresponding to the mask element
- can be modified, false if not.
-
-.. cfunction:: npy_uint8 NpyMask_GetPayload(npy_mask mask)
-
- Returns the payload contained in the mask. The return value
- is between 0 and 127.
-
-.. cfunction:: npy_mask NpyMask_Create(npy_bool exposed, npy_int8 payload)
-
- Creates a mask from a flag indicating whether the element is exposed
- or not and a payload value.
-
Array Iterators
---------------
@@ -2442,6 +2537,9 @@ Array Scalars
if so, returns the appropriate array scalar. It should be used
whenever 0-dimensional arrays could be returned to Python.
+ If *arr* is a 0-dimensional NA-masked array with its value hidden,
+ an instance of :ctype:`NpyNA *` is returned.
+
.. cfunction:: PyObject* PyArray_Scalar(void* data, PyArray_Descr* dtype, PyObject* itemsize)
Return an array scalar object of the given enumerated *typenum*
@@ -2654,6 +2752,19 @@ to.
. No matter what is returned, you must DECREF the object returned
by this routine in *address* when you are done with it.
+ If the input is an array with NA support, this will either raise
+ an error if it contains any NAs, or will make a copy of the array
+ without NA support if it does not contain any NAs. Use the function
+ :cfunc:`PyArray_AllowNAConverter` to support NA-arrays directly
+ and more efficiently.
+
+.. cfunction:: int PyArray_AllowConverter(PyObject* obj, PyObject** address)
+
+ This is the same as :cfunc:`PyArray_Converter`, but allows arrays
+ with NA support to pass through untouched. This function was created
+ so that the existing converter could raise errors appropriately
+ for functions which have not been updated with NA support
+
.. cfunction:: int PyArray_OutputConverter(PyObject* obj, PyArrayObject** address)
This is a default converter for output arrays given to
@@ -2662,6 +2773,17 @@ to.
*obj*) is TRUE then it is returned in *\*address* without
incrementing its reference count.
+ If the output is an array with NA support, this will raise an error.
+ Use the function :cfunc:`PyArray_OutputAllowNAConverter` to support
+ NA-arrays directly.
+
+.. cfunction:: int PyArray_OutputAllowNAConverter(PyObject* obj, PyArrayObject** address)
+
+ This is the same as :cfunc:`PyArray_OutputConverter`, but allows arrays
+ with NA support to pass through. This function was created
+ so that the existing output converter could raise errors appropriately
+ for functions which have not been updated with NA support
+
.. cfunction:: int PyArray_IntpConverter(PyObject* obj, PyArray_Dims* seq)
Convert any Python sequence, *obj*, smaller than :cdata:`NPY_MAXDIMS`
diff --git a/doc/source/reference/c-api.iterator.rst b/doc/source/reference/c-api.iterator.rst
index 01385acfd..0915f765e 100644
--- a/doc/source/reference/c-api.iterator.rst
+++ b/doc/source/reference/c-api.iterator.rst
@@ -551,9 +551,10 @@ Construction and Destruction
.. cvar:: NPY_ITER_ALLOCATE
This is for output arrays, and requires that the flag
- :cdata:`NPY_ITER_WRITEONLY` be set. If ``op[i]`` is NULL,
- creates a new array with the final broadcast dimensions,
- and a layout matching the iteration order of the iterator.
+ :cdata:`NPY_ITER_WRITEONLY` or :cdata:`NPY_ITER_READWRITE`
+ be set. If ``op[i]`` is NULL, creates a new array with
+ the final broadcast dimensions, and a layout matching
+ the iteration order of the iterator.
When ``op[i]`` is NULL, the requested data type
``op_dtypes[i]`` may be NULL as well, in which case it is
@@ -586,6 +587,8 @@ Construction and Destruction
.. cvar:: NPY_ITER_ARRAYMASK
+ .. versionadded:: 1.7
+
Indicates that this operand is the mask to use for
selecting elements when writing to operands which have
the :cdata:`NPY_ITER_WRITEMASKED` flag applied to them.
@@ -609,6 +612,8 @@ Construction and Destruction
.. cvar:: NPY_ITER_WRITEMASKED
+ .. versionadded:: 1.7
+
Indicates that only elements which the operand with
the ARRAYMASK flag indicates are intended to be modified
by the iteration. In general, the iterator does not enforce
@@ -624,6 +629,35 @@ Construction and Destruction
returns true from the corresponding element in the ARRAYMASK
operand.
+ .. cvar:: NPY_ITER_USE_MASKNA
+
+ .. versionadded:: 1.7
+
+ Adds a new operand to the end of the operand list which
+ iterates over the mask of this operand. If this operand has
+ no mask and is read-only, it broadcasts a constant
+ one-valued mask to indicate every value is valid. If this
+ operand has no mask and is writeable, an error is raised.
+
+ Each operand which has this flag applied to it causes
+ an additional operand to be tacked on the end of the operand
+ list, in an order matching that of the operand array.
+ For example, if there are four operands, and operands with index
+ one and three have the flag :cdata:`NPY_ITER_USE_MASKNA`
+ specified, there will be six operands total, and they will
+ look like [op0, op1, op2, op3, op1_mask, op3_mask].
+
+ .. cvar:: NPY_ITER_IGNORE_MASKNA
+
+ .. versionadded:: 1.7
+
+ Under some circumstances, code doing an iteration will
+ have already called :cfunc:`PyArray_ContainsNA` on an
+ operand which has a mask, and seen that its return value
+ was false. When this occurs, it is safe to do the iteration
+ without simultaneously iterating over the mask, and this
+ flag allows that to be done.
+
.. cfunction:: NpyIter* NpyIter_AdvancedNew(npy_intp nop, PyArrayObject** op, npy_uint32 flags, NPY_ORDER order, NPY_CASTING casting, npy_uint32* op_flags, PyArray_Descr** op_dtypes, int oa_ndim, int** op_axes, npy_intp* itershape, npy_intp buffersize)
Extends :cfunc:`NpyIter_MultiNew` with several advanced options providing
@@ -955,6 +989,19 @@ Construction and Destruction
Returns the number of operands in the iterator.
+ When :cdata:`NPY_ITER_USE_MASKNA` is used on an operand, a new
+ operand is added to the end of the operand list in the iterator
+ to track that operand's NA mask. Thus, this equals the number
+ of construction operands plus the number of operands for
+ which the flag :cdata:`NPY_ITER_USE_MASKNA` was specified.
+
+.. cfunction:: int NpyIter_GetFirstMaskNAOp(NpyIter* iter)
+
+ .. versionadded:: 1.7
+
+ Returns the index of the first NA mask operand in the array. This
+ value is equal to the number of operands passed into the constructor.
+
.. cfunction:: npy_intp* NpyIter_GetAxisStrideArray(NpyIter* iter, int axis)
Gets the array of strides for the specified axis. Requires that
@@ -991,6 +1038,16 @@ Construction and Destruction
that are being iterated. The result points into ``iter``,
so the caller does not gain any references to the PyObjects.
+.. cfunction:: npy_int8* NpyIter_GetMaskNAIndexArray(NpyIter* iter)
+
+ .. versionadded:: 1.7
+
+ This gives back a pointer to the ``nop`` indices which map
+ construction operands with :cdata:`NPY_ITER_USE_MASKNA` flagged
+ to their corresponding NA mask operands and vice versa. For
+ operands which were not flagged with :cdata:`NPY_ITER_USE_MASKNA`,
+ this array contains negative values.
+
.. cfunction:: PyObject* NpyIter_GetIterView(NpyIter* iter, npy_intp i)
This gives back a reference to a new ndarray view, which is a view
@@ -1041,6 +1098,30 @@ Construction and Destruction
Returns ``NPY_SUCCEED`` or ``NPY_FAIL``.
+.. cfunction:: npy_bool NpyIter_IsFirstVisit(NpyIter* iter, int iop)
+
+ .. versionadded:: 1.7
+
+ Checks to see whether this is the first time the elements of the
+ specified reduction operand which the iterator points at are being
+ seen for the first time. The function returns a reasonable answer
+ for reduction operands and when buffering is disabled. The answer
+ may be incorrect for buffered non-reduction operands.
+
+ This function is intended to be used in EXTERNAL_LOOP mode only,
+ and will produce some wrong answers when that mode is not enabled.
+
+ If this function returns true, the caller should also check the inner
+ loop stride of the operand, because if that stride is 0, then only
+ the first element of the innermost external loop is being visited
+ for the first time.
+
+ *WARNING*: For performance reasons, 'iop' is not bounds-checked,
+ it is not confirmed that 'iop' is actually a reduction operand,
+ and it is not confirmed that EXTERNAL_LOOP mode is enabled. These
+ checks are the responsibility of the caller, and should be done
+ outside of any inner loops.
+
Functions For Iteration
-----------------------
diff --git a/doc/source/reference/c-api.maskna.rst b/doc/source/reference/c-api.maskna.rst
new file mode 100644
index 000000000..6abb624eb
--- /dev/null
+++ b/doc/source/reference/c-api.maskna.rst
@@ -0,0 +1,592 @@
+Array NA Mask API
+==================
+
+.. sectionauthor:: Mark Wiebe
+
+.. index::
+ pair: maskna; C-API
+ pair: C-API; maskna
+
+.. versionadded:: 1.7
+
+NA Masks in Arrays
+------------------
+
+NumPy supports the idea of NA (Not Available) missing values in its
+arrays. In the design document leading up to the implementation, two
+mechanisms for this were proposed, NA masks and NA bitpatterns. NA masks
+have been implemented as the first representation of these values. This
+mechanism supports working with NA values similar to what the R language
+provides, and when combined with views, allows one to temporarily mark
+elements as NA without affecting the original data.
+
+The C API has been updated with mechanisms to allow NumPy extensions
+to work with these masks, and this document provides some examples and
+reference for the NA mask-related functions.
+
+The NA Object
+-------------
+
+The main *numpy* namespace in Python has a new object called *NA*.
+This is an instance of :ctype:`NpyNA`, which is a Python object
+representing an NA value. This object is analogous to the NumPy
+scalars, and is returned by :cfunc:`PyArray_Return` instead of
+a scalar where appropriate.
+
+The global *numpy.NA* object is accessible from C as :cdata:`Npy_NA`.
+This is an NA value with no data type or multi-NA payload. Use it
+just as you would Py_None, except use :cfunc:`NpyNA_Check` to
+see if an object is an :ctype:`NpyNA`, because :cdata:`Npy_NA` isn't
+the only instance of NA possible.
+
+If you want to see whether a general PyObject* is NA, you should
+use the API function :cfunc:`NpyNA_FromObject` with *suppress_error*
+set to true. If this returns NULL, the object is not an NA, and if
+it returns an NpyNA instance, the object is NA and you can then
+access its *dtype* and *payload* fields as needed.
+
+To make new :ctype:`NpyNA` objects, use
+:cfunc:`NpyNA_FromDTypeAndPayload`. The functions
+:cfunc:`NpyNA_GetDType`, :cfunc:`NpyNA_IsMultiNA`, and
+:cfunc:`NpyNA_GetPayload` provide access to the data members.
+
+Working With NA-Masked Arrays
+-----------------------------
+
+The starting point for many C-API functions which manipulate NumPy
+arrays is the function :cfunc:`PyArray_FromAny`. This function converts
+a general PyObject* object into a NumPy ndarray, based on options
+specified in the flags. To avoid surprises, this function does
+not allow NA-masked arrays to pass through by default.
+
+To allow third-party code to work with NA-masked arrays which contain
+no NAs, :cfunc:`PyArray_FromAny` will make a copy of the array into
+a new array without an NA-mask, and return that. This allows for
+proper interoperability in cases where it's possible until functions
+are updated to provide optimal code paths for NA-masked arrays.
+
+To update a function with NA-mask support, add the flag
+:cdata:`NPY_ARRAY_ALLOWNA` when calling :cfunc:`PyArray_FromAny`.
+This allows NA-masked arrays to pass through untouched, and will
+convert PyObject lists containing NA values into NA-masked arrays
+instead of the alternative of switching to object arrays.
+
+To check whether an array has an NA-mask, use the function
+:cfunc:`PyArray_HASMASKNA`, which checks the appropriate flag.
+There are a number of things that one will typically want to do
+when encountering an NA-masked array. We'll go through a few
+of these cases.
+
+Forbidding Any NA Values
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+The simplest case is to forbid any NA values. Note that it is better
+to still be aware of the NA mask and explicitly test for NA values
+than to leave out the :cdata:`NPY_ARRAY_ALLOWNA`, because it is possible
+to avoid the extra copy that :cfunc:`PyArray_FromAny` will make. The
+check for NAs will go something like this::
+
+ PyArrayObject *arr = ...;
+ int containsna;
+
+ /* ContainsNA checks HASMASKNA() for you */
+ containsna = PyArray_ContainsNA(arr, NULL, NULL);
+ /* Error case */
+ if (containsna < 0) {
+ return NULL;
+ }
+ /* If it found an NA */
+ else if (containsna) {
+ PyErr_SetString(PyExc_ValueError,
+ "this operation does not support arrays with NA values");
+ return NULL;
+ }
+
+After this check, you can be certain that the array doesn't contain any
+NA values, and can proceed accordingly. For example, if you iterate
+over the elements of the array, you may pass the flag
+:cdata:`NPY_ITER_IGNORE_MASKNA` to iterate over the data without
+touching the NA-mask at all.
+
+Manipulating NA Values
+~~~~~~~~~~~~~~~~~~~~~~
+
+The semantics of the NA-mask demand that whenever an array element
+is hidden by the NA-mask, no computations are permitted to modify
+the data backing that element. The :ctype:`NpyIter` provides
+a number of flags to assist with visiting both the array data
+and the mask data simultaneously, and preserving the masking semantics
+even when buffering is required.
+
+The main flag for iterating over NA-masked arrays is
+:cdata:`NPY_ITER_USE_MASKNA`. For each iterator operand which has this
+flag specified, a new operand is added to the end of the iterator operand
+list, and is set to iterate over the original operand's NA-mask. Operands
+which do not have an NA mask are permitted as well when they are flagged
+as read-only. The new operand in this case points to a single exposed
+mask value and all its strides are zero. The latter feature is useful
+when combining multiple read-only inputs, where some of them have masks.
+
+Accumulating NA Values
+~~~~~~~~~~~~~~~~~~~~~~
+
+More complex operations, like the NumPy ufunc reduce functions, need
+to take extra care to follow the masking semantics. If we accumulate
+the NA mask and the data values together, we could discover half way
+through that the output is NA, and that we have violated the contract
+to never change the underlying output value when it is being assigned
+NA.
+
+The solution to this problem is to first accumulate the NA-mask as necessary
+to produce the output's NA-mask, then accumulate the data values without
+touching NA-masked values in the output. The parameter *preservena* in
+functions like :cfunc:`PyArray_AssignArray` can assist when initializing
+values in such an algorithm.
+
+Example NA-Masked Operation in C
+--------------------------------
+
+As an example, let's implement a simple binary NA-masked operation
+for the double dtype. We'll make a divide operation which turns
+divide by zero into NA instead of Inf or NaN.
+
+To start, we define the function prototype and some basic
+:ctype:`NpyIter` boilerplate setup. We'll make a function which
+supports an optional *out* parameter, which may be NULL.::
+
+ static PyArrayObject*
+ SpecialDivide(PyArrayObject* a, PyArrayObject* b, PyArrayObject *out)
+ {
+ NpyIter *iter = NULL;
+ PyArrayObject *op[3];
+ PyArray_Descr *dtypes[3];
+ npy_uint32 flags, op_flags[3];
+
+ /* Iterator construction parameters */
+ op[0] = a;
+ op[1] = b;
+ op[2] = out;
+
+ dtypes[0] = PyArray_DescrFromType(NPY_DOUBLE);
+ if (dtypes[0] == NULL) {
+ return NULL;
+ }
+ dtypes[1] = dtypes[0];
+ dtypes[2] = dtypes[0];
+
+ flags = NPY_ITER_BUFFERED |
+ NPY_ITER_EXTERNAL_LOOP |
+ NPY_ITER_GROWINNER |
+ NPY_ITER_REFS_OK |
+ NPY_ITER_ZEROSIZE_OK;
+
+ /* Every operand gets the flag NPY_ITER_USE_MASKNA */
+ op_flags[0] = NPY_ITER_READONLY |
+ NPY_ITER_ALIGNED |
+ NPY_ITER_USE_MASKNA;
+ op_flags[1] = op_flags[0];
+ op_flags[2] = NPY_ITER_WRITEONLY |
+ NPY_ITER_ALIGNED |
+ NPY_ITER_USE_MASKNA |
+ NPY_ITER_NO_BROADCAST |
+ NPY_ITER_ALLOCATE;
+
+ iter = NpyIter_MultiNew(3, op, flags, NPY_KEEPORDER,
+ NPY_SAME_KIND_CASTING, op_flags, dtypes);
+ /* Don't need the dtype reference anymore */
+ Py_DECREF(dtypes[0]);
+ if (iter == NULL) {
+ return NULL;
+ }
+
+At this point, the input operands have been validated according to
+the casting rule, the shapes of the arrays have been broadcast together,
+and any buffering necessary has been prepared. This means we can
+dive into the inner loop of this function.::
+
+ ...
+ if (NpyIter_GetIterSize(iter) > 0) {
+ NpyIter_IterNextFunc *iternext;
+ char **dataptr;
+ npy_intp *stridesptr, *countptr;
+
+ /* Variables needed for looping */
+ iternext = NpyIter_GetIterNext(iter, NULL);
+ if (iternext == NULL) {
+ NpyIter_Deallocate(iter);
+ return NULL;
+ }
+ dataptr = NpyIter_GetDataPtrArray(iter);
+ stridesptr = NpyIter_GetInnerStrideArray(iter);
+ countptr = NpyIter_GetInnerLoopSizePtr(iter);
+
+The loop gets a bit messy when dealing with NA-masks, because it
+doubles the number of operands being processed in the iterator. Here
+we are naming things clearly so that the content of the innermost loop
+can be easy to work with.::
+
+ ...
+ do {
+ /* Data pointers and strides needed for innermost loop */
+ char *data_a = dataptr[0], *data_b = dataptr[1];
+ char *data_out = dataptr[2];
+ char *maskna_a = dataptr[3], *maskna_b = dataptr[4];
+ char *maskna_out = dataptr[5];
+ npy_intp stride_a = stridesptr[0], stride_b = stridesptr[1];
+ npy_intp stride_out = strides[2];
+ npy_intp maskna_stride_a = stridesptr[3];
+ npy_intp maskna_stride_b = stridesptr[4];
+ npy_intp maskna_stride_out = stridesptr[5];
+ npy_intp i, count = *countptr;
+
+ for (i = 0; i < count; ++i) {
+
+Here is the code for performing one special division. We use
+the functions :cfunc:`NpyMaskValue_IsExposed` and
+:cfunc:`NpyMaskValue_Create` to work with the masks, in order to be
+as general as possible. These are inline functions, and the compiler
+optimizer should be able to produce the same result as if you performed
+these operations directly inline here.::
+
+ ...
+ /* If neither of the inputs are NA */
+ if (NpyMaskValue_IsExposed((npy_mask)*maskna_a) &&
+ NpyMaskValue_IsExposed((npy_mask)*maskna_b)) {
+ double a_val = *(double *)data_a;
+ double b_val = *(double *)data_b;
+ /* Do the divide if 'b' isn't zero */
+ if (b_val != 0.0) {
+ *(double *)data_out = a_val / b_val;
+ /* Need to also set this element to exposed */
+ *maskna_out = NpyMaskValue_Create(1, 0);
+ }
+ /* Otherwise output an NA without touching its data */
+ else {
+ *maskna_out = NpyMaskValue_Create(0, 0);
+ }
+ }
+ /* Turn the output into NA without touching its data */
+ else {
+ *maskna_out = NpyMaskValue_Create(0, 0);
+ }
+
+ data_a += stride_a;
+ data_b += stride_b;
+ data_out += stride_out;
+ maskna_a += maskna_stride_a;
+ maskna_b += maskna_stride_b;
+ maskna_out += maskna_stride_out;
+ }
+ } while (iternext(iter));
+ }
+
+A little bit more boilerplate for returning the result from the iterator,
+and the function is done.::
+
+ ...
+ if (out == NULL) {
+ out = NpyIter_GetOperandArray(iter)[2];
+ }
+ Py_INCREF(out);
+ NpyIter_Deallocate(iter);
+
+ return out;
+ }
+
+To run this example, you can create a simple module with a C-file spdiv_mod.c
+consisting of::
+
+ #include <Python.h>
+ #include <numpy/arrayobject.h>
+
+ /* INSERT SpecialDivide source code here */
+
+ static PyObject *
+ spdiv(PyObject *self, PyObject *args, PyObject *kwds)
+ {
+ PyArrayObject *a, *b, *out = NULL;
+ static char *kwlist[] = {"a", "b", "out", NULL};
+
+ if (!PyArg_ParseTupleAndKeywords(args, kwds, "O&O&|O&", kwlist,
+ &PyArray_AllowNAConverter, &a,
+ &PyArray_AllowNAConverter, &b,
+ &PyArray_OutputAllowNAConverter, &out)) {
+ return NULL;
+ }
+
+ /*
+ * The usual NumPy way is to only use PyArray_Return when
+ * the 'out' parameter is not provided.
+ */
+ if (out == NULL) {
+ return PyArray_Return(SpecialDivide(a, b, out));
+ }
+ else {
+ return (PyObject *)SpecialDivide(a, b, out);
+ }
+ }
+
+ static PyMethodDef SpDivMethods[] = {
+ {"spdiv", (PyCFunction)spdiv, METH_VARARGS | METH_KEYWORDS, NULL},
+ {NULL, NULL, 0, NULL}
+ };
+
+
+ PyMODINIT_FUNC initspdiv_mod(void)
+ {
+ PyObject *m;
+
+ m = Py_InitModule("spdiv_mod", SpDivMethods);
+ if (m == NULL) {
+ return;
+ }
+
+ /* Make sure NumPy is initialized */
+ import_array();
+ }
+
+Create a setup.py file like::
+
+ #!/usr/bin/env python
+ def configuration(parent_package='',top_path=None):
+ from numpy.distutils.misc_util import Configuration
+ config = Configuration('.',parent_package,top_path)
+ config.add_extension('spdiv_mod',['spdiv_mod.c'])
+ return config
+
+ if __name__ == "__main__":
+ from numpy.distutils.core import setup
+ setup(configuration=configuration)
+
+With these two files in a directory by itself, run::
+
+ $ python setup.py build_ext --inplace
+
+and the file spdiv_mod.so (or .dll) will be placed in the same directory.
+Now you can try out this sample, to see how it behaves.::
+
+ >>> import numpy as np
+ >>> from spdiv_mod import spdiv
+
+Because we used :cfunc:`PyArray_Return` when wrapping SpecialDivide,
+it returns scalars like any typical NumPy function does::
+
+ >>> spdiv(1, 2)
+ 0.5
+ >>> spdiv(2, 0)
+ NA(dtype='float64')
+ >>> spdiv(np.NA, 1.5)
+ NA(dtype='float64')
+
+Here we can see how NAs propagate, and how 0 in the output turns into NA
+as desired.::
+
+ >>> a = np.arange(6)
+ >>> b = np.array([0,np.NA,0,2,1,0])
+ >>> spdiv(a, b)
+ array([ NA, NA, NA, 1.5, 4. , NA])
+
+Finally, we can see the masking behavior by creating a masked
+view of an array. The ones in *c_orig* are preserved whereever
+NA got assigned.::
+
+ >>> c_orig = np.ones(6)
+ >>> c = c_orig.view(maskna=True)
+ >>> spdiv(a, b, out=c)
+ array([ NA, NA, NA, 1.5, 4. , NA])
+ >>> c_orig
+ array([ 1. , 1. , 1. , 1.5, 4. , 1. ])
+
+NA Object Data Type
+-------------------
+
+.. ctype:: NpyNA
+
+ This is the C object corresponding to objects of type
+ numpy.NAType. The fields themselves are hidden from consumers of the
+ API, you must use the functions provided to create new NA objects
+ and get their properties.
+
+ This object contains two fields, a :ctype:`PyArray_Descr *` dtype
+ which is either NULL or indicates the data type the NA represents,
+ and a payload which is there for the future addition of multi-NA support.
+
+.. cvar:: Npy_NA
+
+ This is a global singleton, similar to Py_None, which is the
+ *numpy.NA* object. Note that unlike Py_None, multiple NAs may be
+ created, for instance with different multi-NA payloads or with
+ different dtypes. If you want to return an NA with no payload
+ or dtype, return a new reference to Npy_NA.
+
+NA Object Functions
+-------------------
+
+.. cfunction:: NpyNA_Check(obj)
+
+ Evaluates to true if *obj* is an instance of :ctype:`NpyNA`.
+
+.. cfunction:: PyArray_Descr* NpyNA_GetDType(NpyNA* na)
+
+ Returns the *dtype* field of the NA object, which is NULL when
+ the NA has no dtype. Does not raise an error.
+
+.. cfunction:: npy_bool NpyNA_IsMultiNA(NpyNA* na)
+
+ Returns true if the NA has a multi-NA payload, false otherwise.
+
+.. cfunction:: int NpyNA_GetPayload(NpyNA* na)
+
+ Gets the multi-NA payload of the NA, or 0 if *na* doesn't have
+ a multi-NA payload.
+
+.. cfunction:: NpyNA* NpyNA_FromObject(PyObject* obj, int suppress_error)
+
+ If *obj* represents an object which is NA, for example if it
+ is an :ctype:`NpyNA`, or a zero-dimensional NA-masked array with
+ its value hidden by the mask, returns a new reference to an
+ :ctype:`NpyNA` object representing *obj*. Otherwise returns
+ NULL.
+
+ If *suppress_error* is true, this function doesn't raise an exception
+ when the input isn't NA and it returns NULL, otherwise it does.
+
+.. cfunction:: NpyNA* NpyNA_FromDTypeAndPayload(PyArray_Descr *dtype, int multina, int payload)
+
+
+ Constructs a new :ctype:`NpyNA` instance with the specified *dtype*
+ and *payload*. For an NA with no dtype, provide NULL in *dtype*.
+
+ Until multi-NA is implemented, just pass 0 for both *multina*
+ and *payload*.
+
+NA Mask Functions
+-----------------
+
+A mask dtype can be one of three different possibilities. It can
+be :cdata:`NPY_BOOL`, :cdata:`NPY_MASK`, or a struct dtype whose
+fields are all mask dtypes.
+
+A mask of :cdata:`NPY_BOOL` can just indicate True, with underlying
+value 1, for an element that is exposed, and False, with underlying
+value 0, for an element that is hidden.
+
+A mask of :cdata:`NPY_MASK` can additionally carry a payload which
+is a value from 0 to 127. This allows for missing data implementations
+based on such masks to support multiple reasons for data being missing.
+
+A mask of a struct dtype can only pair up with another struct dtype
+with the same field names. In this way, each field of the mask controls
+the masking for the corresponding field in the associated data array.
+
+Inline functions to work with masks are as follows.
+
+.. cfunction:: npy_bool NpyMaskValue_IsExposed(npy_mask mask)
+
+ Returns true if the data element corresponding to the mask element
+ can be modified, false if not.
+
+.. cfunction:: npy_uint8 NpyMaskValue_GetPayload(npy_mask mask)
+
+ Returns the payload contained in the mask. The return value
+ is between 0 and 127.
+
+.. cfunction:: npy_mask NpyMaskValue_Create(npy_bool exposed, npy_int8 payload)
+
+ Creates a mask from a flag indicating whether the element is exposed
+ or not and a payload value.
+
+NA Mask Array Functions
+-----------------------
+
+.. cfunction:: int PyArray_AllocateMaskNA(PyArrayObject *arr, npy_bool ownmaskna, npy_bool multina, npy_mask defaultmask)
+
+ Allocates an NA mask for the array *arr* if necessary. If *ownmaskna*
+ if false, it only allocates an NA mask if none exists, but if
+ *ownmaskna* is true, it also allocates one if the NA mask is a view
+ into another array's NA mask. Here are the two most common usage
+ patterns::
+
+ /* Use this to make sure 'arr' has an NA mask */
+ if (PyArray_AllocateMaskNA(arr, 0, 0, 1) < 0) {
+ return NULL;
+ }
+
+ /* Use this to make sure 'arr' owns an NA mask */
+ if (PyArray_AllocateMaskNA(arr, 1, 0, 1) < 0) {
+ return NULL;
+ }
+
+ The parameter *multina* is provided for future expansion, when
+ mult-NA support is added to NumPy. This will affect the dtype of
+ the NA mask, which currently must be always NPY_BOOL, but will be
+ NPY_MASK for arrays multi-NA when this is implemented.
+
+ When a new NA mask is allocated, and the mask needs to be filled,
+ it uses the value *defaultmask*. In nearly all cases, this should be set
+ to 1, indicating that the elements are exposed. If a mask is allocated
+ just because of *ownmaskna*, the existing mask values are copied
+ into the newly allocated mask.
+
+ This function returns 0 for success, -1 for failure.
+
+.. cfunction:: npy_bool PyArray_HasNASupport(PyArrayObject *arr)
+
+ Returns true if *arr* is an array which supports NA. This function
+ exists because the design for adding NA proposed two mechanisms
+ for NAs in NumPy, NA masks and NA bitpatterns. Currently, just
+ NA masks have been implemented, but when NA bitpatterns are implemented
+ this would return true for arrays with an NA bitpattern dtype as well.
+
+.. cfunction:: int PyArray_ContainsNA(PyArrayObject *arr, PyArrayObject *wheremask, npy_bool *whichna)
+
+ Checks whether the array *arr* contains any NA values.
+
+ If *wheremask* is non-NULL, it must be an NPY_BOOL mask which can
+ broadcast onto *arr*. Whereever the where mask is True, *arr*
+ is checked for NA, and whereever it is False, the *arr* value is
+ ignored.
+
+ The parameter *whichna* is provided for future expansion to multi-NA
+ support. When implemented, this parameter will be a 128 element
+ array of npy_bool, with the value True for the NA values that are
+ being looked for.
+
+ This function returns 1 when the array contains NA values, 0 when
+ it does not, and -1 when a error has occurred.
+
+.. cfunction:: int PyArray_AssignNA(PyArrayObject *arr, NpyNA *na, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna)
+
+ Assigns the given *na* value to elements of *arr*.
+
+ If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable
+ onto *arr*, and only elements of *arr* with a corresponding value
+ of True in *wheremask* will have *na* assigned.
+
+ The parameters *preservena* and *preservewhichna* are provided for
+ future expansion to multi-NA support. With a single NA value, one
+ NA cannot be distinguished from another, so preserving NA values
+ does not make sense. With multiple NA values, preserving NA values
+ becomes an important concept because that implies not overwriting the
+ multi-NA payloads. The parameter *preservewhichna* will be a 128 element
+ array of npy_bool, indicating which NA payloads to preserve.
+
+ This function returns 0 for success, -1 for failure.
+
+.. cfunction:: int PyArray_AssignMaskNA(PyArrayObject *arr, npy_mask maskvalue, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna)
+
+ Assigns the given NA mask *maskvalue* to elements of *arr*.
+
+ If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable
+ onto *arr*, and only elements of *arr* with a corresponding value
+ of True in *wheremask* will have the NA *maskvalue* assigned.
+
+ The parameters *preservena* and *preservewhichna* are provided for
+ future expansion to multi-NA support. With a single NA value, one
+ NA cannot be distinguished from another, so preserving NA values
+ does not make sense. With multiple NA values, preserving NA values
+ becomes an important concept because that implies not overwriting the
+ multi-NA payloads. The parameter *preservewhichna* will be a 128 element
+ array of npy_bool, indicating which NA payloads to preserve.
+
+ This function returns 0 for success, -1 for failure.
diff --git a/doc/source/reference/c-api.rst b/doc/source/reference/c-api.rst
index 7c7775889..0766fdf14 100644
--- a/doc/source/reference/c-api.rst
+++ b/doc/source/reference/c-api.rst
@@ -45,6 +45,7 @@ code.
c-api.dtype
c-api.array
c-api.iterator
+ c-api.maskna
c-api.ufunc
c-api.generalized-ufuncs
c-api.coremath
diff --git a/doc/source/reference/routines.maskna.rst b/doc/source/reference/routines.maskna.rst
new file mode 100644
index 000000000..2910acbac
--- /dev/null
+++ b/doc/source/reference/routines.maskna.rst
@@ -0,0 +1,11 @@
+NA-Masked Array Routines
+========================
+
+.. currentmodule:: numpy
+
+NA Values
+---------
+.. autosummary::
+ :toctree: generated/
+
+ isna
diff --git a/doc/source/reference/routines.rst b/doc/source/reference/routines.rst
index fb53aac3b..14b4f4d04 100644
--- a/doc/source/reference/routines.rst
+++ b/doc/source/reference/routines.rst
@@ -35,6 +35,7 @@ indentation.
routines.set
routines.window
routines.err
+ routines.maskna
routines.ma
routines.help
routines.other
diff --git a/doc/source/reference/routines.sort.rst b/doc/source/reference/routines.sort.rst
index c10252c69..517ea5897 100644
--- a/doc/source/reference/routines.sort.rst
+++ b/doc/source/reference/routines.sort.rst
@@ -37,3 +37,4 @@ Counting
:toctree: generated/
count_nonzero
+ count_reduce_items