diff options
author | Travis E. Oliphant <teoliphant@gmail.com> | 2012-06-21 01:37:01 -0700 |
---|---|---|
committer | Travis E. Oliphant <teoliphant@gmail.com> | 2012-06-21 01:37:01 -0700 |
commit | 134174c9265dd87ea802c89cac7a89478e3184f4 (patch) | |
tree | 5ec76af2359d8b038175fc2df3754c642c708805 /doc/source/reference | |
parent | 651ef74c4ebe7d24e727fd444b1985117ef16fae (diff) | |
parent | 3626d0c4fe510d615ef3e5ef3cf4ed2bfb52b53e (diff) | |
download | numpy-134174c9265dd87ea802c89cac7a89478e3184f4.tar.gz |
Merge pull request #297 from njsmith/separate-maskna
Split maskna support out of mainline into a branch
Diffstat (limited to 'doc/source/reference')
-rw-r--r-- | doc/source/reference/arrays.maskna.rst | 306 | ||||
-rw-r--r-- | doc/source/reference/arrays.rst | 1 | ||||
-rw-r--r-- | doc/source/reference/c-api.array.rst | 114 | ||||
-rw-r--r-- | doc/source/reference/c-api.maskna.rst | 592 | ||||
-rw-r--r-- | doc/source/reference/c-api.rst | 1 | ||||
-rw-r--r-- | doc/source/reference/routines.polynomials.classes.rst | 28 | ||||
-rw-r--r-- | doc/source/reference/routines.rst | 1 |
7 files changed, 0 insertions, 1043 deletions
diff --git a/doc/source/reference/arrays.maskna.rst b/doc/source/reference/arrays.maskna.rst deleted file mode 100644 index bd9516eba..000000000 --- a/doc/source/reference/arrays.maskna.rst +++ /dev/null @@ -1,306 +0,0 @@ -.. currentmodule:: numpy - -.. _arrays.maskna: - -**************** -NA-Masked Arrays -**************** - -.. versionadded:: 1.7.0 - -NumPy 1.7 adds preliminary support for missing values using an interface -based on an NA (Not Available) placeholder, implemented as masks in the -core ndarray. This system is highly flexible, allowing NAs to be used -with any underlying dtype, and supports creating multiple views of the same -data with different choices of NAs. - -.. note:: The NA API is *experimental*, and may undergo changes in future - versions of NumPy. The current implementation based on masks will likely be - supplemented by a second one based on bit-patterns, and it is possible that - a difference will be made between missing and ignored data. - -Other Missing Data Approaches -============================= - -The previous recommended approach for working with missing values was the -:mod:`numpy.ma` module, a subclass of ndarray written purely in Python. -By placing NA-masks directly in the NumPy core, it's possible to avoid -the need for calling "ma.<func>(arr)" instead of "np.<func>(arr)". - -Another approach many people have taken is to use NaN as the -placeholder for missing values. There are a few functions -like :func:`numpy.nansum` which behave similarly to usage of the -ufunc.reduce *skipna* parameter. - -As experienced in the R language, a programming interface based on an -NA placeholder is generally more intuitive to work with than direct -mask manipulation. - -Missing Data Model -================== - -The model adopted by NumPy for missing values is that NA is a -placeholder for a value which is there, but is unknown to computations. -The value may be temporarily hidden by the mask, or may be unknown -for any reason, but could be any value the dtype of the array is able -to hold. - -This model affects computations in specific, well-defined ways. Any time -we have a computation, like *c = NA + 1*, we must reason about whether -*c* will be an NA or not. The NA is not available now, but maybe a -measurement will be made later to determine what its value is, so anything -we calculate must be consistent with it eventually being revealed. One way -to do this is with thought experiments imagining we have discovered -the value of this NA. If the NA is 0, then *c* is 1. If the NA is -100, then *c* is 101. Because the value of *c* is ambiguous, it -isn't available either, so must be NA as well. - -A consequence of separating the NA model from the dtype is that, unlike -in R, NaNs are not considered to be NA. An NA is a value that is completely -unknown, whereas a NaN is usually the result of an invalid computation -as defined in the IEEE 754 floating point arithmetic specification. - -Most computations whose input is NA will output NA as well, a property -known as propagation. Some operations, however, always produce the -same result no matter what the value of the NA is. The clearest -example of this is with the logical operations *and* and *or*. Since both -np.logical_or(True, True) and np.logical_or(False, True) are True, -all possible boolean values on the left hand side produce the -same answer. This means that np.logical_or(np.NA, True) can produce -True instead of the more conservative np.NA. There is a similar case -for np.logical_and. - -A similar, but slightly deceptive, example is wanting to treat (NA * 0.0) -as 0.0 instead of as NA. This is invalid because the NA might be Inf -or NaN, in which case the result is NaN instead of 0.0. This idea is -valid for integer dtypes, but NumPy still chooses to return NA because -checking this special case would adversely affect performance. - -The NA Object -============= - -In the root numpy namespace, there is a new object NA. This is not -the only possible instance of an NA as is the case for None, since an NA -may have a dtype associated with it and has been designed for future -expansion to carry a multi-NA payload. It can be used in computations -like any value:: - - >>> np.NA - NA - >>> np.NA * 3 - NA(dtype='int64') - >>> np.sin(np.NA) - NA(dtype='float64') - -To check whether a value is NA, use the :func:`numpy.isna` function:: - - >>> np.isna(np.NA) - True - >>> np.isna(1.5) - False - >>> np.isna(np.nan) - False - >>> np.isna(np.NA * 3) - True - >>> (np.NA * 3) is np.NA - False - - -Creating NA-Masked Arrays -========================= - -Because having NA support adds some overhead to NumPy arrays, one -must explicitly request it when creating arrays. There are several ways -to get an NA-masked array. The easiest way is to include an NA -value in the list used to construct the array.:: - - >>> a = np.array([1,3,5]) - >>> a - array([1, 3, 5]) - >>> a.flags.maskna - False - - >>> b = np.array([1,3,np.NA]) - >>> b - array([1, 3, NA]) - >>> b.flags.maskna - True - -If one already has an array without an NA-mask, it can be added -by directly setting the *maskna* flag to True. Assigning an NA -to an array without NA support will raise an error rather than -automatically creating an NA-mask, with the idea that supporting -NA should be an explicit user choice.:: - - >>> a = np.array([1,3,5]) - >>> a[1] = np.NA - Traceback (most recent call last): - File "<stdin>", line 1, in <module> - ValueError: Cannot assign NA to an array which does not support NAs - >>> a.flags.maskna = True - >>> a[1] = np.NA - >>> a - array([1, NA, 5]) - -Most array construction functions have a new parameter *maskna*, which -can be set to True to produce an array with an NA-mask.:: - - >>> np.arange(5., maskna=True) - array([ 0., 1., 2., 3., 4.], maskna=True) - >>> np.eye(3, maskna=True) - array([[ 1., 0., 0.], - [ 0., 1., 0.], - [ 0., 0., 1.]], maskna=True) - >>> np.array([1,3,5], maskna=True) - array([1, 3, 5], maskna=True) - -Creating NA-Masked Views -======================== - -It will sometimes be desirable to view an array with an NA-mask, without -adding an NA-mask to that array. This is possible by taking an NA-masked -view of the array. There are two ways to do this, one which simply -guarantees that the view has an NA-mask, and another which guarantees that the -view has its own NA-mask, even if the array already had an NA-mask. - -Starting with a non-masked array, we can use the :func:`ndarray.view` method -to get an NA-masked view.:: - - >>> a = np.array([1,3,5]) - >>> b = a.view(maskna=True) - - >>> b[2] = np.NA - >>> a - array([1, 3, 5]) - >>> b - array([1, 3, NA]) - - >>> b[0] = 2 - >>> a - array([2, 3, 5]) - >>> b - array([2, 3, NA]) - - -It is important to be cautious here, though, since if the array already -has a mask, this will also take a view of that mask. This means the original -array's mask will be affected by assigning NA to the view.:: - - >>> a = np.array([1,np.NA,5]) - >>> b = a.view(maskna=True) - - >>> b[2] = np.NA - >>> a - array([1, NA, NA]) - >>> b - array([1, NA, NA]) - - >>> b[1] = 4 - >>> a - array([1, 4, NA]) - >>> b - array([1, 4, NA]) - - -To guarantee that the view created has its own NA-mask, there is another -flag *ownmaskna*. Using this flag will cause a copy of the array's mask -to be created for the view when the array already has a mask.:: - - >>> a = np.array([1,np.NA,5]) - >>> b = a.view(ownmaskna=True) - - >>> b[2] = np.NA - >>> a - array([1, NA, 5]) - >>> b - array([1, NA, NA]) - - >>> b[1] = 4 - >>> a - array([1, NA, 5]) - >>> b - array([1, 4, NA]) - - -In general, when an NA-masked view of an array has been taken, any time -an NA is assigned to an element of the array the data for that element -will remain untouched. This mechanism allows for multiple temporary -views with NAs of the same original array. - -NA-Masked Reductions -==================== - -Many of NumPy's reductions like :func:`numpy.sum` and :func:`numpy.std` -have been extended to work with NA-masked arrays. A consequence of the -missing value model is that any NA value in an array will cause the -output including that value to become NA.:: - - >>> a = np.array([[1,2,np.NA,3], [0,np.NA,1,1]]) - >>> a.sum(axis=0) - array([1, NA, NA, 4]) - >>> a.sum(axis=1) - array([NA, NA], dtype=int64) - -This is not always the desired result, so NumPy includes a parameter -*skipna* which causes the NA values to be skipped during computation.:: - - >>> a = np.array([[1,2,np.NA,3], [0,np.NA,1,1]]) - >>> a.sum(axis=0, skipna=True) - array([1, 2, 1, 4]) - >>> a.sum(axis=1, skipna=True) - array([6, 2]) - -Iterating Over NA-Masked Arrays -=============================== - -The :class:`nditer` object can be used to iterate over arrays with -NA values just like over normal arrays.:: - - >>> a = np.array([1,3,np.NA]) - >>> for x in np.nditer(a): - ... print x, - ... - 1 3 NA - >>> b = np.zeros(3, maskna=True) - >>> for x, y in np.nditer([a,b], op_flags=[['readonly'], - ... ['writeonly']]): - ... y[...] = -x - ... - >>> b - array([-1., -3., NA]) - -When using the C-API version of the nditer, one must explicitly -add the NPY_ITER_USE_MASKNA flag and take care to deal with the NA -mask appropriately. In the Python exposure, this flag is added -automatically. - -Planned Future Additions -======================== - -The NA support in 1.7 is fairly preliminary, and is focused on getting -the basics solid. This particularly meant getting the API in C refined -to a level where adding NA support to all of NumPy and to third party -software using NumPy would be a reasonable task. - -The biggest missing feature within the core is supporting NA values with -structured arrays. The design for this involves a mask slot for each -field in the structured array, motivated by the fact that many important -uses of structured arrays involve treating the structured fields like -another dimension. - -Another feature that was discussed during the design process is the ability -to support more than one NA value. The design created supports this multi-NA -idea with the addition of a payload to the NA value and to the NA-mask. -The API has been designed in such a way that adding this feature in a future -release should be possible without changing existing API functions in any way. - -To see a more complete list of what is supported and unsupported in the -1.7 release of NumPy, please refer to the release notes. - -During the design phase of this feature, two implementation approaches -for NA values were discussed, called "mask" and "bitpattern". What -has been implemented is the "mask" approach, but the design document, -or "NEP", describes a way both approaches could co-operatively exist -in NumPy, since each has both pros and cons. This design document is -available in the file "doc/neps/missing-data.rst" of the NumPy source -code. diff --git a/doc/source/reference/arrays.rst b/doc/source/reference/arrays.rst index 91b43132a..40c9f755d 100644 --- a/doc/source/reference/arrays.rst +++ b/doc/source/reference/arrays.rst @@ -44,7 +44,6 @@ of also more complicated arrangements of data. arrays.indexing arrays.nditer arrays.classes - arrays.maskna maskedarray arrays.interface arrays.datetime diff --git a/doc/source/reference/c-api.array.rst b/doc/source/reference/c-api.array.rst index 8eedc689a..8736cbc3f 100644 --- a/doc/source/reference/c-api.array.rst +++ b/doc/source/reference/c-api.array.rst @@ -92,40 +92,6 @@ sub-types). A synonym for PyArray_DESCR, named to be consistent with the 'dtype' usage within Python. -.. cfunction:: npy_bool PyArray_HASMASKNA(PyArrayObject* arr) - - .. versionadded:: 1.7 - - Returns true if the array has an NA-mask, false otherwise. - -.. cfunction:: PyArray_Descr *PyArray_MASKNA_DTYPE(PyArrayObject* arr) - - .. versionadded:: 1.7 - - Returns a borrowed reference to the dtype property for the NA mask - of the array, or NULL if the array has no NA mask. This function does - not raise an exception when it returns NULL, it is simply returning - the appropriate field. - -.. cfunction:: char *PyArray_MASKNA_DATA(PyArrayObject* arr) - - .. versionadded:: 1.7 - - Returns a pointer to the raw data for the NA mask of the array, - or NULL if the array has no NA mask. This function does - not raise an exception when it returns NULL, it is simply returning - the appropriate field. - -.. cfunction:: npy_intp *PyArray_MASKNA_STRIDES(PyArrayObject* arr) - - .. versionadded:: 1.7 - - Returns a pointer to strides of the NA mask of the array, If the - array has no NA mask, the values contained in the array will be - invalid. The shape of the NA mask is identical to the shape of the - array itself, so the number of strides is always the same as the - number of array dimensions. - .. cfunction:: void PyArray_ENABLEFLAGS(PyArrayObject* arr, int flags) .. versionadded:: 1.7 @@ -254,11 +220,6 @@ From scratch provided *dims* and *strides* are copied into newly allocated dimension and strides arrays for the new array object. - Because the flags are ignored when *data* is NULL, you cannot - create a new array from scratch with an NA mask. If one is desired, - call the function :cfunc:`PyArray_AllocateMaskNA` after the array - is created. - .. cfunction:: PyObject* PyArray_NewLikeArray(PyArrayObject* prototype, NPY_ORDER order, PyArray_Descr* descr, int subok) .. versionadded:: 1.6 @@ -281,11 +242,6 @@ From scratch *prototype* to create the new array, otherwise it will create a base-class array. - The newly allocated array does not have an NA mask even if the - *prototype* provided does. If an NA mask is desired in the array, - call the function :cfunc:`PyArray_AllocateMaskNA` after the array - is created. - .. cfunction:: PyObject* PyArray_New(PyTypeObject* subtype, int nd, npy_intp* dims, int type_num, npy_intp* strides, void* data, int itemsize, int flags, PyObject* obj) This is similar to :cfunc:`PyArray_DescrNew` (...) except you @@ -475,31 +431,6 @@ From other objects with, then an error is raised. If *op* is not already an array, then this flag has no effect. - .. cvar:: NPY_ARRAY_MASKNA - - .. versionadded:: 1.7 - - Make sure the array has an NA mask associated with its data. - - .. cvar:: NPY_ARRAY_OWNMASKNA - - .. versionadded:: 1.7 - - Make sure the array has an NA mask which it owns - associated with its data. - - .. cvar:: NPY_ARRAY_ALLOWNA - - .. versionadded:: 1.7 - - To prevent simple errors from slipping in, arrays with NA - masks are not permitted to pass through by default. Instead - an exception is raised indicating the operation doesn't support - NA masks yet. In order to enable NA mask support, this flag - must be passed in to allow the NA mask through, signalling that - the later code is written appropriately to handle NA mask - semantics. - .. cvar:: NPY_ARRAY_BEHAVED :cdata:`NPY_ARRAY_ALIGNED` \| :cdata:`NPY_ARRAY_WRITEABLE` @@ -1415,24 +1346,6 @@ or :cdata:`NPY_ARRAY_F_CONTIGUOUS` can be determined by the ``strides``, would have returned an error because :cdata:`NPY_ARRAY_UPDATEIFCOPY` would not have been possible. -.. cvar:: NPY_ARRAY_MASKNA - - If this flag is enabled, the array has an NA mask associated with - the data. C code which interacts with the NA mask must follow - specific semantic rules about when to overwrite data and when not - to. The mask can be accessed through the functions - :cfunc:`PyArray_MASKNA_DTYPE`, :cfunc:`PyArray_MASKNA_DATA`, and - :cfunc:`PyArray_MASKNA_STRIDES`. - -.. cvar:: NPY_ARRAY_OWNMASKNA - - If this flag is enabled, the array owns its own NA mask. If it is not - enabled, the NA mask is a view into a different array's NA mask. - - In order to ensure that an array owns its own NA mask, you can - call :cfunc:`PyArray_AllocateMaskNA` with the parameter *ownmaskna* - set to 1. - :cfunc:`PyArray_UpdateFlags` (obj, flags) will update the ``obj->flags`` for ``flags`` which can be any of :cdata:`NPY_ARRAY_C_CONTIGUOUS`, :cdata:`NPY_ARRAY_F_CONTIGUOUS`, :cdata:`NPY_ARRAY_ALIGNED`, or @@ -2541,9 +2454,6 @@ Array Scalars if so, returns the appropriate array scalar. It should be used whenever 0-dimensional arrays could be returned to Python. - If *arr* is a 0-dimensional NA-masked array with its value hidden, - an instance of :ctype:`NpyNA *` is returned. - .. cfunction:: PyObject* PyArray_Scalar(void* data, PyArray_Descr* dtype, PyObject* itemsize) Return an array scalar object of the given enumerated *typenum* @@ -2756,19 +2666,6 @@ to. . No matter what is returned, you must DECREF the object returned by this routine in *address* when you are done with it. - If the input is an array with NA support, this will either raise - an error if it contains any NAs, or will make a copy of the array - without NA support if it does not contain any NAs. Use the function - :cfunc:`PyArray_AllowNAConverter` to support NA-arrays directly - and more efficiently. - -.. cfunction:: int PyArray_AllowConverter(PyObject* obj, PyObject** address) - - This is the same as :cfunc:`PyArray_Converter`, but allows arrays - with NA support to pass through untouched. This function was created - so that the existing converter could raise errors appropriately - for functions which have not been updated with NA support - .. cfunction:: int PyArray_OutputConverter(PyObject* obj, PyArrayObject** address) This is a default converter for output arrays given to @@ -2777,17 +2674,6 @@ to. *obj*) is TRUE then it is returned in *\*address* without incrementing its reference count. - If the output is an array with NA support, this will raise an error. - Use the function :cfunc:`PyArray_OutputAllowNAConverter` to support - NA-arrays directly. - -.. cfunction:: int PyArray_OutputAllowNAConverter(PyObject* obj, PyArrayObject** address) - - This is the same as :cfunc:`PyArray_OutputConverter`, but allows arrays - with NA support to pass through. This function was created - so that the existing output converter could raise errors appropriately - for functions which have not been updated with NA support - .. cfunction:: int PyArray_IntpConverter(PyObject* obj, PyArray_Dims* seq) Convert any Python sequence, *obj*, smaller than :cdata:`NPY_MAXDIMS` diff --git a/doc/source/reference/c-api.maskna.rst b/doc/source/reference/c-api.maskna.rst deleted file mode 100644 index 6abb624eb..000000000 --- a/doc/source/reference/c-api.maskna.rst +++ /dev/null @@ -1,592 +0,0 @@ -Array NA Mask API -================== - -.. sectionauthor:: Mark Wiebe - -.. index:: - pair: maskna; C-API - pair: C-API; maskna - -.. versionadded:: 1.7 - -NA Masks in Arrays ------------------- - -NumPy supports the idea of NA (Not Available) missing values in its -arrays. In the design document leading up to the implementation, two -mechanisms for this were proposed, NA masks and NA bitpatterns. NA masks -have been implemented as the first representation of these values. This -mechanism supports working with NA values similar to what the R language -provides, and when combined with views, allows one to temporarily mark -elements as NA without affecting the original data. - -The C API has been updated with mechanisms to allow NumPy extensions -to work with these masks, and this document provides some examples and -reference for the NA mask-related functions. - -The NA Object -------------- - -The main *numpy* namespace in Python has a new object called *NA*. -This is an instance of :ctype:`NpyNA`, which is a Python object -representing an NA value. This object is analogous to the NumPy -scalars, and is returned by :cfunc:`PyArray_Return` instead of -a scalar where appropriate. - -The global *numpy.NA* object is accessible from C as :cdata:`Npy_NA`. -This is an NA value with no data type or multi-NA payload. Use it -just as you would Py_None, except use :cfunc:`NpyNA_Check` to -see if an object is an :ctype:`NpyNA`, because :cdata:`Npy_NA` isn't -the only instance of NA possible. - -If you want to see whether a general PyObject* is NA, you should -use the API function :cfunc:`NpyNA_FromObject` with *suppress_error* -set to true. If this returns NULL, the object is not an NA, and if -it returns an NpyNA instance, the object is NA and you can then -access its *dtype* and *payload* fields as needed. - -To make new :ctype:`NpyNA` objects, use -:cfunc:`NpyNA_FromDTypeAndPayload`. The functions -:cfunc:`NpyNA_GetDType`, :cfunc:`NpyNA_IsMultiNA`, and -:cfunc:`NpyNA_GetPayload` provide access to the data members. - -Working With NA-Masked Arrays ------------------------------ - -The starting point for many C-API functions which manipulate NumPy -arrays is the function :cfunc:`PyArray_FromAny`. This function converts -a general PyObject* object into a NumPy ndarray, based on options -specified in the flags. To avoid surprises, this function does -not allow NA-masked arrays to pass through by default. - -To allow third-party code to work with NA-masked arrays which contain -no NAs, :cfunc:`PyArray_FromAny` will make a copy of the array into -a new array without an NA-mask, and return that. This allows for -proper interoperability in cases where it's possible until functions -are updated to provide optimal code paths for NA-masked arrays. - -To update a function with NA-mask support, add the flag -:cdata:`NPY_ARRAY_ALLOWNA` when calling :cfunc:`PyArray_FromAny`. -This allows NA-masked arrays to pass through untouched, and will -convert PyObject lists containing NA values into NA-masked arrays -instead of the alternative of switching to object arrays. - -To check whether an array has an NA-mask, use the function -:cfunc:`PyArray_HASMASKNA`, which checks the appropriate flag. -There are a number of things that one will typically want to do -when encountering an NA-masked array. We'll go through a few -of these cases. - -Forbidding Any NA Values -~~~~~~~~~~~~~~~~~~~~~~~~ - -The simplest case is to forbid any NA values. Note that it is better -to still be aware of the NA mask and explicitly test for NA values -than to leave out the :cdata:`NPY_ARRAY_ALLOWNA`, because it is possible -to avoid the extra copy that :cfunc:`PyArray_FromAny` will make. The -check for NAs will go something like this:: - - PyArrayObject *arr = ...; - int containsna; - - /* ContainsNA checks HASMASKNA() for you */ - containsna = PyArray_ContainsNA(arr, NULL, NULL); - /* Error case */ - if (containsna < 0) { - return NULL; - } - /* If it found an NA */ - else if (containsna) { - PyErr_SetString(PyExc_ValueError, - "this operation does not support arrays with NA values"); - return NULL; - } - -After this check, you can be certain that the array doesn't contain any -NA values, and can proceed accordingly. For example, if you iterate -over the elements of the array, you may pass the flag -:cdata:`NPY_ITER_IGNORE_MASKNA` to iterate over the data without -touching the NA-mask at all. - -Manipulating NA Values -~~~~~~~~~~~~~~~~~~~~~~ - -The semantics of the NA-mask demand that whenever an array element -is hidden by the NA-mask, no computations are permitted to modify -the data backing that element. The :ctype:`NpyIter` provides -a number of flags to assist with visiting both the array data -and the mask data simultaneously, and preserving the masking semantics -even when buffering is required. - -The main flag for iterating over NA-masked arrays is -:cdata:`NPY_ITER_USE_MASKNA`. For each iterator operand which has this -flag specified, a new operand is added to the end of the iterator operand -list, and is set to iterate over the original operand's NA-mask. Operands -which do not have an NA mask are permitted as well when they are flagged -as read-only. The new operand in this case points to a single exposed -mask value and all its strides are zero. The latter feature is useful -when combining multiple read-only inputs, where some of them have masks. - -Accumulating NA Values -~~~~~~~~~~~~~~~~~~~~~~ - -More complex operations, like the NumPy ufunc reduce functions, need -to take extra care to follow the masking semantics. If we accumulate -the NA mask and the data values together, we could discover half way -through that the output is NA, and that we have violated the contract -to never change the underlying output value when it is being assigned -NA. - -The solution to this problem is to first accumulate the NA-mask as necessary -to produce the output's NA-mask, then accumulate the data values without -touching NA-masked values in the output. The parameter *preservena* in -functions like :cfunc:`PyArray_AssignArray` can assist when initializing -values in such an algorithm. - -Example NA-Masked Operation in C --------------------------------- - -As an example, let's implement a simple binary NA-masked operation -for the double dtype. We'll make a divide operation which turns -divide by zero into NA instead of Inf or NaN. - -To start, we define the function prototype and some basic -:ctype:`NpyIter` boilerplate setup. We'll make a function which -supports an optional *out* parameter, which may be NULL.:: - - static PyArrayObject* - SpecialDivide(PyArrayObject* a, PyArrayObject* b, PyArrayObject *out) - { - NpyIter *iter = NULL; - PyArrayObject *op[3]; - PyArray_Descr *dtypes[3]; - npy_uint32 flags, op_flags[3]; - - /* Iterator construction parameters */ - op[0] = a; - op[1] = b; - op[2] = out; - - dtypes[0] = PyArray_DescrFromType(NPY_DOUBLE); - if (dtypes[0] == NULL) { - return NULL; - } - dtypes[1] = dtypes[0]; - dtypes[2] = dtypes[0]; - - flags = NPY_ITER_BUFFERED | - NPY_ITER_EXTERNAL_LOOP | - NPY_ITER_GROWINNER | - NPY_ITER_REFS_OK | - NPY_ITER_ZEROSIZE_OK; - - /* Every operand gets the flag NPY_ITER_USE_MASKNA */ - op_flags[0] = NPY_ITER_READONLY | - NPY_ITER_ALIGNED | - NPY_ITER_USE_MASKNA; - op_flags[1] = op_flags[0]; - op_flags[2] = NPY_ITER_WRITEONLY | - NPY_ITER_ALIGNED | - NPY_ITER_USE_MASKNA | - NPY_ITER_NO_BROADCAST | - NPY_ITER_ALLOCATE; - - iter = NpyIter_MultiNew(3, op, flags, NPY_KEEPORDER, - NPY_SAME_KIND_CASTING, op_flags, dtypes); - /* Don't need the dtype reference anymore */ - Py_DECREF(dtypes[0]); - if (iter == NULL) { - return NULL; - } - -At this point, the input operands have been validated according to -the casting rule, the shapes of the arrays have been broadcast together, -and any buffering necessary has been prepared. This means we can -dive into the inner loop of this function.:: - - ... - if (NpyIter_GetIterSize(iter) > 0) { - NpyIter_IterNextFunc *iternext; - char **dataptr; - npy_intp *stridesptr, *countptr; - - /* Variables needed for looping */ - iternext = NpyIter_GetIterNext(iter, NULL); - if (iternext == NULL) { - NpyIter_Deallocate(iter); - return NULL; - } - dataptr = NpyIter_GetDataPtrArray(iter); - stridesptr = NpyIter_GetInnerStrideArray(iter); - countptr = NpyIter_GetInnerLoopSizePtr(iter); - -The loop gets a bit messy when dealing with NA-masks, because it -doubles the number of operands being processed in the iterator. Here -we are naming things clearly so that the content of the innermost loop -can be easy to work with.:: - - ... - do { - /* Data pointers and strides needed for innermost loop */ - char *data_a = dataptr[0], *data_b = dataptr[1]; - char *data_out = dataptr[2]; - char *maskna_a = dataptr[3], *maskna_b = dataptr[4]; - char *maskna_out = dataptr[5]; - npy_intp stride_a = stridesptr[0], stride_b = stridesptr[1]; - npy_intp stride_out = strides[2]; - npy_intp maskna_stride_a = stridesptr[3]; - npy_intp maskna_stride_b = stridesptr[4]; - npy_intp maskna_stride_out = stridesptr[5]; - npy_intp i, count = *countptr; - - for (i = 0; i < count; ++i) { - -Here is the code for performing one special division. We use -the functions :cfunc:`NpyMaskValue_IsExposed` and -:cfunc:`NpyMaskValue_Create` to work with the masks, in order to be -as general as possible. These are inline functions, and the compiler -optimizer should be able to produce the same result as if you performed -these operations directly inline here.:: - - ... - /* If neither of the inputs are NA */ - if (NpyMaskValue_IsExposed((npy_mask)*maskna_a) && - NpyMaskValue_IsExposed((npy_mask)*maskna_b)) { - double a_val = *(double *)data_a; - double b_val = *(double *)data_b; - /* Do the divide if 'b' isn't zero */ - if (b_val != 0.0) { - *(double *)data_out = a_val / b_val; - /* Need to also set this element to exposed */ - *maskna_out = NpyMaskValue_Create(1, 0); - } - /* Otherwise output an NA without touching its data */ - else { - *maskna_out = NpyMaskValue_Create(0, 0); - } - } - /* Turn the output into NA without touching its data */ - else { - *maskna_out = NpyMaskValue_Create(0, 0); - } - - data_a += stride_a; - data_b += stride_b; - data_out += stride_out; - maskna_a += maskna_stride_a; - maskna_b += maskna_stride_b; - maskna_out += maskna_stride_out; - } - } while (iternext(iter)); - } - -A little bit more boilerplate for returning the result from the iterator, -and the function is done.:: - - ... - if (out == NULL) { - out = NpyIter_GetOperandArray(iter)[2]; - } - Py_INCREF(out); - NpyIter_Deallocate(iter); - - return out; - } - -To run this example, you can create a simple module with a C-file spdiv_mod.c -consisting of:: - - #include <Python.h> - #include <numpy/arrayobject.h> - - /* INSERT SpecialDivide source code here */ - - static PyObject * - spdiv(PyObject *self, PyObject *args, PyObject *kwds) - { - PyArrayObject *a, *b, *out = NULL; - static char *kwlist[] = {"a", "b", "out", NULL}; - - if (!PyArg_ParseTupleAndKeywords(args, kwds, "O&O&|O&", kwlist, - &PyArray_AllowNAConverter, &a, - &PyArray_AllowNAConverter, &b, - &PyArray_OutputAllowNAConverter, &out)) { - return NULL; - } - - /* - * The usual NumPy way is to only use PyArray_Return when - * the 'out' parameter is not provided. - */ - if (out == NULL) { - return PyArray_Return(SpecialDivide(a, b, out)); - } - else { - return (PyObject *)SpecialDivide(a, b, out); - } - } - - static PyMethodDef SpDivMethods[] = { - {"spdiv", (PyCFunction)spdiv, METH_VARARGS | METH_KEYWORDS, NULL}, - {NULL, NULL, 0, NULL} - }; - - - PyMODINIT_FUNC initspdiv_mod(void) - { - PyObject *m; - - m = Py_InitModule("spdiv_mod", SpDivMethods); - if (m == NULL) { - return; - } - - /* Make sure NumPy is initialized */ - import_array(); - } - -Create a setup.py file like:: - - #!/usr/bin/env python - def configuration(parent_package='',top_path=None): - from numpy.distutils.misc_util import Configuration - config = Configuration('.',parent_package,top_path) - config.add_extension('spdiv_mod',['spdiv_mod.c']) - return config - - if __name__ == "__main__": - from numpy.distutils.core import setup - setup(configuration=configuration) - -With these two files in a directory by itself, run:: - - $ python setup.py build_ext --inplace - -and the file spdiv_mod.so (or .dll) will be placed in the same directory. -Now you can try out this sample, to see how it behaves.:: - - >>> import numpy as np - >>> from spdiv_mod import spdiv - -Because we used :cfunc:`PyArray_Return` when wrapping SpecialDivide, -it returns scalars like any typical NumPy function does:: - - >>> spdiv(1, 2) - 0.5 - >>> spdiv(2, 0) - NA(dtype='float64') - >>> spdiv(np.NA, 1.5) - NA(dtype='float64') - -Here we can see how NAs propagate, and how 0 in the output turns into NA -as desired.:: - - >>> a = np.arange(6) - >>> b = np.array([0,np.NA,0,2,1,0]) - >>> spdiv(a, b) - array([ NA, NA, NA, 1.5, 4. , NA]) - -Finally, we can see the masking behavior by creating a masked -view of an array. The ones in *c_orig* are preserved whereever -NA got assigned.:: - - >>> c_orig = np.ones(6) - >>> c = c_orig.view(maskna=True) - >>> spdiv(a, b, out=c) - array([ NA, NA, NA, 1.5, 4. , NA]) - >>> c_orig - array([ 1. , 1. , 1. , 1.5, 4. , 1. ]) - -NA Object Data Type -------------------- - -.. ctype:: NpyNA - - This is the C object corresponding to objects of type - numpy.NAType. The fields themselves are hidden from consumers of the - API, you must use the functions provided to create new NA objects - and get their properties. - - This object contains two fields, a :ctype:`PyArray_Descr *` dtype - which is either NULL or indicates the data type the NA represents, - and a payload which is there for the future addition of multi-NA support. - -.. cvar:: Npy_NA - - This is a global singleton, similar to Py_None, which is the - *numpy.NA* object. Note that unlike Py_None, multiple NAs may be - created, for instance with different multi-NA payloads or with - different dtypes. If you want to return an NA with no payload - or dtype, return a new reference to Npy_NA. - -NA Object Functions -------------------- - -.. cfunction:: NpyNA_Check(obj) - - Evaluates to true if *obj* is an instance of :ctype:`NpyNA`. - -.. cfunction:: PyArray_Descr* NpyNA_GetDType(NpyNA* na) - - Returns the *dtype* field of the NA object, which is NULL when - the NA has no dtype. Does not raise an error. - -.. cfunction:: npy_bool NpyNA_IsMultiNA(NpyNA* na) - - Returns true if the NA has a multi-NA payload, false otherwise. - -.. cfunction:: int NpyNA_GetPayload(NpyNA* na) - - Gets the multi-NA payload of the NA, or 0 if *na* doesn't have - a multi-NA payload. - -.. cfunction:: NpyNA* NpyNA_FromObject(PyObject* obj, int suppress_error) - - If *obj* represents an object which is NA, for example if it - is an :ctype:`NpyNA`, or a zero-dimensional NA-masked array with - its value hidden by the mask, returns a new reference to an - :ctype:`NpyNA` object representing *obj*. Otherwise returns - NULL. - - If *suppress_error* is true, this function doesn't raise an exception - when the input isn't NA and it returns NULL, otherwise it does. - -.. cfunction:: NpyNA* NpyNA_FromDTypeAndPayload(PyArray_Descr *dtype, int multina, int payload) - - - Constructs a new :ctype:`NpyNA` instance with the specified *dtype* - and *payload*. For an NA with no dtype, provide NULL in *dtype*. - - Until multi-NA is implemented, just pass 0 for both *multina* - and *payload*. - -NA Mask Functions ------------------ - -A mask dtype can be one of three different possibilities. It can -be :cdata:`NPY_BOOL`, :cdata:`NPY_MASK`, or a struct dtype whose -fields are all mask dtypes. - -A mask of :cdata:`NPY_BOOL` can just indicate True, with underlying -value 1, for an element that is exposed, and False, with underlying -value 0, for an element that is hidden. - -A mask of :cdata:`NPY_MASK` can additionally carry a payload which -is a value from 0 to 127. This allows for missing data implementations -based on such masks to support multiple reasons for data being missing. - -A mask of a struct dtype can only pair up with another struct dtype -with the same field names. In this way, each field of the mask controls -the masking for the corresponding field in the associated data array. - -Inline functions to work with masks are as follows. - -.. cfunction:: npy_bool NpyMaskValue_IsExposed(npy_mask mask) - - Returns true if the data element corresponding to the mask element - can be modified, false if not. - -.. cfunction:: npy_uint8 NpyMaskValue_GetPayload(npy_mask mask) - - Returns the payload contained in the mask. The return value - is between 0 and 127. - -.. cfunction:: npy_mask NpyMaskValue_Create(npy_bool exposed, npy_int8 payload) - - Creates a mask from a flag indicating whether the element is exposed - or not and a payload value. - -NA Mask Array Functions ------------------------ - -.. cfunction:: int PyArray_AllocateMaskNA(PyArrayObject *arr, npy_bool ownmaskna, npy_bool multina, npy_mask defaultmask) - - Allocates an NA mask for the array *arr* if necessary. If *ownmaskna* - if false, it only allocates an NA mask if none exists, but if - *ownmaskna* is true, it also allocates one if the NA mask is a view - into another array's NA mask. Here are the two most common usage - patterns:: - - /* Use this to make sure 'arr' has an NA mask */ - if (PyArray_AllocateMaskNA(arr, 0, 0, 1) < 0) { - return NULL; - } - - /* Use this to make sure 'arr' owns an NA mask */ - if (PyArray_AllocateMaskNA(arr, 1, 0, 1) < 0) { - return NULL; - } - - The parameter *multina* is provided for future expansion, when - mult-NA support is added to NumPy. This will affect the dtype of - the NA mask, which currently must be always NPY_BOOL, but will be - NPY_MASK for arrays multi-NA when this is implemented. - - When a new NA mask is allocated, and the mask needs to be filled, - it uses the value *defaultmask*. In nearly all cases, this should be set - to 1, indicating that the elements are exposed. If a mask is allocated - just because of *ownmaskna*, the existing mask values are copied - into the newly allocated mask. - - This function returns 0 for success, -1 for failure. - -.. cfunction:: npy_bool PyArray_HasNASupport(PyArrayObject *arr) - - Returns true if *arr* is an array which supports NA. This function - exists because the design for adding NA proposed two mechanisms - for NAs in NumPy, NA masks and NA bitpatterns. Currently, just - NA masks have been implemented, but when NA bitpatterns are implemented - this would return true for arrays with an NA bitpattern dtype as well. - -.. cfunction:: int PyArray_ContainsNA(PyArrayObject *arr, PyArrayObject *wheremask, npy_bool *whichna) - - Checks whether the array *arr* contains any NA values. - - If *wheremask* is non-NULL, it must be an NPY_BOOL mask which can - broadcast onto *arr*. Whereever the where mask is True, *arr* - is checked for NA, and whereever it is False, the *arr* value is - ignored. - - The parameter *whichna* is provided for future expansion to multi-NA - support. When implemented, this parameter will be a 128 element - array of npy_bool, with the value True for the NA values that are - being looked for. - - This function returns 1 when the array contains NA values, 0 when - it does not, and -1 when a error has occurred. - -.. cfunction:: int PyArray_AssignNA(PyArrayObject *arr, NpyNA *na, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna) - - Assigns the given *na* value to elements of *arr*. - - If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable - onto *arr*, and only elements of *arr* with a corresponding value - of True in *wheremask* will have *na* assigned. - - The parameters *preservena* and *preservewhichna* are provided for - future expansion to multi-NA support. With a single NA value, one - NA cannot be distinguished from another, so preserving NA values - does not make sense. With multiple NA values, preserving NA values - becomes an important concept because that implies not overwriting the - multi-NA payloads. The parameter *preservewhichna* will be a 128 element - array of npy_bool, indicating which NA payloads to preserve. - - This function returns 0 for success, -1 for failure. - -.. cfunction:: int PyArray_AssignMaskNA(PyArrayObject *arr, npy_mask maskvalue, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna) - - Assigns the given NA mask *maskvalue* to elements of *arr*. - - If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable - onto *arr*, and only elements of *arr* with a corresponding value - of True in *wheremask* will have the NA *maskvalue* assigned. - - The parameters *preservena* and *preservewhichna* are provided for - future expansion to multi-NA support. With a single NA value, one - NA cannot be distinguished from another, so preserving NA values - does not make sense. With multiple NA values, preserving NA values - becomes an important concept because that implies not overwriting the - multi-NA payloads. The parameter *preservewhichna* will be a 128 element - array of npy_bool, indicating which NA payloads to preserve. - - This function returns 0 for success, -1 for failure. diff --git a/doc/source/reference/c-api.rst b/doc/source/reference/c-api.rst index 6e97cec36..b1a5eb477 100644 --- a/doc/source/reference/c-api.rst +++ b/doc/source/reference/c-api.rst @@ -45,7 +45,6 @@ code. c-api.dtype c-api.array c-api.iterator - c-api.maskna c-api.ufunc c-api.generalized-ufuncs c-api.coremath diff --git a/doc/source/reference/routines.polynomials.classes.rst b/doc/source/reference/routines.polynomials.classes.rst index 2cfbec5d9..9294728c8 100644 --- a/doc/source/reference/routines.polynomials.classes.rst +++ b/doc/source/reference/routines.polynomials.classes.rst @@ -322,31 +322,3 @@ illustrated below for a fit to a noisy sin curve. >>> p.window array([-1., 1.]) >>> plt.show() - -The fit will ignore data points masked with NA. We demonstrate this with -the previous example, but add an outlier that messes up the fit, then mask -it out. - -.. plot:: - - >>> import numpy as np - >>> import matplotlib.pyplot as plt - >>> from numpy.polynomial import Chebyshev as T - >>> np.random.seed(11) - >>> x = np.linspace(0, 2*np.pi, 20) - >>> y = np.sin(x) + np.random.normal(scale=.1, size=x.shape) - >>> y[10] = 2 - >>> p = T.fit(x, y, 5) - >>> plt.plot(x, y, 'o') - [<matplotlib.lines.Line2D object at 0x2136c10>] - >>> xx, yy = p.linspace() - >>> plt.plot(xx, yy, lw=2, label="unmasked") - [<matplotlib.lines.Line2D object at 0x1cf2890>] - >>> ym = y.view(maskna=1) - >>> ym[10] = np.NA - >>> p = T.fit(x, ym, 5) - >>> xx, yy = p.linspace() - >>> plt.plot(xx, yy, lw=2, label="masked") - >>> plt.legend(loc="upper right") - <matplotlib.legend.Legend object at 0x3b3ee10> - >>> plt.show() diff --git a/doc/source/reference/routines.rst b/doc/source/reference/routines.rst index 10d12330c..37b16de59 100644 --- a/doc/source/reference/routines.rst +++ b/doc/source/reference/routines.rst @@ -34,7 +34,6 @@ indentation. routines.linalg routines.logic routines.ma - routines.maskna routines.math routines.matlib routines.numarray |