summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/source/reference/arrays.maskna.rst44
1 files changed, 34 insertions, 10 deletions
diff --git a/doc/source/reference/arrays.maskna.rst b/doc/source/reference/arrays.maskna.rst
index cd075a91b..c55432639 100644
--- a/doc/source/reference/arrays.maskna.rst
+++ b/doc/source/reference/arrays.maskna.rst
@@ -14,10 +14,19 @@ core ndarray. This system is highly flexible, allowing NAs to be used
with any underlying dtype, and supports creating multiple views of the same
data with different choices of NAs.
+Other Missing Data Approaches
+=============================
+
The previous recommended approach for working with missing values was the
:mod:`numpy.ma` module, a subclass of ndarray written purely in Python.
By placing NA-masks directly in the NumPy core, it's possible to avoid
the need for calling "ma.<func>(arr)" instead of "np.<func>(arr)".
+
+Another approach many people have taken is to use NaN as the
+placeholder for missing values. There are a few functions
+like :func:`numpy.nansum` which behave similarly to usage of the
+ufunc.reduce *skipna* parameter.
+
As experienced in the R language, a programming interface based on an
NA placeholder is generally more intuitive to work with than direct
mask manipulation.
@@ -28,13 +37,15 @@ Missing Data Model
The model adopted by NumPy for missing values is that NA is a
placeholder for a value which is there, but is unknown to computations.
The value may be temporarily hidden by the mask, or may be unknown
-for any reason, but all computations must reason as if there is a value
-in existence hidden in a box.
+for any reason.
This model is layered on top of the existing NumPy dtypes, so the value
-behind the NA may be any value the dtype can take on. For example, an
-NA with a floating point dtype could be any finite number, Inf, or NaN,
-computations may not assume anything about its value.
+behind the NA may be any value the dtype can take on. Consider substiting
+different values for the NA in a computation, and see what comes out. If
+different substitutions produce different outputs, that output must
+be NA as well. If all possible substitutions produce the same output,
+it is acceptable, though not strictly necessary, to return that output
+instead of an NA.
A consequence of separating the NA model from the dtype is that unlike
in R, NaNs are not considered to be NA. An NA is a value that is completely
@@ -48,6 +59,12 @@ same answer. This means that np.logical_or(np.NA, True) can produce
True instead of the more conservative np.NA. There is a similar case
for np.logical_and.
+A similar, but slightly deceptive example is wanting to treat (NA * 0.0)
+as 0.0 instead of as NA. This is invalid because the NA might be Inf
+or NaN, in which case the result is NaN instead of 0.0. This idea is
+valid for integer dtypes, but NumPy still chooses to return NA because
+checking this special case would adversely affect performance.
+
The NA Singleton
================
@@ -72,6 +89,11 @@ To check whether a value is NA, use the :func:`numpy.isna` function::
False
>>> np.isna(np.nan)
False
+ >>> np.isna(np.NA * 3)
+ True
+ >>> (np.NA * 3) is np.NA
+ False
+
Creating NA-Masked Arrays
=========================
@@ -221,9 +243,7 @@ Iterating Over NA-Masked Arrays
===============================
The :class:`nditer` object can be used to iterate over arrays with
-NA values just like over normal arrays. The one additional detail to
-be aware of is that the per-operand flag 'use_maskna' must be specified
-when they are being used.::
+NA values just like over normal arrays.::
>>> a = np.array([1,3,np.NA])
>>> for x in np.nditer(a):
@@ -231,13 +251,17 @@ when they are being used.::
...
1 3 NA
>>> b = np.zeros(3, maskna=True)
- >>> for x, y in np.nditer([a,b], op_flags=[['readonly','use_maskna'],
- ... ['writeonly', 'use_maskna']]):
+ >>> for x, y in np.nditer([a,b], op_flags=[['readonly'],
+ ... ['writeonly']]):
... y[...] = -x
...
>>> b
array([-1., -3., NA])
+When using the C-API version of the nditer, one must explicitly
+add the NPY_ITER_USE_MASKNA flag and take care to deal with the NA
+mask appropriately. In the Python exposure, this flag is added
+automatically.
Planned Future Additions
========================