1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
|
Array NA Mask API
==================
.. sectionauthor:: Mark Wiebe
.. index::
pair: maskna; C-API
pair: C-API; maskna
.. versionadded:: 1.7
NA Masks in Arrays
------------------
NumPy supports the idea of NA (Not Available) missing values in its arrays.
In the design document leading up to the implementation, two mechanisms
for this were proposed, NA masks and NA bitpatterns. NA masks have been
implemented as the first representation of these values. This mechanism
supports working with NA values similar to the approach taking in the R
project, while when combined with views, allows one to temporarily mark
elements as NA since the mask is independent of the raw array data.
The C API has been updated with mechanisms to allow NumPy extensions
to work with these masks, and this document provides some examples and
reference for the NA mask-related functions.
The NA Object
-------------
The main *numpy* namespace in Python has a new object called *NA*.
This is an instance of :ctype:`NpyNA *`, which is a Python object
representing an NA value. This object is analogous to the NumPy
scalars, and is returned by :cfunc:`PyArray_Return` instead of
a scalar where appropriate.
The global *numpy.NA* object is accessible from C as :cdata:`Npy_NA`.
This is an NA value with no data type or multi-NA payload. Use it
just as you would Py_None, except use :cfunc:`NpyNA_Check` to
see if an object is an :ctype:`NpyNA *`, because :cdata:`Npy_NA` isn't
the only instance of NA possible.
If you want to see whether a general PyObject* is NA, you should
use the API function :cfunc:`NpyNA_FromObject` with *suppress_error*
set to true. If this returns NULL, the object is not an NA, and if
it returns an NpyNA instance, the object is NA and you can then
access its *dtype* and *payload* fields as needed.
To make new :ctype:`NpyNA *` objects, use
:cfunc:`NpyNA_FromDTypeAndPayload`, and the functions
:cfunc:`NpyNA_GetDType`, :cfunc:`NpyNA_IsMultiNA`, and
:cfunc:`NpyNA_GetPayload` provide access to the data members.
Example NA-Masked Operation in C
--------------------------------
NA Object Data Type
-------------------
.. ctype:: NpyNA *
This is the C object corresponding to objects of type
numpy.NAType. The fields themselves are hidden from consumers of the
API, you must use the functions provided to create new NA objects
and get their properties.
This object contains two fields, a :ctype:`PyArray_Descr *` dtype
which is either NULL or indicates the data type the NA represents,
and a payload which is there for the future addition of multi-NA support.
.. cvar:: Npy_NA
This is a global singleton, similar to Py_None, which is the
*numpy.NA* object. Note that unlike Py_None, multiple NAs may be
created, for instance with different multi-NA payloads or with
different dtypes. If you want to return an NA with no payload
or dtype, return a new reference to Npy_NA.
NA Object Functions
-------------------
.. cfunction:: NpyNA_Check(obj)
Evaluates to true if *obj* is an instance of :ctype:`NpyNA *`.
.. cfunction:: PyArray_Descr* NpyNA_GetDType(NpyNA* na)
Returns the *dtype* field of the NA object, which is NULL when
the NA has no dtype. Does not raise an error.
.. cfunction:: npy_bool NpyNA_IsMultiNA(NpyNA* na)
Returns true if the NA has a multi-NA payload, false otherwise.
.. cfunction:: int NpyNA_GetPayload(NpyNA* na)
Gets the multi-NA payload of the NA, or 0 if *na* doesn't have
a multi-NA payload.
.. cfunction:: NpyNA* NpyNA_FromObject(PyObject* obj, int suppress_error)
If *obj* represents an object which is NA, for example if it
is an :ctype:`NpyNA *`, or a zero-dimensional NA-masked array with
its value hidden by the mask, returns a new reference to an
:ctype:`NpyNA *` object representing *obj*. Otherwise returns
NULL.
If *suppress_error* is true, this function doesn't raise an exception
when the input isn't NA and it returns NULL, otherwise it does.
.. cfunction:: NpyNA* NpyNA_FromDTypeAndPayload(PyArray_Descr *dtype, int multina, int payload)
Constructs a new :ctype:`NpyNA *` instance with the specified *dtype*
and *payload*. For an NA with no dtype, provide NULL in *dtype*.
Until multi-NA is implemented, just pass 0 for both *multina*
and *payload*.
NA Mask Functions
-----------------
.. cfunction:: int PyArray_AllocateMaskNA(PyArrayObject *arr, npy_bool ownmaskna, npy_bool multina, npy_mask defaultmask)
Allocates an NA mask for the array *arr* if necessary. If *ownmaskna*
if false, it only allocates an NA mask if none exists, but if
*ownmaskna* is true, it also allocates one if the NA mask is a view
into another array's NA mask. Here are the two most common usage
patterns::
/* Use this to make sure 'arr' has an NA mask */
if (PyArray_AllocateMaskNA(arr, 0, 0, 1) < 0) {
return NULL;
}
/* Use this to make sure 'arr' owns an NA mask */
if (PyArray_AllocateMaskNA(arr, 1, 0, 1) < 0) {
return NULL;
}
The parameter *multina* is provided for future expansion, when
mult-NA support is added to NumPy. This will affect the dtype of
the NA mask, which currently must be always NPY_BOOL, but will be
NPY_MASK for arrays multi-NA when this is implemented.
When a new NA mask is allocated, and the mask needs to be filled,
it uses the value *defaultmask*. In nearly all cases, this should be set
to 1, indicating that the elements are exposed. If a mask is allocated
just because of *ownmaskna*, the existing mask values are copied
into the newly allocated mask.
This function returns 0 for success, -1 for failure.
.. cfunction:: npy_bool PyArray_HasNASupport(PyArrayObject *arr)
Returns true if *arr* is an array which supports NA. This function
exists because the design for adding NA proposed two mechanisms
for NAs in NumPy, NA masks and NA bitpatterns. Currently, just
NA masks have been implemented, but when NA bitpatterns are implemented
this would return true for arrays with an NA bitpattern dtype as well.
.. cfunction:: int PyArray_ContainsNA(PyArrayObject *arr, PyArrayObject *wheremask, npy_bool *whichna)
Checks whether the array *arr* contains any NA values.
If *wheremask* is non-NULL, it must be an NPY_BOOL mask which can
broadcast onto *arr*. Whereever the where mask is True, *arr*
is checked for NA, and whereever it is False, the *arr* value is
ignored.
The parameter *whichna* is provided for future expansion to multi-NA
support. When implemented, this parameter will be a 128 element
array of npy_bool, with the value True for the NA values that are
being looked for.
This function returns 1 when the array contains NA values, 0 when
it does not, and -1 when a error has occurred.
.. cfunction:: int PyArray_AssignNA(PyArrayObject *arr, NpyNA *na, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna)
Assigns the given *na* value to elements of *arr*.
If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable
onto *arr*, and only elements of *arr* with a corresponding value
of True in *wheremask* will have *na* assigned.
The parameters *preservena* and *preservewhichna* are provided for
future expansion to multi-NA support. With a single NA value, one
NA cannot be distinguished from another, so preserving NA values
does not make sense. With multiple NA values, preserving NA values
becomes an important concept because that implies not overwriting the
multi-NA payloads. The parameter *preservewhichna* will be a 128 element
array of npy_bool, indicating which NA payloads to preserve.
This function returns 0 for success, -1 for failure.
.. cfunction:: int PyArray_AssignMaskNA(PyArrayObject *arr, npy_mask maskvalue, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna)
Assigns the given NA mask *maskvalue* to elements of *arr*.
If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable
onto *arr*, and only elements of *arr* with a corresponding value
of True in *wheremask* will have the NA *maskvalue* assigned.
The parameters *preservena* and *preservewhichna* are provided for
future expansion to multi-NA support. With a single NA value, one
NA cannot be distinguished from another, so preserving NA values
does not make sense. With multiple NA values, preserving NA values
becomes an important concept because that implies not overwriting the
multi-NA payloads. The parameter *preservewhichna* will be a 128 element
array of npy_bool, indicating which NA payloads to preserve.
This function returns 0 for success, -1 for failure.
|