summaryrefslogtreecommitdiff
path: root/doc/source/reference/c-api.maskna.rst
blob: a7613e1b29569e24e6bf775630d7237ddaf724b7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
Array NA Mask API
==================

.. sectionauthor:: Mark Wiebe

.. index::
   pair: maskna; C-API
   pair: C-API; maskna

.. versionadded:: 1.7

NA Masks in Arrays
------------------

NumPy supports the idea of NA (Not Available) missing values in its arrays.
In the design document leading up to the implementation, two mechanisms
for this were proposed, NA masks and NA bitpatterns. NA masks have been
implemented as the first representation of these values. This mechanism
supports working with NA values similar to the approach taking in the R
project, while when combined with views, allows one to temporarily mark
elements as NA since the mask is independent of the raw array data.

The C API has been updated with mechanisms to allow NumPy extensions
to work with these masks, and this document provides some examples and
reference for the NA mask-related functions.

The NA Object
-------------

The main *numpy* namespace in Python has a new object called *NA*.
This is an instance of :ctype:`NpyNA *`, which is a Python object
representing an NA value. This object is analogous to the NumPy
scalars, and is returned by :cfunc:`PyArray_Return` instead of
a scalar where appropriate.

The global *numpy.NA* object is accessible from C as :cdata:`Npy_NA`.
This is an NA value with no data type or multi-NA payload. Use it
just as you would Py_None, except use :cfunc:`NpyNA_Check` to
see if an object is an :ctype:`NpyNA *`, because :cdata:`Npy_NA` isn't
the only instance of NA possible.

If you want to see whether a general PyObject* is NA, you should
use the API function :cfunc:`NpyNA_FromObject` with *suppress_error*
set to true. If this returns NULL, the object is not an NA, and if
it returns an NpyNA instance, the object is NA and you can then
access its *dtype* and *payload* fields as needed.

To make new :ctype:`NpyNA *` objects, use
:cfunc:`NpyNA_FromDTypeAndPayload`, and the functions
:cfunc:`NpyNA_GetDType`, :cfunc:`NpyNA_IsMultiNA`, and
:cfunc:`NpyNA_GetPayload` provide access to the data members.

Example NA-Masked Operation in C
--------------------------------

NA Object Data Type
-------------------

.. ctype:: NpyNA *

    This is the C object corresponding to objects of type
    numpy.NAType. The fields themselves are hidden from consumers of the
    API, you must use the functions provided to create new NA objects
    and get their properties.

    This object contains two fields, a :ctype:`PyArray_Descr *` dtype
    which is either NULL or indicates the data type the NA represents,
    and a payload which is there for the future addition of multi-NA support.

.. cvar:: Npy_NA

    This is a global singleton, similar to Py_None, which is the
    *numpy.NA* object. Note that unlike Py_None, multiple NAs may be
    created, for instance with different multi-NA payloads or with
    different dtypes. If you want to return an NA with no payload
    or dtype, return a new reference to Npy_NA.

NA Object Functions
-------------------

.. cfunction:: NpyNA_Check(obj)

    Evaluates to true if *obj* is an instance of :ctype:`NpyNA *`.

.. cfunction:: PyArray_Descr* NpyNA_GetDType(NpyNA* na)

    Returns the *dtype* field of the NA object, which is NULL when
    the NA has no dtype.  Does not raise an error.

.. cfunction:: npy_bool NpyNA_IsMultiNA(NpyNA* na)

    Returns true if the NA has a multi-NA payload, false otherwise.

.. cfunction:: int NpyNA_GetPayload(NpyNA* na)

    Gets the multi-NA payload of the NA, or 0 if *na* doesn't have
    a multi-NA payload.

.. cfunction:: NpyNA* NpyNA_FromObject(PyObject* obj, int suppress_error)

    If *obj* represents an object which is NA, for example if it
    is an :ctype:`NpyNA *`, or a zero-dimensional NA-masked array with
    its value hidden by the mask, returns a new reference to an
    :ctype:`NpyNA *` object representing *obj*. Otherwise returns
    NULL.

    If *suppress_error* is true, this function doesn't raise an exception
    when the input isn't NA and it returns NULL, otherwise it does.

.. cfunction:: NpyNA* NpyNA_FromDTypeAndPayload(PyArray_Descr *dtype, int multina, int payload)


    Constructs a new :ctype:`NpyNA *` instance with the specified *dtype*
    and *payload*. For an NA with no dtype, provide NULL in *dtype*.

    Until multi-NA is implemented, just pass 0 for both *multina*
    and *payload*.

NA Mask Functions
-----------------

.. cfunction:: int PyArray_AllocateMaskNA(PyArrayObject *arr, npy_bool ownmaskna, npy_bool multina, npy_mask defaultmask)

    Allocates an NA mask for the array *arr* if necessary. If *ownmaskna*
    if false, it only allocates an NA mask if none exists, but if
    *ownmaskna* is true, it also allocates one if the NA mask is a view
    into another array's NA mask. Here are the two most common usage
    patterns::

        /* Use this to make sure 'arr' has an NA mask */
        if (PyArray_AllocateMaskNA(arr, 0, 0, 1) < 0) {
            return NULL;
        }

        /* Use this to make sure 'arr' owns an NA mask */
        if (PyArray_AllocateMaskNA(arr, 1, 0, 1) < 0) {
            return NULL;
        }

    The parameter *multina* is provided for future expansion, when
    mult-NA support is added to NumPy. This will affect the dtype of
    the NA mask, which currently must be always NPY_BOOL, but will be
    NPY_MASK for arrays multi-NA when this is implemented.

    When a new NA mask is allocated, and the mask needs to be filled,
    it uses the value *defaultmask*. In nearly all cases, this should be set
    to 1, indicating that the elements are exposed. If a mask is allocated
    just because of *ownmaskna*, the existing mask values are copied
    into the newly allocated mask.

    This function returns 0 for success, -1 for failure.

.. cfunction:: npy_bool PyArray_HasNASupport(PyArrayObject *arr)

    Returns true if *arr* is an array which supports NA. This function
    exists because the design for adding NA proposed two mechanisms
    for NAs in NumPy, NA masks and NA bitpatterns. Currently, just
    NA masks have been implemented, but when NA bitpatterns are implemented
    this would return true for arrays with an NA bitpattern dtype as well.

.. cfunction:: int PyArray_ContainsNA(PyArrayObject *arr, PyArrayObject *wheremask, npy_bool *whichna)

    Checks whether the array *arr* contains any NA values.

    If *wheremask* is non-NULL, it must be an NPY_BOOL mask which can
    broadcast onto *arr*. Whereever the where mask is True, *arr*
    is checked for NA, and whereever it is False, the *arr* value is
    ignored.

    The parameter *whichna* is provided for future expansion to multi-NA
    support. When implemented, this parameter will be a 128 element
    array of npy_bool, with the value True for the NA values that are
    being looked for.

    This function returns 1 when the array contains NA values, 0 when
    it does not, and -1 when a error has occurred.

.. cfunction:: int PyArray_AssignNA(PyArrayObject *arr, NpyNA *na, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna)

    Assigns the given *na* value to elements of *arr*.

    If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable
    onto *arr*, and only elements of *arr* with a corresponding value
    of True in *wheremask* will have *na* assigned.

    The parameters *preservena* and *preservewhichna* are provided for
    future expansion to multi-NA support. With a single NA value, one
    NA cannot be distinguished from another, so preserving NA values
    does not make sense. With multiple NA values, preserving NA values
    becomes an important concept because that implies not overwriting the
    multi-NA payloads. The parameter *preservewhichna* will be a 128 element
    array of npy_bool, indicating which NA payloads to preserve.

    This function returns 0 for success, -1 for failure.

.. cfunction:: int PyArray_AssignMaskNA(PyArrayObject *arr, npy_mask maskvalue, PyArrayObject *wheremask, npy_bool preservena, npy_bool *preservewhichna)

    Assigns the given NA mask *maskvalue* to elements of *arr*.

    If *wheremask* is non-NULL, it must be an NPY_BOOL array broadcastable
    onto *arr*, and only elements of *arr* with a corresponding value
    of True in *wheremask* will have the NA *maskvalue* assigned.

    The parameters *preservena* and *preservewhichna* are provided for
    future expansion to multi-NA support. With a single NA value, one
    NA cannot be distinguished from another, so preserving NA values
    does not make sense. With multiple NA values, preserving NA values
    becomes an important concept because that implies not overwriting the
    multi-NA payloads. The parameter *preservewhichna* will be a 128 element
    array of npy_bool, indicating which NA payloads to preserve.

    This function returns 0 for success, -1 for failure.