summaryrefslogtreecommitdiff
path: root/doc/release/2.0.0-notes.rst
blob: ca9bc4147f5e8240b5fc3d53503724b9dd1acab3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
=========================
NumPy 2.0.0 Release Notes
=========================

[Possibly 1.7.0 release notes, as ABI compatibility is still being maintained]

Highlights
==========


New features
============


Mask-based NA missing values
----------------------------

Support for NA missing values similar to those in R has been implemented.
This was done by adding optional NA masks to the core array object.

While a significant amount of the NumPy functionality has been extended to
support NA masks, not everything is yet supported. Here is a list of things
that do and do not work with NA values:

What works with NA:
    * Basic indexing and slicing, as well as full boolean mask indexing.
    * All element-wise ufuncs.
    * UFunc.reduce methods, with a new skipna parameter.
    * The nditer object.
    * Array methods:
       + ndarray.clip, ndarray.min, ndarray.max, ndarray.sum, ndarray.prod,
         ndarray.conjugate, ndarray.diagonal, ndarray.flatten
       + numpy.concatenate, numpy.column_stack, numpy.hstack,
         numpy.vstack, numpy.dstack, numpy.squeeze

What doesn't work with NA:
    * Fancy indexing, such as with lists and partial boolean masks.
    * ndarray.flat and any other methods that use the old iterator
      mechanism instead of the newer nditer.
    * Struct dtypes, which will have corresponding struct masks with
      one mask value per primitive field of the struct dtype.
    * UFunc.reduce of multi-dimensional arrays, with skipna=True and a ufunc
      that doesn't have an identity.
    * UFunc.accumulate, UFunc.reduceat.
    * np.logical_and, np.logical_or, np.all, and np.any don't satisfy the
      rules NA | True == True and NA & False == False yet.
    * Array methods:
       + ndarray.argmax, ndarray.argmin,
       + numpy.repeat, numpy.delete (relies on fancy indexing),
         numpy.append, numpy.insert (relies on fancy indexing),
         numpy.where,

Differences with R:
    * R's parameter rm.na=T is spelled skipna=True in NumPy.
    * np.isna(nan) is False, but R's is.na(nan) is TRUE. This is because
      NumPy's NA is treated independently of the underlying data type.
    * Boolean indexing, where the result is compressed to just
      the elements with true in the mask, raises if the booelan mask
      has an NA value in it. This is because that value could be either
      True or False, meaning the count of the output array is actually
      NA. R treats this case in a manner inconsistent with the NA model,
      returning NA values in the spots where the boolean index has NA.
      This may have a practical advantage in spite of violating the
      NA theoretical model, so NumPy could adopt the behavior if necessary



Custom formatter for printing arrays
------------------------------------



Changes
=======

The default casting rule for UFunc out= parameters has been changed from
'unsafe' to 'same_kind'.  Most usages which violate the 'same_kind'
rule are likely bugs, so this change may expose previously undetected
errors in projects that depend on NumPy.

The functions np.diag, np.diagonal, and <ndarray>.diagonal now return a
view into the original array instead of making a copy. This makes these
functions more consistent with NumPy's general approach of taking views
where possible, and performs much faster as well.

The function np.concatenate tries to match the layout of its input
arrays. Previously, the layout did not follow any particular reason,
and depended in an undesirable way on the particular axis chosen for
concatenation. A bug was also fixed which silently allowed out of bounds
axis arguments.

The ufuncs logical_or, logical_and, and logical_not now follow Python's
behavior with object arrays, instead of trying to call methods on the
objects. For example the expression (3 and 'test') produces the string
'test', and now np.logical_and(np.array(3, 'O'), np.array('test', 'O'))
produces 'test' as well.

Deprecations
============

Specifying a custom string formatter with a `_format` array attribute is
deprecated. The new `formatter` keyword in ``numpy.set_printoptions`` or
``numpy.array2string`` can be used instead.

In the C API, direct access to the fields of PyArrayObject* has been
deprecated. Direct access has been recommended against for many releases, but
now you can test your code against the deprecated C API by #defining
NPY_NO_DEPRECATED_API before including any NumPy headers. Expect
something similar for PyArray_Descr* and other core objects in the
future as preparation for NumPy 2.0.