========================= NumPy 2.0.0 Release Notes ========================= [Possibly 1.7.0 release notes, as ABI compatibility is still being maintained] Highlights ========== New features ============ Mask-based NA missing values ---------------------------- Preliminary support for NA missing values similar to those in R has been implemented. This was done by adding optional NA masks to the core array object. While a significant amount of the NumPy functionality has been extended to support NA masks, not everything is yet supported. Here is an (incomplete) list of things that do and do not work with NA values: What works with NA: * Basic indexing and slicing, as well as full boolean mask indexing. * All element-wise ufuncs. * All UFunc.reduce methods, with a new skipna parameter. * np.all and np.any satisfy the rules NA | True == True and NA & False == False * The nditer object. * Array methods: + ndarray.clip, ndarray.min, ndarray.max, ndarray.sum, ndarray.prod, ndarray.conjugate, ndarray.diagonal, ndarray.flatten + numpy.concatenate, numpy.column_stack, numpy.hstack, numpy.vstack, numpy.dstack, numpy.squeeze, numpy.mean, numpy.std, numpy.var What doesn't work with NA: * Fancy indexing, such as with lists and partial boolean masks. * ndarray.flat and any other methods that use the old iterator mechanism instead of the newer nditer. * Struct dtypes, which will have corresponding struct masks with one mask value per primitive field of the struct dtype. * UFunc.accumulate, UFunc.reduceat. * Ufunc calls with both NA masks and a where= mask at the same time. * File I/O has not been extended to support NA-masks. * np.logical_and and np.logical_or don't satisfy the rules NA | True == True and NA & False == False yet. * Array methods: + ndarray.argmax, ndarray.argmin, + numpy.repeat, numpy.delete (relies on fancy indexing), numpy.append, numpy.insert (relies on fancy indexing), numpy.where, Differences with R: * R's parameter rm.na=T is spelled skipna=True in NumPy. * np.isna(nan) is False, but R's is.na(nan) is TRUE. This is because NumPy's NA is treated independently of the underlying data type. * Boolean indexing, where the result is compressed to just the elements with true in the mask, raises if the boolean mask has an NA value in it. This is because that value could be either True or False, meaning the count of the output array is actually NA. R treats this case in a manner inconsistent with the NA model, returning NA values in the spots where the boolean index has NA. This may have a practical advantage in spite of violating the NA theoretical model, so NumPy could adopt the behavior if necessary Reduction UFuncs Generalize axis= Parameter ------------------------------------------- Any ufunc.reduce function call, as well as other reductions like sum, prod, any, all, max and min support the ability to choose a subset of the axes to reduce over. Previously, one could say axis=None to mean all the axes or axis=# to pick a single axis. Now, one can also say axis=(#,#) to pick a list of axes for reduction. Reduction UFuncs New keepdims= Parameter ---------------------------------------- There is a new keepdims= parameter, which if set to True, doesn't throw away the reduction axes but instead sets them to have size one. When this option is set, the reduction result will broadcast correctly to the original operand which was reduced. Custom formatter for printing arrays ------------------------------------ New function numpy.random.choice --------------------------------- A generic sampling function has been added which will generate samples from a given array-like. The samples can be with or without replacement, and with uniform or given non-uniform probabilities. Preliminary multi-dimensional support in the polynomial package --------------------------------------------------------------- Axis keywords have been added to the integration and differentiation functions and a tensor keyword was added to the evaluation functions. These additions allow multi-dimensional coefficient arrays to be used in those functions. New functions for evaluating 2-D and 3-D coefficient arrays on grids or sets of points were added together with 2-D and 3-D pseudo-Vandermonde matrices that can be used for fitting. Support for mask-based NA values in the polynomial package fits --------------------------------------------------------------- The fitting functions recognize and remove masked data from the fit. Changes ======= The default casting rule for UFunc out= parameters has been changed from 'unsafe' to 'same_kind'. Most usages which violate the 'same_kind' rule are likely bugs, so this change may expose previously undetected errors in projects that depend on NumPy. Full-array boolean indexing used to allow boolean arrays with a size non-broadcastable to the array size. Now it forces this to be broadcastable. Since this affects some legacy code, this change will require discussion during alpha or early beta testing, and a decision to either keep the stricter behavior, or add in a hack to allow the previous behavior to work. The functions np.diag, np.diagonal, and .diagonal now return a view into the original array instead of making a copy. This makes these functions more consistent with NumPy's general approach of taking views where possible, and performs much faster as well. This has the potential to break code that assumes a copy is made instead of a view. The .reduce functions evaluates some reductions in a different order than in previous versions of NumPy, generally providing higher performance. Because of the nature of floating-point arithmetic, this may subtly change some results, just as linking NumPy to a different BLAS implementations such as MKL can. The function np.concatenate tries to match the layout of its input arrays. Previously, the layout did not follow any particular reason, and depended in an undesirable way on the particular axis chosen for concatenation. A bug was also fixed which silently allowed out of bounds axis arguments. The ufuncs logical_or, logical_and, and logical_not now follow Python's behavior with object arrays, instead of trying to call methods on the objects. For example the expression (3 and 'test') produces the string 'test', and now np.logical_and(np.array(3, 'O'), np.array('test', 'O')) produces 'test' as well. Deprecations ============ Specifying a custom string formatter with a `_format` array attribute is deprecated. The new `formatter` keyword in ``numpy.set_printoptions`` or ``numpy.array2string`` can be used instead. In the C API, direct access to the fields of PyArrayObject* has been deprecated. Direct access has been recommended against for many releases, but now you can test your code against the deprecated C API by #defining NPY_NO_DEPRECATED_API before including any NumPy headers. Expect something similar for PyArray_Descr* and other core objects in the future as preparation for NumPy 2.0. The deprecated imports in the polynomial package have been removed.