diff options
-rw-r--r-- | doc/release/upcoming_changes/19226.compatibility.rst | 85 |
1 files changed, 85 insertions, 0 deletions
diff --git a/doc/release/upcoming_changes/19226.compatibility.rst b/doc/release/upcoming_changes/19226.compatibility.rst new file mode 100644 index 000000000..0f5932567 --- /dev/null +++ b/doc/release/upcoming_changes/19226.compatibility.rst @@ -0,0 +1,85 @@ +Changes to structured (void) dtype promotion and comparisons +------------------------------------------------------------ +NumPy usually uses field position of structured dtypes when assigning +from one structured dtype to another. This means that:: + + arr[["field1", "field2"]] = arr[["field2", "field1"]] + +swaps the data of the two fields. However, until now this behaviour +was not matched for ``np.concatenate``. NumPy was also overly +restrictive when comparing two structured dtypes. For exmaple:: + + np.ones(3, dtype="i,i") == np.ones(3, dtype="i,d") + +will now succeed instead of giving a ``FutureWarning`` and return ``False``. + +In general, NumPy now defines correct, but slightly limited, promotion for +structured dtypes:: + + >>> np.result_type(np.dtype("i,i"), np.dtype("i,d")) + dtype([('f0', '<i4'), ('f1', '<f8')]) + +For promotion matching field names, order, and titles are enforced, however +padding is ignored. +Note that this also now always ensures native byte-order for all fields, +which can change the result (this can affect ``np.concatenate``):: + + >>> np.result_type(np.dtype("i,>i")) + dtype([('f0', '<i4'), ('f1', '<i4')]) + >>> np.result_type(np.dtype("i,>i"), np.dtype("i,i")) + dtype([('f0', '<i4'), ('f1', '<i4')]) + +which previously returned the first dtype unmodified. + +Further, the new result of ``np.result_type`` and promotion in general +is considered "canonical". Additionally to ensuring native byte-order +for all fields, the result will also be "packed". This means that +all fields is ordered contiguously and any unnecessary padding +is now removed:: + + >>> dt = np.dtype("i1,V3,i4,V1")[["f0", "f2"]] + >>> dt + dtype({'names':['f0','f2'], 'formats':['i1','<i4'], 'offsets':[0,4], 'itemsize':9}) + >>> np.result_type(dt) + dtype([('f0', 'i1'), ('f2', '<i4')]) + +Note that the result prints without ``offsets`` or ``itemsize`` indicating no +additional padding. +If a structured dtype is created with ``align=True`` ensuring that +``dtype.isalignedstruct`` is true, this property is preserved: + + >>> dt = np.dtype("i1,V3,i4,V1", align=True)[["f0", "f2"]] + >>> dt + dtype({'names':['f0','f2'], 'formats':['i1','<i4'], 'offsets':[0,4], 'itemsize':12}, align=True) + >>> np.result_type(dt) + dtype([('f0', 'i1'), ('f2', '<i4')], align=True) + >>> np.result_type(dt).isalignedstruct + True + +When promoting multiple dtypes, the result is aligned if any of the inputs is:: + + >>> np.result_type(np.dtype("i,i"), np.dtype("i,i", align=True)) + dtype([('f0', '<i4'), ('f1', '<i4')], align=True) + +The ``repr`` of aligned structures will now never print the long form +including ``offsets`` and ``itemsize`` unless the struct includes padding +not guaranteed by ``align=True``. + + +Changes to structured dtype casting safety +------------------------------------------ +In alignment with the above changes to the promotion logic, the +casting safety has been updated: + +* ``"equiv"`` enforces matching names and titles. The itemsize + is allowed to differ due to padding. +* ``"safe"`` allows mismatching field names and titles +* The cast safety is limited by the cast safety of each included + field. +* The order of fields is used to decide cast safety of each + individual field. Previously, the field names were used and + only unsafe casts were possible when names mismatched. + +The main important change here is that name mismatches are now +considered "safe" casts. + |