diff options
author | Matti Picus <matti.picus@gmail.com> | 2022-05-19 20:00:27 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2022-05-19 20:00:27 +0300 |
commit | db481babcfa7ebc70833e77985858e9295a3135b (patch) | |
tree | 12d0ddc95df5688213117cfb9ded9161979a7d16 /doc | |
parent | 1cedba6501848a915ec9076108e319235ded7689 (diff) | |
parent | 64e3c5c519bcf3684c19037e81b62f2245420a76 (diff) | |
download | numpy-db481babcfa7ebc70833e77985858e9295a3135b.tar.gz |
Merge pull request #19226 from seberg/fix-void-cast-safety-promotion-and-comparison
API: Fix structured dtype cast-safety, promotion, and comparison
Diffstat (limited to 'doc')
-rw-r--r-- | doc/release/upcoming_changes/19226.compatibility.rst | 39 | ||||
-rw-r--r-- | doc/source/user/basics.rec.rst | 74 |
2 files changed, 101 insertions, 12 deletions
diff --git a/doc/release/upcoming_changes/19226.compatibility.rst b/doc/release/upcoming_changes/19226.compatibility.rst new file mode 100644 index 000000000..8422bf8eb --- /dev/null +++ b/doc/release/upcoming_changes/19226.compatibility.rst @@ -0,0 +1,39 @@ +Changes to structured (void) dtype promotion and comparisons +------------------------------------------------------------ +In general, NumPy now defines correct, but slightly limited, promotion for +structured dtypes by promoting the subtypes of each field instead of raising +an exception:: + + >>> np.result_type(np.dtype("i,i"), np.dtype("i,d")) + dtype([('f0', '<i4'), ('f1', '<f8')]) + +For promotion matching field names, order, and titles are enforced, however +padding is ignored. +Promotion involving structured dtypes now always ensures native byte-order for +all fields (which may change the result of ``np.concatenate``) +and ensures that the result will be "packed", i.e. all fields are ordered +contiguously and padding is removed. +See :ref:`structured_dtype_comparison_and_promotion` for further details. + +The ``repr`` of aligned structures will now never print the long form +including ``offsets`` and ``itemsize`` unless the struct includes padding +not guaranteed by ``align=True``. + + +Changes to structured dtype casting safety +------------------------------------------ +In alignment with the above changes to the promotion logic, the +casting safety has been updated: + +* ``"equiv"`` enforces matching names and titles. The itemsize + is allowed to differ due to padding. +* ``"safe"`` allows mismatching field names and titles +* The cast safety is limited by the cast safety of each included + field. +* The order of fields is used to decide cast safety of each + individual field. Previously, the field names were used and + only unsafe casts were possible when names mismatched. + +The main important change here is that name mismatches are now +considered "safe" casts. + diff --git a/doc/source/user/basics.rec.rst b/doc/source/user/basics.rec.rst index eec2394e9..98589b472 100644 --- a/doc/source/user/basics.rec.rst +++ b/doc/source/user/basics.rec.rst @@ -550,29 +550,79 @@ In order to prevent clobbering object pointers in fields of :class:`object` type, numpy currently does not allow views of structured arrays containing objects. -Structure Comparison --------------------- +.. _structured_dtype_comparison_and_promotion: + +Structure Comparison and Promotion +---------------------------------- If the dtypes of two void structured arrays are equal, testing the equality of the arrays will result in a boolean array with the dimensions of the original arrays, with elements set to ``True`` where all fields of the corresponding -structures are equal. Structured dtypes are equal if the field names, -dtypes and titles are the same, ignoring endianness, and the fields are in -the same order:: +structures are equal:: - >>> a = np.zeros(2, dtype=[('a', 'i4'), ('b', 'i4')]) - >>> b = np.ones(2, dtype=[('a', 'i4'), ('b', 'i4')]) + >>> a = np.array([(1, 1), (2, 2)], dtype=[('a', 'i4'), ('b', 'i4')]) + >>> b = np.array([(1, 1), (2, 3)], dtype=[('a', 'i4'), ('b', 'i4')]) >>> a == b - array([False, False]) + array([True, False]) + +NumPy will promote individual field datatypes to perform the comparison. +So the following is also valid (note the ``'f4'`` dtype for the ``'a'`` field): -Currently, if the dtypes of two void structured arrays are not equivalent the -comparison fails, returning the scalar value ``False``. This behavior is -deprecated as of numpy 1.10 and will raise an error or perform elementwise -comparison in the future. + >>> b = np.array([(1.0, 1), (2.5, 2)], dtype=[("a", "f4"), ("b", "i4")]) + >>> a == b + array([True, False]) + +To compare two structured arrays, it must be possible to promote them to a +common dtype as returned by `numpy.result_type` and `np.promote_types`. +This enforces that the number of fields, the field names, and the field titles +must match precisely. +When promotion is not possible, for example due to mismatching field names, +NumPy will raise an error. +Promotion between two structured dtypes results in a canonical dtype that +ensures native byte-order for all fields:: + + >>> np.result_type(np.dtype("i,>i")) + dtype([('f0', '<i4'), ('f1', '<i4')]) + >>> np.result_type(np.dtype("i,>i"), np.dtype("i,i")) + dtype([('f0', '<i4'), ('f1', '<i4')]) + +The resulting dtype from promotion is also guaranteed to be packed, meaning +that all fields are ordered contiguously and any unnecessary padding is +removed:: + + >>> dt = np.dtype("i1,V3,i4,V1")[["f0", "f2"]] + >>> dt + dtype({'names':['f0','f2'], 'formats':['i1','<i4'], 'offsets':[0,4], 'itemsize':9}) + >>> np.result_type(dt) + dtype([('f0', 'i1'), ('f2', '<i4')]) + +Note that the result prints without ``offsets`` or ``itemsize`` indicating no +additional padding. +If a structured dtype is created with ``align=True`` ensuring that +``dtype.isalignedstruct`` is true, this property is preserved:: + + >>> dt = np.dtype("i1,V3,i4,V1", align=True)[["f0", "f2"]] + >>> dt + dtype({'names':['f0','f2'], 'formats':['i1','<i4'], 'offsets':[0,4], 'itemsize':12}, align=True) + >>> np.result_type(dt) + dtype([('f0', 'i1'), ('f2', '<i4')], align=True) + >>> np.result_type(dt).isalignedstruct + True + +When promoting multiple dtypes, the result is aligned if any of the inputs is:: + + >>> np.result_type(np.dtype("i,i"), np.dtype("i,i", align=True)) + dtype([('f0', '<i4'), ('f1', '<i4')], align=True) The ``<`` and ``>`` operators always return ``False`` when comparing void structured arrays, and arithmetic and bitwise operations are not supported. +.. versionchanged:: 1.23 + Before NumPy 1.23, a warning was given and ``False`` returned when + promotion to a common dtype failed. + Further, promotion was much more restrictive: It would reject the mixed + float/integer comparison example above. + Record Arrays ============= |