summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorMatti Picus <matti.picus@gmail.com>2022-05-19 20:00:27 +0300
committerGitHub <noreply@github.com>2022-05-19 20:00:27 +0300
commitdb481babcfa7ebc70833e77985858e9295a3135b (patch)
tree12d0ddc95df5688213117cfb9ded9161979a7d16 /doc
parent1cedba6501848a915ec9076108e319235ded7689 (diff)
parent64e3c5c519bcf3684c19037e81b62f2245420a76 (diff)
downloadnumpy-db481babcfa7ebc70833e77985858e9295a3135b.tar.gz
Merge pull request #19226 from seberg/fix-void-cast-safety-promotion-and-comparison
API: Fix structured dtype cast-safety, promotion, and comparison
Diffstat (limited to 'doc')
-rw-r--r--doc/release/upcoming_changes/19226.compatibility.rst39
-rw-r--r--doc/source/user/basics.rec.rst74
2 files changed, 101 insertions, 12 deletions
diff --git a/doc/release/upcoming_changes/19226.compatibility.rst b/doc/release/upcoming_changes/19226.compatibility.rst
new file mode 100644
index 000000000..8422bf8eb
--- /dev/null
+++ b/doc/release/upcoming_changes/19226.compatibility.rst
@@ -0,0 +1,39 @@
+Changes to structured (void) dtype promotion and comparisons
+------------------------------------------------------------
+In general, NumPy now defines correct, but slightly limited, promotion for
+structured dtypes by promoting the subtypes of each field instead of raising
+an exception::
+
+ >>> np.result_type(np.dtype("i,i"), np.dtype("i,d"))
+ dtype([('f0', '<i4'), ('f1', '<f8')])
+
+For promotion matching field names, order, and titles are enforced, however
+padding is ignored.
+Promotion involving structured dtypes now always ensures native byte-order for
+all fields (which may change the result of ``np.concatenate``)
+and ensures that the result will be "packed", i.e. all fields are ordered
+contiguously and padding is removed.
+See :ref:`structured_dtype_comparison_and_promotion` for further details.
+
+The ``repr`` of aligned structures will now never print the long form
+including ``offsets`` and ``itemsize`` unless the struct includes padding
+not guaranteed by ``align=True``.
+
+
+Changes to structured dtype casting safety
+------------------------------------------
+In alignment with the above changes to the promotion logic, the
+casting safety has been updated:
+
+* ``"equiv"`` enforces matching names and titles. The itemsize
+ is allowed to differ due to padding.
+* ``"safe"`` allows mismatching field names and titles
+* The cast safety is limited by the cast safety of each included
+ field.
+* The order of fields is used to decide cast safety of each
+ individual field. Previously, the field names were used and
+ only unsafe casts were possible when names mismatched.
+
+The main important change here is that name mismatches are now
+considered "safe" casts.
+
diff --git a/doc/source/user/basics.rec.rst b/doc/source/user/basics.rec.rst
index eec2394e9..98589b472 100644
--- a/doc/source/user/basics.rec.rst
+++ b/doc/source/user/basics.rec.rst
@@ -550,29 +550,79 @@ In order to prevent clobbering object pointers in fields of
:class:`object` type, numpy currently does not allow views of structured
arrays containing objects.
-Structure Comparison
---------------------
+.. _structured_dtype_comparison_and_promotion:
+
+Structure Comparison and Promotion
+----------------------------------
If the dtypes of two void structured arrays are equal, testing the equality of
the arrays will result in a boolean array with the dimensions of the original
arrays, with elements set to ``True`` where all fields of the corresponding
-structures are equal. Structured dtypes are equal if the field names,
-dtypes and titles are the same, ignoring endianness, and the fields are in
-the same order::
+structures are equal::
- >>> a = np.zeros(2, dtype=[('a', 'i4'), ('b', 'i4')])
- >>> b = np.ones(2, dtype=[('a', 'i4'), ('b', 'i4')])
+ >>> a = np.array([(1, 1), (2, 2)], dtype=[('a', 'i4'), ('b', 'i4')])
+ >>> b = np.array([(1, 1), (2, 3)], dtype=[('a', 'i4'), ('b', 'i4')])
>>> a == b
- array([False, False])
+ array([True, False])
+
+NumPy will promote individual field datatypes to perform the comparison.
+So the following is also valid (note the ``'f4'`` dtype for the ``'a'`` field):
-Currently, if the dtypes of two void structured arrays are not equivalent the
-comparison fails, returning the scalar value ``False``. This behavior is
-deprecated as of numpy 1.10 and will raise an error or perform elementwise
-comparison in the future.
+ >>> b = np.array([(1.0, 1), (2.5, 2)], dtype=[("a", "f4"), ("b", "i4")])
+ >>> a == b
+ array([True, False])
+
+To compare two structured arrays, it must be possible to promote them to a
+common dtype as returned by `numpy.result_type` and `np.promote_types`.
+This enforces that the number of fields, the field names, and the field titles
+must match precisely.
+When promotion is not possible, for example due to mismatching field names,
+NumPy will raise an error.
+Promotion between two structured dtypes results in a canonical dtype that
+ensures native byte-order for all fields::
+
+ >>> np.result_type(np.dtype("i,>i"))
+ dtype([('f0', '<i4'), ('f1', '<i4')])
+ >>> np.result_type(np.dtype("i,>i"), np.dtype("i,i"))
+ dtype([('f0', '<i4'), ('f1', '<i4')])
+
+The resulting dtype from promotion is also guaranteed to be packed, meaning
+that all fields are ordered contiguously and any unnecessary padding is
+removed::
+
+ >>> dt = np.dtype("i1,V3,i4,V1")[["f0", "f2"]]
+ >>> dt
+ dtype({'names':['f0','f2'], 'formats':['i1','<i4'], 'offsets':[0,4], 'itemsize':9})
+ >>> np.result_type(dt)
+ dtype([('f0', 'i1'), ('f2', '<i4')])
+
+Note that the result prints without ``offsets`` or ``itemsize`` indicating no
+additional padding.
+If a structured dtype is created with ``align=True`` ensuring that
+``dtype.isalignedstruct`` is true, this property is preserved::
+
+ >>> dt = np.dtype("i1,V3,i4,V1", align=True)[["f0", "f2"]]
+ >>> dt
+ dtype({'names':['f0','f2'], 'formats':['i1','<i4'], 'offsets':[0,4], 'itemsize':12}, align=True)
+ >>> np.result_type(dt)
+ dtype([('f0', 'i1'), ('f2', '<i4')], align=True)
+ >>> np.result_type(dt).isalignedstruct
+ True
+
+When promoting multiple dtypes, the result is aligned if any of the inputs is::
+
+ >>> np.result_type(np.dtype("i,i"), np.dtype("i,i", align=True))
+ dtype([('f0', '<i4'), ('f1', '<i4')], align=True)
The ``<`` and ``>`` operators always return ``False`` when comparing void
structured arrays, and arithmetic and bitwise operations are not supported.
+.. versionchanged:: 1.23
+ Before NumPy 1.23, a warning was given and ``False`` returned when
+ promotion to a common dtype failed.
+ Further, promotion was much more restrictive: It would reject the mixed
+ float/integer comparison example above.
+
Record Arrays
=============