1 files changed, 113 insertions, 0 deletions
diff --git a/doc/source/dev/alignment.rst b/doc/source/dev/alignment.rst
new file mode 100644
index 000000000..bb1198ebf
--- /dev/null
+++ b/doc/source/dev/alignment.rst
@@ -0,0 +1,113 @@
+.. currentmodule:: numpy
+
+.. _alignment:
+
+****************
+Memory Alignment
+****************
+
+NumPy alignment goals
+=====================
+
+There are three use-cases related to memory alignment in NumPy (as of 1.14):
+
+ 1. Creating :term:`structured datatypes <structured data type>` with
+    :term:`fields <field>` aligned like in a C-struct.
+ 2. Speeding up copy operations by using :class:`uint` assignment in instead of
+    ``memcpy``.
+ 3. Guaranteeing safe aligned access for ufuncs/setitem/casting code.
+
+NumPy uses two different forms of alignment to achieve these goals:
+"True alignment" and "Uint alignment".
+
+"True" alignment refers to the architecture-dependent alignment of an
+equivalent C-type in C. For example, in x64 systems :attr:`float64` is
+equivalent to ``double`` in C. On most systems, this has either an alignment of
+4 or 8 bytes (and this can be controlled in GCC by the option
+``malign-double``).  A variable is aligned in memory if its memory offset is a
+multiple of its alignment. On some systems (eg. sparc) memory alignment is
+required; on others, it gives a speedup.
+
+"Uint" alignment depends on the size of a datatype. It is defined to be the
+"True alignment" of the uint used by NumPy's copy-code to copy the datatype, or
+undefined/unaligned if there is no equivalent uint. Currently, NumPy uses
+``uint8``, ``uint16``, ``uint32``, ``uint64``, and ``uint64`` to copy data of
+size 1, 2, 4, 8, 16 bytes respectively, and all other sized datatypes cannot
+be uint-aligned.
+
+For example, on a (typical Linux x64 GCC) system, the NumPy :attr:`complex64`
+datatype is implemented as ``struct { float real, imag; }``. This has "true"
+alignment of 4 and "uint" alignment of 8 (equal to the true alignment of
+``uint64``).
+
+Some cases where uint and true alignment are different (default GCC Linux):
+   ======   =========   ========    ========
+   arch     type        true-aln    uint-aln
+   ======   =========   ========    ========
+   x86_64   complex64          4           8
+   x86_64   float128          16           8
+   x86      float96            4          \-
+   ======   =========   ========    ========
+
+
+Variables in NumPy which control and describe alignment
+=======================================================
+
+There are 4 relevant uses of the word ``align`` used in NumPy:
+
+ * The :attr:`dtype.alignment` attribute (``descr->alignment`` in C). This is
+   meant to reflect the "true alignment" of the type. It has arch-dependent
+   default values for all datatypes, except for the structured types created
+   with ``align=True`` as described below.
+ * The ``ALIGNED`` flag of an ndarray, computed in ``IsAligned`` and checked
+   by :c:func:`PyArray_ISALIGNED`. This is computed from
+   :attr:`dtype.alignment`.
+   It is set to ``True`` if every item in the array is at a memory location
+   consistent with :attr:`dtype.alignment`, which is the case if the
+   ``data ptr`` and all strides of the array are multiples of that alignment.
+ * The ``align`` keyword of the dtype constructor, which only affects
+   :ref:`structured_arrays`. If the structure's field offsets are not manually
+   provided, NumPy determines offsets automatically. In that case,
+   ``align=True`` pads the structure so that each field is "true" aligned in
+   memory and sets :attr:`dtype.alignment` to be the largest of the field
+   "true" alignments. This is like what C-structs usually do. Otherwise if
+   offsets or itemsize were manually provided ``align=True`` simply checks that
+   all the fields are "true" aligned and that the total itemsize is a multiple
+   of the largest field alignment. In either case :attr:`dtype.isalignedstruct`
+   is also set to True.
+ * ``IsUintAligned`` is used to determine if an ndarray is "uint aligned" in
+   an analogous way to how ``IsAligned`` checks for true alignment.
+
+Consequences of alignment
+=========================
+
+Here is how the variables above are used:
+
+ 1. Creating aligned structs: To know how to offset a field when
+    ``align=True``, NumPy looks up ``field.dtype.alignment``. This includes
+    fields that are nested structured arrays.
+ 2. Ufuncs: If the ``ALIGNED`` flag of an array is False, ufuncs will
+    buffer/cast the array before evaluation. This is needed since ufunc inner
+    loops access raw elements directly, which might fail on some archs if the
+    elements are not true-aligned.
+ 3. Getitem/setitem/copyswap function: Similar to ufuncs, these functions
+    generally have two code paths. If ``ALIGNED`` is False they will
+    use a code path that buffers the arguments so they are true-aligned.
+ 4. Strided copy code: Here, "uint alignment" is used instead.  If the itemsize
+    of an array is equal to 1, 2, 4, 8 or 16 bytes and the array is uint
+    aligned then instead NumPy will do ``*(uintN*)dst) = *(uintN*)src)`` for
+    appropriate N. Otherwise, NumPy copies by doing ``memcpy(dst, src, N)``.
+ 5. Nditer code: Since this often calls the strided copy code, it must
+    check for "uint alignment".
+ 6. Cast code: This checks for "true" alignment, as it does
+    ``*dst = CASTFUNC(*src)`` if aligned. Otherwise, it does
+    ``memmove(srcval, src); dstval = CASTFUNC(srcval); memmove(dst, dstval)``
+    where dstval/srcval are aligned.
+
+Note that the strided-copy and strided-cast code are deeply intertwined and so
+any arrays being processed by them must be both uint and true aligned, even
+though the copy-code only needs uint alignment and the cast code only true
+alignment.  If there is ever a big rewrite of this code it would be good to
+allow them to use different alignments.
+
+