1 files changed, 96 insertions, 0 deletions
diff --git a/doc/source/dev/alignment.rst b/doc/source/dev/alignment.rst
new file mode 100644
index 000000000..f067f0d03
--- /dev/null
+++ b/doc/source/dev/alignment.rst
@@ -0,0 +1,96 @@
+.. _alignment:
+
+
+Numpy Alignment Goals
+=====================
+
+There are three use-cases related to memory alignment in numpy (as of 1.14):
+
+ 1. Creating structured datatypes with fields aligned like in a C-struct.
+ 2. Speeding up copy operations by using uint assignment in instead of memcpy
+ 3. Guaranteeing safe aligned access for ufuncs/setitem/casting code
+
+Numpy uses two different forms of alignment to achieve these goals:
+"True alignment" and "Uint alignment".
+
+"True" alignment refers to the architecture-dependent alignment of an
+equivalent C-type in C. For example, in x64 systems ``numpy.float64`` is
+equivalent to ``double`` in C. On most systems this has either an alignment of
+4 or 8 bytes (and this can be controlled in gcc by the option
+``malign-double``).  A variable is aligned in memory if its memory offset is a
+multiple of its alignment. On some systems (eg sparc) memory alignment is
+required, on others it gives a speedup.
+
+"Uint" alignment depends on the size of a datatype. It is defined to be the
+"True alignment" of the uint used by numpy's copy-code to copy the datatype, or
+undefined/unaligned if there is no equivalent uint. Currently numpy uses uint8,
+uint16, uint32, uint64 and uint64 to copy data of size 1,2,4,8,16 bytes
+respectively, and all other sized datatypes cannot be uint-aligned.
+
+For example, on a (typical linux x64 gcc) system, the numpy ``complex64``
+datatype is implemented as ``struct { float real, imag; }``. This has "true"
+alignment of 4 and "uint" alignment of 8 (equal to the true alignment of
+``uint64``).
+
+Variables in Numpy which control and describe alignment
+=======================================================
+
+There are 4 relevant uses of the word ``align`` used in numpy:
+
+ * The ``dtype.alignment`` attribute (``descr->alignment`` in C). This is meant
+   to reflect the "true alignment" of the type. It has arch-dependent default
+   values for all datatypes, with the exception of structured types created
+   with ``align=True`` as described below.
+ * The ``ALIGNED`` flag of an ndarray, computed in ``IsAligned`` and checked
+   by ``PyArray_ISALIGNED``. This is computed from ``dtype.alignment``.
+   It is set to ``True`` if every item in the array is at a memory location
+   consistent with ``dtype.alignment``, which is the case if the data ptr and
+   all strides of the array are multiples of that alignment.
+ * The ``align`` keyword of the dtype constructor, which only affects structured
+   arrays. If the structure's field offsets are not manually provided numpy
+   determines offsets automatically. In that case, ``align=True`` pads the
+   structure so that each field is "true" aligned in memory and sets
+   ``dtype.alignment`` to be the largest of the field "true" alignments. This
+   is like what C-structs usually do. Otherwise if offsets or itemsize were
+   manually provided ``align=True`` simply checks that all the fields are
+   "true" aligned and that the total itemsize is a multiple of the largest
+   field alignment. In either case ``dtype.isalignedstruct`` is also set to
+   True.
+ * ``IsUintAligned`` is used to determine if an ndarray is "uint aligned" in
+   an analagous way to how ``IsAligned`` checks for true-alignment.
+
+Consequences of alignment
+=========================
+
+Here is how the variables above are used:
+
+ 1. Creating aligned structs: In order to know how to offset a field when
+    ``align=True``, numpy looks up ``field.dtype.alignment``. This includes
+    fields which are nested structured arrays.
+ 2. Ufuncs: If the ``ALIGNED`` flag of an array is False, ufuncs will
+    buffer/cast the array before evaluation. This is needed since ufunc inner
+    loops access raw elements directly, which might fail on some archs if the
+    elements are not true-aligned.
+ 3. Getitem/setitem/copyswap function: Similar to ufuncs, these functions
+    generally have two code paths. If ``ALIGNED`` is False they will
+    use a code path that buffers the arguments so they are true-aligned.
+ 4. Strided copy code: Here, "uint alignment" is used instead.  If the itemsize
+    of an array is equal to 1, 2, 4, 8 or 16 bytes and the array is uint
+    aligned then instead numpy will do ``*(uintN*)dst) = *(uintN*)src)`` for
+    appropriate N. Otherwise numpy copies by doing ``memcpy(dst, src, N)``.
+ 5. Nditer code: Since this often calls the strided copy code, it must
+    check for "uint alignment".
+ 6. Cast code: if the array is "uint aligned" this will essentially do
+    ``*dst = CASTFUNC(*src)``. If not, it does
+    ``memmove(srcval, src); dstval = CASTFUNC(srcval); memmove(dst, dstval)``
+    where dstval/srcval are aligned.
+
+Note that in principle, only "true alignment" is required for casting code.
+However, because the casting code and copy code are deeply intertwined they
+both use "uint" alignment. This should be safe assuming uint alignment is
+always larger than true alignment, though it can cause unnecessary buffering if
+an array is "true aligned" but not "uint aligned". If there is ever a big
+rewrite of this code it would be good to allow them to use different
+alignments.
+
+