23 files changed, 402 insertions, 965 deletions
diff --git a/doc/source/reference/alignment.rst b/doc/source/reference/alignment.rst
index 5e4315b38..70ded916a 100644
--- a/doc/source/reference/alignment.rst
+++ b/doc/source/reference/alignment.rst
@@ -1,104 +1,13 @@
-.. _alignment:
+:orphan:
 
+****************
 Memory Alignment
-================
+****************
 
-Numpy Alignment Goals
----------------------
+.. This document has been moved to ../dev/alignment.rst.
 
-There are three use-cases related to memory alignment in numpy (as of 1.14):
+This document has been moved to :ref:`alignment`.
 
- 1. Creating structured datatypes with fields aligned like in a C-struct.
- 2. Speeding up copy operations by using uint assignment in instead of memcpy
- 3. Guaranteeing safe aligned access for ufuncs/setitem/casting code
 
-Numpy uses two different forms of alignment to achieve these goals:
-"True alignment" and "Uint alignment".
-
-"True" alignment refers to the architecture-dependent alignment of an
-equivalent C-type in C. For example, in x64 systems ``numpy.float64`` is
-equivalent to ``double`` in C. On most systems this has either an alignment of
-4 or 8 bytes (and this can be controlled in gcc by the option
-``malign-double``).  A variable is aligned in memory if its memory offset is a
-multiple of its alignment. On some systems (eg sparc) memory alignment is
-required, on others it gives a speedup.
-
-"Uint" alignment depends on the size of a datatype. It is defined to be the
-"True alignment" of the uint used by numpy's copy-code to copy the datatype, or
-undefined/unaligned if there is no equivalent uint. Currently numpy uses uint8,
-uint16, uint32, uint64 and uint64 to copy data of size 1,2,4,8,16 bytes
-respectively, and all other sized datatypes cannot be uint-aligned.
-
-For example, on a (typical linux x64 gcc) system, the numpy ``complex64``
-datatype is implemented as ``struct { float real, imag; }``. This has "true"
-alignment of 4 and "uint" alignment of 8 (equal to the true alignment of
-``uint64``).
-
-Some cases where uint and true alignment are different (default gcc linux):
-   arch     type        true-aln    uint-aln
-   ----     ----        --------    --------
-   x86_64   complex64          4           8
-   x86_64   float128          16           8
-   x86      float96            4           -
-
-
-Variables in Numpy which control and describe alignment
--------------------------------------------------------
-
-There are 4 relevant uses of the word ``align`` used in numpy:
-
- * The ``dtype.alignment`` attribute (``descr->alignment`` in C). This is meant
-   to reflect the "true alignment" of the type. It has arch-dependent default
-   values for all datatypes, with the exception of structured types created
-   with ``align=True`` as described below.
- * The ``ALIGNED`` flag of an ndarray, computed in ``IsAligned`` and checked
-   by ``PyArray_ISALIGNED``. This is computed from ``dtype.alignment``.
-   It is set to ``True`` if every item in the array is at a memory location
-   consistent with ``dtype.alignment``, which is the case if the data ptr and
-   all strides of the array are multiples of that alignment.
- * The ``align`` keyword of the dtype constructor, which only affects structured
-   arrays. If the structure's field offsets are not manually provided numpy
-   determines offsets automatically. In that case, ``align=True`` pads the
-   structure so that each field is "true" aligned in memory and sets
-   ``dtype.alignment`` to be the largest of the field "true" alignments. This
-   is like what C-structs usually do. Otherwise if offsets or itemsize were
-   manually provided ``align=True`` simply checks that all the fields are
-   "true" aligned and that the total itemsize is a multiple of the largest
-   field alignment. In either case ``dtype.isalignedstruct`` is also set to
-   True.
- * ``IsUintAligned`` is used to determine if an ndarray is "uint aligned" in
-   an analogous way to how ``IsAligned`` checks for true-alignment.
-
-Consequences of alignment
--------------------------
-
-Here is how the variables above are used:
-
- 1. Creating aligned structs: In order to know how to offset a field when
-    ``align=True``, numpy looks up ``field.dtype.alignment``. This includes
-    fields which are nested structured arrays.
- 2. Ufuncs: If the ``ALIGNED`` flag of an array is False, ufuncs will
-    buffer/cast the array before evaluation. This is needed since ufunc inner
-    loops access raw elements directly, which might fail on some archs if the
-    elements are not true-aligned.
- 3. Getitem/setitem/copyswap function: Similar to ufuncs, these functions
-    generally have two code paths. If ``ALIGNED`` is False they will
-    use a code path that buffers the arguments so they are true-aligned.
- 4. Strided copy code: Here, "uint alignment" is used instead.  If the itemsize
-    of an array is equal to 1, 2, 4, 8 or 16 bytes and the array is uint
-    aligned then instead numpy will do ``*(uintN*)dst) = *(uintN*)src)`` for
-    appropriate N. Otherwise numpy copies by doing ``memcpy(dst, src, N)``.
- 5. Nditer code: Since this often calls the strided copy code, it must
-    check for "uint alignment".
- 6. Cast code: This checks for "true" alignment, as it does
-    ``*dst = CASTFUNC(*src)`` if aligned. Otherwise, it does
-    ``memmove(srcval, src); dstval = CASTFUNC(srcval); memmove(dst, dstval)``
-    where dstval/srcval are aligned.
-
-Note that the strided-copy and strided-cast code are deeply intertwined and so
-any arrays being processed by them must be both uint and true aligned, even
-though the copy-code only needs uint alignment and the cast code only true
-alignment.  If there is ever a big rewrite of this code it would be good to
-allow them to use different alignments.
 
 
diff --git a/doc/source/reference/arrays.datetime.rst b/doc/source/reference/arrays.datetime.rst
index e3b8d270d..63c93821b 100644
--- a/doc/source/reference/arrays.datetime.rst
+++ b/doc/source/reference/arrays.datetime.rst
@@ -25,7 +25,7 @@ form of the string, and can be either a :ref:`date unit <arrays.dtypes.dateunits
 :ref:`time unit <arrays.dtypes.timeunits>`. The date units are years ('Y'),
 months ('M'), weeks ('W'), and days ('D'), while the time units are
 hours ('h'), minutes ('m'), seconds ('s'), milliseconds ('ms'), and
-some additional SI-prefix seconds-based units. The datetime64 data type 
+some additional SI-prefix seconds-based units. The datetime64 data type
 also accepts the string "NAT", in any combination of lowercase/uppercase
 letters, for a "Not A Time" value.
 
@@ -74,6 +74,18 @@ datetime type with generic units.
     array(['2001-01-01T12:00:00.000', '2002-02-03T13:56:03.172'],
           dtype='datetime64[ms]')
 
+An array of datetimes can be constructed from integers representing
+POSIX timestamps with the given unit.
+
+.. admonition:: Example
+
+    >>> np.array([0, 1577836800], dtype='datetime64[s]')
+    array(['1970-01-01T00:00:00', '2020-01-01T00:00:00'],
+          dtype='datetime64[s]')
+
+    >>> np.array([0, 1577836800000]).astype('datetime64[ms]')
+    array(['1970-01-01T00:00:00.000', '2020-01-01T00:00:00.000'],
+          dtype='datetime64[ms]')
 
 The datetime type works with many common NumPy functions, for
 example :func:`arange` can be used to generate ranges of dates.
@@ -120,9 +132,9 @@ Datetime and Timedelta Arithmetic
 NumPy allows the subtraction of two Datetime values, an operation which
 produces a number with a time unit. Because NumPy doesn't have a physical
 quantities system in its core, the timedelta64 data type was created
-to complement datetime64. The arguments for timedelta64 are a number, 
+to complement datetime64. The arguments for timedelta64 are a number,
 to represent the number of units, and a date/time unit, such as
-(D)ay, (M)onth, (Y)ear, (h)ours, (m)inutes, or (s)econds. The timedelta64 
+(D)ay, (M)onth, (Y)ear, (h)ours, (m)inutes, or (s)econds. The timedelta64
 data type also accepts the string "NAT" in place of the number for a "Not A Time" value.
 
 .. admonition:: Example
diff --git a/doc/source/reference/arrays.dtypes.rst b/doc/source/reference/arrays.dtypes.rst
index b5ffa1a8b..8606bc8f1 100644
--- a/doc/source/reference/arrays.dtypes.rst
+++ b/doc/source/reference/arrays.dtypes.rst
@@ -562,3 +562,20 @@ The following methods implement the pickle protocol:
 
    dtype.__reduce__
    dtype.__setstate__
+
+Utility method for typing:
+
+.. autosummary::
+   :toctree: generated/
+
+   dtype.__class_getitem__
+
+Comparison operations:
+
+.. autosummary::
+   :toctree: generated/
+
+   dtype.__ge__
+   dtype.__gt__
+   dtype.__le__
+   dtype.__lt__
diff --git a/doc/source/reference/arrays.ndarray.rst b/doc/source/reference/arrays.ndarray.rst
index f2204752d..0f703b475 100644
--- a/doc/source/reference/arrays.ndarray.rst
+++ b/doc/source/reference/arrays.ndarray.rst
@@ -249,7 +249,6 @@ Other attributes
    ndarray.real
    ndarray.imag
    ndarray.flat
-   ndarray.ctypes
 
 
 .. _arrays.ndarray.array-interface:
@@ -621,3 +620,10 @@ String representations:
 
    ndarray.__str__
    ndarray.__repr__
+
+Utility method for typing:
+
+.. autosummary::
+   :toctree: generated/
+
+   ndarray.__class_getitem__
diff --git a/doc/source/reference/arrays.scalars.rst b/doc/source/reference/arrays.scalars.rst
index abef66692..c691e802f 100644
--- a/doc/source/reference/arrays.scalars.rst
+++ b/doc/source/reference/arrays.scalars.rst
@@ -196,10 +196,10 @@ Inexact types
    ``f16`` prints as ``0.1`` because it is as close to that value as possible,
    whereas the other types do not as they have more precision and therefore have
    closer values.
-   
+
    Conversely, floating-point scalars of different precisions which approximate
    the same decimal value may compare unequal despite printing identically:
-   
+
        >>> f16 = np.float16("0.1")
        >>> f32 = np.float32("0.1")
        >>> f64 = np.float64("0.1")
@@ -399,7 +399,7 @@ are also provided.
                complex256
 
    Alias for `numpy.clongdouble`, named after its size in bits.
-   The existance of these aliases depends on the platform.
+   The existence of these aliases depends on the platform.
 
 Other aliases
 ~~~~~~~~~~~~~
@@ -498,6 +498,13 @@ The exceptions to the above rules are given below:
    generic.__setstate__
    generic.setflags
 
+Utility method for typing:
+
+.. autosummary::
+   :toctree: generated/
+
+   number.__class_getitem__
+
 
 Defining new types
 ==================
diff --git a/doc/source/reference/c-api/array.rst b/doc/source/reference/c-api/array.rst
index 26a8f643d..bb4405825 100644
--- a/doc/source/reference/c-api/array.rst
+++ b/doc/source/reference/c-api/array.rst
@@ -325,8 +325,7 @@ From scratch
     should be increased after the pointer is passed in, and the base member
     of the returned ndarray should point to the Python object that owns
     the data. This will ensure that the provided memory is not
-    freed while the returned array is in existence. To free memory as soon
-    as the ndarray is deallocated, set the OWNDATA flag on the returned ndarray.
+    freed while the returned array is in existence.
 
 .. c:function:: PyObject* PyArray_SimpleNewFromDescr( \
         int nd, npy_int const* dims, PyArray_Descr* descr)
@@ -519,34 +518,40 @@ From other objects
 
         :c:data:`NPY_ARRAY_CARRAY`
 
-    .. c:macro:: NPY_ARRAY_IN_ARRAY
+..
+  dedented to allow internal linking, pending a refactoring
 
-        :c:data:`NPY_ARRAY_C_CONTIGUOUS` \| :c:data:`NPY_ARRAY_ALIGNED`
+.. c:macro:: NPY_ARRAY_IN_ARRAY
+
+    :c:data:`NPY_ARRAY_C_CONTIGUOUS` \| :c:data:`NPY_ARRAY_ALIGNED`
 
     .. c:macro:: NPY_ARRAY_IN_FARRAY
 
         :c:data:`NPY_ARRAY_F_CONTIGUOUS` \| :c:data:`NPY_ARRAY_ALIGNED`
 
-    .. c:macro:: NPY_OUT_ARRAY
+.. c:macro:: NPY_OUT_ARRAY
 
-        :c:data:`NPY_ARRAY_C_CONTIGUOUS` \| :c:data:`NPY_ARRAY_WRITEABLE` \|
-        :c:data:`NPY_ARRAY_ALIGNED`
+    :c:data:`NPY_ARRAY_C_CONTIGUOUS` \| :c:data:`NPY_ARRAY_WRITEABLE` \|
+    :c:data:`NPY_ARRAY_ALIGNED`
 
-    .. c:macro:: NPY_ARRAY_OUT_ARRAY
+.. c:macro:: NPY_ARRAY_OUT_ARRAY
 
-        :c:data:`NPY_ARRAY_C_CONTIGUOUS` \| :c:data:`NPY_ARRAY_ALIGNED` \|
-        :c:data:`NPY_ARRAY_WRITEABLE`
+    :c:data:`NPY_ARRAY_C_CONTIGUOUS` \| :c:data:`NPY_ARRAY_ALIGNED` \|
+    :c:data:`NPY_ARRAY_WRITEABLE`
 
     .. c:macro:: NPY_ARRAY_OUT_FARRAY
 
         :c:data:`NPY_ARRAY_F_CONTIGUOUS` \| :c:data:`NPY_ARRAY_WRITEABLE` \|
         :c:data:`NPY_ARRAY_ALIGNED`
 
-    .. c:macro:: NPY_ARRAY_INOUT_ARRAY
+..
+  dedented to allow internal linking, pending a refactoring
 
-        :c:data:`NPY_ARRAY_C_CONTIGUOUS` \| :c:data:`NPY_ARRAY_WRITEABLE` \|
-        :c:data:`NPY_ARRAY_ALIGNED` \| :c:data:`NPY_ARRAY_WRITEBACKIFCOPY` \|
-        :c:data:`NPY_ARRAY_UPDATEIFCOPY`
+.. c:macro:: NPY_ARRAY_INOUT_ARRAY
+
+    :c:data:`NPY_ARRAY_C_CONTIGUOUS` \| :c:data:`NPY_ARRAY_WRITEABLE` \|
+    :c:data:`NPY_ARRAY_ALIGNED` \| :c:data:`NPY_ARRAY_WRITEBACKIFCOPY` \|
+    :c:data:`NPY_ARRAY_UPDATEIFCOPY`
 
     .. c:macro:: NPY_ARRAY_INOUT_FARRAY
 
@@ -584,6 +589,9 @@ From other objects
     did not have the _ARRAY_ macro namespace in them. That form
     of the constant names is deprecated in 1.7.
 
+..
+  dedented to allow internal linking, pending a refactoring
+
 .. c:macro:: NPY_ARRAY_NOTSWAPPED
 
     Make sure the returned array has a data-type descriptor that is in
@@ -595,9 +603,13 @@ From other objects
     not in machine byte- order), then a new data-type descriptor is
     created and used with its byte-order field set to native.
 
-.. c:macro:: NPY_ARRAY_BEHAVED_NS
+    .. c:macro:: NPY_ARRAY_BEHAVED_NS
 
-    :c:data:`NPY_ARRAY_ALIGNED` \| :c:data:`NPY_ARRAY_WRITEABLE` \| :c:data:`NPY_ARRAY_NOTSWAPPED`
+        :c:data:`NPY_ARRAY_ALIGNED` \| :c:data:`NPY_ARRAY_WRITEABLE` \|
+        :c:data:`NPY_ARRAY_NOTSWAPPED`
+
+..
+  dedented to allow internal linking, pending a refactoring
 
 .. c:macro:: NPY_ARRAY_ELEMENTSTRIDES
 
@@ -723,6 +735,13 @@ From other objects
     broadcastable to the shape of ``dest``. The data areas of dest
     and src must not overlap.
 
+.. c:function:: int PyArray_CopyObject(PyArrayObject* dest, PyObject* src)
+
+    Assign an object ``src`` to a NumPy array ``dest`` according to
+    array-coercion rules. This is basically identical to
+    :c:func:`PyArray_FromAny`, but assigns directly to the output array.
+    Returns 0 on success and -1 on failures.
+
 .. c:function:: int PyArray_MoveInto(PyArrayObject* dest, PyArrayObject* src)
 
     Move data from the source array, ``src``, into the destination
@@ -1303,7 +1322,7 @@ User-defined data types
     data-type object, *descr*, of the given *scalar* kind. Use
     *scalar* = :c:data:`NPY_NOSCALAR` to register that an array of data-type
     *descr* can be cast safely to a data-type whose type_number is
-    *totype*.
+    *totype*. The return value is 0 on success or -1 on failure.
 
 .. c:function:: int PyArray_TypeNumFromName( \
         char const *str)
@@ -1443,7 +1462,9 @@ of the constant names is deprecated in 1.7.
 
 .. c:macro:: NPY_ARRAY_OWNDATA
 
-    The data area is owned by this array.
+    The data area is owned by this array. Should never be set manually, instead
+    create a ``PyObject`` wrapping the data and set the array's base to that
+    object. For an example, see the test in ``test_mem_policy``.
 
 .. c:macro:: NPY_ARRAY_ALIGNED
 
@@ -2707,6 +2728,45 @@ cost of a slight overhead.
     neighborhood. Calling this function after every point of the
     neighborhood has been visited is undefined.
 
+Array mapping
+-------------
+
+Array mapping is the machinery behind advanced indexing.
+
+.. c:function:: PyObject* PyArray_MapIterArray(PyArrayObject *a, \
+                 PyObject *index)
+
+    Use advanced indexing to iterate an array.
+
+.. c:function:: void PyArray_MapIterSwapAxes(PyArrayMapIterObject *mit, \
+                PyArrayObject **ret, int getmap)
+
+    Swap the axes to or from their inserted form. ``MapIter`` always puts the
+    advanced (array) indices first in the iteration. But if they are
+    consecutive, it will insert/transpose them back before returning.
+    This is stored as ``mit->consec != 0`` (the place where they are inserted).
+    For assignments, the opposite happens: the values to be assigned are
+    transposed (``getmap=1`` instead of ``getmap=0``). ``getmap=0`` and
+    ``getmap=1`` undo the other operation.
+
+.. c:function:: void PyArray_MapIterNext(PyArrayMapIterObject *mit)
+
+    This function needs to update the state of the map iterator
+    and point ``mit->dataptr`` to the memory-location of the next object.
+
+    Note that this function never handles an extra operand but provides
+    compatibility for an old (exposed) API.
+
+.. c:function:: PyObject* PyArray_MapIterArrayCopyIfOverlap(PyArrayObject *a, \
+                PyObject *index, int copy_if_overlap, PyArrayObject *extra_op)
+
+    Similar to :c:func:`PyArray_MapIterArray` but with an additional
+    ``copy_if_overlap`` argument. If ``copy_if_overlap != 0``, checks if ``a``
+    has memory overlap with any of the arrays in ``index`` and with
+    ``extra_op``, and make copies as appropriate to avoid problems if the
+    input is modified during the iteration. ``iter->array`` may contain a
+    copied array (UPDATEIFCOPY/WRITEBACKIFCOPY set).
+
 Array Scalars
 -------------
 
@@ -2719,13 +2779,19 @@ Array Scalars
     whenever 0-dimensional arrays could be returned to Python.
 
 .. c:function:: PyObject* PyArray_Scalar( \
-        void* data, PyArray_Descr* dtype, PyObject* itemsize)
-
-    Return an array scalar object of the given enumerated *typenum*
-    and *itemsize* by **copying** from memory pointed to by *data*
-    . If *swap* is nonzero then this function will byteswap the data
-    if appropriate to the data-type because array scalars are always
-    in correct machine-byte order.
+        void* data, PyArray_Descr* dtype, PyObject* base)
+
+    Return an array scalar object of the given *dtype* by **copying**
+    from memory pointed to by *data*.  *base* is expected to be the
+    array object that is the owner of the data.  *base* is required
+    if `dtype` is a ``void`` scalar, or if the ``NPY_USE_GETITEM``
+    flag is set and it is known that the ``getitem`` method uses
+    the ``arr`` argument without checking if it is ``NULL``.  Otherwise
+    `base` may be ``NULL``.
+
+    If the data is not in native byte order (as indicated by
+    ``dtype->byteorder``) then this function will byteswap the data,
+    because array scalars are always in correct machine-byte order.
 
 .. c:function:: PyObject* PyArray_ToScalar(void* data, PyArrayObject* arr)
 
diff --git a/doc/source/reference/c-api/data_memory.rst b/doc/source/reference/c-api/data_memory.rst
new file mode 100644
index 000000000..11a37adc4
--- /dev/null
+++ b/doc/source/reference/c-api/data_memory.rst
@@ -0,0 +1,158 @@
+.. _data_memory:
+
+Memory management in NumPy
+==========================
+
+The `numpy.ndarray` is a python class. It requires additional memory allocations
+to hold `numpy.ndarray.strides`, `numpy.ndarray.shape` and
+`numpy.ndarray.data` attributes. These attributes are specially allocated
+after creating the python object in `__new__`. The ``strides`` and
+``shape`` are stored in a piece of memory allocated internally.
+
+The ``data`` allocation used to store the actual array values (which could be
+pointers in the case of ``object`` arrays) can be very large, so NumPy has
+provided interfaces to manage its allocation and release. This document details
+how those interfaces work.
+
+Historical overview
+-------------------
+
+Since version 1.7.0, NumPy has exposed a set of ``PyDataMem_*`` functions
+(:c:func:`PyDataMem_NEW`, :c:func:`PyDataMem_FREE`, :c:func:`PyDataMem_RENEW`)
+which are backed by `alloc`, `free`, `realloc` respectively. In that version
+NumPy also exposed the `PyDataMem_EventHook` function described below, which
+wrap the OS-level calls.
+
+Since those early days, Python also improved its memory management
+capabilities, and began providing
+various :ref:`management policies <memoryoverview>` beginning in version
+3.4. These routines are divided into a set of domains, each domain has a
+:c:type:`PyMemAllocatorEx` structure of routines for memory management. Python also
+added a `tracemalloc` module to trace calls to the various routines. These
+tracking hooks were added to the NumPy ``PyDataMem_*`` routines.
+
+NumPy added a small cache of allocated memory in its internal
+``npy_alloc_cache``, ``npy_alloc_cache_zero``, and ``npy_free_cache``
+functions. These wrap ``alloc``, ``alloc-and-memset(0)`` and ``free``
+respectively, but when ``npy_free_cache`` is called, it adds the pointer to a
+short list of available blocks marked by size. These blocks can be re-used by
+subsequent calls to ``npy_alloc*``, avoiding memory thrashing.
+
+Configurable memory routines in NumPy (NEP 49)
+----------------------------------------------
+
+Users may wish to override the internal data memory routines with ones of their
+own. Since NumPy does not use the Python domain strategy to manage data memory,
+it provides an alternative set of C-APIs to change memory routines. There are
+no Python domain-wide strategies for large chunks of object data, so those are
+less suited to NumPy's needs. User who wish to change the NumPy data memory
+management routines can use :c:func:`PyDataMem_SetHandler`, which uses a
+:c:type:`PyDataMem_Handler` structure to hold pointers to functions used to
+manage the data memory. The calls are still wrapped by internal routines to
+call :c:func:`PyTraceMalloc_Track`, :c:func:`PyTraceMalloc_Untrack`, and will
+use the :c:func:`PyDataMem_EventHookFunc` mechanism. Since the functions may
+change during the lifetime of the process, each ``ndarray`` carries with it the
+functions used at the time of its instantiation, and these will be used to
+reallocate or free the data memory of the instance.
+
+.. c:type:: PyDataMem_Handler
+
+    A struct to hold function pointers used to manipulate memory
+
+    .. code-block:: c
+
+        typedef struct {
+            char name[128];  /* multiple of 64 to keep the struct aligned */
+            PyDataMemAllocator allocator;
+        } PyDataMem_Handler;
+
+    where the allocator structure is
+
+    .. code-block:: c
+
+        /* The declaration of free differs from PyMemAllocatorEx */ 
+        typedef struct {
+            void *ctx;
+            void* (*malloc) (void *ctx, size_t size);
+            void* (*calloc) (void *ctx, size_t nelem, size_t elsize);
+            void* (*realloc) (void *ctx, void *ptr, size_t new_size);
+            void (*free) (void *ctx, void *ptr, size_t size);
+        } PyDataMemAllocator;
+
+.. c:function:: PyObject * PyDataMem_SetHandler(PyObject *handler)
+
+   Set a new allocation policy. If the input value is ``NULL``, will reset the
+   policy to the default. Return the previous policy, or
+   return ``NULL`` if an error has occurred. We wrap the user-provided functions
+   so they will still call the python and numpy memory management callback
+   hooks.
+    
+.. c:function:: PyObject * PyDataMem_GetHandler()
+
+   Return the current policy that will be used to allocate data for the
+   next ``PyArrayObject``. On failure, return ``NULL``.
+
+For an example of setting up and using the PyDataMem_Handler, see the test in
+:file:`numpy/core/tests/test_mem_policy.py`
+
+.. c:function:: void PyDataMem_EventHookFunc(void *inp, void *outp, size_t size, void *user_data);
+
+    This function will be called during data memory manipulation
+
+.. c:function:: PyDataMem_EventHookFunc * PyDataMem_SetEventHook(PyDataMem_EventHookFunc *newhook, void *user_data, void **old_data)
+
+    Sets the allocation event hook for numpy array data.
+  
+    Returns a pointer to the previous hook or ``NULL``.  If old_data is
+    non-``NULL``, the previous user_data pointer will be copied to it.
+  
+    If not ``NULL``, hook will be called at the end of each ``PyDataMem_NEW/FREE/RENEW``:
+
+    .. code-block:: c
+   
+        result = PyDataMem_NEW(size)        -> (*hook)(NULL, result, size, user_data)
+        PyDataMem_FREE(ptr)                 -> (*hook)(ptr, NULL, 0, user_data)
+        result = PyDataMem_RENEW(ptr, size) -> (*hook)(ptr, result, size, user_data)
+  
+    When the hook is called, the GIL will be held by the calling
+    thread.  The hook should be written to be reentrant, if it performs
+    operations that might cause new allocation events (such as the
+    creation/destruction numpy objects, or creating/destroying Python
+    objects which might cause a gc)
+
+What happens when deallocating if there is no policy set
+--------------------------------------------------------
+
+A rare but useful technique is to allocate a buffer outside NumPy, use
+:c:func:`PyArray_NewFromDescr` to wrap the buffer in a ``ndarray``, then switch
+the ``OWNDATA`` flag to true. When the ``ndarray`` is released, the
+appropriate function from the ``ndarray``'s ``PyDataMem_Handler`` should be
+called to free the buffer. But the ``PyDataMem_Handler`` field was never set,
+it will be ``NULL``. For backward compatibility, NumPy will call ``free()`` to
+release the buffer. If ``NUMPY_WARN_IF_NO_MEM_POLICY`` is set to ``1``, a
+warning will be emitted. The current default is not to emit a warning, this may
+change in a future version of NumPy.
+
+A better technique would be to use a ``PyCapsule`` as a base object:
+
+.. code-block:: c
+
+    /* define a PyCapsule_Destructor, using the correct deallocator for buff */
+    void free_wrap(void *capsule){
+        void * obj = PyCapsule_GetPointer(capsule, PyCapsule_GetName(capsule));
+        free(obj); 
+    };
+
+    /* then inside the function that creates arr from buff */
+    ...
+    arr = PyArray_NewFromDescr(... buf, ...);
+    if (arr == NULL) {
+        return NULL;
+    }
+    capsule = PyCapsule_New(buf, "my_wrapped_buffer",
+                            (PyCapsule_Destructor)&free_wrap);
+    if (PyArray_SetBaseObject(arr, capsule) == -1) {
+        Py_DECREF(arr);
+        return NULL;
+    }
+    ...
diff --git a/doc/source/reference/c-api/index.rst b/doc/source/reference/c-api/index.rst
index bb1ed154e..6288ff33b 100644
--- a/doc/source/reference/c-api/index.rst
+++ b/doc/source/reference/c-api/index.rst
@@ -49,3 +49,4 @@ code.
    generalized-ufuncs
    coremath
    deprecations
+   data_memory
diff --git a/doc/source/reference/c-api/iterator.rst b/doc/source/reference/c-api/iterator.rst
index 2208cdd2f..83644d8b2 100644
--- a/doc/source/reference/c-api/iterator.rst
+++ b/doc/source/reference/c-api/iterator.rst
@@ -1230,7 +1230,7 @@ Functions For Iteration
 .. c:function:: npy_intp* NpyIter_GetIndexPtr(NpyIter* iter)
 
     This gives back a pointer to the index being tracked, or NULL
-    if no index is being tracked.  It is only useable if one of
+    if no index is being tracked.  It is only usable if one of
     the flags :c:data:`NPY_ITER_C_INDEX` or :c:data:`NPY_ITER_F_INDEX`
     were specified during construction.
 
diff --git a/doc/source/reference/c-api/types-and-structures.rst b/doc/source/reference/c-api/types-and-structures.rst
index 39a17cc72..605a4ae71 100644
--- a/doc/source/reference/c-api/types-and-structures.rst
+++ b/doc/source/reference/c-api/types-and-structures.rst
@@ -94,7 +94,7 @@ PyArray_Type and PyArrayObject
           PyArray_Descr *descr;
           int flags;
           PyObject *weakreflist;
-          /* version dependend private members */
+          /* version dependent private members */
       } PyArrayObject;
 
    .. c:macro:: PyObject_HEAD
@@ -178,7 +178,7 @@ PyArray_Type and PyArrayObject
 
    .. note::
 
-      Further members are considered private and version dependend. If the size
+      Further members are considered private and version dependent. If the size
       of the struct is important for your code, special care must be taken.
       A possible use-case when this is relevant is subclassing in C.
       If your code relies on ``sizeof(PyArrayObject)`` to be constant,
@@ -286,48 +286,54 @@ PyArrayDescr_Type and PyArray_Descr
        array like behavior. Each bit in this member is a flag which are named
        as:
 
-       .. c:macro:: NPY_ITEM_REFCOUNT
+..
+  dedented to allow internal linking, pending a refactoring
 
-           Indicates that items of this data-type must be reference
-           counted (using :c:func:`Py_INCREF` and :c:func:`Py_DECREF` ).
+.. c:macro:: NPY_ITEM_REFCOUNT
+
+    Indicates that items of this data-type must be reference
+    counted (using :c:func:`Py_INCREF` and :c:func:`Py_DECREF` ).
 
        .. c:macro:: NPY_ITEM_HASOBJECT
 
            Same as :c:data:`NPY_ITEM_REFCOUNT`.
 
-       .. c:macro:: NPY_LIST_PICKLE
+..
+  dedented to allow internal linking, pending a refactoring
+
+.. c:macro:: NPY_LIST_PICKLE
 
-           Indicates arrays of this data-type must be converted to a list
-           before pickling.
+    Indicates arrays of this data-type must be converted to a list
+    before pickling.
 
-       .. c:macro:: NPY_ITEM_IS_POINTER
+.. c:macro:: NPY_ITEM_IS_POINTER
 
-           Indicates the item is a pointer to some other data-type
+    Indicates the item is a pointer to some other data-type
 
-       .. c:macro:: NPY_NEEDS_INIT
+.. c:macro:: NPY_NEEDS_INIT
 
-           Indicates memory for this data-type must be initialized (set
-           to 0) on creation.
+    Indicates memory for this data-type must be initialized (set
+    to 0) on creation.
 
-       .. c:macro:: NPY_NEEDS_PYAPI
+.. c:macro:: NPY_NEEDS_PYAPI
 
-           Indicates this data-type requires the Python C-API during
-           access (so don't give up the GIL if array access is going to
-           be needed).
+    Indicates this data-type requires the Python C-API during
+    access (so don't give up the GIL if array access is going to
+    be needed).
 
-       .. c:macro:: NPY_USE_GETITEM
+.. c:macro:: NPY_USE_GETITEM
 
-           On array access use the ``f->getitem`` function pointer
-           instead of the standard conversion to an array scalar. Must
-           use if you don't define an array scalar to go along with
-           the data-type.
+    On array access use the ``f->getitem`` function pointer
+    instead of the standard conversion to an array scalar. Must
+    use if you don't define an array scalar to go along with
+    the data-type.
 
-       .. c:macro:: NPY_USE_SETITEM
+.. c:macro:: NPY_USE_SETITEM
 
-           When creating a 0-d array from an array scalar use
-           ``f->setitem`` instead of the standard copy from an array
-           scalar. Must use if you don't define an array scalar to go
-           along with the data-type.
+    When creating a 0-d array from an array scalar use
+    ``f->setitem`` instead of the standard copy from an array
+    scalar. Must use if you don't define an array scalar to go
+    along with the data-type.
 
        .. c:macro:: NPY_FROM_FIELDS
 
@@ -961,8 +967,8 @@ PyUFunc_Type and PyUFuncObject
        .. deprecated:: 1.22
 
             Some fallback support for this slot exists, but will be removed
-            eventually.  A univiersal function which relied on this will have
-            eventually have to be ported.
+            eventually.  A universal function that relied on this will
+            have to be ported eventually.
             See ref:`NEP 41 <NEP41>` and ref:`NEP 43 <NEP43>`
 
    .. c:member:: void *reserved2
@@ -989,14 +995,17 @@ PyUFunc_Type and PyUFuncObject
 
        For each distinct core dimension, a set of ``UFUNC_CORE_DIM*`` flags
 
-       .. c:macro:: UFUNC_CORE_DIM_CAN_IGNORE
+..
+  dedented to allow internal linking, pending a refactoring
+
+.. c:macro:: UFUNC_CORE_DIM_CAN_IGNORE
 
-           if the dim name ends in ``?``
+    if the dim name ends in ``?``
 
-       .. c:macro:: UFUNC_CORE_DIM_SIZE_INFERRED
+.. c:macro:: UFUNC_CORE_DIM_SIZE_INFERRED
 
-           if the dim size will be determined from the operands
-           and not from a :ref:`frozen <frozen>` signature
+    if the dim size will be determined from the operands
+    and not from a :ref:`frozen <frozen>` signature
 
    .. c:member:: PyObject *identity_value
 
diff --git a/doc/source/reference/global_state.rst b/doc/source/reference/global_state.rst
index f18481235..20874ceaa 100644
--- a/doc/source/reference/global_state.rst
+++ b/doc/source/reference/global_state.rst
@@ -84,3 +84,13 @@ contiguous in memory.
 Most users will have no reason to change these; for details
 see the :ref:`memory layout <memory-layout>` documentation.
 
+
+Warn if no memory allocation policy when deallocating data
+----------------------------------------------------------
+
+Some users might pass ownership of the data pointer to the ``ndarray`` by
+setting the ``OWNDATA`` flag. If they do this without setting (manually) a
+memory allocation policy, the default will be to call ``free``. If
+``NUMPY_WARN_IF_NO_MEM_POLICY`` is set to ``"1"``, a ``RuntimeWarning`` will
+be emitted. A better alternative is to use a ``PyCapsule`` with a deallocator
+and set the ``ndarray.base``.
diff --git a/doc/source/reference/index.rst b/doc/source/reference/index.rst
index f12d923df..a18211cca 100644
--- a/doc/source/reference/index.rst
+++ b/doc/source/reference/index.rst
@@ -26,7 +26,6 @@ For learning how to use NumPy, see the :ref:`complete documentation <numpy_docs_
    distutils
    distutils_guide
    c-api/index
-   internals
    simd/simd-optimizations
    swig
 
diff --git a/doc/source/reference/internals.code-explanations.rst b/doc/source/reference/internals.code-explanations.rst
index e8e428f2e..d34314610 100644
--- a/doc/source/reference/internals.code-explanations.rst
+++ b/doc/source/reference/internals.code-explanations.rst
@@ -1,618 +1,9 @@
-.. currentmodule:: numpy
+:orphan:
 
 *************************
 NumPy C Code Explanations
 *************************
 
-    Fanaticism consists of redoubling your efforts when you have forgotten
-    your aim.
-    --- *George Santayana*
+.. This document has been moved to ../dev/internals.code-explanations.rst.
 
-    An authority is a person who can tell you more about something than
-    you really care to know.
-    --- *Unknown*
-
-This Chapter attempts to explain the logic behind some of the new
-pieces of code. The purpose behind these explanations is to enable
-somebody to be able to understand the ideas behind the implementation
-somewhat more easily than just staring at the code. Perhaps in this
-way, the algorithms can be improved on, borrowed from, and/or
-optimized by more people.
-
-
-Memory model
-============
-
-.. index::
-   pair: ndarray; memory model
-
-One fundamental aspect of the ndarray is that an array is seen as a
-"chunk" of memory starting at some location. The interpretation of
-this memory depends on the stride information. For each dimension in
-an :math:`N` -dimensional array, an integer (stride) dictates how many
-bytes must be skipped to get to the next element in that dimension.
-Unless you have a single-segment array, this stride information must
-be consulted when traversing through an array. It is not difficult to
-write code that accepts strides, you just have to use (char \*)
-pointers because strides are in units of bytes. Keep in mind also that
-strides do not have to be unit-multiples of the element size. Also,
-remember that if the number of dimensions of the array is 0 (sometimes
-called a rank-0 array), then the strides and dimensions variables are
-NULL.
-
-Besides the structural information contained in the strides and
-dimensions members of the :c:type:`PyArrayObject`, the flags contain
-important information about how the data may be accessed. In particular,
-the :c:data:`NPY_ARRAY_ALIGNED` flag is set when the memory is on a
-suitable boundary according to the data-type array. Even if you have
-a contiguous chunk of memory, you cannot just assume it is safe to
-dereference a data- type-specific pointer to an element. Only if the
-:c:data:`NPY_ARRAY_ALIGNED` flag is set is this a safe operation (on
-some platforms it will work but on others, like Solaris, it will cause
-a bus error). The :c:data:`NPY_ARRAY_WRITEABLE` should also be ensured
-if you plan on writing to the memory area of the array. It is also
-possible to obtain a pointer to an unwritable memory area. Sometimes,
-writing to the memory area when the :c:data:`NPY_ARRAY_WRITEABLE` flag is not
-set will just be rude. Other times it can cause program crashes ( *e.g.*
-a data-area that is a read-only memory-mapped file).
-
-
-Data-type encapsulation
-=======================
-
-.. index::
-   single: dtype
-
-The data-type is an important abstraction of the ndarray. Operations
-will look to the data-type to provide the key functionality that is
-needed to operate on the array. This functionality is provided in the
-list of function pointers pointed to by the 'f' member of the
-:c:type:`PyArray_Descr` structure. In this way, the number of data-types can be
-extended simply by providing a :c:type:`PyArray_Descr` structure with suitable
-function pointers in the 'f' member. For built-in types there are some
-optimizations that by-pass this mechanism, but the point of the data-
-type abstraction is to allow new data-types to be added.
-
-One of the built-in data-types, the void data-type allows for
-arbitrary structured types containing 1 or more fields as elements of the
-array. A field is simply another data-type object along with an offset
-into the current structured type. In order to support arbitrarily nested
-fields, several recursive implementations of data-type access are
-implemented for the void type. A common idiom is to cycle through the
-elements of the dictionary and perform a specific operation based on
-the data-type object stored at the given offset. These offsets can be
-arbitrary numbers. Therefore, the possibility of encountering mis-
-aligned data must be recognized and taken into account if necessary.
-
-
-N-D Iterators
-=============
-
-.. index::
-   single: array iterator
-
-A very common operation in much of NumPy code is the need to iterate
-over all the elements of a general, strided, N-dimensional array. This
-operation of a general-purpose N-dimensional loop is abstracted in the
-notion of an iterator object. To write an N-dimensional loop, you only
-have to create an iterator object from an ndarray, work with the
-dataptr member of the iterator object structure and call the macro
-:c:func:`PyArray_ITER_NEXT` (it) on the iterator object to move to the next
-element. The "next" element is always in C-contiguous order. The macro
-works by first special casing the C-contiguous, 1-D, and 2-D cases
-which work very simply.
-
-For the general case, the iteration works by keeping track of a list
-of coordinate counters in the iterator object. At each iteration, the
-last coordinate counter is increased (starting from 0). If this
-counter is smaller than one less than the size of the array in that
-dimension (a pre-computed and stored value), then the counter is
-increased and the dataptr member is increased by the strides in that
-dimension and the macro ends. If the end of a dimension is reached,
-the counter for the last dimension is reset to zero and the dataptr is
-moved back to the beginning of that dimension by subtracting the
-strides value times one less than the number of elements in that
-dimension (this is also pre-computed and stored in the backstrides
-member of the iterator object). In this case, the macro does not end,
-but a local dimension counter is decremented so that the next-to-last
-dimension replaces the role that the last dimension played and the
-previously-described tests are executed again on the next-to-last
-dimension. In this way, the dataptr is adjusted appropriately for
-arbitrary striding.
-
-The coordinates member of the :c:type:`PyArrayIterObject` structure maintains
-the current N-d counter unless the underlying array is C-contiguous in
-which case the coordinate counting is by-passed. The index member of
-the :c:type:`PyArrayIterObject` keeps track of the current flat index of the
-iterator. It is updated by the :c:func:`PyArray_ITER_NEXT` macro.
-
-
-Broadcasting
-============
-
-.. index::
-   single: broadcasting
-
-In Numeric, the ancestor of Numpy, broadcasting was implemented in several
-lines of code buried deep in ufuncobject.c. In NumPy, the notion of broadcasting
-has been abstracted so that it can be performed in multiple places.
-Broadcasting is handled by the function :c:func:`PyArray_Broadcast`. This
-function requires a :c:type:`PyArrayMultiIterObject` (or something that is a
-binary equivalent) to be passed in. The :c:type:`PyArrayMultiIterObject` keeps
-track of the broadcast number of dimensions and size in each
-dimension along with the total size of the broadcast result. It also
-keeps track of the number of arrays being broadcast and a pointer to
-an iterator for each of the arrays being broadcast.
-
-The :c:func:`PyArray_Broadcast` function takes the iterators that have already
-been defined and uses them to determine the broadcast shape in each
-dimension (to create the iterators at the same time that broadcasting
-occurs then use the :c:func:`PyArray_MultiIterNew` function).
-Then, the iterators are
-adjusted so that each iterator thinks it is iterating over an array
-with the broadcast size. This is done by adjusting the iterators
-number of dimensions, and the shape in each dimension. This works
-because the iterator strides are also adjusted. Broadcasting only
-adjusts (or adds) length-1 dimensions. For these dimensions, the
-strides variable is simply set to 0 so that the data-pointer for the
-iterator over that array doesn't move as the broadcasting operation
-operates over the extended dimension.
-
-Broadcasting was always implemented in Numeric using 0-valued strides
-for the extended dimensions. It is done in exactly the same way in
-NumPy. The big difference is that now the array of strides is kept
-track of in a :c:type:`PyArrayIterObject`, the iterators involved in a
-broadcast result are kept track of in a :c:type:`PyArrayMultiIterObject`,
-and the :c:func:`PyArray_Broadcast` call implements the broad-casting rules.
-
-
-Array Scalars
-=============
-
-.. index::
-   single: array scalars
-
-The array scalars offer a hierarchy of Python types that allow a one-
-to-one correspondence between the data-type stored in an array and the
-Python-type that is returned when an element is extracted from the
-array. An exception to this rule was made with object arrays. Object
-arrays are heterogeneous collections of arbitrary Python objects. When
-you select an item from an object array, you get back the original
-Python object (and not an object array scalar which does exist but is
-rarely used for practical purposes).
-
-The array scalars also offer the same methods and attributes as arrays
-with the intent that the same code can be used to support arbitrary
-dimensions (including 0-dimensions). The array scalars are read-only
-(immutable) with the exception of the void scalar which can also be
-written to so that structured array field setting works more naturally
-(a[0]['f1'] = ``value`` ).
-
-
-Indexing
-========
-
-.. index::
-   single: indexing
-
-All python indexing operations ``arr[index]`` are organized by first preparing
-the index and finding the index type. The supported index types are:
-
-* integer
-* newaxis
-* slice
-* ellipsis
-* integer arrays/array-likes (fancy)
-* boolean (single boolean array); if there is more than one boolean array as
-  index or the shape does not match exactly, the boolean array will be
-  converted to an integer array instead.
-* 0-d boolean (and also integer); 0-d boolean arrays are a special
-  case which has to be handled in the advanced indexing code. They signal
-  that a 0-d boolean array had to be interpreted as an integer array.
-
-As well as the scalar array special case signaling that an integer array
-was interpreted as an integer index, which is important because an integer
-array index forces a copy but is ignored if a scalar is returned (full integer
-index). The prepared index is guaranteed to be valid with the exception of
-out of bound values and broadcasting errors for advanced indexing. This
-includes that an ellipsis is added for incomplete indices for example when
-a two dimensional array is indexed with a single integer.
-
-The next step depends on the type of index which was found. If all
-dimensions are indexed with an integer a scalar is returned or set. A
-single boolean indexing array will call specialized boolean functions.
-Indices containing an ellipsis or slice but no advanced indexing will
-always create a view into the old array by calculating the new strides and
-memory offset.  This view can then either be returned or, for assignments,
-filled using :c:func:`PyArray_CopyObject`. Note that `PyArray_CopyObject`
-may also be called on temporary arrays in other branches to support
-complicated assignments when the array is of object dtype.
-
-Advanced indexing
------------------
-
-By far the most complex case is advanced indexing, which may or may not be
-combined with typical view based indexing. Here integer indices are
-interpreted as view based. Before trying to understand this, you may want
-to make yourself familiar with its subtleties. The advanced indexing code
-has three different branches and one special case:
-
-* There is one indexing array and it, as well as the assignment array, can
-  be iterated trivially. For example they may be contiguous. Also the
-  indexing array must be of `intp` type and the value array in assignments
-  should be of the correct type. This is purely a fast path.
-* There are only integer array indices so that no subarray exists.
-* View based and advanced indexing is mixed. In this case the view based
-  indexing defines a collection of subarrays that are combined by the
-  advanced indexing. For example, ``arr[[1, 2, 3], :]`` is created by
-  vertically stacking the subarrays ``arr[1, :]``, ``arr[2,:]``, and
-  ``arr[3, :]``.
-* There is a subarray but it has exactly one element. This case can be handled
-  as if there is no subarray, but needs some care during setup.
-
-Deciding what case applies, checking broadcasting, and determining the kind
-of transposition needed are all done in `PyArray_MapIterNew`. After setting
-up, there are two cases. If there is no subarray or it only has one
-element, no subarray iteration is necessary and an iterator is prepared
-which iterates all indexing arrays *as well as* the result or value array.
-If there is a subarray, there are three iterators prepared. One for the
-indexing arrays, one for the result or value array (minus its subarray),
-and one for the subarrays of the original and the result/assignment array.
-The first two iterators give (or allow calculation) of the pointers into
-the start of the subarray, which then allows to restart the subarray
-iteration.
-
-When advanced indices are next to each other transposing may be necessary.
-All necessary transposing is handled by :c:func:`PyArray_MapIterSwapAxes` and
-has to be handled by the caller unless `PyArray_MapIterNew` is asked to
-allocate the result.
-
-After preparation, getting and setting is relatively straight forward,
-although the different modes of iteration need to be considered. Unless
-there is only a single indexing array during item getting, the validity of
-the indices is checked beforehand. Otherwise it is handled in the inner
-loop itself for optimization.
-
-
-Universal Functions
-===================
-
-.. index::
-   single: ufunc
-
-Universal functions are callable objects that take :math:`N` inputs
-and produce :math:`M` outputs by wrapping basic 1-D loops that work
-element-by-element into full easy-to use functions that seamlessly
-implement broadcasting, type-checking and buffered coercion, and
-output-argument handling. New universal functions are normally created
-in C, although there is a mechanism for creating ufuncs from Python
-functions (:func:`frompyfunc`). The user must supply a 1-D loop that
-implements the basic function taking the input scalar values and
-placing the resulting scalars into the appropriate output slots as
-explained in implementation.
-
-
-Setup
------
-
-Every ufunc calculation involves some overhead related to setting up
-the calculation. The practical significance of this overhead is that
-even though the actual calculation of the ufunc is very fast, you will
-be able to write array and type-specific code that will work faster
-for small arrays than the ufunc. In particular, using ufuncs to
-perform many calculations on 0-D arrays will be slower than other
-Python-based solutions (the silently-imported scalarmath module exists
-precisely to give array scalars the look-and-feel of ufunc based
-calculations with significantly reduced overhead).
-
-When a ufunc is called, many things must be done. The information
-collected from these setup operations is stored in a loop-object. This
-loop object is a C-structure (that could become a Python object but is
-not initialized as such because it is only used internally). This loop
-object has the layout needed to be used with PyArray_Broadcast so that
-the broadcasting can be handled in the same way as it is handled in
-other sections of code.
-
-The first thing done is to look-up in the thread-specific global
-dictionary the current values for the buffer-size, the error mask, and
-the associated error object. The state of the error mask controls what
-happens when an error condition is found. It should be noted that
-checking of the hardware error flags is only performed after each 1-D
-loop is executed. This means that if the input and output arrays are
-contiguous and of the correct type so that a single 1-D loop is
-performed, then the flags may not be checked until all elements of the
-array have been calculated. Looking up these values in a thread-
-specific dictionary takes time which is easily ignored for all but
-very small arrays.
-
-After checking, the thread-specific global variables, the inputs are
-evaluated to determine how the ufunc should proceed and the input and
-output arrays are constructed if necessary. Any inputs which are not
-arrays are converted to arrays (using context if necessary). Which of
-the inputs are scalars (and therefore converted to 0-D arrays) is
-noted.
-
-Next, an appropriate 1-D loop is selected from the 1-D loops available
-to the ufunc based on the input array types. This 1-D loop is selected
-by trying to match the signature of the data-types of the inputs
-against the available signatures. The signatures corresponding to
-built-in types are stored in the types member of the ufunc structure.
-The signatures corresponding to user-defined types are stored in a
-linked-list of function-information with the head element stored as a
-``CObject`` in the userloops dictionary keyed by the data-type number
-(the first user-defined type in the argument list is used as the key).
-The signatures are searched until a signature is found to which the
-input arrays can all be cast safely (ignoring any scalar arguments
-which are not allowed to determine the type of the result). The
-implication of this search procedure is that "lesser types" should be
-placed below "larger types" when the signatures are stored. If no 1-D
-loop is found, then an error is reported. Otherwise, the argument_list
-is updated with the stored signature --- in case casting is necessary
-and to fix the output types assumed by the 1-D loop.
-
-If the ufunc has 2 inputs and 1 output and the second input is an
-Object array then a special-case check is performed so that
-NotImplemented is returned if the second input is not an ndarray, has
-the __array_priority\__ attribute, and has an __r{op}\__ special
-method. In this way, Python is signaled to give the other object a
-chance to complete the operation instead of using generic object-array
-calculations. This allows (for example) sparse matrices to override
-the multiplication operator 1-D loop.
-
-For input arrays that are smaller than the specified buffer size,
-copies are made of all non-contiguous, mis-aligned, or out-of-
-byteorder arrays to ensure that for small arrays, a single loop is
-used. Then, array iterators are created for all the input arrays and
-the resulting collection of iterators is broadcast to a single shape.
-
-The output arguments (if any) are then processed and any missing
-return arrays are constructed. If any provided output array doesn't
-have the correct type (or is mis-aligned) and is smaller than the
-buffer size, then a new output array is constructed with the special
-:c:data:`NPY_ARRAY_WRITEBACKIFCOPY` flag set. At the end of the function,
-:c:func:`PyArray_ResolveWritebackIfCopy` is called so that 
-its contents will be copied back into the output array.
-Iterators for the output arguments are then processed.
-
-Finally, the decision is made about how to execute the looping
-mechanism to ensure that all elements of the input arrays are combined
-to produce the output arrays of the correct type. The options for loop
-execution are one-loop (for contiguous, aligned, and correct data
-type), strided-loop (for non-contiguous but still aligned and correct
-data type), and a buffered loop (for mis-aligned or incorrect data
-type situations). Depending on which execution method is called for,
-the loop is then setup and computed.
-
-
-Function call
--------------
-
-This section describes how the basic universal function computation loop is
-setup and executed for each of the three different kinds of execution. If
-:c:data:`NPY_ALLOW_THREADS` is defined during compilation, then as long as
-no object arrays are involved, the Python Global Interpreter Lock (GIL) is
-released prior to calling the loops.  It is re-acquired if necessary to
-handle error conditions. The hardware error flags are checked only after
-the 1-D loop is completed.
-
-
-One Loop
-^^^^^^^^
-
-This is the simplest case of all. The ufunc is executed by calling the
-underlying 1-D loop exactly once. This is possible only when we have
-aligned data of the correct type (including byte-order) for both input
-and output and all arrays have uniform strides (either contiguous,
-0-D, or 1-D). In this case, the 1-D computational loop is called once
-to compute the calculation for the entire array. Note that the
-hardware error flags are only checked after the entire calculation is
-complete.
-
-
-Strided Loop
-^^^^^^^^^^^^
-
-When the input and output arrays are aligned and of the correct type,
-but the striding is not uniform (non-contiguous and 2-D or larger),
-then a second looping structure is employed for the calculation. This
-approach converts all of the iterators for the input and output
-arguments to iterate over all but the largest dimension. The inner
-loop is then handled by the underlying 1-D computational loop. The
-outer loop is a standard iterator loop on the converted iterators. The
-hardware error flags are checked after each 1-D loop is completed.
-
-
-Buffered Loop
-^^^^^^^^^^^^^
-
-This is the code that handles the situation whenever the input and/or
-output arrays are either misaligned or of the wrong data-type
-(including being byte-swapped) from what the underlying 1-D loop
-expects. The arrays are also assumed to be non-contiguous. The code
-works very much like the strided-loop except for the inner 1-D loop is
-modified so that pre-processing is performed on the inputs and post-
-processing is performed on the outputs in bufsize chunks (where
-bufsize is a user-settable parameter). The underlying 1-D
-computational loop is called on data that is copied over (if it needs
-to be). The setup code and the loop code is considerably more
-complicated in this case because it has to handle:
-
-- memory allocation of the temporary buffers
-
-- deciding whether or not to use buffers on the input and output data
-  (mis-aligned and/or wrong data-type)
-
-- copying and possibly casting data for any inputs or outputs for which
-  buffers are necessary.
-
-- special-casing Object arrays so that reference counts are properly
-  handled when copies and/or casts are necessary.
-
-- breaking up the inner 1-D loop into bufsize chunks (with a possible
-  remainder).
-
-Again, the hardware error flags are checked at the end of each 1-D
-loop.
-
-
-Final output manipulation
--------------------------
-
-Ufuncs allow other array-like classes to be passed seamlessly through
-the interface in that inputs of a particular class will induce the
-outputs to be of that same class. The mechanism by which this works is
-the following. If any of the inputs are not ndarrays and define the
-:obj:`~numpy.class.__array_wrap__` method, then the class with the largest
-:obj:`~numpy.class.__array_priority__` attribute determines the type of all the
-outputs (with the exception of any output arrays passed in). The
-:obj:`~numpy.class.__array_wrap__` method of the input array will be called with the
-ndarray being returned from the ufunc as it's input. There are two
-calling styles of the :obj:`~numpy.class.__array_wrap__` function supported. The first
-takes the ndarray as the first argument and a tuple of "context" as
-the second argument. The context is (ufunc, arguments, output argument
-number). This is the first call tried. If a TypeError occurs, then the
-function is called with just the ndarray as the first argument.
-
-
-Methods
--------
-
-There are three methods of ufuncs that require calculation similar to
-the general-purpose ufuncs. These are reduce, accumulate, and
-reduceat. Each of these methods requires a setup command followed by a
-loop. There are four loop styles possible for the methods
-corresponding to no-elements, one-element, strided-loop, and buffered-
-loop. These are the same basic loop styles as implemented for the
-general purpose function call except for the no-element and one-
-element cases which are special-cases occurring when the input array
-objects have 0 and 1 elements respectively.
-
-
-Setup
-^^^^^
-
-The setup function for all three methods is ``construct_reduce``.
-This function creates a reducing loop object and fills it with
-parameters needed to complete the loop. All of the methods only work
-on ufuncs that take 2-inputs and return 1 output. Therefore, the
-underlying 1-D loop is selected assuming a signature of [ ``otype``,
-``otype``, ``otype`` ] where ``otype`` is the requested reduction
-data-type. The buffer size and error handling is then retrieved from
-(per-thread) global storage. For small arrays that are mis-aligned or
-have incorrect data-type, a copy is made so that the un-buffered
-section of code is used. Then, the looping strategy is selected. If
-there is 1 element or 0 elements in the array, then a simple looping
-method is selected. If the array is not mis-aligned and has the
-correct data-type, then strided looping is selected. Otherwise,
-buffered looping must be performed. Looping parameters are then
-established, and the return array is constructed.  The output array is
-of a different shape depending on whether the method is reduce,
-accumulate, or reduceat. If an output array is already provided, then
-it's shape is checked. If the output array is not C-contiguous,
-aligned, and of the correct data type, then a temporary copy is made
-with the WRITEBACKIFCOPY flag set. In this way, the methods will be able
-to work with a well-behaved output array but the result will be copied
-back into the true output array when :c:func:`PyArray_ResolveWritebackIfCopy`
-is called at function completion.
-Finally, iterators are set up to loop over the correct axis
-(depending on the value of axis provided to the method) and the setup
-routine returns to the actual computation routine.
-
-
-Reduce
-^^^^^^
-
-.. index::
-   triple: ufunc; methods; reduce
-
-All of the ufunc methods use the same underlying 1-D computational
-loops with input and output arguments adjusted so that the appropriate
-reduction takes place. For example, the key to the functioning of
-reduce is that the 1-D loop is called with the output and the second
-input pointing to the same position in memory and both having a step-
-size of 0. The first input is pointing to the input array with a step-
-size given by the appropriate stride for the selected axis. In this
-way, the operation performed is
-
-.. math::
-   :nowrap:
-
-   \begin{align*}
-   o & = & i[0] \\
-   o & = & i[k]\textrm{<op>}o\quad k=1\ldots N
-   \end{align*}
-
-where :math:`N+1` is the number of elements in the input, :math:`i`,
-:math:`o` is the output, and :math:`i[k]` is the
-:math:`k^{\textrm{th}}` element of :math:`i` along the selected axis.
-This basic operations is repeated for arrays with greater than 1
-dimension so that the reduction takes place for every 1-D sub-array
-along the selected axis. An iterator with the selected dimension
-removed handles this looping.
-
-For buffered loops, care must be taken to copy and cast data before
-the loop function is called because the underlying loop expects
-aligned data of the correct data-type (including byte-order). The
-buffered loop must handle this copying and casting prior to calling
-the loop function on chunks no greater than the user-specified
-bufsize.
-
-
-Accumulate
-^^^^^^^^^^
-
-.. index::
-   triple: ufunc; methods; accumulate
-
-The accumulate function is very similar to the reduce function in that
-the output and the second input both point to the output. The
-difference is that the second input points to memory one stride behind
-the current output pointer. Thus, the operation performed is
-
-.. math::
-   :nowrap:
-
-   \begin{align*}
-   o[0] & = & i[0] \\
-   o[k] & = & i[k]\textrm{<op>}o[k-1]\quad k=1\ldots N.
-   \end{align*}
-
-The output has the same shape as the input and each 1-D loop operates
-over :math:`N` elements when the shape in the selected axis is :math:`N+1`.
-Again, buffered loops take care to copy and cast the data before
-calling the underlying 1-D computational loop.
-
-
-Reduceat
-^^^^^^^^
-
-.. index::
-   triple: ufunc; methods; reduceat
-   single: ufunc
-
-The reduceat function is a generalization of both the reduce and
-accumulate functions. It implements a reduce over ranges of the input
-array specified by indices. The extra indices argument is checked to
-be sure that every input is not too large for the input array along
-the selected dimension before the loop calculations take place. The
-loop implementation is handled using code that is very similar to the
-reduce code repeated as many times as there are elements in the
-indices input. In particular: the first input pointer passed to the
-underlying 1-D computational loop points to the input array at the
-correct location indicated by the index array. In addition, the output
-pointer and the second input pointer passed to the underlying 1-D loop
-point to the same position in memory. The size of the 1-D
-computational loop is fixed to be the difference between the current
-index and the next index (when the current index is the last index,
-then the next index is assumed to be the length of the array along the
-selected dimension). In this way, the 1-D loop will implement a reduce
-over the specified indices.
-
-Mis-aligned or a loop data-type that does not match the input and/or
-output data-type is handled using buffered code where-in data is
-copied to a temporary buffer and cast to the correct data-type if
-necessary prior to calling the underlying 1-D function. The temporary
-buffers are created in (element) sizes no bigger than the user
-settable buffer-size value. Thus, the loop must be flexible enough to
-call the underlying 1-D computational loop enough times to complete
-the total calculation in chunks no bigger than the buffer-size.
+This document has been moved to :ref:`c-code-explanations`.
+\ No newline at end of file
diff --git a/doc/source/reference/internals.rst b/doc/source/reference/internals.rst
index ed8042c08..7a5e6374c 100644
--- a/doc/source/reference/internals.rst
+++ b/doc/source/reference/internals.rst
@@ -1,168 +1,10 @@
-.. _numpy-internals:
+:orphan:
 
 ***************
 NumPy internals
 ***************
 
-.. toctree::
-
-   internals.code-explanations
-   alignment
-
-Internal organization of numpy arrays
-=====================================
-
-It helps to understand a bit about how numpy arrays are handled under the covers to help understand numpy better. This section will not go into great detail. Those wishing to understand the full details are referred to Travis Oliphant's book "Guide to NumPy".
-
-NumPy arrays consist of two major components, the raw array data (from now on,
-referred to as the data buffer), and the information about the raw array data.
-The data buffer is typically what people think of as arrays in C or Fortran,
-a contiguous (and fixed) block of memory containing fixed sized data items.
-NumPy also contains a significant set of data that describes how to interpret
-the data in the data buffer. This extra information contains (among other things):
-
- 1) The basic data element's size in bytes
- 2) The start of the data within the data buffer (an offset relative to the
-    beginning of the data buffer).
- 3) The number of dimensions and the size of each dimension
- 4) The separation between elements for each dimension (the 'stride'). This
-    does not have to be a multiple of the element size
- 5) The byte order of the data (which may not be the native byte order)
- 6) Whether the buffer is read-only
- 7) Information (via the dtype object) about the interpretation of the basic
-    data element. The basic data element may be as simple as a int or a float,
-    or it may be a compound object (e.g., struct-like), a fixed character field,
-    or Python object pointers.
- 8) Whether the array is to interpreted as C-order or Fortran-order.
-
-This arrangement allow for very flexible use of arrays. One thing that it allows
-is simple changes of the metadata to change the interpretation of the array buffer.
-Changing the byteorder of the array is a simple change involving no rearrangement
-of the data. The shape of the array can be changed very easily without changing
-anything in the data buffer or any data copying at all
-
-Among other things that are made possible is one can create a new array metadata
-object that uses the same data buffer
-to create a new view of that data buffer that has a different interpretation
-of the buffer (e.g., different shape, offset, byte order, strides, etc) but
-shares the same data bytes. Many operations in numpy do just this such as
-slices. Other operations, such as transpose, don't move data elements
-around in the array, but rather change the information about the shape and strides so that the indexing of the array changes, but the data in the doesn't move.
-
-Typically these new versions of the array metadata but the same data buffer are
-new 'views' into the data buffer. There is a different ndarray object, but it
-uses the same data buffer. This is why it is necessary to force copies through
-use of the .copy() method if one really wants to make a new and independent
-copy of the data buffer.
-
-New views into arrays mean the object reference counts for the data buffer
-increase. Simply doing away with the original array object will not remove the
-data buffer if other views of it still exist.
-
-Multidimensional Array Indexing Order Issues
-============================================
-
-What is the right way to index
-multi-dimensional arrays? Before you jump to conclusions about the one and
-true way to index multi-dimensional arrays, it pays to understand why this is
-a confusing issue. This section will try to explain in detail how numpy
-indexing works and why we adopt the convention we do for images, and when it
-may be appropriate to adopt other conventions.
-
-The first thing to understand is
-that there are two conflicting conventions for indexing 2-dimensional arrays.
-Matrix notation uses the first index to indicate which row is being selected and
-the second index to indicate which column is selected. This is opposite the
-geometrically oriented-convention for images where people generally think the
-first index represents x position (i.e., column) and the second represents y
-position (i.e., row). This alone is the source of much confusion;
-matrix-oriented users and image-oriented users expect two different things with
-regard to indexing.
-
-The second issue to understand is how indices correspond
-to the order the array is stored in memory. In Fortran the first index is the
-most rapidly varying index when moving through the elements of a two
-dimensional array as it is stored in memory. If you adopt the matrix
-convention for indexing, then this means the matrix is stored one column at a
-time (since the first index moves to the next row as it changes). Thus Fortran
-is considered a Column-major language. C has just the opposite convention. In
-C, the last index changes most rapidly as one moves through the array as
-stored in memory. Thus C is a Row-major language. The matrix is stored by
-rows. Note that in both cases it presumes that the matrix convention for
-indexing is being used, i.e., for both Fortran and C, the first index is the
-row. Note this convention implies that the indexing convention is invariant
-and that the data order changes to keep that so.
-
-But that's not the only way
-to look at it. Suppose one has large two-dimensional arrays (images or
-matrices) stored in data files. Suppose the data are stored by rows rather than
-by columns. If we are to preserve our index convention (whether matrix or
-image) that means that depending on the language we use, we may be forced to
-reorder the data if it is read into memory to preserve our indexing
-convention. For example if we read row-ordered data into memory without
-reordering, it will match the matrix indexing convention for C, but not for
-Fortran. Conversely, it will match the image indexing convention for Fortran,
-but not for C. For C, if one is using data stored in row order, and one wants
-to preserve the image index convention, the data must be reordered when
-reading into memory.
-
-In the end, which you do for Fortran or C depends on
-which is more important, not reordering data or preserving the indexing
-convention. For large images, reordering data is potentially expensive, and
-often the indexing convention is inverted to avoid that.
-
-The situation with
-numpy makes this issue yet more complicated. The internal machinery of numpy
-arrays is flexible enough to accept any ordering of indices. One can simply
-reorder indices by manipulating the internal stride information for arrays
-without reordering the data at all. NumPy will know how to map the new index
-order to the data without moving the data.
-
-So if this is true, why not choose
-the index order that matches what you most expect? In particular, why not define
-row-ordered images to use the image convention? (This is sometimes referred
-to as the Fortran convention vs the C convention, thus the 'C' and 'FORTRAN'
-order options for array ordering in numpy.) The drawback of doing this is
-potential performance penalties. It's common to access the data sequentially,
-either implicitly in array operations or explicitly by looping over rows of an
-image. When that is done, then the data will be accessed in non-optimal order.
-As the first index is incremented, what is actually happening is that elements
-spaced far apart in memory are being sequentially accessed, with usually poor
-memory access speeds. For example, for a two dimensional image 'im' defined so
-that im[0, 10] represents the value at x=0, y=10. To be consistent with usual
-Python behavior then im[0] would represent a column at x=0. Yet that data
-would be spread over the whole array since the data are stored in row order.
-Despite the flexibility of numpy's indexing, it can't really paper over the fact
-basic operations are rendered inefficient because of data order or that getting
-contiguous subarrays is still awkward (e.g., im[:,0] for the first row, vs
-im[0]), thus one can't use an idiom such as for row in im; for col in im does
-work, but doesn't yield contiguous column data.
-
-As it turns out, numpy is
-smart enough when dealing with ufuncs to determine which index is the most
-rapidly varying one in memory and uses that for the innermost loop. Thus for
-ufuncs there is no large intrinsic advantage to either approach in most cases.
-On the other hand, use of .flat with an FORTRAN ordered array will lead to
-non-optimal memory access as adjacent elements in the flattened array (iterator,
-actually) are not contiguous in memory.
-
-Indeed, the fact is that Python
-indexing on lists and other sequences naturally leads to an outside-to inside
-ordering (the first index gets the largest grouping, the next the next largest,
-and the last gets the smallest element). Since image data are normally stored
-by rows, this corresponds to position within rows being the last item indexed.
-
-If you do want to use Fortran ordering realize that
-there are two approaches to consider: 1) accept that the first index is just not
-the most rapidly changing in memory and have all your I/O routines reorder
-your data when going from memory to disk or visa versa, or use numpy's
-mechanism for mapping the first index to the most rapidly varying data. We
-recommend the former if possible. The disadvantage of the latter is that many
-of numpy's functions will yield arrays without Fortran ordering unless you are
-careful to use the 'order' keyword. Doing this would be highly inconvenient.
-
-Otherwise we recommend simply learning to reverse the usual order of indices
-when accessing elements of an array. Granted, it goes against the grain, but
-it is more in line with Python semantics and the natural order of the data.
+.. This document has been moved to ../dev/internals.rst.
 
+This document has been moved to :ref:`numpy-internals`.
 
diff --git a/doc/source/reference/random/bit_generators/index.rst b/doc/source/reference/random/bit_generators/index.rst
index c5c349806..211f0d60e 100644
--- a/doc/source/reference/random/bit_generators/index.rst
+++ b/doc/source/reference/random/bit_generators/index.rst
@@ -4,7 +4,7 @@ Bit Generators
 --------------
 
 The random values produced by :class:`~Generator`
-orignate in a BitGenerator.  The BitGenerators do not directly provide
+originate in a BitGenerator.  The BitGenerators do not directly provide
 random numbers and only contains methods used for seeding, getting or
 setting the state, jumping or advancing the state, and for accessing
 low-level wrappers for consumption by code that can efficiently
diff --git a/doc/source/reference/random/index.rst b/doc/source/reference/random/index.rst
index 96cd47017..aaabc9b39 100644
--- a/doc/source/reference/random/index.rst
+++ b/doc/source/reference/random/index.rst
@@ -55,7 +55,7 @@ properties than the legacy `MT19937` used in `RandomState`.
   more_vals = random.standard_normal(10)
 
 `Generator` can be used as a replacement for `RandomState`. Both class
-instances hold a internal `BitGenerator` instance to provide the bit
+instances hold an internal `BitGenerator` instance to provide the bit
 stream, it is accessible as ``gen.bit_generator``. Some long-overdue API
 cleanup means that legacy and compatibility methods have been removed from
 `Generator`
diff --git a/doc/source/reference/random/performance.rst b/doc/source/reference/random/performance.rst
index 85855be59..cb9b94113 100644
--- a/doc/source/reference/random/performance.rst
+++ b/doc/source/reference/random/performance.rst
@@ -13,7 +13,7 @@ full-featured, and fast on most platforms, but somewhat slow when compiled for
 parallelism would indicate using `PCG64DXSM`.
 
 `Philox` is fairly slow, but its statistical properties have
-very high quality, and it is easy to get assuredly-independent stream by using
+very high quality, and it is easy to get an assuredly-independent stream by using
 unique keys. If that is the style you wish to use for parallel streams, or you
 are porting from another system that uses that style, then
 `Philox` is your choice.
diff --git a/doc/source/reference/routines.ma.rst b/doc/source/reference/routines.ma.rst
index d961cbf02..5404c43d8 100644
--- a/doc/source/reference/routines.ma.rst
+++ b/doc/source/reference/routines.ma.rst
@@ -44,7 +44,9 @@ Ones and zeros
    ma.masked_all
    ma.masked_all_like
    ma.ones
+   ma.ones_like
    ma.zeros
+   ma.zeros_like
 
 
 _____
@@ -287,11 +289,11 @@ Filling a masked array
 
 _____
 
-Masked arrays arithmetics
-=========================
+Masked arrays arithmetic
+========================
 
-Arithmetics
-~~~~~~~~~~~
+Arithmetic
+~~~~~~~~~~
 .. autosummary::
    :toctree: generated/
 
@@ -331,6 +333,7 @@ Minimum/maximum
    ma.max
    ma.min
    ma.ptp
+   ma.diff
 
    ma.MaskedArray.argmax
    ma.MaskedArray.argmin
diff --git a/doc/source/reference/routines.math.rst b/doc/source/reference/routines.math.rst
index 3c2f96830..2a09b8d20 100644
--- a/doc/source/reference/routines.math.rst
+++ b/doc/source/reference/routines.math.rst
@@ -143,6 +143,21 @@ Handling complex numbers
    conj
    conjugate
 
+Extrema Finding
+---------------
+.. autosummary::
+   :toctree: generated/
+
+   maximum
+   fmax
+   amax
+   nanmax
+   
+   minimum
+   fmin
+   amin
+   nanmin
+   
 
 Miscellaneous
 -------------
@@ -160,11 +175,7 @@ Miscellaneous
    fabs
    sign
    heaviside
-   maximum
-   minimum
-   fmax
-   fmin
-
+   
    nan_to_num
    real_if_close
 
diff --git a/doc/source/reference/routines.polynomials.rst b/doc/source/reference/routines.polynomials.rst
index ecfb012f0..4aea963c0 100644
--- a/doc/source/reference/routines.polynomials.rst
+++ b/doc/source/reference/routines.polynomials.rst
@@ -22,7 +22,7 @@ Therefore :mod:`numpy.polynomial` is recommended for new coding.
    the polynomial functions prefixed with *poly* accessible from the `numpy`
    namespace (e.g. `numpy.polyadd`, `numpy.polyval`, `numpy.polyfit`, etc.).
 
-   The term *polynomial package* refers to the new API definied in 
+   The term *polynomial package* refers to the new API defined in 
    `numpy.polynomial`, which includes the convenience classes for the
    different kinds of polynomials (`numpy.polynomial.Polynomial`,
    `numpy.polynomial.Chebyshev`, etc.).
@@ -110,7 +110,7 @@ See the documentation for the
 `convenience classes <routines.polynomials.classes>`_ for further details on
 the ``domain`` and ``window`` attributes.
 
-Another major difference bewteen the legacy polynomial module and the
+Another major difference between the legacy polynomial module and the
 polynomial package is polynomial fitting. In the old module, fitting was
 done via the `~numpy.polyfit` function. In the polynomial package, the
 `~numpy.polynomial.polynomial.Polynomial.fit` class method is preferred. For
diff --git a/doc/source/reference/routines.statistics.rst b/doc/source/reference/routines.statistics.rst
index c675b6090..cd93e6025 100644
--- a/doc/source/reference/routines.statistics.rst
+++ b/doc/source/reference/routines.statistics.rst
@@ -9,11 +9,7 @@ Order statistics
 
 .. autosummary::
    :toctree: generated/
-
-   amin
-   amax
-   nanmin
-   nanmax
+   
    ptp
    percentile
    nanpercentile
diff --git a/doc/source/reference/simd/simd-optimizations.rst b/doc/source/reference/simd/simd-optimizations.rst
index 956824321..9de6d1734 100644
--- a/doc/source/reference/simd/simd-optimizations.rst
+++ b/doc/source/reference/simd/simd-optimizations.rst
@@ -14,7 +14,7 @@ written only once. There are three layers:
   written using the maximum set of intrinsics possible.
 - At *compile* time, a distutils command is used to define the minimum and
   maximum features to support, based on user choice and compiler support. The
-  appropriate macros are overlayed with the platform / architecture intrinsics,
+  appropriate macros are overlaid with the platform / architecture intrinsics,
   and the three loops are compiled.
 - At *runtime import*, the CPU is probed for the set of supported intrinsic
   features. A mechanism is used to grab the pointer to the most appropriate
@@ -89,7 +89,7 @@ NOTES
 ~~~~~~~~~~~~~
 - CPU features and other options are case-insensitive.
 
-- The order of the requsted optimizations doesn't matter.
+- The order of the requested optimizations doesn't matter.
 
 - Either commas or spaces can be used as a separator, e.g. ``--cpu-dispatch``\ =
   "avx2 avx512f" or ``--cpu-dispatch``\ = "avx2, avx512f" both work, but the
@@ -113,7 +113,7 @@ NOTES
   compiler native flag ``-march=native`` or ``-xHost`` or ``QxHost`` is
   enabled through environment variable ``CFLAGS``
 
-- The validation process for the requsted optimizations when it comes to
+- The validation process for the requested optimizations when it comes to
   ``--cpu-baseline`` isn't strict. For example, if the user requested
   ``AVX2`` but the compiler doesn't support it then we just skip it and return
   the maximum optimization that the compiler can handle depending on the
@@ -379,15 +379,15 @@ through ``--cpu-dispatch``, but it can also represent other options such as:
       #include "numpy/utils.h" // NPY_CAT, NPY_TOSTR
 
       #ifndef NPY__CPU_TARGET_CURRENT
-        // wrapping the dispatch-able source only happens to the addtional optimizations
-        // but if the keyword 'baseline' provided within the configuration statments,
+        // wrapping the dispatch-able source only happens to the additional optimizations
+        // but if the keyword 'baseline' provided within the configuration statements,
         // the infrastructure will add extra compiling for the dispatch-able source by
         // passing it as-is to the compiler without any changes.
         #define CURRENT_TARGET(X) X
         #define NPY__CPU_TARGET_CURRENT baseline // for printing only
       #else
         // since we reach to this point, that's mean we're dealing with
-          // the addtional optimizations, so it could be SSE42 or AVX512F
+          // the additional optimizations, so it could be SSE42 or AVX512F
         #define CURRENT_TARGET(X) NPY_CAT(NPY_CAT(X, _), NPY__CPU_TARGET_CURRENT)
       #endif
       // Macro 'CURRENT_TARGET' adding the current target as suffux to the exported symbols,
@@ -418,7 +418,7 @@ through ``--cpu-dispatch``, but it can also represent other options such as:
       #undef NPY__CPU_DISPATCH_BASELINE_CALL
       #undef NPY__CPU_DISPATCH_CALL
       // nothing strange here, just a normal preprocessor callback
-      // enabled only if 'baseline' spesfied withiin the configration statments
+      // enabled only if 'baseline' specified within the configuration statements
       #define NPY__CPU_DISPATCH_BASELINE_CALL(CB, ...) \
         NPY__CPU_DISPATCH_EXPAND_(CB(__VA_ARGS__))
       // 'NPY__CPU_DISPATCH_CALL' is an abstract macro is used for dispatching
@@ -427,7 +427,7 @@ through ``--cpu-dispatch``, but it can also represent other options such as:
       // @param CHK, Expected a macro that can be used to detect CPU features
       // in runtime, which takes a CPU feature name without string quotes and
       // returns the testing result in a shape of boolean value.
-      // NumPy already has macro called "NPY_CPU_HAVE", which fit this requirment.
+      // NumPy already has macro called "NPY_CPU_HAVE", which fits this requirement.
       //
       // @param CB, a callback macro that expected to be called multiple times depending
       // on the required optimizations, the callback should receive the following arguments:
diff --git a/doc/source/reference/ufuncs.rst b/doc/source/reference/ufuncs.rst
index b832dad04..6ace5b233 100644
--- a/doc/source/reference/ufuncs.rst
+++ b/doc/source/reference/ufuncs.rst
@@ -185,7 +185,7 @@ attribute of the ufunc.  (This list may be missing DTypes not defined
 by NumPy.)
 
 The ``signature`` only specifies the DType class/type.  For example, it
-can specifiy that the operation should be ``datetime64`` or ``float64``
+can specify that the operation should be ``datetime64`` or ``float64``
 operation.  It does not specify the ``datetime64`` time-unit or the
 ``float64`` byte-order.