summaryrefslogtreecommitdiff
path: root/doc/source/reference/c-api
diff options
context:
space:
mode:
authorMatti Picus <matti.picus@gmail.com>2021-10-25 21:53:48 +0300
committerGitHub <noreply@github.com>2021-10-25 13:53:48 -0500
commit84e0707afa587e7655410561324ac36085db2b95 (patch)
tree9f7db2f514f0c33dcaece64076025de94b9e1c70 /doc/source/reference/c-api
parent48e6ac6e120c6d408d85d4fdd3c4867e0195a758 (diff)
downloadnumpy-84e0707afa587e7655410561324ac36085db2b95.tar.gz
ENH: Configurable allocator (#17582)
Fixes gh-17467. Adds a public struct to hold memory manipulation routines PyDataMem_Handler and two new API functions PyDataMem_SetHandler to replace the current routines with the new ones, and PyDataMem_GetHandlerName to get the string name of the current routines (either globally or for a specific ndarray object). This also changes the size of the ndarray object to hold the PyDataMem_Handler active when it was created so subsequent actions on its data memory will remain consistent. Tests and documentation are included. Along the way, I found some places in the code where the current policy is inconsistent (all data memory handling should have gone through npy_*_cache not PyDataMem_*) so even if this is rejected it might improve the cache handling. The PyDataMem_Handler has fields to override memcpy, these are currently not implemented: memcpy in the code base is untouched. I think this PR is invasive enough as-is, if desired memcpy can be handled in a follow-up PR. * ENH: add and use global configurable memory routines * ENH: add tests and a way to compile c-extensions from tests * fix allocation/free exposed by tests * DOC: document the new APIs (and some old ones too) * BUG: return void from FREE, also some cleanup * MAINT: changes from review * fixes from linter * setting ndarray->descr on 0d or scalars mess with FREE * make scalar allocation more consistent wrt np_alloc_cache * change formatting for sphinx * remove memcpy variants * update to match NEP 49 * ENH: add a python-level get_handler_name * ENH: add core.multiarray.get_handler_name * Allow closure-like definition of the data mem routines * Fix incompatible pointer warnings * Note PyDataMemAllocator and PyMemAllocatorEx differentiation Co-authored-by: Matti Picus <matti.picus@gmail.com> * Redefine default allocator handling * Always allocate new arrays using the current_handler * Search for the mem_handler name of the data owner * Sub-comparisons don't need a local mem_handler * Make the default_handler a valid PyDataMem_Handler * Fix PyDataMem_SetHandler description (NEP discussion) * Pass the allocators by reference * Implement allocator context-locality * Fix documentation, make PyDataMem_GetHandler return const * remove import of setuptools==49.1.3, doesn't work on python3.10 * Fix refcount leaks * fix function signatures in test * Return early on PyDataMem_GetHandler error (VOID_compare) * Add context/thread-locality tests, allow testing custom policies * ENH: add and use global configurable memory routines * ENH: add tests and a way to compile c-extensions from tests * fix allocation/free exposed by tests * DOC: document the new APIs (and some old ones too) * BUG: return void from FREE, also some cleanup * MAINT: changes from review * fixes from linter * setting ndarray->descr on 0d or scalars mess with FREE * make scalar allocation more consistent wrt np_alloc_cache * change formatting for sphinx * remove memcpy variants * update to match NEP 49 * ENH: add a python-level get_handler_name * ENH: add core.multiarray.get_handler_name * Allow closure-like definition of the data mem routines * Fix incompatible pointer warnings * Note PyDataMemAllocator and PyMemAllocatorEx differentiation Co-authored-by: Matti Picus <matti.picus@gmail.com> * Redefine default allocator handling * Always allocate new arrays using the current_handler * Search for the mem_handler name of the data owner * Sub-comparisons don't need a local mem_handler * Make the default_handler a valid PyDataMem_Handler * Fix PyDataMem_SetHandler description (NEP discussion) * Pass the allocators by reference * remove import of setuptools==49.1.3, doesn't work on python3.10 * fix function signatures in test * try to fix cygwin extension building * YAPF mem_policy test * Less empty lines, more comments (tests) * Apply suggestions from code review (set an exception and) Co-authored-by: Matti Picus <matti.picus@gmail.com> * skip test on cygwin * update API hash for changed signature * TST: add gc.collect to make sure cycles are broken * Implement thread-locality for PyPy Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net> * Update numpy/core/tests/test_mem_policy.py Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net> * fixes from review * update circleci config * fix test * make the connection between OWNDATA and having a allocator handle more explicit * improve docstring, fix flake8 for tests * update PyDataMem_GetHandler() from review * Implement allocator lifetime management * update NEP and add best-effort handling of error in PyDataMem_UserFREE * ENH: fix and test for blindly taking ownership of data * Update doc/neps/nep-0049.rst Co-authored-by: Elias Koromilas <elias.koromilas@gmail.com>
Diffstat (limited to 'doc/source/reference/c-api')
-rw-r--r--doc/source/reference/c-api/data_memory.rst119
-rw-r--r--doc/source/reference/c-api/index.rst1
2 files changed, 120 insertions, 0 deletions
diff --git a/doc/source/reference/c-api/data_memory.rst b/doc/source/reference/c-api/data_memory.rst
new file mode 100644
index 000000000..8e2989403
--- /dev/null
+++ b/doc/source/reference/c-api/data_memory.rst
@@ -0,0 +1,119 @@
+Memory management in NumPy
+==========================
+
+The `numpy.ndarray` is a python class. It requires additional memory allocations
+to hold `numpy.ndarray.strides`, `numpy.ndarray.shape` and
+`numpy.ndarray.data` attributes. These attributes are specially allocated
+after creating the python object in `__new__`. The ``strides`` and
+``shape`` are stored in a piece of memory allocated internally.
+
+The ``data`` allocation used to store the actual array values (which could be
+pointers in the case of ``object`` arrays) can be very large, so NumPy has
+provided interfaces to manage its allocation and release. This document details
+how those interfaces work.
+
+Historical overview
+-------------------
+
+Since version 1.7.0, NumPy has exposed a set of ``PyDataMem_*`` functions
+(:c:func:`PyDataMem_NEW`, :c:func:`PyDataMem_FREE`, :c:func:`PyDataMem_RENEW`)
+which are backed by `alloc`, `free`, `realloc` respectively. In that version
+NumPy also exposed the `PyDataMem_EventHook` function described below, which
+wrap the OS-level calls.
+
+Since those early days, Python also improved its memory management
+capabilities, and began providing
+various :ref:`management policies <memoryoverview>` beginning in version
+3.4. These routines are divided into a set of domains, each domain has a
+:c:type:`PyMemAllocatorEx` structure of routines for memory management. Python also
+added a `tracemalloc` module to trace calls to the various routines. These
+tracking hooks were added to the NumPy ``PyDataMem_*`` routines.
+
+NumPy added a small cache of allocated memory in its internal
+``npy_alloc_cache``, ``npy_alloc_cache_zero``, and ``npy_free_cache``
+functions. These wrap ``alloc``, ``alloc-and-memset(0)`` and ``free``
+respectively, but when ``npy_free_cache`` is called, it adds the pointer to a
+short list of available blocks marked by size. These blocks can be re-used by
+subsequent calls to ``npy_alloc*``, avoiding memory thrashing.
+
+Configurable memory routines in NumPy (NEP 49)
+----------------------------------------------
+
+Users may wish to override the internal data memory routines with ones of their
+own. Since NumPy does not use the Python domain strategy to manage data memory,
+it provides an alternative set of C-APIs to change memory routines. There are
+no Python domain-wide strategies for large chunks of object data, so those are
+less suited to NumPy's needs. User who wish to change the NumPy data memory
+management routines can use :c:func:`PyDataMem_SetHandler`, which uses a
+:c:type:`PyDataMem_Handler` structure to hold pointers to functions used to
+manage the data memory. The calls are still wrapped by internal routines to
+call :c:func:`PyTraceMalloc_Track`, :c:func:`PyTraceMalloc_Untrack`, and will
+use the :c:func:`PyDataMem_EventHookFunc` mechanism. Since the functions may
+change during the lifetime of the process, each ``ndarray`` carries with it the
+functions used at the time of its instantiation, and these will be used to
+reallocate or free the data memory of the instance.
+
+.. c:type:: PyDataMem_Handler
+
+ A struct to hold function pointers used to manipulate memory
+
+ .. code-block:: c
+
+ typedef struct {
+ char name[128]; /* multiple of 64 to keep the struct aligned */
+ PyDataMemAllocator allocator;
+ } PyDataMem_Handler;
+
+ where the allocator structure is
+
+ .. code-block:: c
+
+ /* The declaration of free differs from PyMemAllocatorEx */
+ typedef struct {
+ void *ctx;
+ void* (*malloc) (void *ctx, size_t size);
+ void* (*calloc) (void *ctx, size_t nelem, size_t elsize);
+ void* (*realloc) (void *ctx, void *ptr, size_t new_size);
+ void (*free) (void *ctx, void *ptr, size_t size);
+ } PyDataMemAllocator;
+
+.. c:function:: PyObject * PyDataMem_SetHandler(PyObject *handler)
+
+ Set a new allocation policy. If the input value is ``NULL``, will reset the
+ policy to the default. Return the previous policy, or
+ return ``NULL`` if an error has occurred. We wrap the user-provided functions
+ so they will still call the python and numpy memory management callback
+ hooks.
+
+.. c:function:: PyObject * PyDataMem_GetHandler()
+
+ Return the current policy that will be used to allocate data for the
+ next ``PyArrayObject``. On failure, return ``NULL``.
+
+For an example of setting up and using the PyDataMem_Handler, see the test in
+:file:`numpy/core/tests/test_mem_policy.py`
+
+.. c:function:: void PyDataMem_EventHookFunc(void *inp, void *outp, size_t size, void *user_data);
+
+ This function will be called during data memory manipulation
+
+.. c:function:: PyDataMem_EventHookFunc * PyDataMem_SetEventHook(PyDataMem_EventHookFunc *newhook, void *user_data, void **old_data)
+
+ Sets the allocation event hook for numpy array data.
+
+ Returns a pointer to the previous hook or ``NULL``. If old_data is
+ non-``NULL``, the previous user_data pointer will be copied to it.
+
+ If not ``NULL``, hook will be called at the end of each ``PyDataMem_NEW/FREE/RENEW``:
+
+ .. code-block:: c
+
+ result = PyDataMem_NEW(size) -> (*hook)(NULL, result, size, user_data)
+ PyDataMem_FREE(ptr) -> (*hook)(ptr, NULL, 0, user_data)
+ result = PyDataMem_RENEW(ptr, size) -> (*hook)(ptr, result, size, user_data)
+
+ When the hook is called, the GIL will be held by the calling
+ thread. The hook should be written to be reentrant, if it performs
+ operations that might cause new allocation events (such as the
+ creation/destruction numpy objects, or creating/destroying Python
+ objects which might cause a gc)
diff --git a/doc/source/reference/c-api/index.rst b/doc/source/reference/c-api/index.rst
index bb1ed154e..6288ff33b 100644
--- a/doc/source/reference/c-api/index.rst
+++ b/doc/source/reference/c-api/index.rst
@@ -49,3 +49,4 @@ code.
generalized-ufuncs
coremath
deprecations
+ data_memory