diff options
author | Julian Taylor <jtaylor.debian@googlemail.com> | 2016-08-30 20:58:29 +0200 |
---|---|---|
committer | Julian Taylor <jtaylor.debian@googlemail.com> | 2017-02-24 19:21:05 +0100 |
commit | e6c397b974a72d6970a9ec3a7858355951d43e0a (patch) | |
tree | b53c2fbbed17666a3627d11ce35c64a6e3f8287c /numpy/core/setup_common.py | |
parent | 5f5ccecbfc116284ed8c8d53cd8b203ceef5f7c7 (diff) | |
download | numpy-e6c397b974a72d6970a9ec3a7858355951d43e0a.tar.gz |
ENH: avoid temporary arrays in expressions
Temporary arrays generated in expressions are expensive as the imply
extra memory bandwidth which is the bottleneck in most numpy operations.
For example:
r = -a + b
One can avoid temporaries when the reference count of one of the
operands is one as it cannot be referenced by any other python code
(rvalue in C++ terms).
Python itself uses this method to optimize string concatenation.
The tricky part is that via the C-API one can use the PyNumber_ directly
and skip increasing the reference count when not needed.
The python stack cannot be used to determine this as python does not
export the stack size, only the current position.
To avoid the problem we collect the backtrace of the call until the
python frame evaluation function and if it consist exclusively of
functions inside python or the c-library we can assume no other library
is using the C-API in between.
Issues are that the reliability of backtraces is unknown on non-GNU
platforms. On GNU and amd64 it should be reliable enough, even without frame
pointers, as glibc will use GCCs stack unwinder via the mandatory dwarf
stack annotations.
Other platforms with backtrace need to be tested.
Another problem is that the stack unwinding is very expensive. Unwinding
a 100 function deep stack (which is not uncommon from python) takes
around 35us, so the elision can only be done for relatively large
arrays.
Heuristicly it seems to be beneficial around 256kb array sizes (which is
about the typical L2 cache size).
The performance gain is quite significant around 1.5 to 2 times faster
operations with temporaries can be observed. The speed is similar to
rewriting the operations to inplace operations manually.
Diffstat (limited to 'numpy/core/setup_common.py')
-rw-r--r-- | numpy/core/setup_common.py | 4 |
1 files changed, 3 insertions, 1 deletions
diff --git a/numpy/core/setup_common.py b/numpy/core/setup_common.py index 357051cdb..7691a2aeb 100644 --- a/numpy/core/setup_common.py +++ b/numpy/core/setup_common.py @@ -107,7 +107,8 @@ MANDATORY_FUNCS = ["sin", "cos", "tan", "sinh", "cosh", "tanh", "fabs", OPTIONAL_STDFUNCS = ["expm1", "log1p", "acosh", "asinh", "atanh", "rint", "trunc", "exp2", "log2", "hypot", "atan2", "pow", "copysign", "nextafter", "ftello", "fseeko", - "strtoll", "strtoull", "cbrt", "strtold_l", "fallocate"] + "strtoll", "strtoull", "cbrt", "strtold_l", "fallocate", + "backtrace"] OPTIONAL_HEADERS = [ @@ -116,6 +117,7 @@ OPTIONAL_HEADERS = [ "emmintrin.h", # SSE2 "features.h", # for glibc version linux "xlocale.h" # see GH#8367 + "dlfcn.h", # dladdr ] # optional gcc compiler builtins and their call arguments and optional a |