summaryrefslogtreecommitdiff
path: root/numpy/core/setup_common.py
diff options
context:
space:
mode:
authorJulian Taylor <jtaylor.debian@googlemail.com>2016-08-30 20:58:29 +0200
committerJulian Taylor <jtaylor.debian@googlemail.com>2017-02-24 19:21:05 +0100
commite6c397b974a72d6970a9ec3a7858355951d43e0a (patch)
treeb53c2fbbed17666a3627d11ce35c64a6e3f8287c /numpy/core/setup_common.py
parent5f5ccecbfc116284ed8c8d53cd8b203ceef5f7c7 (diff)
downloadnumpy-e6c397b974a72d6970a9ec3a7858355951d43e0a.tar.gz
ENH: avoid temporary arrays in expressions
Temporary arrays generated in expressions are expensive as the imply extra memory bandwidth which is the bottleneck in most numpy operations. For example: r = -a + b One can avoid temporaries when the reference count of one of the operands is one as it cannot be referenced by any other python code (rvalue in C++ terms). Python itself uses this method to optimize string concatenation. The tricky part is that via the C-API one can use the PyNumber_ directly and skip increasing the reference count when not needed. The python stack cannot be used to determine this as python does not export the stack size, only the current position. To avoid the problem we collect the backtrace of the call until the python frame evaluation function and if it consist exclusively of functions inside python or the c-library we can assume no other library is using the C-API in between. Issues are that the reliability of backtraces is unknown on non-GNU platforms. On GNU and amd64 it should be reliable enough, even without frame pointers, as glibc will use GCCs stack unwinder via the mandatory dwarf stack annotations. Other platforms with backtrace need to be tested. Another problem is that the stack unwinding is very expensive. Unwinding a 100 function deep stack (which is not uncommon from python) takes around 35us, so the elision can only be done for relatively large arrays. Heuristicly it seems to be beneficial around 256kb array sizes (which is about the typical L2 cache size). The performance gain is quite significant around 1.5 to 2 times faster operations with temporaries can be observed. The speed is similar to rewriting the operations to inplace operations manually.
Diffstat (limited to 'numpy/core/setup_common.py')
-rw-r--r--numpy/core/setup_common.py4
1 files changed, 3 insertions, 1 deletions
diff --git a/numpy/core/setup_common.py b/numpy/core/setup_common.py
index 357051cdb..7691a2aeb 100644
--- a/numpy/core/setup_common.py
+++ b/numpy/core/setup_common.py
@@ -107,7 +107,8 @@ MANDATORY_FUNCS = ["sin", "cos", "tan", "sinh", "cosh", "tanh", "fabs",
OPTIONAL_STDFUNCS = ["expm1", "log1p", "acosh", "asinh", "atanh",
"rint", "trunc", "exp2", "log2", "hypot", "atan2", "pow",
"copysign", "nextafter", "ftello", "fseeko",
- "strtoll", "strtoull", "cbrt", "strtold_l", "fallocate"]
+ "strtoll", "strtoull", "cbrt", "strtold_l", "fallocate",
+ "backtrace"]
OPTIONAL_HEADERS = [
@@ -116,6 +117,7 @@ OPTIONAL_HEADERS = [
"emmintrin.h", # SSE2
"features.h", # for glibc version linux
"xlocale.h" # see GH#8367
+ "dlfcn.h", # dladdr
]
# optional gcc compiler builtins and their call arguments and optional a