diff options
author | Julian Taylor <jtaylor.debian@googlemail.com> | 2015-11-11 19:34:23 +0100 |
---|---|---|
committer | Julian Taylor <jtaylor.debian@googlemail.com> | 2015-11-16 21:10:46 +0100 |
commit | 904da7c202384c8a2a6ec88cece378f70e2dd956 (patch) | |
tree | 92bdf5542fc7e4483ebbd4d94135450a0ccf5a0c /numpy/core/setup_common.py | |
parent | 1d511429ac04d137c3d9ec7da9160bec7baa2829 (diff) | |
download | numpy-904da7c202384c8a2a6ec88cece378f70e2dd956.tar.gz |
ENH: use prefetching for summation
It seems the small blocksizes (128) messes up the hardware prefetcher
which would usually be able to work fine on this iteration pattern.
Fix this by using software prefetching. Improves performance for large
sums by 15%-30%. Tested on core2duo, xeon E5-4620, i5-3470 and AMD phenom II X4.
Prefers __builtin_prefetch as that, unlike SSE2 _mm_prefetch, also works
on capable non-x86 cpus.
Diffstat (limited to 'numpy/core/setup_common.py')
-rw-r--r-- | numpy/core/setup_common.py | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/numpy/core/setup_common.py b/numpy/core/setup_common.py index 68efd1791..d93e475e3 100644 --- a/numpy/core/setup_common.py +++ b/numpy/core/setup_common.py @@ -125,7 +125,10 @@ OPTIONAL_INTRINSICS = [("__builtin_isnan", '5.'), ("__builtin_expect", '5, 0'), ("__builtin_mul_overflow", '5, 5, (int*)5'), ("_mm_load_ps", '(float*)0', "xmmintrin.h"), # SSE + ("_mm_prefetch", '(float*)0, _MM_HINT_NTA', + "xmmintrin.h"), # SSE ("_mm_load_pd", '(double*)0', "emmintrin.h"), # SSE2 + ("__builtin_prefetch", "(float*)0, 0, 3"), ] # function attributes |