ENH: vectorize packbits with SSE2

SSE2 has a special instruction to pack bytes into bits, available as the intrinsic _mm_movemask_epi8. It is significantly faster than the per byte loop currently being used. Unfortunately packbits is bitwise "big endian", the most significant bit is the first in the input byte while _mm_movemask_epi is little endian so we need to byteswap the input first. But it is still about 8-10 times faster than the scalar code.
author: Julian Taylor <jtaylor.debian@googlemail.com> 2016-11-29 00:19:21 +0100
committer: Julian Taylor <jtaylor.debian@googlemail.com> 2017-01-12 17:17:07 +0100
commit: f0f7ad80f2ef2d7525965dfe27c0e2ab68647197 (patch)
tree: 885b69a2c89ca924395da8f908f5ba2c48383864 /numpy/core/setup_common.py
parent: 7e6091c9a3fc4536ccbadb337e88650b2c901313 (diff)
download: numpy-f0f7ad80f2ef2d7525965dfe27c0e2ab68647197.tar.gz
1 files changed, 2 insertions, 0 deletions
diff --git a/numpy/core/setup_common.py b/numpy/core/setup_common.py
index 18066d991..596b3996c 100644
--- a/numpy/core/setup_common.py
+++ b/numpy/core/setup_common.py
@@ -130,6 +130,8 @@ OPTIONAL_INTRINSICS = [("__builtin_isnan", '5.'),
                        # broken on OSX 10.11, make sure its not optimized away
                        ("volatile int r = __builtin_cpu_supports", '"sse"',
                         "stdio.h", "__BUILTIN_CPU_SUPPORTS"),
+                       # MMX only needed for icc, but some clangs don't have it
+                       ("_m_from_int64", '0', "emmintrin.h"),
                        ("_mm_load_ps", '(float*)0', "xmmintrin.h"),  # SSE
                        ("_mm_prefetch", '(float*)0, _MM_HINT_NTA',
                         "xmmintrin.h"),  # SSE
author	Julian Taylor <jtaylor.debian@googlemail.com>	2016-11-29 00:19:21 +0100
committer	Julian Taylor <jtaylor.debian@googlemail.com>	2017-01-12 17:17:07 +0100
commit	f0f7ad80f2ef2d7525965dfe27c0e2ab68647197 (patch)
tree	885b69a2c89ca924395da8f908f5ba2c48383864 /numpy/core/setup_common.py
parent	7e6091c9a3fc4536ccbadb337e88650b2c901313 (diff)
download	numpy-f0f7ad80f2ef2d7525965dfe27c0e2ab68647197.tar.gz