summaryrefslogtreecommitdiff
path: root/numpy/lib/utils.py
diff options
context:
space:
mode:
authorJulian Taylor <jtaylor.debian@googlemail.com>2013-05-19 17:04:27 +0200
committerJulian Taylor <jtaylor.debian@googlemail.com>2013-05-25 17:36:00 +0200
commit0adccaaa910ab495e993f453956fd983775604f3 (patch)
tree575e6b1bc7066bbe24ade1fee8576e4e31f2f7ef /numpy/lib/utils.py
parent8ff5e37bff03925da4c1b121b38188f9fd779b4d (diff)
downloadnumpy-0adccaaa910ab495e993f453956fd983775604f3.tar.gz
ENH: vectorize sqrt ufunc using SSE2
specialize the sqrt ufunc for float and double and vectorize it using SSE2. improves performance by 4/2 for float/double if one is not memory bound due to non-cached data. performance is always better on all tested machines (amd phenom X2, intel xeon 5xxx/7xxx, core2duo, corei7) This version will not set errno on invalid input, but numpy only checks the fpu flags so the behavior is the same. In principle the compiler could autovectorize it when setting ffast-math (for no errno) and specializing the loop for the vectorizable strides and giving it some hints (restrict, __builtin_assume_aligned, etc.), but its simpler and more reliable to simply vectorize it by hand.
Diffstat (limited to 'numpy/lib/utils.py')
0 files changed, 0 insertions, 0 deletions