diff options
author | Julian Taylor <jtaylor.debian@googlemail.com> | 2013-05-19 17:04:27 +0200 |
---|---|---|
committer | Julian Taylor <jtaylor.debian@googlemail.com> | 2013-05-25 17:36:00 +0200 |
commit | 0adccaaa910ab495e993f453956fd983775604f3 (patch) | |
tree | 575e6b1bc7066bbe24ade1fee8576e4e31f2f7ef /numpy/core/numeric.py | |
parent | 8ff5e37bff03925da4c1b121b38188f9fd779b4d (diff) | |
download | numpy-0adccaaa910ab495e993f453956fd983775604f3.tar.gz |
ENH: vectorize sqrt ufunc using SSE2
specialize the sqrt ufunc for float and double and vectorize it using
SSE2.
improves performance by 4/2 for float/double if one is not memory bound
due to non-cached data.
performance is always better on all tested machines (amd phenom X2,
intel xeon 5xxx/7xxx, core2duo, corei7)
This version will not set errno on invalid input, but numpy only checks
the fpu flags so the behavior is the same.
In principle the compiler could autovectorize it when setting ffast-math
(for no errno) and specializing the loop for the vectorizable strides
and giving it some hints (restrict, __builtin_assume_aligned, etc.),
but its simpler and more reliable to simply vectorize it by hand.
Diffstat (limited to 'numpy/core/numeric.py')
0 files changed, 0 insertions, 0 deletions