diff options
author | Julian Taylor <jtaylor.debian@googlemail.com> | 2013-06-07 19:22:51 +0200 |
---|---|---|
committer | Julian Taylor <jtaylor.debian@googlemail.com> | 2013-06-08 20:44:05 +0200 |
commit | 9d5884b0acf935401f0f8e64912b98abc73f62c3 (patch) | |
tree | ead794dec512d321c01b3527ba90a431988d3bbd /numpy/testing/utils.py | |
parent | abad5e3a753a2d0f5bbd7bdf4e8769cf9a4ef02d (diff) | |
download | numpy-9d5884b0acf935401f0f8e64912b98abc73f62c3.tar.gz |
ENH: Vectorize float absolute operation with sse2
fabs on x86 can be implemented by masking out the sign bit.
Obtaining such a bit pattern is best done by a bitwise not on the
negative zero.
This is the same operation the compiler will convert fabs to on amd64.
Improves performance by ~1.7/3.5 for float/double for cached data
and ~1.4/1.1 for non-cached data.
If one simplifies the loops gcc could also autovectorize it but with all
hints its almost the same code length and slightly worse assembly.
The code can easily be extended to support AVX by changing vpre and
vtype to 256.
Diffstat (limited to 'numpy/testing/utils.py')
0 files changed, 0 insertions, 0 deletions