diff options
author | Charles Harris <charlesr.harris@gmail.com> | 2013-06-08 14:57:15 -0700 |
---|---|---|
committer | Charles Harris <charlesr.harris@gmail.com> | 2013-06-08 14:57:15 -0700 |
commit | 3e471304393244eb02215d1601d2396b3f94eb8d (patch) | |
tree | c837f5bd2272256f361cb2ec6b0d04a84b023a8d /doc | |
parent | 23a27572994e1c731605c6a78a1564014c2a62c8 (diff) | |
parent | 7fb8b714906a92516905cc0f03e45511bd1ac1ed (diff) | |
download | numpy-3e471304393244eb02215d1601d2396b3f94eb8d.tar.gz |
Merge pull request #3411 from juliantaylor/vectorize-fabs
ENH: Vectorize float absolute operation with sse2
Diffstat (limited to 'doc')
-rw-r--r-- | doc/release/1.8.0-notes.rst | 18 |
1 files changed, 18 insertions, 0 deletions
diff --git a/doc/release/1.8.0-notes.rst b/doc/release/1.8.0-notes.rst index 76dcf50c2..c5315e6cd 100644 --- a/doc/release/1.8.0-notes.rst +++ b/doc/release/1.8.0-notes.rst @@ -142,6 +142,24 @@ The `pad` function has a new implementation, greatly improving performance for all inputs except `mode=<function>` (retained for backwards compatibility). Scaling with dimensionality is dramatically improved for rank >= 4. +Performance improvements to `isnan`, `isinf`, `isfinite` and `byteswap` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +`isnan`, `isinf`, `isfinite` and `byteswap` have been improved to take +advantage of compiler builtins to avoid expensive calls to libc. +This improves performance of these operations by about a factor of two on gnu +libc systems. + +Performance improvements to `sqrt` and `abs` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The `sqrt` and `abs` functions for unit stride elementary operations have been +improved to make use of SSE2 CPU SIMD instructions. +This improves performance of these operations up to 4x/2x for float32/float64 +depending on the location of the data in the CPU caches. The performance gain +is greatest for in-place operations. +In order to use the improved functions the SSE2 instruction set must be enabled +at compile time. It is enabled by default on x86_64 systems. On x86_32 with a +capable CPU it must be enabled by passing the appropriate flag to CFLAGS build +variable (-msse2 with gcc). Changes ======= |