summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorCharles Harris <charlesr.harris@gmail.com>2013-06-08 14:57:15 -0700
committerCharles Harris <charlesr.harris@gmail.com>2013-06-08 14:57:15 -0700
commit3e471304393244eb02215d1601d2396b3f94eb8d (patch)
treec837f5bd2272256f361cb2ec6b0d04a84b023a8d /doc
parent23a27572994e1c731605c6a78a1564014c2a62c8 (diff)
parent7fb8b714906a92516905cc0f03e45511bd1ac1ed (diff)
downloadnumpy-3e471304393244eb02215d1601d2396b3f94eb8d.tar.gz
Merge pull request #3411 from juliantaylor/vectorize-fabs
ENH: Vectorize float absolute operation with sse2
Diffstat (limited to 'doc')
-rw-r--r--doc/release/1.8.0-notes.rst18
1 files changed, 18 insertions, 0 deletions
diff --git a/doc/release/1.8.0-notes.rst b/doc/release/1.8.0-notes.rst
index 76dcf50c2..c5315e6cd 100644
--- a/doc/release/1.8.0-notes.rst
+++ b/doc/release/1.8.0-notes.rst
@@ -142,6 +142,24 @@ The `pad` function has a new implementation, greatly improving performance for
all inputs except `mode=<function>` (retained for backwards compatibility).
Scaling with dimensionality is dramatically improved for rank >= 4.
+Performance improvements to `isnan`, `isinf`, `isfinite` and `byteswap`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`isnan`, `isinf`, `isfinite` and `byteswap` have been improved to take
+advantage of compiler builtins to avoid expensive calls to libc.
+This improves performance of these operations by about a factor of two on gnu
+libc systems.
+
+Performance improvements to `sqrt` and `abs`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The `sqrt` and `abs` functions for unit stride elementary operations have been
+improved to make use of SSE2 CPU SIMD instructions.
+This improves performance of these operations up to 4x/2x for float32/float64
+depending on the location of the data in the CPU caches. The performance gain
+is greatest for in-place operations.
+In order to use the improved functions the SSE2 instruction set must be enabled
+at compile time. It is enabled by default on x86_64 systems. On x86_32 with a
+capable CPU it must be enabled by passing the appropriate flag to CFLAGS build
+variable (-msse2 with gcc).
Changes
=======