ENH: vectorizing float32 implementation of np.exp & np.log

This commit implements vectorized single precision exponential and natural log using AVX2 and AVX512. Accuracy: | Function | Max ULP Error | Max Relative Error | |----------|---------------|--------------------| | np.exp | 2.52 | 2.1E-07 | | np.log | 3.83 | 2.4E-07 | Performance: (1) Micro-benchmarks: measured execution time of np.exp and np.log using timeit package in python. Each function is executed 1000 times and this is repeated 100 times. The standard deviation for all the runs was less than 2% of their mean value and hence not included in the data. The vectorized implementation was upto 7.6x faster than the scalar version. | Function | NumPy1.16 | AVX2 | AVX512 | AVX2 speedup | AVX512 speedup | | -------- | --------- | ------ | ------ | ------------ | -------------- | | np.exp | 0.395s | 0.112s | 0.055s | 3.56x | 7.25x | | np.log | 0.456s | 0.147s | 0.059s | 3.10x | 7.64x | (2) Logistic regression: exp and log are heavily used in training neural networks (as part of sigmoid activation function and loss function respectively). This patch significantly speeds up training a logistic regression model. As an example, we measured how much time it takes to train a model with 15 features using 1000 training data points. We observed a 2x speed up to train the model to achieve a loss function error < 10E-04. | Function | NumPy1.16 | AVX2 | AVX512 | AVX2 speedup | AVX512 speedup | | -------------- | ---------- | ------ | ------ | ------------ | -------------- | | logistic.train | 121.0s | 75.02s | 60.60s | 1.61x | 2.02x |
author: Raghuveer Devulapalli <raghuveer.devulapalli@intel.com> 2019-03-05 09:13:55 -0800
committer: Raghuveer Devulapalli <raghuveer.devulapalli@intel.com> 2019-04-19 10:47:15 -0700
commit: 9754a207828f377654c79873e38d475bb87d98de (patch)
tree: 6512d0febf26593ac946722d9c38ca57a4bbbdfb /numpy/core/setup_common.py
parent: 31e71d7ce8d447cb74b9fb83875361cf7dba4579 (diff)
download: numpy-9754a207828f377654c79873e38d475bb87d98de.tar.gz
1 files changed, 5 insertions, 0 deletions
diff --git a/numpy/core/setup_common.py b/numpy/core/setup_common.py
index f837df112..a9c044da9 100644
--- a/numpy/core/setup_common.py
+++ b/numpy/core/setup_common.py
@@ -118,6 +118,7 @@ OPTIONAL_HEADERS = [
 # sse headers only enabled automatically on amd64/x32 builds
                 "xmmintrin.h",  # SSE
                 "emmintrin.h",  # SSE2
+                "immintrin.h",  # AVX
                 "features.h",  # for glibc version linux
                 "xlocale.h",  # see GH#8367
                 "dlfcn.h", # dladdr
@@ -149,6 +150,8 @@ OPTIONAL_INTRINSICS = [("__builtin_isnan", '5.'),
                         "stdio.h", "LINK_AVX"),
                        ("__asm__ volatile", '"vpand %ymm1, %ymm2, %ymm3"',
                         "stdio.h", "LINK_AVX2"),
+                       ("__asm__ volatile", '"vpaddd %zmm1, %zmm2, %zmm3"',
+                        "stdio.h", "LINK_AVX512F"),
                        ("__asm__ volatile", '"xgetbv"', "stdio.h", "XGETBV"),
                        ]
 
@@ -165,6 +168,8 @@ OPTIONAL_FUNCTION_ATTRIBUTES = [('__attribute__((optimize("unroll-loops")))',
                                  'attribute_target_avx'),
                                 ('__attribute__((target ("avx2")))',
                                  'attribute_target_avx2'),
+                                ('__attribute__((target ("avx512f")))',
+                                 'attribute_target_avx512f'),
                                 ]
 
 # variable attributes tested via "int %s a" % attribute
author	Raghuveer Devulapalli <raghuveer.devulapalli@intel.com>	2019-03-05 09:13:55 -0800
committer	Raghuveer Devulapalli <raghuveer.devulapalli@intel.com>	2019-04-19 10:47:15 -0700
commit	9754a207828f377654c79873e38d475bb87d98de (patch)
tree	6512d0febf26593ac946722d9c38ca57a4bbbdfb /numpy/core/setup_common.py
parent	31e71d7ce8d447cb74b9fb83875361cf7dba4579 (diff)
download	numpy-9754a207828f377654c79873e38d475bb87d98de.tar.gz