summaryrefslogtreecommitdiff
path: root/numpy/core/setup_common.py
diff options
context:
space:
mode:
authorRaghuveer Devulapalli <raghuveer.devulapalli@intel.com>2019-03-05 09:13:55 -0800
committerRaghuveer Devulapalli <raghuveer.devulapalli@intel.com>2019-04-19 10:47:15 -0700
commit9754a207828f377654c79873e38d475bb87d98de (patch)
tree6512d0febf26593ac946722d9c38ca57a4bbbdfb /numpy/core/setup_common.py
parent31e71d7ce8d447cb74b9fb83875361cf7dba4579 (diff)
downloadnumpy-9754a207828f377654c79873e38d475bb87d98de.tar.gz
ENH: vectorizing float32 implementation of np.exp & np.log
This commit implements vectorized single precision exponential and natural log using AVX2 and AVX512. Accuracy: | Function | Max ULP Error | Max Relative Error | |----------|---------------|--------------------| | np.exp | 2.52 | 2.1E-07 | | np.log | 3.83 | 2.4E-07 | Performance: (1) Micro-benchmarks: measured execution time of np.exp and np.log using timeit package in python. Each function is executed 1000 times and this is repeated 100 times. The standard deviation for all the runs was less than 2% of their mean value and hence not included in the data. The vectorized implementation was upto 7.6x faster than the scalar version. | Function | NumPy1.16 | AVX2 | AVX512 | AVX2 speedup | AVX512 speedup | | -------- | --------- | ------ | ------ | ------------ | -------------- | | np.exp | 0.395s | 0.112s | 0.055s | 3.56x | 7.25x | | np.log | 0.456s | 0.147s | 0.059s | 3.10x | 7.64x | (2) Logistic regression: exp and log are heavily used in training neural networks (as part of sigmoid activation function and loss function respectively). This patch significantly speeds up training a logistic regression model. As an example, we measured how much time it takes to train a model with 15 features using 1000 training data points. We observed a 2x speed up to train the model to achieve a loss function error < 10E-04. | Function | NumPy1.16 | AVX2 | AVX512 | AVX2 speedup | AVX512 speedup | | -------------- | ---------- | ------ | ------ | ------------ | -------------- | | logistic.train | 121.0s | 75.02s | 60.60s | 1.61x | 2.02x |
Diffstat (limited to 'numpy/core/setup_common.py')
-rw-r--r--numpy/core/setup_common.py5
1 files changed, 5 insertions, 0 deletions
diff --git a/numpy/core/setup_common.py b/numpy/core/setup_common.py
index f837df112..a9c044da9 100644
--- a/numpy/core/setup_common.py
+++ b/numpy/core/setup_common.py
@@ -118,6 +118,7 @@ OPTIONAL_HEADERS = [
# sse headers only enabled automatically on amd64/x32 builds
"xmmintrin.h", # SSE
"emmintrin.h", # SSE2
+ "immintrin.h", # AVX
"features.h", # for glibc version linux
"xlocale.h", # see GH#8367
"dlfcn.h", # dladdr
@@ -149,6 +150,8 @@ OPTIONAL_INTRINSICS = [("__builtin_isnan", '5.'),
"stdio.h", "LINK_AVX"),
("__asm__ volatile", '"vpand %ymm1, %ymm2, %ymm3"',
"stdio.h", "LINK_AVX2"),
+ ("__asm__ volatile", '"vpaddd %zmm1, %zmm2, %zmm3"',
+ "stdio.h", "LINK_AVX512F"),
("__asm__ volatile", '"xgetbv"', "stdio.h", "XGETBV"),
]
@@ -165,6 +168,8 @@ OPTIONAL_FUNCTION_ATTRIBUTES = [('__attribute__((optimize("unroll-loops")))',
'attribute_target_avx'),
('__attribute__((target ("avx2")))',
'attribute_target_avx2'),
+ ('__attribute__((target ("avx512f")))',
+ 'attribute_target_avx512f'),
]
# variable attributes tested via "int %s a" % attribute