diff options
author | Raghuveer Devulapalli <raghuveer.devulapalli@intel.com> | 2019-03-05 09:13:55 -0800 |
---|---|---|
committer | Raghuveer Devulapalli <raghuveer.devulapalli@intel.com> | 2019-04-19 10:47:15 -0700 |
commit | 9754a207828f377654c79873e38d475bb87d98de (patch) | |
tree | 6512d0febf26593ac946722d9c38ca57a4bbbdfb /numpy/core/setup_common.py | |
parent | 31e71d7ce8d447cb74b9fb83875361cf7dba4579 (diff) | |
download | numpy-9754a207828f377654c79873e38d475bb87d98de.tar.gz |
ENH: vectorizing float32 implementation of np.exp & np.log
This commit implements vectorized single precision exponential and
natural log using AVX2 and AVX512.
Accuracy:
| Function | Max ULP Error | Max Relative Error |
|----------|---------------|--------------------|
| np.exp | 2.52 | 2.1E-07 |
| np.log | 3.83 | 2.4E-07 |
Performance:
(1) Micro-benchmarks: measured execution time of np.exp and np.log using
timeit package in python. Each function is executed 1000 times and this
is repeated 100 times. The standard deviation for all the runs was less
than 2% of their mean value and hence not included in the data. The
vectorized implementation was upto 7.6x faster than the scalar version.
| Function | NumPy1.16 | AVX2 | AVX512 | AVX2 speedup | AVX512 speedup |
| -------- | --------- | ------ | ------ | ------------ | -------------- |
| np.exp | 0.395s | 0.112s | 0.055s | 3.56x | 7.25x |
| np.log | 0.456s | 0.147s | 0.059s | 3.10x | 7.64x |
(2) Logistic regression: exp and log are heavily used in training neural
networks (as part of sigmoid activation function and loss function
respectively). This patch significantly speeds up training a logistic
regression model. As an example, we measured how much time it takes to
train a model with 15 features using 1000 training data points. We
observed a 2x speed up to train the model to achieve a loss function
error < 10E-04.
| Function | NumPy1.16 | AVX2 | AVX512 | AVX2 speedup | AVX512 speedup |
| -------------- | ---------- | ------ | ------ | ------------ | -------------- |
| logistic.train | 121.0s | 75.02s | 60.60s | 1.61x | 2.02x |
Diffstat (limited to 'numpy/core/setup_common.py')
-rw-r--r-- | numpy/core/setup_common.py | 5 |
1 files changed, 5 insertions, 0 deletions
diff --git a/numpy/core/setup_common.py b/numpy/core/setup_common.py index f837df112..a9c044da9 100644 --- a/numpy/core/setup_common.py +++ b/numpy/core/setup_common.py @@ -118,6 +118,7 @@ OPTIONAL_HEADERS = [ # sse headers only enabled automatically on amd64/x32 builds "xmmintrin.h", # SSE "emmintrin.h", # SSE2 + "immintrin.h", # AVX "features.h", # for glibc version linux "xlocale.h", # see GH#8367 "dlfcn.h", # dladdr @@ -149,6 +150,8 @@ OPTIONAL_INTRINSICS = [("__builtin_isnan", '5.'), "stdio.h", "LINK_AVX"), ("__asm__ volatile", '"vpand %ymm1, %ymm2, %ymm3"', "stdio.h", "LINK_AVX2"), + ("__asm__ volatile", '"vpaddd %zmm1, %zmm2, %zmm3"', + "stdio.h", "LINK_AVX512F"), ("__asm__ volatile", '"xgetbv"', "stdio.h", "XGETBV"), ] @@ -165,6 +168,8 @@ OPTIONAL_FUNCTION_ATTRIBUTES = [('__attribute__((optimize("unroll-loops")))', 'attribute_target_avx'), ('__attribute__((target ("avx2")))', 'attribute_target_avx2'), + ('__attribute__((target ("avx512f")))', + 'attribute_target_avx512f'), ] # variable attributes tested via "int %s a" % attribute |