diff options
author | Developer-Ecosystem-Engineering <65677710+Developer-Ecosystem-Engineering@users.noreply.github.com> | 2022-08-23 12:32:19 -0700 |
---|---|---|
committer | Developer-Ecosystem-Engineering <65677710+Developer-Ecosystem-Engineering@users.noreply.github.com> | 2022-08-23 12:32:19 -0700 |
commit | 6bddc28787fab819c31480da7c309ee0d18496b6 (patch) | |
tree | 5936c1eed6bae21549b25754dd5b6cb6f26b1370 /numpy/array_api/_searching_functions.py | |
parent | d8c09c50ef2e90f0db7395e70d2d8fa11921abc5 (diff) | |
download | numpy-6bddc28787fab819c31480da7c309ee0d18496b6.tar.gz |
ENH: Improve tanh for architectures without efficient gather/scatter instructions
NumPy implements tanh with a lookup table that isn't setup well for Apple silicon. Transposing the lookup table makes it more efficient to load all coefficients of the polynomial.
- float32: 1.8x faster
- float64: 1.3x faster
Apple M1 native (arm64):
```
before after ratio
[7c143834] [c3762e7a]
<main> <tanh/upstream-pr>
- 564±2μs 491±2μs 0.87 bench_ufunc.UFunc.time_ufunc_types('tanh')
- 410±0.04μs 324±0.03μs 0.79 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 1, 'd')
- 429±0.2μs 336±0.3μs 0.78 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 2, 'd')
- 450±0.05μs 350±0.1μs 0.78 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 2, 'd')
- 452±0.3μs 352±0.8μs 0.78 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 2, 'd')
- 432±0.5μs 335±0.5μs 0.78 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 1, 'd')
- 435±0.8μs 337±1μs 0.77 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 1, 'd')
- 466±0.2μs 360±0.5μs 0.77 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 4, 'd')
- 444±0.3μs 343±0.3μs 0.77 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 4, 'd')
- 467±0.3μs 359±0.3μs 0.77 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 4, 'd')
- 237±0.1μs 147±0.03μs 0.62 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 4, 'f')
- 230±0.4μs 143±0.2μs 0.62 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 2, 'f')
- 224±0.1μs 138±0.8μs 0.61 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 4, 'f')
- 199±0.2μs 122±1μs 0.61 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 2, 'f')
- 216±0.6μs 131±0.2μs 0.61 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 2, 'f')
- 208±0.4μs 125±0.3μs 0.60 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 4, 'f')
- 203±0.3μs 120±0.9μs 0.59 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 1, 'f')
- 216±0.7μs 125±0.03μs 0.58 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 1, 'f')
- 190±0.04μs 110±0.06μs 0.58 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 1, 'f')
```
Diffstat (limited to 'numpy/array_api/_searching_functions.py')
0 files changed, 0 insertions, 0 deletions