summaryrefslogtreecommitdiff
path: root/numpy/array_api/_searching_functions.py
diff options
context:
space:
mode:
authorDeveloper-Ecosystem-Engineering <65677710+Developer-Ecosystem-Engineering@users.noreply.github.com>2022-08-23 12:32:19 -0700
committerDeveloper-Ecosystem-Engineering <65677710+Developer-Ecosystem-Engineering@users.noreply.github.com>2022-08-23 12:32:19 -0700
commit6bddc28787fab819c31480da7c309ee0d18496b6 (patch)
tree5936c1eed6bae21549b25754dd5b6cb6f26b1370 /numpy/array_api/_searching_functions.py
parentd8c09c50ef2e90f0db7395e70d2d8fa11921abc5 (diff)
downloadnumpy-6bddc28787fab819c31480da7c309ee0d18496b6.tar.gz
ENH: Improve tanh for architectures without efficient gather/scatter instructions
NumPy implements tanh with a lookup table that isn't setup well for Apple silicon. Transposing the lookup table makes it more efficient to load all coefficients of the polynomial. - float32: 1.8x faster - float64: 1.3x faster Apple M1 native (arm64): ``` before after ratio [7c143834] [c3762e7a] <main> <tanh/upstream-pr> - 564±2μs 491±2μs 0.87 bench_ufunc.UFunc.time_ufunc_types('tanh') - 410±0.04μs 324±0.03μs 0.79 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 1, 'd') - 429±0.2μs 336±0.3μs 0.78 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 2, 'd') - 450±0.05μs 350±0.1μs 0.78 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 2, 'd') - 452±0.3μs 352±0.8μs 0.78 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 2, 'd') - 432±0.5μs 335±0.5μs 0.78 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 1, 'd') - 435±0.8μs 337±1μs 0.77 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 1, 'd') - 466±0.2μs 360±0.5μs 0.77 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 4, 'd') - 444±0.3μs 343±0.3μs 0.77 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 4, 'd') - 467±0.3μs 359±0.3μs 0.77 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 4, 'd') - 237±0.1μs 147±0.03μs 0.62 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 4, 'f') - 230±0.4μs 143±0.2μs 0.62 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 2, 'f') - 224±0.1μs 138±0.8μs 0.61 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 4, 'f') - 199±0.2μs 122±1μs 0.61 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 2, 'f') - 216±0.6μs 131±0.2μs 0.61 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 2, 'f') - 208±0.4μs 125±0.3μs 0.60 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 4, 'f') - 203±0.3μs 120±0.9μs 0.59 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 1, 'f') - 216±0.7μs 125±0.03μs 0.58 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 1, 'f') - 190±0.04μs 110±0.06μs 0.58 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 1, 'f') ```
Diffstat (limited to 'numpy/array_api/_searching_functions.py')
0 files changed, 0 insertions, 0 deletions