diff options
author | Sebastian Pop <spop@amazon.com> | 2019-07-03 20:10:38 +0000 |
---|---|---|
committer | Dmitry Stogov <dmitry@zend.com> | 2019-07-11 12:04:29 +0300 |
commit | 3b73c9fb8692a6ffc0f4cb6e66eb649871dfed34 (patch) | |
tree | 134ec1adc671faad79666100f2ff430fadeb826d /ext/reflection/php_reflection.c | |
parent | 2a535a9707c89502df8bc0bd785f2e9192929422 (diff) | |
download | php-git-3b73c9fb8692a6ffc0f4cb6e66eb649871dfed34.tar.gz |
neon vectorization for base64
A similar algorithm is used to vectorize on x86_64, with a good description in
https://arxiv.org/abs/1704.00605 . On AArch64 the implementation differs in that
instead of using multiplies to shift bits around, it uses the vld3+vst4 and
vld4+vst3 combinations to load and store interleaved data. This patch is based
on the NEON implementation of Wojciech Mula:
https://github.com/WojciechMula/base64simd/blob/master/encode/encode.neon.cpp
https://github.com/WojciechMula/base64simd/blob/master/encode/lookup.neon.cpp
and
https://github.com/WojciechMula/base64simd/blob/master/encode/encode.neon.cpp
https://github.com/WojciechMula/base64simd/blob/master/encode/encode.neon.cpp
adapted to php/ext/standard/base64.c and vectorized with factor 16 instead of 8.
On a Graviton A1 instance and on the synthetic benchmarks in
https://github.com/lemire/fastbase64 I see 175% speedup on base64 encoding and
60% speedup on base64 decode compared to the scalar implementation.
The patch passes `make test` regression testing on aarch64-linux.
Diffstat (limited to 'ext/reflection/php_reflection.c')
0 files changed, 0 insertions, 0 deletions