diff options
author | Raghuveer Devulapalli <raghuveer.devulapalli@intel.com> | 2019-03-20 15:12:50 -0700 |
---|---|---|
committer | Raghuveer Devulapalli <raghuveer.devulapalli@intel.com> | 2019-08-03 10:48:43 -0700 |
commit | bd2c82bf141852b4737da297c081e5f621604317 (patch) | |
tree | db9b0bf70cabe5fb3199f5fa3b9693de841c0b8c /numpy/core/include | |
parent | 71c8a1030d5a32342edc2e2311cb71dc38a7374e (diff) | |
download | numpy-bd2c82bf141852b4737da297c081e5f621604317.tar.gz |
ENH: Use AVX for float32 implementation of np.sin & np.cos
This commit implements vectorized single precision sine and cosine using
AVX2 and AVX512. Both sine and cosine are computed using a
polynomial approximation which are accurate for values between
[-PI/4,PI/4]. The original input is reduced to this range using a 3-step
Cody-Waite's range reduction method. This method is only accurate for
values between [-71476.0625f, 71476.0625f] for cosine and [-117435.992f,
117435.992f] for sine. The algorithm identifies elements outside this
range and calls glibc in a scalar loop to compute their output.
The algorithm is a vectorized version of the methods presented
here: https://stackoverflow.com/questions/30463616/payne-hanek-algorithm-implementation-in-c/30465751#30465751
Accuracy: maximum ULP error = 1.49
Performance: The speed-up this implementation provides is dependent on
the values of the input array. It performs best when all the input
values are within the range specified above. Details of the performance
boost are provided below. Its worst performance is when all the array
elements are outside the range leading to about 1-2% reduction in
performance.
Three different benchmarking data are provided, each of which was benchmarked
using timeit package in python. Each function is executed 1000 times and
this is repeated 100 times. The standard deviation for all the runs was
less than 2% of their mean value and hence not included in the data.
(1) Micro-bencharking:
Array size = 10000, Command = "%timeit np.cos([myarr])"
|---------------+------------+--------+---------+----------+----------|
| Function Name | NumPy 1.16 | AVX2 | AVX512 | AVX2 | AVX512 |
| | | | | speed up | speed up |
|---------------+------------+--------+---------+----------+----------|
| np.cos | 1.5174 | 0.1553 | 0.06634 | 9.77 | 22.87 |
| np.sin | 1.4738 | 0.1517 | 0.06406 | 9.71 | 23.00 |
|---------------+------------+--------+---------+----------+----------|
(2) Package ai.cs provides an API to transform spherical coordinates to
cartesean system:
Array size = 10000, Command = "%timeit ai.cs.sp2cart(r,theta,phi)"
|---------------+------------+--------+--------+----------+----------|
| Function Name | NumPy 1.16 | AVX2 | AVX512 | AVX2 | AVX512 |
| | | | | speed up | speed up |
|---------------+------------+--------+--------+----------+----------|
| ai.cs.sp2cart | 0.6371 | 0.1066 | 0.0605 | 5.97 | 10.53 |
|---------------+------------+--------+--------+----------+----------|
(3) Package photutils provides an API to find the best fit of first and
second harmonic functions to a set of (angle, intensity) pairs:
Array size = 1000, Command = "%timeit fit_first_and_second_harmonics(E, data)"
|--------------------------------+------------+--------+--------+----------+----------|
| Function Name | NumPy 1.16 | AVX2 | AVX512 | AVX2 | AVX512 |
| | | | | speed up | speed up |
|--------------------------------+------------+--------+--------+----------+----------|
| fit_first_and_second_harmonics | 1.598 | 0.8709 | 0.7761 | 1.83 | 2.05 |
|--------------------------------+------------+--------+--------+----------+----------|
Diffstat (limited to 'numpy/core/include')
-rw-r--r-- | numpy/core/include/numpy/npy_math.h | 17 |
1 files changed, 16 insertions, 1 deletions
diff --git a/numpy/core/include/numpy/npy_math.h b/numpy/core/include/numpy/npy_math.h index 6a78ff3c2..7831dd3d7 100644 --- a/numpy/core/include/numpy/npy_math.h +++ b/numpy/core/include/numpy/npy_math.h @@ -144,7 +144,22 @@ NPY_INLINE static float __npy_nzerof(void) #define NPY_COEFF_Q3_LOGf 9.864942958519418960339e-01f #define NPY_COEFF_Q4_LOGf 1.546476374983906719538e-01f #define NPY_COEFF_Q5_LOGf 5.875095403124574342950e-03f - +/* + * Constants used in vector implementation of sinf/cosf(x) + */ +#define NPY_TWO_O_PIf 0x1.45f306p-1f +#define NPY_CODY_WAITE_PI_O_2_HIGHf -0x1.921fb0p+00f +#define NPY_CODY_WAITE_PI_O_2_MEDf -0x1.5110b4p-22f +#define NPY_CODY_WAITE_PI_O_2_LOWf -0x1.846988p-48f +#define NPY_COEFF_INVF0_COSINEf 0x1.000000p+00f +#define NPY_COEFF_INVF2_COSINEf -0x1.000000p-01f +#define NPY_COEFF_INVF4_COSINEf 0x1.55553cp-05f +#define NPY_COEFF_INVF6_COSINEf -0x1.6c06dcp-10f +#define NPY_COEFF_INVF8_COSINEf 0x1.98e616p-16f +#define NPY_COEFF_INVF3_SINEf -0x1.555556p-03f +#define NPY_COEFF_INVF5_SINEf 0x1.11119ap-07f +#define NPY_COEFF_INVF7_SINEf -0x1.a06bbap-13f +#define NPY_COEFF_INVF9_SINEf 0x1.7d3bbcp-19f /* * Integer functions. */ |