summaryrefslogtreecommitdiff
path: root/numpy/lib/recfunctions.py
diff options
context:
space:
mode:
authorRaghuveer Devulapalli <raghuveer.devulapalli@intel.com>2019-03-20 15:12:50 -0700
committerRaghuveer Devulapalli <raghuveer.devulapalli@intel.com>2019-08-03 10:48:43 -0700
commitbd2c82bf141852b4737da297c081e5f621604317 (patch)
treedb9b0bf70cabe5fb3199f5fa3b9693de841c0b8c /numpy/lib/recfunctions.py
parent71c8a1030d5a32342edc2e2311cb71dc38a7374e (diff)
downloadnumpy-bd2c82bf141852b4737da297c081e5f621604317.tar.gz
ENH: Use AVX for float32 implementation of np.sin & np.cos
This commit implements vectorized single precision sine and cosine using AVX2 and AVX512. Both sine and cosine are computed using a polynomial approximation which are accurate for values between [-PI/4,PI/4]. The original input is reduced to this range using a 3-step Cody-Waite's range reduction method. This method is only accurate for values between [-71476.0625f, 71476.0625f] for cosine and [-117435.992f, 117435.992f] for sine. The algorithm identifies elements outside this range and calls glibc in a scalar loop to compute their output. The algorithm is a vectorized version of the methods presented here: https://stackoverflow.com/questions/30463616/payne-hanek-algorithm-implementation-in-c/30465751#30465751 Accuracy: maximum ULP error = 1.49 Performance: The speed-up this implementation provides is dependent on the values of the input array. It performs best when all the input values are within the range specified above. Details of the performance boost are provided below. Its worst performance is when all the array elements are outside the range leading to about 1-2% reduction in performance. Three different benchmarking data are provided, each of which was benchmarked using timeit package in python. Each function is executed 1000 times and this is repeated 100 times. The standard deviation for all the runs was less than 2% of their mean value and hence not included in the data. (1) Micro-bencharking: Array size = 10000, Command = "%timeit np.cos([myarr])" |---------------+------------+--------+---------+----------+----------| | Function Name | NumPy 1.16 | AVX2 | AVX512 | AVX2 | AVX512 | | | | | | speed up | speed up | |---------------+------------+--------+---------+----------+----------| | np.cos | 1.5174 | 0.1553 | 0.06634 | 9.77 | 22.87 | | np.sin | 1.4738 | 0.1517 | 0.06406 | 9.71 | 23.00 | |---------------+------------+--------+---------+----------+----------| (2) Package ai.cs provides an API to transform spherical coordinates to cartesean system: Array size = 10000, Command = "%timeit ai.cs.sp2cart(r,theta,phi)" |---------------+------------+--------+--------+----------+----------| | Function Name | NumPy 1.16 | AVX2 | AVX512 | AVX2 | AVX512 | | | | | | speed up | speed up | |---------------+------------+--------+--------+----------+----------| | ai.cs.sp2cart | 0.6371 | 0.1066 | 0.0605 | 5.97 | 10.53 | |---------------+------------+--------+--------+----------+----------| (3) Package photutils provides an API to find the best fit of first and second harmonic functions to a set of (angle, intensity) pairs: Array size = 1000, Command = "%timeit fit_first_and_second_harmonics(E, data)" |--------------------------------+------------+--------+--------+----------+----------| | Function Name | NumPy 1.16 | AVX2 | AVX512 | AVX2 | AVX512 | | | | | | speed up | speed up | |--------------------------------+------------+--------+--------+----------+----------| | fit_first_and_second_harmonics | 1.598 | 0.8709 | 0.7761 | 1.83 | 2.05 | |--------------------------------+------------+--------+--------+----------+----------|
Diffstat (limited to 'numpy/lib/recfunctions.py')
0 files changed, 0 insertions, 0 deletions