diff options
author | Developer-Ecosystem-Engineering <65677710+Developer-Ecosystem-Engineering@users.noreply.github.com> | 2021-09-25 15:48:32 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2021-09-25 16:48:32 -0600 |
commit | 0b01b48d9c86af834e24a290339ec83e8bc9334a (patch) | |
tree | 49a36a952fb1971635057a82bbee6d5773f2f9f6 /numpy/array_api/_creation_functions.py | |
parent | 2d112a98ed7597c4120b31908384ae09b0304659 (diff) | |
download | numpy-0b01b48d9c86af834e24a290339ec83e8bc9334a.tar.gz |
BUG: Resolve Divide by Zero on Apple silicon + test failures (#19926)
* Resolve divide by zero in reciprocal #18555
clang has an optimization bug where a vector that is only partially loaded / stored will get narrowed to only the lanes used, which can be fine in some cases. However, in numpy's `reciprocal` function a partial load is explicitly extended to the full width of the register (filled with '1's) to avoid divide-by-zero. clang's optimization ignores the explicit filling with '1's.
The changes here insert a dummy `volatile` variable. This convinces clang not to narrow the load and ignore the explicit filling of '1's.
`volatile` can be expensive since it forces loads / stores to refresh contents whenever the variable is used. numpy has its own template / macro system that'll expand the loop function below for sqrt, absolute, square, and reciprocal. Additionally, the loop can be called on a full array if there's overlap between src and dst. Consequently, we try to limit the scope that we need to apply `volatile`. Intention is it should only be needed when compiling with clang, against Apple arm64, and only for the `reciprocal` function. Moreover, `volatile` is only needed when a vector is partially loaded.
Testing:
Beyond fixing the cases mentioned in the GitHub issue, the changes here also resolve several failures in numpy's test suite.
Before:
```
FAILED numpy/core/tests/test_scalarmath.py::TestBaseMath::test_blocked - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/core/tests/test_ufunc.py::TestUfuncGenericLoops::test_unary_PyUFunc_O_O_method_full[reciprocal] - AssertionError: FloatingPointError not raised
FAILED numpy/core/tests/test_umath.py::TestPower::test_power_float - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan
FAILED numpy/core/tests/test_umath.py::TestAVXUfuncs::test_avx_based_ufunc - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormDouble::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormSingle::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormInt64::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
8 failed, 14759 passed, 204 skipped, 1268 deselected, 34 xfailed in 69.90s (0:01:09)
```
After:
```
FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan
1 failed, 14766 passed, 204 skipped, 1268 deselected, 34 xfailed in 70.37s (0:01:10)
```
* Enhancement on top of workaround for clang bug in reciprocal
Enhancement on top of workaround for clang bug in reciprocal (numpy/numpy#18555)
Numpy's FP unary loops use a partial load / store on every iteration. The partial load / store helpers each insert a switch statement to know how many elements to handle. This causes a lot of unnecessary branches to be inserted in the loops. The partial load / store is only needed on the final iteration of the loop if it isn't a full vector.
The changes here breakout the final iteration to use the partial load / stores. The loop has been changed to use full load / stores. Additionally, this means we don't need to conditionalize the volatile workaround in the loop.
* Address Azure CI failures with older versions of clang
- -ftrapping-math is default enabled for Numpy, but support in clang is mainly for x86_64
- Apple Clang and Clang have different, but overlapping versions
- Non-Apple Clang versions come from looking at when they started supporting -ftrapping-math for x86_64
Testing was done against Apple Clang versions
- v11 / x86_64 - failed previously, now passes (azure failure)
- v12+ / x86_64 - passes before and after
- v13 / arm64 - failed before initial patch, passes after
Diffstat (limited to 'numpy/array_api/_creation_functions.py')
0 files changed, 0 insertions, 0 deletions