diff options
| author | Pieter Eendebak <pieter.eendebak@gmail.com> | 2022-05-14 11:26:57 +0200 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2022-05-14 11:26:57 +0200 |
| commit | cfbbde8ae720ba4fc653aabe64f9acc3d4c48183 (patch) | |
| tree | 6c235c0362637683a0eb70a360b02b327d9e311f /numpy | |
| parent | c93faa47504c8dc1cc59d60212a537395f946a7c (diff) | |
| download | numpy-cfbbde8ae720ba4fc653aabe64f9acc3d4c48183.tar.gz | |
PERF: Fast check on equivalent arrays in PyArray_EQUIVALENTLY_ITERABLE_OVERLAP_OK (#21464)
Addresses one of the items in #21455
* fast check on equivalent arrays in PyArray_EQUIVALENTLY_ITERABLE_OVERLAP_OK
* fix for case stride1==0
* remove assert statements
* whitespace
* DOC: Add comment that output broadcasting is rejected by overlap detection
That is a bit of a weird dynamic here, and it would probably nice to clean
up. However, it is also not a reason to not add the fast-path right now.
---
Note that it may also be good to just use a data-pointer comparison or push this fast-path into the overlap detection function itself.
Diffstat (limited to 'numpy')
| -rw-r--r-- | numpy/core/src/common/lowlevel_strided_loops.h | 16 | ||||
| -rw-r--r-- | numpy/core/src/umath/ufunc_object.c | 1 |
2 files changed, 14 insertions, 3 deletions
diff --git a/numpy/core/src/common/lowlevel_strided_loops.h b/numpy/core/src/common/lowlevel_strided_loops.h index ad86c0489..118ce9cb1 100644 --- a/numpy/core/src/common/lowlevel_strided_loops.h +++ b/numpy/core/src/common/lowlevel_strided_loops.h @@ -692,6 +692,19 @@ PyArray_EQUIVALENTLY_ITERABLE_OVERLAP_OK(PyArrayObject *arr1, PyArrayObject *arr return 1; } + size1 = PyArray_SIZE(arr1); + stride1 = PyArray_TRIVIAL_PAIR_ITERATION_STRIDE(size1, arr1); + + /* + * arr1 == arr2 is common for in-place operations, so we fast-path it here. + * TODO: The stride1 != 0 check rejects broadcast arrays. This may affect + * self-overlapping arrays, but seems only necessary due to + * `try_trivial_single_output_loop` not rejecting broadcast outputs. + */ + if (arr1 == arr2 && stride1 != 0) { + return 1; + } + if (solve_may_share_memory(arr1, arr2, 1) == 0) { return 1; } @@ -701,10 +714,7 @@ PyArray_EQUIVALENTLY_ITERABLE_OVERLAP_OK(PyArrayObject *arr1, PyArrayObject *arr * arrays stride ahead faster than output arrays. */ - size1 = PyArray_SIZE(arr1); size2 = PyArray_SIZE(arr2); - - stride1 = PyArray_TRIVIAL_PAIR_ITERATION_STRIDE(size1, arr1); stride2 = PyArray_TRIVIAL_PAIR_ITERATION_STRIDE(size2, arr2); /* diff --git a/numpy/core/src/umath/ufunc_object.c b/numpy/core/src/umath/ufunc_object.c index e4f1df79f..290ed24a6 100644 --- a/numpy/core/src/umath/ufunc_object.c +++ b/numpy/core/src/umath/ufunc_object.c @@ -1206,6 +1206,7 @@ prepare_ufunc_output(PyUFuncObject *ufunc, * cannot broadcast any other array (as it requires a single stride). * The function accepts all 1-D arrays, and N-D arrays that are either all * C- or all F-contiguous. + * NOTE: Broadcast outputs are implicitly rejected in the overlap detection. * * Returns -2 if a trivial loop is not possible, 0 on success and -1 on error. */ |
