| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| | |
BUG: Fix that reducelikes honour out always (and live int he future)
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Reducelikes should have lived in the future where the `out` dtype
is correctly honoured always and used as one of the *inputs*.
However, when legacy fallback occurs, this leads to problems because
the legacy code path has 0-D fallbacks.
There are two probable solutions to this:
* Live with weird value-based stuff here even though it was never
actually better especially for reducelikes.
(enforce value-based promotion)
* Avoid value based promotion completely.
This does the second one, using a terrible hack by just mutating
the dimension of `out` to tell the resolvers that value-based logic
cannot be used.
Is that hack safe? Yes, so long nobody has super-crazy custom type
resolvers (the only one I know is pyerfa and they are fine, PyGEOS
I think has no custom type resolver).
It also relies on the GIL of course, but...
The future? We need to ditch this value-based stuff, do annoying
acrobatics with dynamically created DType classes, or something similar
(so ditching seems best, it is topping my TODO list currently).
Testing this is tricky, running the test:
```
python runtests.py -t numpy/core/tests/test_ufunc.py::TestUfunc::test_reducelike_out_promotes
```
triggers it, but because reducelikes do not enforce value-based promotion
the failure can be "hidden" (which is why the test succeeds in a full test
run).
Closes gh-20739
|
| |
| |
| |
| |
| | |
* BUG: `array_api.argsort(descending=True)` respects relative order
* Regression test for stable descending `array_api.argsort()`
|
|\ \
| | |
| | | |
TYP: Type the NEP 35 `like` parameter via a `__array_function__` protocol
|
| | | |
|
| |/ |
|
|\ \
| |/
|/| |
BUG, DOC: Fixes SciPy docs build warnings
|
| |
| |
| |
| |
| |
| |
| | |
The new f2py symbolic parser writes ternary expressions with
spaces surrounding the colon operator, which causes the generated
docstrings to be incorrectly parsed. Removing the spaces solves the
issue.
|
|\ \
| | |
| | | |
BUG: min/max is slow, re-implement using NEON (#17989)
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
- Avoid unroll vectorized loops max/min by x6/x8 when SIMD width > 128
to avoid memory bandwidth bottleneck
- tune reduce max/min
- vectorize non-contiguos max/min
- fix code style
- call npyv_cleanup() at end of inner loop
|
| | | |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We've incorporated the changes you've requested for scalar operations.
**Testing**
- Apple silicon M1 native (arm64 / aarch64) -- No test failures
- Apple silicon M1 Rosetta (x86_64) -- No new test failures
- iMacPro1,1 (AVX512F) -- No test failures
- Ubuntu VM (aarch64) -- No test failures
**Benchmarks**
Again, Apple silicon M1 native (arm64 / aarch64) looks similar to original patch (comparison below)
Also, x86_64 (both Apple silicon M1 Rosetta and iMacPro1,1 AVX512F) have varying results. Some are better. Some are worse. Compared to previous re-org, we see improvements though.
Apple silicon M1 native (arm64 / aarch64) comparison to previous commit:
```
before after ratio
[8b01e839] [18565b27]
<gh-issue-17989/feedback/round-1> <gh-issue-17989/feedback/round-2>
+ 176±0.2μs 196±1μs 1.11 bench_function_base.Sort.time_sort('heap', 'int16', ('ordered',))
+ 234±0.2μs 261±1μs 1.11 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'exp'>, 2, 4, 'f')
+ 43.4±0.4μs 48.3±0.4μs 1.11 bench_function_base.Sort.time_sort('quick', 'int64', ('uniform',))
+ 22.5±0.1μs 25.1±0.3μs 1.11 bench_shape_base.Block2D.time_block2d((512, 512), 'uint8', (2, 2))
+ 4.75±0.05μs 5.28±0.07μs 1.11 bench_ma.UFunc.time_scalar(True, True, 1000)
+ 224±0.2μs 248±0.9μs 1.11 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'exp2'>, 1, 1, 'f')
+ 233±0.5μs 258±1μs 1.11 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'exp'>, 4, 2, 'f')
+ 8.81±0.02μs 9.72±0.1μs 1.10 bench_shape_base.Block2D.time_block2d((32, 32), 'uint16', (2, 2))
+ 8.71±0.1μs 9.58±0.3μs 1.10 bench_indexing.ScalarIndexing.time_assign_cast(2)
+ 96.2±0.03μs 105±3μs 1.09 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'fabs'>, 1, 1, 'd')
+ 20.2±0.1μs 22.0±0.5μs 1.09 bench_shape_base.Block.time_block_simple_row_wise(100)
+ 469±4μs 510±7μs 1.09 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'cos'>, 2, 1, 'd')
+ 43.9±0.02μs 46.4±2μs 1.06 bench_function_base.Median.time_odd_inplace
+ 4.75±0μs 5.02±0.2μs 1.06 bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'int64')
- 16.4±0.07μs 15.6±0.4μs 0.95 bench_ufunc.UFunc.time_ufunc_types('left_shift')
- 127±6μs 120±0.1μs 0.94 bench_ufunc.UFunc.time_ufunc_types('deg2rad')
- 10.9±0.5μs 10.3±0.01μs 0.94 bench_function_base.Sort.time_sort('merge', 'int64', ('reversed',))
- 115±5μs 108±0.2μs 0.94 bench_function_base.Bincount.time_bincount
- 17.0±0.4μs 15.9±0.03μs 0.94 bench_ufunc.UFunc.time_ufunc_types('right_shift')
- 797±30ns 743±0.5ns 0.93 bench_ufunc.ArgParsingReduce.time_add_reduce_arg_parsing((array([0., 1.]), axis=0))
- 18.4±1μs 17.2±0.04μs 0.93 bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 10000, <class 'bool'>)
- 241±7μs 224±0.3μs 0.93 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'exp2'>, 2, 1, 'f')
- 105±1μs 96.7±0.02μs 0.92 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'deg2rad'>, 2, 4, 'f')
- 23.3±0.2μs 21.4±0.02μs 0.92 bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 1, 'edge')
- 833±20μs 766±2μs 0.92 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'arctanh'>, 1, 1, 'd')
- 86.8±4μs 79.5±0.4μs 0.92 bench_ufunc.UFunc.time_ufunc_types('conjugate')
- 2.58±0.1μs 2.36±0μs 0.91 bench_ufunc.CustomScalar.time_divide_scalar2(<class 'numpy.float32'>)
- 102±4μs 92.8±0.7μs 0.91 bench_ufunc.UFunc.time_ufunc_types('logical_not')
- 46.6±0.4μs 42.1±0.07μs 0.90 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 1, 'd')
- 158±0.7μs 142±0.07μs 0.90 bench_lib.Pad.time_pad((4, 4, 4, 4), 1, 'linear_ramp')
- 729±6μs 657±1μs 0.90 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'arccos'>, 4, 4, 'f')
- 63.6±0.9μs 56.2±1μs 0.88 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 4, 'd')
- 730±40μs 605±3μs 0.83 bench_lib.Pad.time_pad((1024, 1024), 1, 'reflect')
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.
```
|
| |\ \
| | | |
| | | |
| | | | |
https://github.com/Developer-Ecosystem-Engineering/numpy into as_min_max
|
| | |\ \ |
|
| |/ / / |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Thank you @seiko2plus for the excellent example.
Reorganized code so that it can be used for other architectures. Core implementations and unroll factors should be the same as before for ARM NEON. Beyond reorganizing, we've added default implementations using universal intrinsics for non-ARM-NEON. Additionally, we've moved most min, max, fmin, fmax implementations to a new dispatchable source file: numpy/core/src/umath/loops_minmax.dispatch.c.src
**Testing**
- Apple silicon M1 native (arm64 / aarch64) -- No test failures
- Apple silicon M1 Rosetta (x86_64) -- No new test failures
- iMacPro1,1 (AVX512F) -- No test failures
**Benchmarks**
- Apple silicon M1 native (arm64 / aarch64)
- Similar improvements as before reorg (comparison below)
- x86_64 (both Apple silicon M1 Rosetta and iMacPro1,1 AVX512F)
- Some x86_64 benchmarks are better, some are worse
Apple silicon M1 native (arm64 / aarch64) comparison to original implementation / before reorg:
```
before after ratio
[559ddede] [a3463b09]
<gh-issue-17989/improve-neon-min-max> <gh-issue-17989/feedback/round-1>
+ 6.45±0.04μs 7.07±0.09μs 1.10 bench_lib.Nan.time_nanargmin(200, 0.1)
+ 32.1±0.3μs 35.2±0.2μs 1.10 bench_ufunc_strides.Unary.time_ufunc(<ufunc '_ones_like'>, 2, 1, 'd')
+ 29.1±0.02μs 31.8±0.05μs 1.10 bench_core.Core.time_array_int_l1000
+ 69.0±0.2μs 75.3±3μs 1.09 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'logical_not'>, 2, 4, 'f')
+ 92.0±1μs 99.5±0.5μs 1.08 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'logical_not'>, 4, 4, 'd')
+ 9.29±0.1μs 9.99±0.5μs 1.08 bench_ma.UFunc.time_1d(True, True, 10)
+ 338±0.6μs 362±10μs 1.07 bench_function_base.Sort.time_sort('quick', 'int16', ('random',))
+ 4.21±0.03μs 4.48±0.2μs 1.07 bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 100, <class 'str'>)
+ 12.3±0.06μs 13.1±0.7μs 1.06 bench_function_base.Median.time_even_small
+ 1.27±0μs 1.35±0.06μs 1.06 bench_itemselection.PutMask.time_dense(False, 'float16')
+ 139±1ns 147±6ns 1.06 bench_core.Core.time_array_1
+ 33.7±0.01μs 35.5±2μs 1.05 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 4, 'f')
+ 69.4±0.1μs 73.1±0.2μs 1.05 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'logical_not'>, 4, 4, 'f')
+ 225±0.09μs 237±9μs 1.05 bench_random.Bounded.time_bounded('PCG64', [<class 'numpy.uint32'>, 2047])
- 15.7±0.5μs 14.9±0.03μs 0.95 bench_core.CountNonzero.time_count_nonzero_axis(2, 10000, <class 'numpy.int64'>)
- 34.2±2μs 32.0±0.03μs 0.94 bench_ufunc_strides.Unary.time_ufunc(<ufunc '_ones_like'>, 4, 2, 'f')
- 1.03±0.05ms 955±3μs 0.92 bench_lib.Nan.time_nanargmax(200000, 50.0)
- 6.97±0.08μs 6.43±0.02μs 0.92 bench_ma.UFunc.time_scalar(True, False, 10)
- 5.41±0μs 4.98±0.01μs 0.92 bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 2, 'F')
- 22.4±0.01μs 20.6±0.02μs 0.92 bench_core.Core.time_array_float64_l1000
- 1.51±0.01ms 1.38±0ms 0.92 bench_core.CorrConv.time_correlate(1000, 10000, 'same')
- 10.1±0.2μs 9.27±0.09μs 0.92 bench_ufunc.UFunc.time_ufunc_types('invert')
- 8.50±0.02μs 7.80±0.09μs 0.92 bench_indexing.ScalarIndexing.time_assign_cast(1)
- 29.5±0.2μs 26.6±0.03μs 0.90 bench_ma.Concatenate.time_it('masked', 100)
- 2.09±0.02ms 1.87±0ms 0.90 bench_ma.UFunc.time_2d(True, True, 1000)
- 298±10μs 267±0.3μs 0.89 bench_app.MaxesOfDots.time_it
- 10.7±0.2μs 9.60±0.02μs 0.89 bench_ma.UFunc.time_1d(True, True, 100)
- 567±3μs 505±2μs 0.89 bench_lib.Nan.time_nanargmax(200000, 90.0)
- 342±0.9μs 282±5μs 0.83 bench_lib.Nan.time_nanargmax(200000, 2.0)
- 307±0.7μs 244±0.8μs 0.80 bench_lib.Nan.time_nanargmax(200000, 0.1)
- 309±1μs 241±0.1μs 0.78 bench_lib.Nan.time_nanargmax(200000, 0)
```
|
| | | | |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This fixes numpy/numpy#17989 by adding ARM NEON implementations for min/max and fmin/max.
Before: Rosetta faster than native arm64 by `1.2x - 8.6x`.
After: Native arm64 faster than Rosetta by `1.6x - 6.7x`. (2.8x - 15.5x improvement)
**Benchmarks**
```
before after ratio
[b0e1a445] [8301ffd7]
<main> <gh-issue-17989/improve-neon-min-max>
+ 32.6±0.04μs 37.5±0.08μs 1.15 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 1, 'd')
+ 32.6±0.06μs 37.5±0.04μs 1.15 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 1, 'd')
+ 37.8±0.09μs 43.2±0.09μs 1.14 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 4, 'f')
+ 37.7±0.09μs 42.9±0.1μs 1.14 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 2, 'd')
+ 37.9±0.2μs 43.0±0.02μs 1.14 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 2, 'd')
+ 37.7±0.01μs 42.3±1μs 1.12 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'conjugate'>, 2, 2, 'd')
+ 34.2±0.07μs 38.1±0.05μs 1.12 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 2, 'f')
+ 32.6±0.03μs 35.8±0.04μs 1.10 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 1, 'f')
+ 37.1±0.1μs 40.3±0.1μs 1.09 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 2, 'd')
+ 37.2±0.1μs 40.3±0.04μs 1.08 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 4, 'f')
+ 37.1±0.09μs 40.3±0.07μs 1.08 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 2, 'd')
+ 68.6±0.5μs 74.2±0.3μs 1.08 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 4, 'd')
+ 37.1±0.2μs 40.0±0.1μs 1.08 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'conjugate'>, 1, 2, 'd')
+ 2.42±0μs 2.61±0.05μs 1.08 bench_core.CountNonzero.time_count_nonzero_axis(3, 100, <class 'numpy.int16'>)
+ 69.1±0.7μs 73.5±0.7μs 1.06 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'conjugate'>, 4, 4, 'd')
+ 54.7±0.3μs 58.0±0.2μs 1.06 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 4, 'd')
+ 54.5±0.2μs 57.8±0.2μs 1.06 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'conjugate'>, 2, 4, 'd')
+ 3.78±0.04μs 4.00±0.02μs 1.06 bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 100, <class 'str'>)
+ 54.8±0.2μs 57.9±0.3μs 1.06 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 4, 'd')
+ 3.68±0.01μs 3.87±0.02μs 1.05 bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 100, <class 'object'>)
+ 69.6±0.2μs 73.1±0.2μs 1.05 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 4, 'd')
+ 229±2μs 241±0.2μs 1.05 bench_random.Bounded.time_bounded('PCG64', [<class 'numpy.uint64'>, 1535])
- 73.0±0.8μs 69.5±0.2μs 0.95 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 4, 'd')
- 37.6±0.1μs 35.7±0.3μs 0.95 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 4, 'f')
- 88.7±0.04μs 84.2±0.7μs 0.95 bench_lib.Pad.time_pad((256, 128, 1), 1, 'wrap')
- 57.9±0.2μs 54.8±0.2μs 0.95 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 4, 'd')
- 39.9±0.2μs 37.2±0.04μs 0.93 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'positive'>, 1, 2, 'd')
- 2.66±0.01μs 2.47±0.01μs 0.93 bench_lib.Nan.time_nanmin(200, 0)
- 2.65±0.02μs 2.46±0.04μs 0.93 bench_lib.Nan.time_nanmin(200, 50.0)
- 2.64±0.01μs 2.45±0.01μs 0.93 bench_lib.Nan.time_nanmax(200, 90.0)
- 2.64±0μs 2.44±0.02μs 0.92 bench_lib.Nan.time_nanmax(200, 0)
- 2.68±0.02μs 2.48±0μs 0.92 bench_lib.Nan.time_nanmax(200, 2.0)
- 40.2±0.01μs 37.1±0.1μs 0.92 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 4, 'f')
- 2.69±0μs 2.47±0μs 0.92 bench_lib.Nan.time_nanmin(200, 2.0)
- 2.70±0.02μs 2.48±0.02μs 0.92 bench_lib.Nan.time_nanmax(200, 0.1)
- 2.70±0μs 2.47±0μs 0.91 bench_lib.Nan.time_nanmin(200, 90.0)
- 2.70±0μs 2.46±0μs 0.91 bench_lib.Nan.time_nanmin(200, 0.1)
- 2.70±0μs 2.42±0.01μs 0.90 bench_lib.Nan.time_nanmax(200, 50.0)
- 11.8±0.6ms 10.6±0.6ms 0.89 bench_core.CountNonzero.time_count_nonzero_axis(2, 1000000, <class 'str'>)
- 42.7±0.1μs 37.8±0.02μs 0.88 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'positive'>, 2, 2, 'd')
- 42.8±0.03μs 37.8±0.2μs 0.88 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 2, 'd')
- 43.1±0.2μs 37.7±0.09μs 0.87 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 4, 'f')
- 37.5±0.07μs 32.6±0.06μs 0.87 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 1, 'd')
- 41.7±0.03μs 36.3±0.07μs 0.87 bench_ufunc_strides.Unary.time_ufunc(<ufunc '_ones_like'>, 1, 4, 'd')
- 166±0.8μs 144±1μs 0.87 bench_ufunc.UFunc.time_ufunc_types('fmin')
- 11.6±0.8ms 10.0±0.01ms 0.87 bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 1000000, <class 'str'>)
- 167±0.9μs 144±2μs 0.86 bench_ufunc.UFunc.time_ufunc_types('minimum')
- 168±4μs 143±0.5μs 0.85 bench_ufunc.UFunc.time_ufunc_types('fmax')
- 167±1μs 142±0.8μs 0.85 bench_ufunc.UFunc.time_ufunc_types('maximum')
- 7.10±0μs 4.97±0.01μs 0.70 bench_ufunc_strides.AVX_BFunc.time_ufunc('minimum', 'd', 2)
- 7.11±0.07μs 4.96±0.01μs 0.70 bench_ufunc_strides.AVX_BFunc.time_ufunc('maximum', 'd', 2)
- 7.05±0.07μs 4.68±0μs 0.66 bench_ufunc_strides.AVX_BFunc.time_ufunc('minimum', 'f', 4)
- 7.13±0μs 4.68±0.01μs 0.66 bench_ufunc_strides.AVX_BFunc.time_ufunc('maximum', 'f', 4)
- 461±0.2μs 297±7μs 0.64 bench_app.MaxesOfDots.time_it
- 7.04±0.07μs 3.95±0μs 0.56 bench_ufunc_strides.AVX_BFunc.time_ufunc('maximum', 'f', 2)
- 7.06±0.06μs 3.95±0.01μs 0.56 bench_ufunc_strides.AVX_BFunc.time_ufunc('minimum', 'f', 2)
- 7.09±0.06μs 3.24±0μs 0.46 bench_ufunc_strides.AVX_BFunc.time_ufunc('minimum', 'd', 1)
- 7.12±0.07μs 3.25±0.02μs 0.46 bench_ufunc_strides.AVX_BFunc.time_ufunc('maximum', 'd', 1)
- 14.5±0.02μs 3.98±0μs 0.27 bench_reduce.MinMax.time_max(<class 'numpy.int64'>)
- 14.6±0.1μs 4.00±0.01μs 0.27 bench_reduce.MinMax.time_min(<class 'numpy.int64'>)
- 6.88±0.06μs 1.34±0μs 0.19 bench_ufunc_strides.AVX_BFunc.time_ufunc('maximum', 'f', 1)
- 7.00±0μs 1.33±0μs 0.19 bench_ufunc_strides.AVX_BFunc.time_ufunc('minimum', 'f', 1)
- 39.4±0.01μs 3.95±0.01μs 0.10 bench_reduce.MinMax.time_min(<class 'numpy.float64'>)
- 39.4±0.01μs 3.95±0.02μs 0.10 bench_reduce.MinMax.time_max(<class 'numpy.float64'>)
- 254±0.02μs 22.8±0.2μs 0.09 bench_lib.Nan.time_nanmax(200000, 50.0)
- 253±0.1μs 22.7±0.1μs 0.09 bench_lib.Nan.time_nanmin(200000, 0)
- 254±0.06μs 22.7±0.09μs 0.09 bench_lib.Nan.time_nanmin(200000, 2.0)
- 254±0.01μs 22.7±0.03μs 0.09 bench_lib.Nan.time_nanmin(200000, 0.1)
- 254±0.04μs 22.7±0.02μs 0.09 bench_lib.Nan.time_nanmin(200000, 50.0)
- 253±0.1μs 22.7±0.04μs 0.09 bench_lib.Nan.time_nanmax(200000, 0.1)
- 253±0.03μs 22.7±0.04μs 0.09 bench_lib.Nan.time_nanmin(200000, 90.0)
- 253±0.02μs 22.7±0.07μs 0.09 bench_lib.Nan.time_nanmax(200000, 0)
- 254±0.03μs 22.7±0.02μs 0.09 bench_lib.Nan.time_nanmax(200000, 90.0)
- 254±0.09μs 22.7±0.04μs 0.09 bench_lib.Nan.time_nanmax(200000, 2.0)
- 39.2±0.01μs 2.51±0.01μs 0.06 bench_reduce.MinMax.time_max(<class 'numpy.float32'>)
- 39.2±0.01μs 2.50±0.01μs 0.06 bench_reduce.MinMax.time_min(<class 'numpy.float32'>)
```
Size change of _multiarray_umath.cpython-39-darwin.so:
Before: 3,890,723
After: 3,924,035
Change: +33,312 (~ +0.856 %)
|
|\ \ \ \
| | | | |
| | | | | |
ENH: Make ndarray.__array_finalize__ a callable no-op
|
| | | | | |
|
| | | | | |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In the process, __array_finalized__ is looked up on the subclass
instead of the instance, which is more like python for methods like these.
It cannot make a difference, since the instance is created in the
same routine, so the instance method is guaranteed to be the same as
that on the class.
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This helps subclasses, who can now do super() in their own
implementation.
|
|\ \ \ \ \
| | | | | |
| | | | | | |
PERF: Optimize array check for bounded 0,1 values
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Optimize frequent check for probabilities when they are doubles
|
|\ \ \ \ \ \
| |_|_|_|_|/
|/| | | | | |
MAINT: removed duplicate 'int' type in ScalarType
|
| | |/ / /
| |/| | | |
|
|/ / / / |
|
|\ \ \ \
| | | | |
| | | | | |
TYP: Allow time manipulation functions to accept `data` and `timedelta` objects
|
| | | | | |
|
|\ \ \ \ \
| | | | | |
| | | | | | |
MAINT: Relax asserts to match relaxed reducelike resolution behaviour
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This closes gh-20751, which was due to the assert not being noticed
triggered (not sure why) during initial CI run.
The behaviour is relaxed, so the assert must also be relaxed.
|
|/ / / / /
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
* BUG: Added check for NULL data in ufuncs
* DOC: Made NULL refs more explicit
* DOC: Added ..versionchanged:: tag
|
|\ \ \ \ \
| | | | | |
| | | | | | |
ENH: Removed requirement for C-contiguity when changing to dtype of different size
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Expires deprecated F-contiguous behavior.
Simplifies C code of dtype set descriptor.
Adds tests that verify which error condition is triggered.
Introduces extra long exception message that upsets linter.
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | | |
BUG: Remove trailing dec point in dragon4positional
|
| | | | | | | |
|
| | | | | | | |
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
fixes #12441
|
|\ \ \ \ \ \ \
| |_|_|/ / / /
|/| | | | | | |
BUG: Relax dtype identity check in reductions
|
| | |/ / / /
| |/| | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
In some cases, e.g. ensure-native-byte-order will return not the
default, but a copy of the descriptor.
This (and maybe metadata) makes it somewhat annoying to ensure
exact identity between descriptors for reduce "operands" as
returned by the resolve-descirptors method of the ArrayMethod.
To avoid this problem, we check for no-casting (which implies
viewable with `offset == 0`) rather than strict identity.
Unfortunately, this means that descriptor resolution must be slightly
more careful, but in general this should all be perfectly well
defined.
Closes gh-20699
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | | |
BLD: Add NPY_DISABLE_SVML env var to opt out of SVML
|
| |/ / / / / |
|
|/ / / / /
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
* DOC: Document that dtype, strides, shape attributes should not be set
This adds a `warning` directive and makes sure that `view` is mentioned.
(assignment to data is already deprecated, so not indcluding it here.)
* Update numpy/core/_add_newdocs.py
Co-authored-by: Matti Picus <matti.picus@gmail.com>
Co-authored-by: Matti Picus <matti.picus@gmail.com>
|
|\ \ \ \ \
| | | | | |
| | | | | | |
ENH: add support for operator() in crackfortran.
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Some interface name may contains parenthesis when used
with operator, like:
interface operator(==)
module procedure my_type_equals
end interface operator(==)
Make the end part properly detected, and store also
the operator ('==' in that case) in the name.
Also implement support to list the implemented by in
any interface declaration.
|
|\ \ \ \ \ \
| |/ / / / /
|/| | | | | |
TYP: add a few type annotations to `numpy.array_api.Array`
|
| | | | | |
| | | | | |
| | | | | | |
Co-authored-by: Bas van Beek <43369155+BvB93@users.noreply.github.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This fixes the majority of the complaints for `$ mypy numpy/array_api`.
The comment indicating that one fix is blocked by lack of support in
Mypy for `NotImplemented` is responsible for another several dozen
errors.
[skip ci]
|
| | | | | |
| | | | | |
| | | | | | |
* BUG: Fix array dimensions solver for multidimensional arguments in f2py. See #20709
|