summaryrefslogtreecommitdiff
path: root/numpy
Commit message (Collapse)AuthorAgeFilesLines
* Merge pull request #20793 from seberg/evil-reducelike-no-value-basedCharles Harris2022-01-121-0/+18
|\ | | | | BUG: Fix that reducelikes honour out always (and live int he future)
| * BUG: Fix that reducelikes honour out always (and live int he future)Sebastian Berg2022-01-111-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reducelikes should have lived in the future where the `out` dtype is correctly honoured always and used as one of the *inputs*. However, when legacy fallback occurs, this leads to problems because the legacy code path has 0-D fallbacks. There are two probable solutions to this: * Live with weird value-based stuff here even though it was never actually better especially for reducelikes. (enforce value-based promotion) * Avoid value based promotion completely. This does the second one, using a terrible hack by just mutating the dimension of `out` to tell the resolvers that value-based logic cannot be used. Is that hack safe? Yes, so long nobody has super-crazy custom type resolvers (the only one I know is pyerfa and they are fine, PyGEOS I think has no custom type resolver). It also relies on the GIL of course, but... The future? We need to ditch this value-based stuff, do annoying acrobatics with dynamically created DType classes, or something similar (so ditching seems best, it is topping my TODO list currently). Testing this is tricky, running the test: ``` python runtests.py -t numpy/core/tests/test_ufunc.py::TestUfunc::test_reducelike_out_promotes ``` triggers it, but because reducelikes do not enforce value-based promotion the failure can be "hidden" (which is why the test succeeds in a full test run). Closes gh-20739
* | BUG: `array_api.argsort(descending=True)` respects relative sort order (#20788)Matthew Barber2022-01-122-3/+37
| | | | | | | | | | * BUG: `array_api.argsort(descending=True)` respects relative order * Regression test for stable descending `array_api.argsort()`
* | Merge pull request #20794 from BvB93/likeCharles Harris2022-01-119-77/+104
|\ \ | | | | | | TYP: Type the NEP 35 `like` parameter via a `__array_function__` protocol
| * | ENH: Type the `like` parameter via a `__array_function__` protocolBas van Beek2022-01-117-77/+89
| | |
| * | TYP: Add a protocol class representing `__array_function__`Bas van Beek2022-01-112-0/+15
| |/
* | Merge pull request #20786 from melissawm/f2py-scipy-docsCharles Harris2022-01-112-2/+2
|\ \ | |/ |/| BUG, DOC: Fixes SciPy docs build warnings
| * BUG, DOC: Fixes SciPy docs build warningsmelissawm2022-01-112-2/+2
| | | | | | | | | | | | | | The new f2py symbolic parser writes ternary expressions with spaces surrounding the colon operator, which causes the generated docstrings to be incorrectly parsed. Removing the spaces solves the issue.
* | Merge pull request #20131 from Developer-Ecosystem-Engineering/as_min_maxMatti Picus2022-01-116-301/+596
|\ \ | | | | | | BUG: min/max is slow, re-implement using NEON (#17989)
| * | ENH, SIMD: serveral improvments for max/minSayed Adel2022-01-061-80/+121
| | | | | | | | | | | | | | | | | | | | | | | | | | | - Avoid unroll vectorized loops max/min by x6/x8 when SIMD width > 128 to avoid memory bandwidth bottleneck - tune reduce max/min - vectorize non-contiguos max/min - fix code style - call npyv_cleanup() at end of inner loop
| * | ENH: remove raw x86 SIMD of max/minSayed Adel2021-12-312-218/+3
| | |
| * | Integrate requested changes, improve scalar operations, address linux aarch64Developer-Ecosystem-Engineering2021-12-131-15/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've incorporated the changes you've requested for scalar operations. **Testing** - Apple silicon M1 native (arm64 / aarch64) -- No test failures - Apple silicon M1 Rosetta (x86_64) -- No new test failures - iMacPro1,1 (AVX512F) -- No test failures - Ubuntu VM (aarch64) -- No test failures **Benchmarks** Again, Apple silicon M1 native (arm64 / aarch64) looks similar to original patch (comparison below) Also, x86_64 (both Apple silicon M1 Rosetta and iMacPro1,1 AVX512F) have varying results. Some are better. Some are worse. Compared to previous re-org, we see improvements though. Apple silicon M1 native (arm64 / aarch64) comparison to previous commit: ``` before after ratio [8b01e839] [18565b27] <gh-issue-17989/feedback/round-1> <gh-issue-17989/feedback/round-2> + 176±0.2μs 196±1μs 1.11 bench_function_base.Sort.time_sort('heap', 'int16', ('ordered',)) + 234±0.2μs 261±1μs 1.11 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'exp'>, 2, 4, 'f') + 43.4±0.4μs 48.3±0.4μs 1.11 bench_function_base.Sort.time_sort('quick', 'int64', ('uniform',)) + 22.5±0.1μs 25.1±0.3μs 1.11 bench_shape_base.Block2D.time_block2d((512, 512), 'uint8', (2, 2)) + 4.75±0.05μs 5.28±0.07μs 1.11 bench_ma.UFunc.time_scalar(True, True, 1000) + 224±0.2μs 248±0.9μs 1.11 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'exp2'>, 1, 1, 'f') + 233±0.5μs 258±1μs 1.11 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'exp'>, 4, 2, 'f') + 8.81±0.02μs 9.72±0.1μs 1.10 bench_shape_base.Block2D.time_block2d((32, 32), 'uint16', (2, 2)) + 8.71±0.1μs 9.58±0.3μs 1.10 bench_indexing.ScalarIndexing.time_assign_cast(2) + 96.2±0.03μs 105±3μs 1.09 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'fabs'>, 1, 1, 'd') + 20.2±0.1μs 22.0±0.5μs 1.09 bench_shape_base.Block.time_block_simple_row_wise(100) + 469±4μs 510±7μs 1.09 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'cos'>, 2, 1, 'd') + 43.9±0.02μs 46.4±2μs 1.06 bench_function_base.Median.time_odd_inplace + 4.75±0μs 5.02±0.2μs 1.06 bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'int64') - 16.4±0.07μs 15.6±0.4μs 0.95 bench_ufunc.UFunc.time_ufunc_types('left_shift') - 127±6μs 120±0.1μs 0.94 bench_ufunc.UFunc.time_ufunc_types('deg2rad') - 10.9±0.5μs 10.3±0.01μs 0.94 bench_function_base.Sort.time_sort('merge', 'int64', ('reversed',)) - 115±5μs 108±0.2μs 0.94 bench_function_base.Bincount.time_bincount - 17.0±0.4μs 15.9±0.03μs 0.94 bench_ufunc.UFunc.time_ufunc_types('right_shift') - 797±30ns 743±0.5ns 0.93 bench_ufunc.ArgParsingReduce.time_add_reduce_arg_parsing((array([0., 1.]), axis=0)) - 18.4±1μs 17.2±0.04μs 0.93 bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 10000, <class 'bool'>) - 241±7μs 224±0.3μs 0.93 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'exp2'>, 2, 1, 'f') - 105±1μs 96.7±0.02μs 0.92 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'deg2rad'>, 2, 4, 'f') - 23.3±0.2μs 21.4±0.02μs 0.92 bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 1, 'edge') - 833±20μs 766±2μs 0.92 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'arctanh'>, 1, 1, 'd') - 86.8±4μs 79.5±0.4μs 0.92 bench_ufunc.UFunc.time_ufunc_types('conjugate') - 2.58±0.1μs 2.36±0μs 0.91 bench_ufunc.CustomScalar.time_divide_scalar2(<class 'numpy.float32'>) - 102±4μs 92.8±0.7μs 0.91 bench_ufunc.UFunc.time_ufunc_types('logical_not') - 46.6±0.4μs 42.1±0.07μs 0.90 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 1, 'd') - 158±0.7μs 142±0.07μs 0.90 bench_lib.Pad.time_pad((4, 4, 4, 4), 1, 'linear_ramp') - 729±6μs 657±1μs 0.90 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'arccos'>, 4, 4, 'f') - 63.6±0.9μs 56.2±1μs 0.88 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 4, 'd') - 730±40μs 605±3μs 0.83 bench_lib.Pad.time_pad((1024, 1024), 1, 'reflect') SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY. PERFORMANCE DECREASED. ```
| * | Merge branch 'as_min_max' of ↵Developer-Ecosystem-Engineering2021-11-18258-5242/+12478
| |\ \ | | | | | | | | | | | | https://github.com/Developer-Ecosystem-Engineering/numpy into as_min_max
| | * \ Merge branch 'numpy:main' into as_min_maxDeveloper-Ecosystem-Engineering2021-11-18258-5242/+12478
| | |\ \
| * | | | Delete .org file, unnecessaryDeveloper-Ecosystem-Engineering2021-11-181-717/+0
| |/ / /
| * | | Reorganize NEON min/max implementation to be more genericDeveloper-Ecosystem-Engineering2021-11-186-758/+1105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Thank you @seiko2plus for the excellent example. Reorganized code so that it can be used for other architectures. Core implementations and unroll factors should be the same as before for ARM NEON. Beyond reorganizing, we've added default implementations using universal intrinsics for non-ARM-NEON. Additionally, we've moved most min, max, fmin, fmax implementations to a new dispatchable source file: numpy/core/src/umath/loops_minmax.dispatch.c.src **Testing** - Apple silicon M1 native (arm64 / aarch64) -- No test failures - Apple silicon M1 Rosetta (x86_64) -- No new test failures - iMacPro1,1 (AVX512F) -- No test failures **Benchmarks** - Apple silicon M1 native (arm64 / aarch64) - Similar improvements as before reorg (comparison below) - x86_64 (both Apple silicon M1 Rosetta and iMacPro1,1 AVX512F) - Some x86_64 benchmarks are better, some are worse Apple silicon M1 native (arm64 / aarch64) comparison to original implementation / before reorg: ``` before after ratio [559ddede] [a3463b09] <gh-issue-17989/improve-neon-min-max> <gh-issue-17989/feedback/round-1> + 6.45±0.04μs 7.07±0.09μs 1.10 bench_lib.Nan.time_nanargmin(200, 0.1) + 32.1±0.3μs 35.2±0.2μs 1.10 bench_ufunc_strides.Unary.time_ufunc(<ufunc '_ones_like'>, 2, 1, 'd') + 29.1±0.02μs 31.8±0.05μs 1.10 bench_core.Core.time_array_int_l1000 + 69.0±0.2μs 75.3±3μs 1.09 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'logical_not'>, 2, 4, 'f') + 92.0±1μs 99.5±0.5μs 1.08 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'logical_not'>, 4, 4, 'd') + 9.29±0.1μs 9.99±0.5μs 1.08 bench_ma.UFunc.time_1d(True, True, 10) + 338±0.6μs 362±10μs 1.07 bench_function_base.Sort.time_sort('quick', 'int16', ('random',)) + 4.21±0.03μs 4.48±0.2μs 1.07 bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 100, <class 'str'>) + 12.3±0.06μs 13.1±0.7μs 1.06 bench_function_base.Median.time_even_small + 1.27±0μs 1.35±0.06μs 1.06 bench_itemselection.PutMask.time_dense(False, 'float16') + 139±1ns 147±6ns 1.06 bench_core.Core.time_array_1 + 33.7±0.01μs 35.5±2μs 1.05 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 4, 'f') + 69.4±0.1μs 73.1±0.2μs 1.05 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'logical_not'>, 4, 4, 'f') + 225±0.09μs 237±9μs 1.05 bench_random.Bounded.time_bounded('PCG64', [<class 'numpy.uint32'>, 2047]) - 15.7±0.5μs 14.9±0.03μs 0.95 bench_core.CountNonzero.time_count_nonzero_axis(2, 10000, <class 'numpy.int64'>) - 34.2±2μs 32.0±0.03μs 0.94 bench_ufunc_strides.Unary.time_ufunc(<ufunc '_ones_like'>, 4, 2, 'f') - 1.03±0.05ms 955±3μs 0.92 bench_lib.Nan.time_nanargmax(200000, 50.0) - 6.97±0.08μs 6.43±0.02μs 0.92 bench_ma.UFunc.time_scalar(True, False, 10) - 5.41±0μs 4.98±0.01μs 0.92 bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 2, 'F') - 22.4±0.01μs 20.6±0.02μs 0.92 bench_core.Core.time_array_float64_l1000 - 1.51±0.01ms 1.38±0ms 0.92 bench_core.CorrConv.time_correlate(1000, 10000, 'same') - 10.1±0.2μs 9.27±0.09μs 0.92 bench_ufunc.UFunc.time_ufunc_types('invert') - 8.50±0.02μs 7.80±0.09μs 0.92 bench_indexing.ScalarIndexing.time_assign_cast(1) - 29.5±0.2μs 26.6±0.03μs 0.90 bench_ma.Concatenate.time_it('masked', 100) - 2.09±0.02ms 1.87±0ms 0.90 bench_ma.UFunc.time_2d(True, True, 1000) - 298±10μs 267±0.3μs 0.89 bench_app.MaxesOfDots.time_it - 10.7±0.2μs 9.60±0.02μs 0.89 bench_ma.UFunc.time_1d(True, True, 100) - 567±3μs 505±2μs 0.89 bench_lib.Nan.time_nanargmax(200000, 90.0) - 342±0.9μs 282±5μs 0.83 bench_lib.Nan.time_nanargmax(200000, 2.0) - 307±0.7μs 244±0.8μs 0.80 bench_lib.Nan.time_nanargmax(200000, 0.1) - 309±1μs 241±0.1μs 0.78 bench_lib.Nan.time_nanargmax(200000, 0) ```
| * | | Remove extraneous .org filesDeveloper-Ecosystem-Engineering2021-10-182-1730/+0
| | | |
| * | | BUG: NEON min/max is slow (#17989)Developer-Ecosystem-Engineering2021-10-188-0/+2531
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes numpy/numpy#17989 by adding ARM NEON implementations for min/max and fmin/max. Before: Rosetta faster than native arm64 by `1.2x - 8.6x`. After: Native arm64 faster than Rosetta by `1.6x - 6.7x`. (2.8x - 15.5x improvement) **Benchmarks** ``` before after ratio [b0e1a445] [8301ffd7] <main> <gh-issue-17989/improve-neon-min-max> + 32.6±0.04μs 37.5±0.08μs 1.15 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 1, 'd') + 32.6±0.06μs 37.5±0.04μs 1.15 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 1, 'd') + 37.8±0.09μs 43.2±0.09μs 1.14 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 4, 'f') + 37.7±0.09μs 42.9±0.1μs 1.14 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 2, 'd') + 37.9±0.2μs 43.0±0.02μs 1.14 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 2, 'd') + 37.7±0.01μs 42.3±1μs 1.12 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'conjugate'>, 2, 2, 'd') + 34.2±0.07μs 38.1±0.05μs 1.12 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 2, 'f') + 32.6±0.03μs 35.8±0.04μs 1.10 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 1, 'f') + 37.1±0.1μs 40.3±0.1μs 1.09 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 2, 'd') + 37.2±0.1μs 40.3±0.04μs 1.08 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 4, 'f') + 37.1±0.09μs 40.3±0.07μs 1.08 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 2, 'd') + 68.6±0.5μs 74.2±0.3μs 1.08 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 4, 'd') + 37.1±0.2μs 40.0±0.1μs 1.08 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'conjugate'>, 1, 2, 'd') + 2.42±0μs 2.61±0.05μs 1.08 bench_core.CountNonzero.time_count_nonzero_axis(3, 100, <class 'numpy.int16'>) + 69.1±0.7μs 73.5±0.7μs 1.06 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'conjugate'>, 4, 4, 'd') + 54.7±0.3μs 58.0±0.2μs 1.06 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 4, 'd') + 54.5±0.2μs 57.8±0.2μs 1.06 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'conjugate'>, 2, 4, 'd') + 3.78±0.04μs 4.00±0.02μs 1.06 bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 100, <class 'str'>) + 54.8±0.2μs 57.9±0.3μs 1.06 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 4, 'd') + 3.68±0.01μs 3.87±0.02μs 1.05 bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 100, <class 'object'>) + 69.6±0.2μs 73.1±0.2μs 1.05 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 4, 'd') + 229±2μs 241±0.2μs 1.05 bench_random.Bounded.time_bounded('PCG64', [<class 'numpy.uint64'>, 1535]) - 73.0±0.8μs 69.5±0.2μs 0.95 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 4, 'd') - 37.6±0.1μs 35.7±0.3μs 0.95 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 4, 'f') - 88.7±0.04μs 84.2±0.7μs 0.95 bench_lib.Pad.time_pad((256, 128, 1), 1, 'wrap') - 57.9±0.2μs 54.8±0.2μs 0.95 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 4, 'd') - 39.9±0.2μs 37.2±0.04μs 0.93 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'positive'>, 1, 2, 'd') - 2.66±0.01μs 2.47±0.01μs 0.93 bench_lib.Nan.time_nanmin(200, 0) - 2.65±0.02μs 2.46±0.04μs 0.93 bench_lib.Nan.time_nanmin(200, 50.0) - 2.64±0.01μs 2.45±0.01μs 0.93 bench_lib.Nan.time_nanmax(200, 90.0) - 2.64±0μs 2.44±0.02μs 0.92 bench_lib.Nan.time_nanmax(200, 0) - 2.68±0.02μs 2.48±0μs 0.92 bench_lib.Nan.time_nanmax(200, 2.0) - 40.2±0.01μs 37.1±0.1μs 0.92 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 4, 'f') - 2.69±0μs 2.47±0μs 0.92 bench_lib.Nan.time_nanmin(200, 2.0) - 2.70±0.02μs 2.48±0.02μs 0.92 bench_lib.Nan.time_nanmax(200, 0.1) - 2.70±0μs 2.47±0μs 0.91 bench_lib.Nan.time_nanmin(200, 90.0) - 2.70±0μs 2.46±0μs 0.91 bench_lib.Nan.time_nanmin(200, 0.1) - 2.70±0μs 2.42±0.01μs 0.90 bench_lib.Nan.time_nanmax(200, 50.0) - 11.8±0.6ms 10.6±0.6ms 0.89 bench_core.CountNonzero.time_count_nonzero_axis(2, 1000000, <class 'str'>) - 42.7±0.1μs 37.8±0.02μs 0.88 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'positive'>, 2, 2, 'd') - 42.8±0.03μs 37.8±0.2μs 0.88 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 2, 'd') - 43.1±0.2μs 37.7±0.09μs 0.87 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 4, 'f') - 37.5±0.07μs 32.6±0.06μs 0.87 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 1, 'd') - 41.7±0.03μs 36.3±0.07μs 0.87 bench_ufunc_strides.Unary.time_ufunc(<ufunc '_ones_like'>, 1, 4, 'd') - 166±0.8μs 144±1μs 0.87 bench_ufunc.UFunc.time_ufunc_types('fmin') - 11.6±0.8ms 10.0±0.01ms 0.87 bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 1000000, <class 'str'>) - 167±0.9μs 144±2μs 0.86 bench_ufunc.UFunc.time_ufunc_types('minimum') - 168±4μs 143±0.5μs 0.85 bench_ufunc.UFunc.time_ufunc_types('fmax') - 167±1μs 142±0.8μs 0.85 bench_ufunc.UFunc.time_ufunc_types('maximum') - 7.10±0μs 4.97±0.01μs 0.70 bench_ufunc_strides.AVX_BFunc.time_ufunc('minimum', 'd', 2) - 7.11±0.07μs 4.96±0.01μs 0.70 bench_ufunc_strides.AVX_BFunc.time_ufunc('maximum', 'd', 2) - 7.05±0.07μs 4.68±0μs 0.66 bench_ufunc_strides.AVX_BFunc.time_ufunc('minimum', 'f', 4) - 7.13±0μs 4.68±0.01μs 0.66 bench_ufunc_strides.AVX_BFunc.time_ufunc('maximum', 'f', 4) - 461±0.2μs 297±7μs 0.64 bench_app.MaxesOfDots.time_it - 7.04±0.07μs 3.95±0μs 0.56 bench_ufunc_strides.AVX_BFunc.time_ufunc('maximum', 'f', 2) - 7.06±0.06μs 3.95±0.01μs 0.56 bench_ufunc_strides.AVX_BFunc.time_ufunc('minimum', 'f', 2) - 7.09±0.06μs 3.24±0μs 0.46 bench_ufunc_strides.AVX_BFunc.time_ufunc('minimum', 'd', 1) - 7.12±0.07μs 3.25±0.02μs 0.46 bench_ufunc_strides.AVX_BFunc.time_ufunc('maximum', 'd', 1) - 14.5±0.02μs 3.98±0μs 0.27 bench_reduce.MinMax.time_max(<class 'numpy.int64'>) - 14.6±0.1μs 4.00±0.01μs 0.27 bench_reduce.MinMax.time_min(<class 'numpy.int64'>) - 6.88±0.06μs 1.34±0μs 0.19 bench_ufunc_strides.AVX_BFunc.time_ufunc('maximum', 'f', 1) - 7.00±0μs 1.33±0μs 0.19 bench_ufunc_strides.AVX_BFunc.time_ufunc('minimum', 'f', 1) - 39.4±0.01μs 3.95±0.01μs 0.10 bench_reduce.MinMax.time_min(<class 'numpy.float64'>) - 39.4±0.01μs 3.95±0.02μs 0.10 bench_reduce.MinMax.time_max(<class 'numpy.float64'>) - 254±0.02μs 22.8±0.2μs 0.09 bench_lib.Nan.time_nanmax(200000, 50.0) - 253±0.1μs 22.7±0.1μs 0.09 bench_lib.Nan.time_nanmin(200000, 0) - 254±0.06μs 22.7±0.09μs 0.09 bench_lib.Nan.time_nanmin(200000, 2.0) - 254±0.01μs 22.7±0.03μs 0.09 bench_lib.Nan.time_nanmin(200000, 0.1) - 254±0.04μs 22.7±0.02μs 0.09 bench_lib.Nan.time_nanmin(200000, 50.0) - 253±0.1μs 22.7±0.04μs 0.09 bench_lib.Nan.time_nanmax(200000, 0.1) - 253±0.03μs 22.7±0.04μs 0.09 bench_lib.Nan.time_nanmin(200000, 90.0) - 253±0.02μs 22.7±0.07μs 0.09 bench_lib.Nan.time_nanmax(200000, 0) - 254±0.03μs 22.7±0.02μs 0.09 bench_lib.Nan.time_nanmax(200000, 90.0) - 254±0.09μs 22.7±0.04μs 0.09 bench_lib.Nan.time_nanmax(200000, 2.0) - 39.2±0.01μs 2.51±0.01μs 0.06 bench_reduce.MinMax.time_max(<class 'numpy.float32'>) - 39.2±0.01μs 2.50±0.01μs 0.06 bench_reduce.MinMax.time_min(<class 'numpy.float32'>) ``` Size change of _multiarray_umath.cpython-39-darwin.so: Before: 3,890,723 After: 3,924,035 Change: +33,312 (~ +0.856 %)
* | | | Merge pull request #20766 from mhvk/ndarray_array_finalizeSebastian Berg2022-01-108-30/+88
|\ \ \ \ | | | | | | | | | | ENH: Make ndarray.__array_finalize__ a callable no-op
| * | | | DEP: warn that __array_finalize__ = None is deprecatedMarten van Kerkwijk2022-01-093-9/+27
| | | | |
| * | | | MAINT: correct typing for ndarray.__array_finalize__Marten van Kerkwijk2022-01-091-1/+1
| | | | |
| * | | | MAINT: Speed up subtypes that do not override __array_finalize__Marten van Kerkwijk2022-01-082-5/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the process, __array_finalized__ is looked up on the subclass instead of the instance, which is more like python for methods like these. It cannot make a difference, since the instance is created in the same routine, so the instance method is guaranteed to be the same as that on the class.
| * | | | ENH: Let ndarray.__array_finalize__ be callable.Marten van Kerkwijk2022-01-085-25/+33
| | | | | | | | | | | | | | | | | | | | | | | | | This helps subclasses, who can now do super() in their own implementation.
* | | | | Merge pull request #20643 from bashtage/experiment-array-cons-checkSebastian Berg2022-01-101-3/+19
|\ \ \ \ \ | | | | | | | | | | | | PERF: Optimize array check for bounded 0,1 values
| * | | | | PERF: Optimize array check for bounded 0,1 valuesKevin Sheppard2021-12-221-3/+20
| | | | | | | | | | | | | | | | | | | | | | | | Optimize frequent check for probabilities when they are doubles
* | | | | | Merge pull request #20779 from gdementen/remove-duplicate-int-typeSebastian Berg2022-01-104-8/+7
|\ \ \ \ \ \ | |_|_|_|_|/ |/| | | | | MAINT: removed duplicate 'int' type in ScalarType
| * | | | | MAINT: removed duplicate 'int' type in ScalarTypeGaëtan de Menten2022-01-104-8/+7
| | |/ / / | |/| | |
* | | | | Method without self argument should be staticDimitri Papadopoulos2022-01-101-0/+1
|/ / / /
* | | | Merge pull request #20750 from BvB93/datetimeCharles Harris2022-01-063-36/+49
|\ \ \ \ | | | | | | | | | | TYP: Allow time manipulation functions to accept `data` and `timedelta` objects
| * | | | TYP: Allow time manipulation functions to accept `data` and `timedelta` objectsBas van Beek2022-01-063-36/+49
| | | | |
* | | | | Merge pull request #20754 from seberg/fixup-relax-dtype-identityCharles Harris2022-01-061-4/+12
|\ \ \ \ \ | | | | | | | | | | | | MAINT: Relax asserts to match relaxed reducelike resolution behaviour
| * | | | | MAINT: Relax asserts to match relaxed reducelike resolution behaviourSebastian Berg2022-01-061-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This closes gh-20751, which was due to the assert not being noticed triggered (not sure why) during initial CI run. The behaviour is relaxed, so the assert must also be relaxed.
* | | | | | BUG: Added check for NULL data in ufuncs (#20689)Joseph Fox-Rabinovitz2022-01-062-2/+2
|/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * BUG: Added check for NULL data in ufuncs * DOC: Made NULL refs more explicit * DOC: Added ..versionchanged:: tag
* | | | | Merge pull request #20722 from madphysicist/dtype-checking-1Matti Picus2022-01-065-52/+102
|\ \ \ \ \ | | | | | | | | | | | | ENH: Removed requirement for C-contiguity when changing to dtype of different size
| * | | | | ENH: Support for changing dtype in non-C-contiguous viewsJoseph R. Fox-Rabinovitz2022-01-055-52/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Expires deprecated F-contiguous behavior. Simplifies C code of dtype set descriptor. Adds tests that verify which error condition is triggered. Introduces extra long exception message that upsets linter.
* | | | | | Merge pull request #20678 from corneliusroemer/fix/remove-trailing-pointMatti Picus2022-01-062-3/+11
|\ \ \ \ \ \ | | | | | | | | | | | | | | BUG: Remove trailing dec point in dragon4positional
| * | | | | | MAINT: Refactor test to shorten line lengthCornelius Roemer2021-12-291-2/+1
| | | | | | |
| * | | | | | BUG: Add test for scalarprint bugfixCornelius Roemer2021-12-291-0/+2
| | | | | | |
| * | | | | | BUG: Remove trailing dec point in dragon4positionalCornelius Roemer2021-12-291-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | fixes #12441
* | | | | | | Merge pull request #20729 from seberg/relax-reduce-dtype-identity-checkMatti Picus2022-01-062-2/+17
|\ \ \ \ \ \ \ | |_|_|/ / / / |/| | | | | | BUG: Relax dtype identity check in reductions
| * | | | | | BUG: Relax dtype identity check in reductionsSebastian Berg2022-01-052-2/+17
| | |/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some cases, e.g. ensure-native-byte-order will return not the default, but a copy of the descriptor. This (and maybe metadata) makes it somewhat annoying to ensure exact identity between descriptors for reduce "operands" as returned by the resolve-descirptors method of the ArrayMethod. To avoid this problem, we check for no-casting (which implies viewable with `offset == 0`) rather than strict identity. Unfortunately, this means that descriptor resolution must be slightly more careful, but in general this should all be perfectly well defined. Closes gh-20699
* | | | | | Merge pull request #20695 from ahesford/x-the-avxMatti Picus2022-01-051-0/+7
|\ \ \ \ \ \ | | | | | | | | | | | | | | BLD: Add NPY_DISABLE_SVML env var to opt out of SVML
| * | | | | | BLD: Add NPY_DISABLE_SVML env var to opt out of SVMLAndrew J. Hesford2022-01-051-0/+7
| |/ / / / /
* | | | | | DOC: Document that dtype, strides, shape attributes should not be set (#20730)Sebastian Berg2022-01-051-0/+19
|/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * DOC: Document that dtype, strides, shape attributes should not be set This adds a `warning` directive and makes sure that `view` is mentioned. (assignment to data is already deprecated, so not indcluding it here.) * Update numpy/core/_add_newdocs.py Co-authored-by: Matti Picus <matti.picus@gmail.com> Co-authored-by: Matti Picus <matti.picus@gmail.com>
* | | | | Merge pull request #15006 from dcaliste/crackCharles Harris2022-01-043-4/+95
|\ \ \ \ \ | | | | | | | | | | | | ENH: add support for operator() in crackfortran.
| * | | | | ENH: add support for operator() in crackfortran.Damien Caliste2022-01-043-4/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some interface name may contains parenthesis when used with operator, like: interface operator(==) module procedure my_type_equals end interface operator(==) Make the end part properly detected, and store also the operator ('==' in that case) in the name. Also implement support to list the implemented by in any interface declaration.
* | | | | | Merge pull request #20720 from rgommers/arrayapi-annotation-fixesMatti Picus2022-01-041-2/+3
|\ \ \ \ \ \ | |/ / / / / |/| | | | | TYP: add a few type annotations to `numpy.array_api.Array`
| * | | | | TYP: accept review comment on ignoring NotImplemented in type checkingRalf Gommers2022-01-041-4/+1
| | | | | | | | | | | | | | | | | | Co-authored-by: Bas van Beek <43369155+BvB93@users.noreply.github.com>
| * | | | | TYP: add a few type annotations to `numpy.array_api.Array`Ralf Gommers2022-01-031-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the majority of the complaints for `$ mypy numpy/array_api`. The comment indicating that one fix is blocked by lack of support in Mypy for `NotImplemented` is responsible for another several dozen errors. [skip ci]
* | | | | | BUG: Fix array dimensions solver for multidimensional arguments in f2py (#20721)Pearu Peterson2022-01-042-4/+9
| | | | | | | | | | | | | | | | | | * BUG: Fix array dimensions solver for multidimensional arguments in f2py. See #20709