<feed xmlns='http://www.w3.org/2005/Atom'>
<title>delta/python-packages/numpy.git/numpy/core/src/umath, branch main</title>
<subtitle>github.com: numpy/numpy.git
</subtitle>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/python-packages/numpy.git/'/>
<entry>
<title>Merge pull request #23763 from seberg/nep50-fixes-part2</title>
<updated>2023-05-16T19:00:16+00:00</updated>
<author>
<name>Charles Harris</name>
<email>charlesr.harris@gmail.com</email>
</author>
<published>2023-05-16T19:00:16+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/python-packages/numpy.git/commit/?id=0200e4a00c6ea90ab433962479f47a927a13ed3e'/>
<id>0200e4a00c6ea90ab433962479f47a927a13ed3e</id>
<content type='text'>
BUG: Fix weak scalar logic for large ints in ufuncs</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
BUG: Fix weak scalar logic for large ints in ufuncs</pre>
</div>
</content>
</entry>
<entry>
<title>MAINT: fix signed/unsigned int comparison warnings</title>
<updated>2023-05-15T16:02:51+00:00</updated>
<author>
<name>Nathan Goldbaum</name>
<email>nathan.goldbaum@gmail.com</email>
</author>
<published>2023-05-15T16:02:51+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/python-packages/numpy.git/commit/?id=7a6a867cfb785eca60c6ee0cbd7d7816ece510bd'/>
<id>7a6a867cfb785eca60c6ee0cbd7d7816ece510bd</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>BUG: Fix weak scalar logic for large ints in ufuncs</title>
<updated>2023-05-15T10:43:30+00:00</updated>
<author>
<name>Sebastian Berg</name>
<email>sebastianb@nvidia.com</email>
</author>
<published>2023-05-10T13:11:38+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/python-packages/numpy.git/commit/?id=21602a8b1673a7b468d032ef19c20c53ac15c0b9'/>
<id>21602a8b1673a7b468d032ef19c20c53ac15c0b9</id>
<content type='text'>
This fixes it, breaks warnings (partially), but most or all of
those paths should be errors anyway.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This fixes it, breaks warnings (partially), but most or all of
those paths should be errors anyway.
</pre>
</div>
</content>
</entry>
<entry>
<title>ENH: Make signed/unsigned integer comparisons exact</title>
<updated>2023-05-04T14:33:27+00:00</updated>
<author>
<name>Sebastian Berg</name>
<email>sebastianb@nvidia.com</email>
</author>
<published>2023-05-04T14:33:27+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/python-packages/numpy.git/commit/?id=ec8d5db302c0e8597feb058f58863d5e9a6554c1'/>
<id>ec8d5db302c0e8597feb058f58863d5e9a6554c1</id>
<content type='text'>
This makes comparisons between signed and unsigned integers exact
by special-casing promotion in comparison to never promote integers
to floats, but rather promote them to uint64 or int64 and use a
specific loop for that purpose.

This is a bit lazy, it doesn't make the scalar paths fast (they never were
though) nor does it try to vectorize the loop.
Thus, for cases that are not int64/uint64 already and require a cast in
either case, it should be a bit slower.  OTOH, it was never really fast
and the int64/uint64 mix is probably faster since it avoids casting.

---

Now... the reason I was looking into this was, that I had hoped
it would help with NEP 50/weak scalar typing to allow:

    uint64(1) &lt; -1  # annoying that it fails with NEP 50

but, it doesn't actually, because if I use int64 for the -1 then very
large numbers would be a problem...
I could probably(?) add a *specific* "Python integer" ArrayMethod for comparisons
and that could pick `object` dtype and thus get the original Python object
(the loop could then in practice assume a scalar value).

---

In either case, this works, and unless we worry about keeping the behavior
we probably might as well do this.
(Potentially with follow-ups to speed it up.)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This makes comparisons between signed and unsigned integers exact
by special-casing promotion in comparison to never promote integers
to floats, but rather promote them to uint64 or int64 and use a
specific loop for that purpose.

This is a bit lazy, it doesn't make the scalar paths fast (they never were
though) nor does it try to vectorize the loop.
Thus, for cases that are not int64/uint64 already and require a cast in
either case, it should be a bit slower.  OTOH, it was never really fast
and the int64/uint64 mix is probably faster since it avoids casting.

---

Now... the reason I was looking into this was, that I had hoped
it would help with NEP 50/weak scalar typing to allow:

    uint64(1) &lt; -1  # annoying that it fails with NEP 50

but, it doesn't actually, because if I use int64 for the -1 then very
large numbers would be a problem...
I could probably(?) add a *specific* "Python integer" ArrayMethod for comparisons
and that could pick `object` dtype and thus get the original Python object
(the loop could then in practice assume a scalar value).

---

In either case, this works, and unless we worry about keeping the behavior
we probably might as well do this.
(Potentially with follow-ups to speed it up.)
</pre>
</div>
</content>
</entry>
<entry>
<title>BUG: Correct sin/cos float64 range check functions</title>
<updated>2023-05-02T14:48:30+00:00</updated>
<author>
<name>Chris Sidebottom</name>
<email>chris.sidebottom@arm.com</email>
</author>
<published>2023-05-02T09:03:04+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/python-packages/numpy.git/commit/?id=c43ae85bdd44174c0f2867adc85fe94cbc626873'/>
<id>c43ae85bdd44174c0f2867adc85fe94cbc626873</id>
<content type='text'>
When I translated range checks for [sin](https://github.com/ARM-software/optimized-routines/blob/91d5bbc3091fa568e6856c7c41f9d7492d5957df/math/v_sin.c#L68):

```c
cmp = v_cond_u64 ((ir &gt;&gt; 52) - TinyBound &gt;= Thresh);
```

and [cos](https://github.com/ARM-software/optimized-routines/blob/91d5bbc3091fa568e6856c7c41f9d7492d5957df/math/v_cos.c#L56):

```c
cmp = v_cond_u64 (v_as_u64_f64 (r) &gt;= v_as_u64_f64 (RangeVal));
```

They ended up the wrong way around, this corrects it.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When I translated range checks for [sin](https://github.com/ARM-software/optimized-routines/blob/91d5bbc3091fa568e6856c7c41f9d7492d5957df/math/v_sin.c#L68):

```c
cmp = v_cond_u64 ((ir &gt;&gt; 52) - TinyBound &gt;= Thresh);
```

and [cos](https://github.com/ARM-software/optimized-routines/blob/91d5bbc3091fa568e6856c7c41f9d7492d5957df/math/v_cos.c#L56):

```c
cmp = v_cond_u64 (v_as_u64_f64 (r) &gt;= v_as_u64_f64 (RangeVal));
```

They ended up the wrong way around, this corrects it.
</pre>
</div>
</content>
</entry>
<entry>
<title>MAINT: Fixup handling of subarray dtype in ufunc.resolve_dtypes</title>
<updated>2023-04-26T12:57:52+00:00</updated>
<author>
<name>Sebastian Berg</name>
<email>sebastianb@nvidia.com</email>
</author>
<published>2023-04-26T12:26:59+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/python-packages/numpy.git/commit/?id=ca3df13ea111d08ac1b365040b45c13117510ded'/>
<id>ca3df13ea111d08ac1b365040b45c13117510ded</id>
<content type='text'>
This is now OK to just support, we won't replace things and things
should work out for the most part (probably).
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is now OK to just support, we won't replace things and things
should work out for the most part (probably).
</pre>
</div>
</content>
</entry>
<entry>
<title>MAINT: Refactor internal array creation to also allow dtype preservation</title>
<updated>2023-04-26T12:57:52+00:00</updated>
<author>
<name>Sebastian Berg</name>
<email>sebastianb@nvidia.com</email>
</author>
<published>2023-04-26T12:04:03+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/python-packages/numpy.git/commit/?id=9b62c3859f11094b664546e2f4a0fc92ed5c493c'/>
<id>9b62c3859f11094b664546e2f4a0fc92ed5c493c</id>
<content type='text'>
In some cases we know that we want to use the *exact* dtype that we already
have (mainly when taking views).  This is also useful internally because there
are very rare code-paths were we even create temporary arrays that contain
subarray dtypes.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In some cases we know that we want to use the *exact* dtype that we already
have (mainly when taking views).  This is also useful internally because there
are very rare code-paths were we even create temporary arrays that contain
subarray dtypes.
</pre>
</div>
</content>
</entry>
<entry>
<title>ENH: float64 sin/cos using Numpy intrinsics (#23399)</title>
<updated>2023-04-25T16:56:37+00:00</updated>
<author>
<name>Christopher Sidebottom</name>
<email>chris.sidebottom@arm.com</email>
</author>
<published>2023-04-25T16:56:37+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/python-packages/numpy.git/commit/?id=fe5472fa4eae131ff9646d7c980c6c4081c10386'/>
<id>fe5472fa4eae131ff9646d7c980c6c4081c10386</id>
<content type='text'>
This takes the [sin](https://github.com/ARM-software/optimized-routines/blob/master/math/v_sin.c) and [cos](https://github.com/ARM-software/optimized-routines/blob/master/math/v_cos.c) algorithms from Optimized Routines under MIT license, and converts them to Numpy intrinsics.

The routines are within the ULP boundaries of other vectorised math routines (&lt;4ULP). The routines reduce performance in some special cases but improves normal cases. Comparing to the SVML implementation, these routines are more performant in special cases, we're therefore safe to assume the performance is acceptable for AArch64 as well.

| performance ratio (lower is better)  | benchmark |
| ----  | ---- |
| 1.8   | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'cos'&gt;	4	2	'd') |
| 1.79  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'cos'&gt;	4	4	'd') |
| 1.77  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'cos'&gt;	4	1	'd') |
| 1.74  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'cos'&gt;	2	2	'd') |
| 1.74  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'cos'&gt;	2	4	'd') |
| 1.72  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'cos'&gt;	2	1	'd') |
| 1.6   | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'cos'&gt;	1	2	'd') |
| 1.6   | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'cos'&gt;	1	4	'd') |
| 1.56  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'cos'&gt;	1	1	'd') |
| 1.42  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'sin'&gt;	2	2	'd') |
| 1.41  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'sin'&gt;	2	4	'd') |
| 1.37  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'sin'&gt;	2	1	'd') |
| 1.26  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'sin'&gt;	4	2	'd') |
| 1.26  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'sin'&gt;	4	4	'd') |
| 1.2   | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'sin'&gt;	4	1	'd') |
| 1.18  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'sin'&gt;	1	2	'd') |
| 1.18  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'sin'&gt;	1	4	'd') |
| 1.12  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'sin'&gt;	1	1	'd') |
| 0.65  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'cos'&gt;	4	2	'd') |
| 0.64  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'cos'&gt;	2	4	'd') |
| 0.64  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'cos'&gt;	4	4	'd') |
| 0.64  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'cos'&gt;	2	2	'd') |
| 0.61  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'cos'&gt;	1	4	'd') |
| 0.61  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'cos'&gt;	1	2	'd') |
| 0.6   | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'cos'&gt;	2	1	'd') |
| 0.6   | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'cos'&gt;	4	1	'd') |
| 0.56  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'cos'&gt;	1	1	'd') |
| 0.52  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'sin'&gt;	4	2	'd') |
| 0.52  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'sin'&gt;	4	4	'd') |
| 0.52  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'sin'&gt;	2	4	'd') |
| 0.52  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'sin'&gt;	2	2	'd') |
| 0.47  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'sin'&gt;	1	4	'd') |
| 0.47  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'sin'&gt;	1	2	'd') |
| 0.46  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'sin'&gt;	4	1	'd') |
| 0.46  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'sin'&gt;	2	1	'd') |
| 0.42  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'sin'&gt;	1	1	'd') |

Co-authored-by: Pierre Blanchard &lt;Pierre.Blanchard@arm.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This takes the [sin](https://github.com/ARM-software/optimized-routines/blob/master/math/v_sin.c) and [cos](https://github.com/ARM-software/optimized-routines/blob/master/math/v_cos.c) algorithms from Optimized Routines under MIT license, and converts them to Numpy intrinsics.

The routines are within the ULP boundaries of other vectorised math routines (&lt;4ULP). The routines reduce performance in some special cases but improves normal cases. Comparing to the SVML implementation, these routines are more performant in special cases, we're therefore safe to assume the performance is acceptable for AArch64 as well.

| performance ratio (lower is better)  | benchmark |
| ----  | ---- |
| 1.8   | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'cos'&gt;	4	2	'd') |
| 1.79  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'cos'&gt;	4	4	'd') |
| 1.77  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'cos'&gt;	4	1	'd') |
| 1.74  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'cos'&gt;	2	2	'd') |
| 1.74  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'cos'&gt;	2	4	'd') |
| 1.72  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'cos'&gt;	2	1	'd') |
| 1.6   | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'cos'&gt;	1	2	'd') |
| 1.6   | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'cos'&gt;	1	4	'd') |
| 1.56  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'cos'&gt;	1	1	'd') |
| 1.42  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'sin'&gt;	2	2	'd') |
| 1.41  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'sin'&gt;	2	4	'd') |
| 1.37  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'sin'&gt;	2	1	'd') |
| 1.26  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'sin'&gt;	4	2	'd') |
| 1.26  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'sin'&gt;	4	4	'd') |
| 1.2   | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'sin'&gt;	4	1	'd') |
| 1.18  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'sin'&gt;	1	2	'd') |
| 1.18  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'sin'&gt;	1	4	'd') |
| 1.12  | bench_ufunc_strides.UnaryFPSpecial.time_unary(&lt;ufunc	'sin'&gt;	1	1	'd') |
| 0.65  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'cos'&gt;	4	2	'd') |
| 0.64  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'cos'&gt;	2	4	'd') |
| 0.64  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'cos'&gt;	4	4	'd') |
| 0.64  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'cos'&gt;	2	2	'd') |
| 0.61  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'cos'&gt;	1	4	'd') |
| 0.61  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'cos'&gt;	1	2	'd') |
| 0.6   | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'cos'&gt;	2	1	'd') |
| 0.6   | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'cos'&gt;	4	1	'd') |
| 0.56  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'cos'&gt;	1	1	'd') |
| 0.52  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'sin'&gt;	4	2	'd') |
| 0.52  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'sin'&gt;	4	4	'd') |
| 0.52  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'sin'&gt;	2	4	'd') |
| 0.52  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'sin'&gt;	2	2	'd') |
| 0.47  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'sin'&gt;	1	4	'd') |
| 0.47  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'sin'&gt;	1	2	'd') |
| 0.46  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'sin'&gt;	4	1	'd') |
| 0.46  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'sin'&gt;	2	1	'd') |
| 0.42  | bench_ufunc_strides.UnaryFP.time_unary(&lt;ufunc	'sin'&gt;	1	1	'd') |

Co-authored-by: Pierre Blanchard &lt;Pierre.Blanchard@arm.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>BUG: in the fastest path form ufunc.at, properly increment args[2]</title>
<updated>2023-03-27T06:55:35+00:00</updated>
<author>
<name>mattip</name>
<email>matti.picus@gmail.com</email>
</author>
<published>2023-03-27T06:55:35+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/python-packages/numpy.git/commit/?id=47df67bb6c83d68c3254b69a44426b02d3f24caa'/>
<id>47df67bb6c83d68c3254b69a44426b02d3f24caa</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>ENH: Modified `PyUFunc_CheckOverride` to allow the `where` argument to override `__array_ufunc__`.</title>
<updated>2023-02-24T08:21:55+00:00</updated>
<author>
<name>Roy Smart</name>
<email>roytsmart@gmail.com</email>
</author>
<published>2023-02-17T22:29:41+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/python-packages/numpy.git/commit/?id=f3f108d313a8b8a4f7a90fb932867f17dc48b1f6'/>
<id>f3f108d313a8b8a4f7a90fb932867f17dc48b1f6</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
</feed>
