summaryrefslogtreecommitdiff
path: root/doc/release/1.9.0-notes.rst
blob: 2e457ad8471fde089406b752e5495787ebb926c3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
NumPy 1.9.0 Release Notes
*************************

This release supports  Python 2.6 -2.7 and 3.2 - 3.3.


Highlights
==========


Dropped Support
===============

* The oldnumeric and numarray modules have been removed.
* The doc/pyrex and doc/cython directories have been removed.

Future Changes
==============


Compatibility notes
===================

Special scalar float values don't cause upcast to double anymore
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In previous numpy versions operations involving floating point scalars
containing special values ``NaN``, ``Inf`` and ``-Inf`` caused the result
type to be at least ``float64``.  As the special values can be represented
in the smallest available floating point type, the upcast is not performed
anymore.

For example the dtype of:

    ``np.array([1.], dtype=np.float32) * float('nan')``

now remains ``float32`` instead of being cast to ``float64``.
Operations involving non-special values have not been changed.

Percentile output changes
~~~~~~~~~~~~~~~~~~~~~~~~~
If given more than one percentile to compute numpy.percentile returns an
array instead of a list. A single percentile still returns a scalar.  The
array is equivalent to converting the list returned in older versions
to an array via ``np.array``.

If the ``overwrite_input`` option is used the input is only partially
instead of fully sorted.

``npy_3kcompat.h`` header change
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The unused ``simple_capsule_dtor`` function has been removed from
``npy_3kcompat.h``.  Note that this header is not meant to be used outside
of numpy; other projects should be using their own copy of this file when
needed.

Negative indices in C-Api ``sq_item`` and ``sq_ass_item`` sequence methods
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When directly accessing the ``sq_item`` or ``sq_ass_item`` PyObject slots
for item getting, negative indices will not be supported anymore.
``PySequence_GetItem`` and ``PySequence_SetItem`` however fix negative
indices so that they can be used there.

ndarray.tofile exception type
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All ``tofile`` exceptions are now ``IOError``, some were previously
``ValueError``.


New Features
============

Percentile supports more interpolation options
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``np.percentile`` now has the interpolation keyword argument to specify in
which way points should be interpolated if the percentiles fall between two
values.  See the documentation for the available options.

Ufunc and Dot Overrides
~~~~~~~~~~~~~~~~~~~~~~~

For better compatibility with external objects you can now override
universal functions (ufuncs), ``numpy.core._dotblas.dot``, and
``numpy.core.multiarray.dot`` (the numpy.dot functions). By defining a
``__numpy_ufunc__`` method.

Dtype parameter added to ``np.linspace`` and ``np.logspace``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The returned data type from the ``linspace`` and ``logspace`` functions can
now be specified using the dtype parameter.

More general ``np.triu`` and ``np.tril`` broadcasting
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For arrays with ``ndim`` exceeding 2, these functions will now apply to the
final two axes instead of raising an exception.

``tobytes`` alias for ``tostring`` method
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``ndarray.tobytes`` and ``MaskedArray.tobytes`` have been added as aliases
for ``tostring`` which exports arrays as ``bytes``. This is more consistent
in Python 3 where ``str`` and ``bytes`` are not the same.


Improvements
============

Percentile implemented in terms of ``np.partition``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``np.percentile`` has been implemented in terms of ``np.partition`` which
only partially sorts the data via a selection algorithm. This improves the
time complexity from ``O(nlog(n))`` to ``O(n)``.

Performance improvement for ``np.array``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The performance of converting lists containing arrays to arrays using
``np.array`` has been improved. It is now equivalent in speed to
``np.vstack(list)``.

Performance improvement for ``np.searchsorted``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For the built-in numeric types, ``np.searchsorted`` no longer relies on the
data type's ``compare`` function to perform the search, but is now
implemented by type specific functions. Depending on the size of the
inputs, this can result in performance improvements over 2x.

Full broadcasting support for ``np.cross``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``np.cross`` now properly broadcasts its two input arrays, even if they
have different number of dimensions. In earlier versions this would result
in either an error being raised, or wrong results computed.

Changes
=======

Argmin and argmax out argument
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``out`` argument to ``np.argmin`` and ``np.argmax`` and their
equivalent C-API functions is now checked to match the desired output shape
exactly.  If the check fails a ``ValueError`` instead of ``TypeError`` is
raised.


Einsum
~~~~~~
Remove unnecessary broadcasting notation restrictions.
``np.einsum('ijk,j->ijk', A, B)`` can also be written as
``np.einsum('ij...,j->ij...', A, B)`` (ellipsis is no longer required on 'j')


Indexing
~~~~~~~~

The NumPy indexing has seen a complete rewrite in this version. This makes
most advanced integer indexing operations much faster and should have no
other implications.  However some subtle changes and deprecations were
introduced in advanced indexing operations:

* Boolean indexing into scalar arrays will always return a new 1-d array.
  This means that ``array(1)[array(True)]`` gives ``array([1])`` and
  not the original array.

* Advanced indexing into one dimensional arrays used to have
  (undocumented) special handling regarding repeating the value array in
  assignments when the shape of the value array was too small or did not
  match.  Code using this will raise an error. For compatibility you can
  use ``arr.flat[index] = values``, which uses the old code branch.  (for
  example ``a = np.ones(10); a[np.arange(10)] = [1, 2, 3]``)

* The iteration order over advanced indexes used to be always C-order.
  In NumPy 1.9. the iteration order adapts to the inputs and is not
  guaranteed (with the exception of a *single* advanced index which is
  never reversed for compatibility reasons). This means that the result
  is undefined if multiple values are assigned to the same element.  An
  example for this is ``arr[[0, 0], [1, 1]] = [1, 2]``, which may set
  ``arr[0, 1]`` to either 1 or 2.

* Equivalent to the iteration order, the memory layout of the advanced
  indexing result is adapted for faster indexing and cannot be predicted.

* All indexing operations return a view or a copy. No indexing operation
  will return the original array object. (For example ``arr[...]``)

* In the future Boolean array-likes (such as lists of python bools) will
  always be treated as Boolean indexes and Boolean scalars (including
  python ``True``) will be a legal *boolean* index. At this time, this is
  already the case for scalar arrays to allow the general
  ``positive = a[a > 0]`` to work when ``a`` is zero dimensional.

* In NumPy 1.8 it was possible to use ``array(True)`` and
  ``array(False)`` equivalent to 1 and 0 if the result of the operation
  was a scalar.  This will raise an error in NumPy 1.9 and, as noted
  above, treated as a boolean index in the future.

* All non-integer array-likes are deprecated, object arrays of custom
  integer like objects may have to be cast explicitly.

* The error reporting for advanced indexing is more informative, however
  the error type has changed in some cases. (Broadcasting errors of
  indexing arrays are reported as ``IndexError``)

* Indexing with more then one ellipsis (``...``) is deprecated.


``promote_types`` and string dtype
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``promote_types`` function now returns a valid string length when given an
integer or float dtype as one argument and a string dtype as another
argument.  Previously it always returned the input string dtype, even if it
wasn't long enough to store the max integer/float value converted to a
string.


``can_cast`` and string dtype
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``can_cast`` function now returns False in "safe" casting mode for
integer/float dtype and string dtype if the string dtype length is not long
enough to store the max integer/float value converted to a string.
Previously ``can_cast`` in "safe" mode returned True for integer/float
dtype and a string dtype of any length.


astype and string dtype
~~~~~~~~~~~~~~~~~~~~~~~
The ``astype`` method now returns an error if the string dtype to cast to
is not long enough in "safe" casting mode to hold the max value of
integer/float array that is being casted. Previously the casting was
allowed even if the result was truncated.


NDIter
~~~~~~
When ``NpyIter_RemoveAxis`` is now called, the iterator range will be reset.

When a multi index is being tracked and an iterator is not buffered, it is
possible to use ``NpyIter_RemoveAxis``. In this case an iterator can shrink
in size. Because the total size of an iterator is limited, the iterator
may be too large before these calls. In this case its size will be set to ``-1``
and an error issued not at construction time but when removing the multi
index, setting the iterator range, or getting the next function.

This has no effect on currently working code, but highlights the necessity
of checking for an error return if these conditions can occur. In most
cases the arrays being iterated are as large as the iterator so that such
a problem cannot occur.


C-API
~~~~~


Deprecations
============

Non-integer scalars for sequence repetition
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Using non-integer numpy scalars to repeat python sequences is deprecated.
For example ``np.float_(2) * [1]`` will be an error in the future.

C-API
~~~~~

The utility function npy_PyFile_Dup and npy_PyFile_DupClose are broken by the
internal buffering python 3 applies to its file objects.
To fix this two new functions npy_PyFile_Dup2 and npy_PyFile_DupClose2 are
declared in npy_3kcompat.h and the old functions are deprecated.
Due to the fragile nature of these functions it is recommended to instead use
the python API when possible.


New Features
============