summaryrefslogtreecommitdiff
path: root/doc/source/reference/random/index.rst
blob: 666a2d7d8995b2f7352b5afb9bdf14561c2bf102 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
.. _numpyrandom:

.. py:module:: numpy.random

.. currentmodule:: numpy.random

Random sampling (:mod:`numpy.random`)
=====================================

.. _random-quick-start:

Quick Start
-----------

The :mod:`numpy.random` module implements pseudo-random number generators
(PRNGs) with the ability to draw samples from a variety of probability
distributions. In general, users will create a `Generator` instance with
`default_rng` and call the various methods on it to obtain samples from
different distributions.

::

  >>> import numpy as np
  >>> rng = np.random.default_rng()
  # Generate one random float uniformly distributed over the range [0, 1)
  >>> rng.random()  #doctest: +SKIP
  0.06369197489564249  # may vary
  # Generate an array of 10 numbers according to a unit Gaussian distribution.
  >>> rng.standard_normal(10)  #doctest: +SKIP
  array([-0.31018314, -1.8922078 , -0.3628523 , -0.63526532,  0.43181166,  # may vary
          0.51640373,  1.25693945,  0.07779185,  0.84090247, -2.13406828])
  # Generate an array of 5 integers uniformly over the range [0, 10).
  >>> rng.integers(low=0, high=10, size=5)  #doctest: +SKIP
  array([8, 7, 6, 2, 0])  # may vary

PRNGs are deterministic sequences and can be reproduced by specifying a seed to
control its initial state. By default, with no seed, `default_rng` will create
the PRNG using nondeterministic data from the operating system and therefore
generate different numbers each time. The pseudorandom sequences will be
practically independent.

::

    >>> rng1 = np.random.default_rng()
    >>> rng1.random()  #doctest: +SKIP
    0.6596288841243357  # may vary
    >>> rng2 = np.random.default_rng()
    >>> rng2.random()  #doctest: +SKIP
    0.11885628817151628  # may vary

Seeds are usually large positive integers. `default_rng` can take positive
integers of any size. We recommend using very large, unique numbers to ensure
that your seed is different from anyone else's. This is good practice to ensure
that your results are statistically independent from theirs unless if you are
intentionally *trying* to reproduce their result. A convenient way to get
such seed number is to use :py:func:`secrets.randbits` to get an
arbitrary 128-bit integer.

::

    >>> import secrets
    >>> import numpy as np
    >>> secrets.randbits(128)  #doctest: +SKIP
    122807528840384100672342137672332424406  # may vary
    >>> rng1 = np.random.default_rng(122807528840384100672342137672332424406)
    >>> rng1.random()
    0.5363922081269535
    >>> rng2 = np.random.default_rng(122807528840384100672342137672332424406)
    >>> rng2.random()
    0.5363922081269535

See the documentation on `default_rng` and `SeedSequence` for more advanced
options for controlling the seed in specialized scenarios.

`Generator` and its associated infrastructure was introduced in Numpy version
1.17.0. There is still a lot of code that uses the older `RandomState` and the
functions in `numpy.random`. While there are no plans to remove them at this
time, we do recommend transitioning to `Generator` as you can. The algorithms
are faster, more flexible, and will receive more improvements in the future.
For the most part, `Generator` can be used as a replacement for `RandomState`.
See :ref:`legacy` for information on the legacy infrastructure,
:ref:`new-or-different` for information on transitioning, and :ref:`NEP 19
<NEP19>` for some of the reasoning for the transition.

Design
------

Users primarily interact with `Generator` instances. Each `Generator` instance
owns a `BitGenerator` instance that implements the core PRNG algorithm. The
`BitGenerator` has a limited set of responsibilities. It manages state and
provides functions to produce random doubles and random unsigned 32- and 64-bit
values.

The `Generator` takes the bit generator-provided stream and transforms them
into more useful distributions, e.g., simulated normal random values. This
structure allows alternative bit generators to be used with little code
duplication.

Numpy implements several different `BitGenerator` classes implementing
different PRNG algorithms. `default_rng` currently uses `~PCG64` as the
default `BitGenerator`. It has better statistical properties and performance
over the `~MT19937` algorithm used in the legacy `RandomState`. See
:ref:`random-bit-generators` for more details on the supported BitGenerators.

`default_rng` and BitGenerators delegate the conversion of seeds into PRNG
states to `SeedSequence` internally. `SeedSequence` implements a sophisticated
algorithm that intermediates between the user's input and the internal
implementation details of each `BitGenerator` algorithm, each of which can
require different amounts of bits for its state. Importantly, it lets you use
arbitrary-sized integers and arbitrary sequences of such integers to mix
together into the PRNG state. This is a useful primitive for constructing
a `flexible pattern for parallel PRNG streams <seedsequence-spawn>`.

For backward compatibility, we still maintain the legacy `RandomState` class.
It continues to use the `~MT19937` algorithm by default, and old seeds continue
to reproduce the same results. The convenience :ref:`functions-in-numpy-random`
are still aliases to the methods on a single global `RandomState` instance. See
:ref:`legacy` for the complete details.

Parallel Generation
~~~~~~~~~~~~~~~~~~~

The included generators can be used in parallel, distributed applications in
a number of ways:

* :ref:`seedsequence-spawn`
* :ref:`sequence-of-seeds`
* :ref:`independent-streams`
* :ref:`parallel-jumped`

Users with a very large amount of parallelism will want to consult
:ref:`upgrading-pcg64`.

What's New or Different
~~~~~~~~~~~~~~~~~~~~~~~
.. warning::

  The Box-Muller method used to produce NumPy's normals is no longer available
  in `Generator`.  It is not possible to reproduce the exact random
  values using Generator for the normal distribution or any other
  distribution that relies on the normal such as the `RandomState.gamma` or
  `RandomState.standard_t`. If you require bitwise backward compatible
  streams, use `RandomState`.

`Generator` can be used as a replacement for `RandomState`. Both class
instances hold an internal `BitGenerator` instance to provide the bit
stream, it is accessible as ``gen.bit_generator``. Some long-overdue API
cleanup means that legacy and compatibility methods have been removed from
`Generator`.

=================== ============== ============
`RandomState`       `Generator`    Notes
------------------- -------------- ------------
``random_sample``,  ``random``     Compatible with `random.random`
``rand``
------------------- -------------- ------------
``randint``,        ``integers``   Add an ``endpoint`` kwarg
``random_integers``
------------------- -------------- ------------
``tomaxint``        removed        Use ``integers(0, np.iinfo(np.int_).max,``
                                   ``endpoint=False)``
------------------- -------------- ------------
``seed``            removed        Use `SeedSequence.spawn`
=================== ============== ============

Something like the following code can be used to support both ``RandomState``
and ``Generator``, with the understanding that the interfaces are slightly
different.

.. code-block:: python

    try:
        rng_integers = rng.integers
    except AttributeError:
        rng_integers = rng.randint
    a = rng_integers(1000)

* The Generator's normal, exponential and gamma functions use 256-step Ziggurat
  methods which are 2-10 times faster than NumPy's Box-Muller or inverse CDF
  implementations.
* Optional ``dtype`` argument that accepts ``np.float32`` or ``np.float64``
  to produce either single or double precision uniform random variables for
  select distributions
* Optional ``out`` argument that allows existing arrays to be filled for
  select distributions
* All BitGenerators can produce doubles, uint64s and uint32s via CTypes
  (`PCG64.ctypes`) and CFFI (`PCG64.cffi`). This allows the bit generators
  to be used in numba.
* The bit generators can be used in downstream projects via
  :ref:`Cython <random_cython>`.
* `Generator.integers` is now the canonical way to generate integer
  random numbers from a discrete uniform distribution. The ``rand`` and
  ``randn`` methods are only available through the legacy `RandomState`.
  The ``endpoint`` keyword can be used to specify open or closed intervals.
  This replaces both ``randint`` and the deprecated ``random_integers``.
* `Generator.random` is now the canonical way to generate floating-point
  random numbers, which replaces `RandomState.random_sample`,
  `RandomState.sample`, and `RandomState.ranf`. This is consistent with
  Python's `random.random`.
* All BitGenerators in numpy use `SeedSequence` to convert seeds into
  initialized states.
* The addition of an ``axis`` keyword argument to methods such as 
  `Generator.choice`, `Generator.permutation`,  and `Generator.shuffle` 
  improves support for sampling from and shuffling multi-dimensional arrays.

See :ref:`new-or-different` for a complete list of improvements and
differences from the traditional ``Randomstate``.

.. _random-compatibility::

Compatibility Policy
--------------------

`numpy.random` has a somewhat stricter compatibility policy than the rest of
Numpy. Users of pseudorandomness often have use cases for being able to
reproduce runs in fine detail given the same seed (so-called "stream
compatibility"), and so we try to balance those needs with the flexibility to
enhance our algorithms. :ref:`NEP 19 <NEP19>` describes the evolution of this
policy.

The main kind of compatibility that we enforce is stream-compatibility from run
to run under certain conditions. If you create a `Generator` with the same
`BitGenerator`, with the same seed, perform the same sequence of method calls
with the same arguments, on the same build of ``numpy``, in the same
environment, on the same machine, you should get the same stream of numbers.
Note that these conditions are very strict. There are a number of factors
outside of Numpy's control that limit our ability to guarantee much more than
this. For example, different CPUs implement floating point arithmetic
differently, and this can cause differences in certain edge cases that cascade
to the rest of the stream. `Generator.multivariate_normal`, for another
example, uses a matrix decomposition from ``numpy.linalg``. Even on the same
platform, a different build of ``numpy`` may use a different version of this
matrix decomposition algorithm from the LAPACK that it links to, causing
`Generator.multivariate_normal` to return completely different (but equally
valid!) results. We strive to prefer algorithms that are more resistant to
these effects, but this is always imperfect.

.. note::

   Most of the `Generator` methods allow you to draw multiple values from
   a distribution as arrays. The requested size of this array is a parameter,
   for the purposes of the above policy. Calling ``rng.random()`` 5 times is
   not *guaranteed* to give the same numbers as ``rng.random(5)``. We reserve
   the ability to decide to use different algorithms for different-sized
   blocks. In practice, this happens rarely.

Like the rest of Numpy, we generally maintain API source
compatibility from version to version. If we *must* make an API-breaking
change, then we will only do so with an appropriate deprecation period and
warnings, according to :ref:`general Numpy policy <NEP23>`.

Breaking stream-compatibility in order to introduce new features or
improve performance in `Generator` or `default_rng` will be *allowed* with
*caution*. Such changes will be considered features, and as such will be no
faster than the standard release cadence of features (i.e. on ``X.Y`` releases,
never ``X.Y.Z``).  Slowness will not be considered a bug for this purpose.
Correctness bug fixes that break stream-compatibility can happen on bugfix
releases, per usual, but developers should consider if they can wait until the
next feature release. We encourage developers to strongly weight user’s pain
from the break in stream-compatibility against the improvements. One example
of a worthwhile improvement would be to change algorithms for a significant
increase in performance, for example, moving from the `Box-Muller transform
<https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform>`_ method of
Gaussian variate generation to the faster `Ziggurat algorithm
<https://en.wikipedia.org/wiki/Ziggurat_algorithm>`_. An example of
a discouraged improvement would be tweaking the Ziggurat tables just a little
bit for a small performance improvement.

.. note::

    In particular, `default_rng` is allowed to change the default
    `BitGenerator` that it uses (again, with *caution* and plenty of advance
    warning).

In general, `BitGenerator` classes have stronger guarantees of
version-to-version stream compatibility. This allows them to be a firmer
building block for downstream users that need it. Their limited API surface
makes them easier to maintain this compatibility from version to version. See
the docstrings of each `BitGenerator` class for their individual compatibility
guarantees.

The legacy `RandomState` and the `associated convenience functions
<random-functions-in-numpy-random>` have a stricter version-to-version
compatibility guarantee. For reasons outlined in :ref:`NEP 19 <NEP19>`, we had made
stronger promises about their version-to-version stability early in Numpy's
development. There are still some limited use cases for this kind of
compatibility (like generating data for tests), so we maintain as much
compatibility as we can. There will be no more modifications to `RandomState`,
not even to fix correctness bugs. There are a few gray areas where we can make
minor fixes to keep `RandomState` working without segfaulting as Numpy's
internals change, and some docstring fixes. However, the previously-mentioned
caveats about the variability from machine to machine and build to build still
apply to `RandomState` just as much as it does to `Generator`.

Concepts
--------
.. toctree::
   :maxdepth: 1

   generator
   Legacy Generator (RandomState) <legacy>
   BitGenerators, SeedSequences <bit_generators/index>
   Upgrading PCG64 with PCG64DXSM <upgrading-pcg64>

Features
--------
.. toctree::
   :maxdepth: 2

   Parallel Applications <parallel>
   Multithreaded Generation <multithreading>
   new-or-different
   Comparing Performance <performance>
   c-api
   Examples of using Numba, Cython, CFFI <extending>

Original Source of the Generator and BitGenerators
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This package was developed independently of NumPy and was integrated in version
1.17.0. The original repo is at https://github.com/bashtage/randomgen.