Merge pull request #13849 from rkern/doc/random-cleanups

DOC: np.random documentation cleanup and expansion.
author: Charles Harris <charlesr.harris@gmail.com> 2019-06-27 19:42:43 -0700
committer: GitHub <noreply@github.com> 2019-06-27 19:42:43 -0700
commit: 588310c7bdf75715955e7eefce4151c0569dc730 (patch)
tree: 6d7a7008c568a57b98f8250faa5ed7c31fd0a685 /doc/source/reference/random
parent: 1461f1da00fc34082d7a2693deb35c1990fd1b41 (diff)
parent: 49450483628d96d0bdf9caf18e627f05ab9a3431 (diff)
download: numpy-588310c7bdf75715955e7eefce4151c0569dc730.tar.gz
6 files changed, 289 insertions, 180 deletions
diff --git a/doc/source/reference/random/bit_generators/index.rst b/doc/source/reference/random/bit_generators/index.rst
index 4540f60d9..35d9e5d09 100644
--- a/doc/source/reference/random/bit_generators/index.rst
+++ b/doc/source/reference/random/bit_generators/index.rst
@@ -17,21 +17,23 @@ Supported BitGenerators
 
 The included BitGenerators are:
 
+* PCG-64 - The default. A fast generator that supports many parallel streams
+  and can be advanced by an arbitrary amount. See the documentation for
+  :meth:`~.PCG64.advance`. PCG-64 has a period of :math:`2^{128}`. See the `PCG
+  author's page`_ for more details about this class of PRNG.
 * MT19937 - The standard Python BitGenerator. Adds a `~mt19937.MT19937.jumped`
-  function that returns a new generator with state as-if ``2**128`` draws have
+  function that returns a new generator with state as-if :math:`2^{128}` draws have
   been made.
-* PCG-64 - Fast generator that support many parallel streams and
-  can be advanced by an arbitrary amount. See the documentation for
-  :meth:`~.PCG64.advance`. PCG-64 has a period of
-  :math:`2^{128}`. See the `PCG author's page`_ for more details about
-  this class of PRNG.
-* Philox - a counter-based generator capable of being advanced an
+* Philox - A counter-based generator capable of being advanced an
   arbitrary number of steps or generating independent streams. See the
   `Random123`_ page for more details about this class of bit generators.
+* SFC64 - A fast generator based on random invertible mappings. Usually the
+  fastest generator of the four. See the `SFC author's page`_ for (a little)
+  more detail.
 
 .. _`PCG author's page`: http://www.pcg-random.org/
 .. _`Random123`: https://www.deshawresearch.com/resources_random123.html
-
+.. _`SFC author's page`: http://pracrand.sourceforge.net/RNG_engines.txt
 
 .. toctree::
    :maxdepth: 1
@@ -46,26 +48,65 @@ Seeding and Entropy
 -------------------
 
 A BitGenerator provides a stream of random values. In order to generate
-reproducableis streams, BitGenerators support setting their initial state via a
-seed. But how best to seed the BitGenerator? On first impulse one would like to
-do something like ``[bg(i) for i in range(12)]`` to obtain 12 non-correlated,
-independent BitGenerators. However using a highly correlated set of seeds could
-generate BitGenerators that are correlated or overlap within a few samples.
-
-NumPy uses a `SeedSequence` class to mix the seed in a reproducible way that
-introduces the necessary entropy to produce independent and largely non-
-overlapping streams. Small seeds are unable to fill the complete range of
-initializaiton states, and lead to biases among an ensemble of small-seed
-runs. For many cases, that doesn't matter. If you just want to hold things in
-place while you debug something, biases aren't a concern.  For actual
-simulations whose results you care about, let ``SeedSequence(None)`` do its
-thing and then log/print the `SeedSequence.entropy` for repeatable
-`BitGenerator` streams.
+reproducible streams, BitGenerators support setting their initial state via a
+seed. All of the provided BitGenerators will take an arbitrary-sized
+non-negative integer, or a list of such integers, as a seed. BitGenerators
+need to take those inputs and process them into a high-quality internal state
+for the BitGenerator. All of the BitGenerators in numpy delegate that task to
+`~SeedSequence`, which uses hashing techniques to ensure that even low-quality
+seeds generate high-quality initial states.
+
+.. code-block:: python
+
+  from numpy.random import PCG64
+
+  bg = PCG64(12345678903141592653589793)
+
+.. end_block
+
+`~SeedSequence` is designed to be convenient for implementing best practices.
+We recommend that a stochastic program defaults to using entropy from the OS so
+that each run is different. The program should print out or log that entropy.
+In order to reproduce a past value, the program should allow the user to
+provide that value through some mechanism, a command-line argument is common,
+so that the user can then re-enter that entropy to reproduce the result.
+`~SeedSequence` can take care of everything except for communicating with the
+user, which is up to you.
+
+.. code-block:: python
+
+  from numpy.random import PCG64, SeedSequence
+
+  # Get the user's seed somehow, maybe through `argparse`.
+  # If the user did not provide a seed, it should return `None`.
+  seed = get_user_seed()
+  ss = SeedSequence(seed)
+  print('seed = {}'.format(ss.entropy))
+  bg = PCG64(ss)
+
+.. end_block
+
+We default to using a 128-bit integer using entropy gathered from the OS. This
+is a good amount of entropy to initialize all of the generators that we have in
+numpy. We do not recommend using small seeds below 32 bits for general use.
+Using just a small set of seeds to instantiate larger state spaces means that
+there are some initial states that are impossible to reach. This creates some
+biases if everyone uses such values.
+
+There will not be anything *wrong* with the results, per se; even a seed of
+0 is perfectly fine thanks to the processing that `~SeedSequence` does. If you
+just need *some* fixed value for unit tests or debugging, feel free to use
+whatever seed you like. But if you want to make inferences from the results or
+publish them, drawing from a larger set of seeds is good practice.
+
+If you need to generate a good seed "offline", then ``SeedSequence().entropy``
+or using ``secrets.randbits(128)`` from the standard library are both
+convenient ways.
 
 .. autosummary::
    :toctree: generated/
 
+    SeedSequence
     bit_generator.ISeedSequence
     bit_generator.ISpawnableSeedSequence
-    SeedSequence
     bit_generator.SeedlessSeedSequence
diff --git a/doc/source/reference/random/index.rst b/doc/source/reference/random/index.rst
index f32853e7c..b862788ea 100644
--- a/doc/source/reference/random/index.rst
+++ b/doc/source/reference/random/index.rst
@@ -1,18 +1,16 @@
 .. _numpyrandom:
 
+.. py:module:: numpy.random
+
 .. currentmodule:: numpy.random
 
-numpy.random
-============
+Random sampling (:mod:`numpy.random`)
+=====================================
 
 Numpy's random number routines produce pseudo random numbers using
 combinations of a `BitGenerator` to create sequences and a `Generator`
 to use those sequences to sample from different statistical distributions:
 
-* SeedSequence: Objects that provide entropy for the initial state of a
-  BitGenerator. A good SeedSequence will provide initializations across the
-  entire range of possible states for the BitGenerator, otherwise biases may
-  creep into the generated bit streams.
 * BitGenerators: Objects that generate random numbers. These are typically
   unsigned integer words filled with sequences of either 32 or 64 random bits.
 * Generators: Objects that transform sequences of random bits from a
@@ -24,18 +22,18 @@ Since Numpy version 1.17.0 the Generator can be initialized with a
 number of different BitGenerators. It exposes many different probability
 distributions. See `NEP 19 <https://www.numpy.org/neps/
 nep-0019-rng-policy.html>`_ for context on the updated random Numpy number
-routines. The legacy `RandomState` random number routines are still
+routines. The legacy `.RandomState` random number routines are still
 available, but limited to a single BitGenerator.
 
-For convenience and backward compatibility, a single `RandomState`
+For convenience and backward compatibility, a single `~.RandomState`
 instance's methods are imported into the numpy.random namespace, see
 :ref:`legacy` for the complete list.
 
 Quick Start
 -----------
 
-By default, `Generator` uses normals provided by `PCG64` which will be
-statistically more reliable than the legacy methods in `RandomState`
+By default, `~Generator` uses normals provided by `~pcg64.PCG64` which will be
+statistically more reliable than the legacy methods in `~.RandomState`
 
 .. code-block:: python
 
@@ -43,9 +41,9 @@ statistically more reliable than the legacy methods in `RandomState`
   from numpy import random
   random.standard_normal()
 
-`Generator` can be used as a direct replacement for `~RandomState`, although
-the random values are generated by `~PCG64`. The
-`Generator` holds an instance of a BitGenerator. It is accessible as
+`~Generator` can be used as a direct replacement for `~.RandomState`, although
+the random values are generated by `~.PCG64`. The
+`~Generator` holds an instance of a BitGenerator. It is accessible as
 ``gen.bit_generator``.
 
 .. code-block:: python
@@ -69,45 +67,37 @@ is wrapped with a `~.Generator`.
 
 Introduction
 ------------
-RandomGen takes a different approach to producing random numbers from the
-`RandomState` object.  Random number generation is separated into three
-components, a seed sequence, a bit generator and a random generator.
+The new infrastructure takes a different approach to producing random numbers
+from the `~.RandomState` object.  Random number generation is separated into
+two components, a bit generator and a random generator.
 
 The `BitGenerator` has a limited set of responsibilities. It manages state
 and provides functions to produce random doubles and random unsigned 32- and
 64-bit values.
 
-The `SeedSequence` takes a seed and provides the initial state for the
-`BitGenerator`. Since consecutive seeds can cause bad effects when comparing
-`BitGenerator` streams, the `SeedSequence` uses current best-practice methods
-to spread the initial state out. However small seeds may still be unable to
-reach all possible initialization states, which can cause biases among an
-ensemble of small-seed runs. For many cases, that doesn't matter. If you just
-want to hold things in place while you debug something, biases aren't a
-concern.  For actual simulations whose results you care about, let
-``SeedSequence(None)`` do its thing and then log/print the
-`SeedSequence.entropy` for repeatable `BitGenerator` streams.
-
 The `random generator <Generator>` takes the
 bit generator-provided stream and transforms them into more useful
 distributions, e.g., simulated normal random values. This structure allows
 alternative bit generators to be used with little code duplication.
 
 The `Generator` is the user-facing object that is nearly identical to
-`RandomState`. The canonical method to initialize a generator passes a
-`~mt19937.MT19937` bit generator, the underlying bit generator in Python -- as
-the sole argument. Note that the BitGenerator must be instantiated.
+`.RandomState`. The canonical method to initialize a generator passes a
+`~.PCG64` bit generator as the sole argument.
+
 .. code-block:: python
 
-  from numpy.random import Generator, PCG64
-  rg = Generator(PCG64())
+  from numpy.random import default_gen
+  rg = default_gen(12345)
   rg.random()
 
-Seed information is directly passed to the bit generator.
+One can also instantiate `Generator` directly with a `BitGenerator` instance.
+To use the older `~mt19937.MT19937` algorithm, one can instantiate it directly
+and pass it to `Generator`.
 
 .. code-block:: python
 
-  rg = Generator(PCG64(12345))
+  from numpy.random import Generator, MT19937
+  rg = Generator(MT19937(12345))
   rg.random()
 
 What's New or Different
@@ -117,9 +107,9 @@ What's New or Different
   The Box-Muller method used to produce NumPy's normals is no longer available
   in `Generator`.  It is not possible to reproduce the exact random
   values using Generator for the normal distribution or any other
-  distribution that relies on the normal such as the `numpy.random.gamma` or
-  `numpy.random.standard_t`. If you require bitwise backward compatible
-  streams, use `RandomState`.
+  distribution that relies on the normal such as the `.RandomState.gamma` or
+  `.RandomState.standard_t`. If you require bitwise backward compatible
+  streams, use `.RandomState`.
 
 * The Generator's normal, exponential and gamma functions use 256-step Ziggurat
   methods which are 2-10 times faster than NumPy's Box-Muller or inverse CDF
@@ -133,9 +123,8 @@ What's New or Different
   source of randomness that is used in cryptographic applications (e.g.,
   ``/dev/urandom`` on Unix).
 * All BitGenerators can produce doubles, uint64s and uint32s via CTypes
-  (`~PCG64.ctypes`) and CFFI
-  (:meth:`~PCG64.cffi`). This allows the bit generators to
-  be used in numba.
+  (`~.PCG64.ctypes`) and CFFI (`~.PCG64.cffi`). This allows the bit generators
+  to be used in numba.
 * The bit generators can be used in downstream projects via
   :ref:`Cython <randomgen_cython>`.
 * `~.Generator.integers` is now the canonical way to generate integer
@@ -144,8 +133,11 @@ What's New or Different
   The ``endpoint`` keyword can be used to specify open or closed intervals.
   This replaces both ``randint`` and the deprecated ``random_integers``.
 * `~.Generator.random` is now the canonical way to generate floating-point
-  random numbers, which replaces `random_sample`, `sample`, and `ranf`. This
-  is consistent with Python's `random.random`.
+  random numbers, which replaces `.RandomState.random_sample`,
+  `.RandomState.sample`, and `.RandomState.ranf`. This is consistent with
+  Python's `random.random`.
+* All BitGenerators in numpy use `~SeedSequence` to convert seeds into
+  initialized states.
 
 See :ref:`new-or-different` for a complete list of improvements and
 differences from the traditional ``Randomstate``.
@@ -154,10 +146,11 @@ Parallel Generation
 ~~~~~~~~~~~~~~~~~~~
 
 The included generators can be used in parallel, distributed applications in
-one of two ways:
+one of three ways:
 
+* :ref:`seedsequence-spawn`
 * :ref:`independent-streams`
-* :ref:`jump-and-advance`
+* :ref:`parallel-jumped`
 
 Concepts
 --------
diff --git a/doc/source/reference/random/legacy.rst b/doc/source/reference/random/legacy.rst
index d9391e9e2..04d4d3569 100644
--- a/doc/source/reference/random/legacy.rst
+++ b/doc/source/reference/random/legacy.rst
@@ -1,3 +1,5 @@
+.. currentmodule:: numpy.random
+
 .. _legacy:
 
 Legacy Random Generation
@@ -8,7 +10,7 @@ no further improvements.  It is guaranteed to produce the same values
 as the final point release of NumPy v1.16. These all depend on Box-Muller
 normals or inverse CDF exponentials or gammas. This class should only be used
 if it is essential to have randoms that are identical to what
-would have been produced by NumPy.
+would have been produced by previous versions of NumPy.
 
 `~mtrand.RandomState` adds additional information
 to the state which is required when using Box-Muller normals since these
@@ -16,38 +18,33 @@ are produced in pairs. It is important to use
 `~mtrand.RandomState.get_state`, and not the underlying bit generators
 `state`, when accessing the state so that these extra values are saved.
 
-.. warning::
-
-  :class:`~randomgen.legacy.LegacyGenerator` only contains functions
-  that have changed.  Since it does not contain other functions, it
-  is not directly possible to replace :class:`~numpy.random.RandomState`.
-  In order to full replace :class:`~numpy.random.RandomState`, it is
-  necessary to use both :class:`~randomgen.legacy.LegacyGenerator`
-  and :class:`~randomgen.generator.RandomGenerator` both driven
-  by the same basic RNG. Methods present in :class:`~randomgen.legacy.LegacyGenerator`
-  must be called from :class:`~randomgen.legacy.LegacyGenerator`.  Other Methods
-  should be called from :class:`~randomgen.generator.RandomGenerator`.
-
+Although we provide the `~mt19937.MT19937` BitGenerator for use independent of
+`~mtrand.RandomState`, note that its default seeding uses `~SeedSequence`
+rather than the legacy seeding algorithm. `~mtrand.RandomState` will use the
+legacy seeding algorithm. The methods to use the legacy seeding algorithm are
+currently private as the main reason to use them is just to implement
+`~mtrand.RandomState`. However, one can reset the state of `~mt19937.MT19937`
+using the state of the `~mtrand.RandomState`:
 
 .. code-block:: python
 
    from numpy.random import MT19937
    from numpy.random import RandomState
 
-   # Use same seed
    rs = RandomState(12345)
-   mt19937 = MT19937(12345)
-   lg = RandomState(mt19937)
+   mt19937 = MT19937()
+   mt19937.state = rs.get_state()
+   rs2 = RandomState(mt19937)
 
-   # Identical output
+   # Same output
    rs.standard_normal()
-   lg.standard_normal()
+   rs2.standard_normal()
 
    rs.random()
-   lg.random()
+   rs2.random()
 
    rs.standard_exponential()
-   lg.standard_exponential()
+   rs2.standard_exponential()
 
 
 .. currentmodule:: numpy.random.mtrand
diff --git a/doc/source/reference/random/multithreading.rst b/doc/source/reference/random/multithreading.rst
index 849d64d4e..6883d3672 100644
--- a/doc/source/reference/random/multithreading.rst
+++ b/doc/source/reference/random/multithreading.rst
@@ -1,9 +1,11 @@
 Multithreaded Generation
 ========================
 
-The four core distributions all allow existing arrays to be filled using the
-``out`` keyword argument.  Existing arrays need to be contiguous and
-well-behaved (writable and aligned).  Under normal circumstances, arrays
+The four core distributions (:meth:`~.Generator.random`,
+:meth:`~.Generator.standard_normal`, :meth:`~.Generator.standard_exponential`,
+and :meth:`~.Generator.standard_gamma`) all allow existing arrays to be filled
+using the ``out`` keyword argument. Existing arrays need to be contiguous and
+well-behaved (writable and aligned). Under normal circumstances, arrays
 created using the common constructors such as :meth:`numpy.empty` will satisfy
 these requirements.
 
diff --git a/doc/source/reference/random/parallel.rst b/doc/source/reference/random/parallel.rst
index 36e173ef2..18060defe 100644
--- a/doc/source/reference/random/parallel.rst
+++ b/doc/source/reference/random/parallel.rst
@@ -5,59 +5,147 @@ There are three strategies implemented that can be used to produce
 repeatable pseudo-random numbers across multiple processes (local
 or distributed).
 
-.. _independent-streams:
-
 .. currentmodule:: numpy.random
 
-Independent Streams
--------------------
+.. _seedsequence-spawn:
+
+`~SeedSequence` spawning
+------------------------
+
+`~SeedSequence` `implements an algorithm`_ to process a user-provided seed,
+typically as an integer of some size, and to convert it into an initial state for
+a `~BitGenerator`. It uses hashing techniques to ensure that low-quality seeds
+are turned into high quality initial states (at least, with very high
+probability).
+
+For example, `~mt19937.MT19937` has a state consisting of 624
+`uint32` integers. A naive way to take a 32-bit integer seed would be to just set
+the last element of the state to the 32-bit seed and leave the rest 0s. This is
+a valid state for `~mt19937.MT19937`, but not a good one. The Mersenne Twister
+algorithm `suffers if there are too many 0s`_. Similarly, two adjacent 32-bit
+integer seeds (i.e. ``12345`` and ``12346``) would produce very similar
+streams.
+
+`~SeedSequence` avoids these problems by using successions of integer hashes
+with good `avalanche properties`_ to ensure that flipping any bit in the input
+input has about a 50% chance of flipping any bit in the output. Two input seeds
+that are very close to each other will produce initial states that are very far
+from each other (with very high probability). It is also constructed in such
+a way that you can provide arbitrary-sized integers or lists of integers.
+`~SeedSequence` will take all of the bits that you provide and mix them
+together to produce however many bits the consuming `~BitGenerator` needs to
+initialize itself.
+
+These properties together mean that we can safely mix together the usual
+user-provided seed with simple incrementing counters to get `~BitGenerator`
+states that are (to very high probability) independent of each other. We can
+wrap this together into an API that is easy to use and difficult to misuse.
+
+.. code-block:: python
+
+  from numpy.random import SeedSequence, default_gen
 
-:class:`~pcg64.PCG64`, :class:`~threefry.ThreeFry`
-and :class:`~philox.Philox` support independent streams.  This
-example shows how many streams can be created by passing in different index
-values in the second input while using the same seed in the first.
+  ss = SeedSequence(12345)
+
+  # Spawn off 10 child SeedSequences to pass to child processes.
+  child_seeds = ss.spawn(10)
+  streams = [default_gen(s) for s in child_seeds]
+
+.. end_block
+
+Child `~SeedSequence` objects can also spawn to make grandchildren, and so on.
+Each `~SeedSequence` has its position in the tree of spawned `~SeedSequence`
+objects mixed in with the user-provided seed to generate independent (with very
+high probability) streams.
 
 .. code-block:: python
 
-  from numpy.random.entropy import random_entropy
-  from numpy.random import PCG64
+  grandchildren = child_seeds[0].spawn(4)
+  grand_streams = [default_gen(s) for s in grandchildren]
+
+.. end_block
+
+This feature lets you make local decisions about when and how to split up
+streams without coordination between processes. You do not have to preallocate
+space to avoid overlapping or request streams from a common global service. This
+general "tree-hashing" scheme is `not unique to numpy`_ but not yet widespread.
+Python has increasingly-flexible mechanisms for parallelization available, and
+this scheme fits in very well with that kind of use.
+
+Using this scheme, an upper bound on the probability of a collision can be
+estimated if one knows the number of streams that you derive. `~SeedSequence`
+hashes its inputs, both the seed and the spawn-tree-path, down to a 128-bit
+pool by default. The probability that there is a collision in
+that pool, pessimistically-estimated ([1]_), will be about :math:`n^2*2^{-128}` where
+`n` is the number of streams spawned. If a program uses an aggressive million
+streams, about :math:`2^{20}`, then the probability that at least one pair of
+them are identical is about :math:`2^{-88}`, which is in solidly-ignorable
+territory ([2]_).
+
+.. [1] The algorithm is carefully designed to eliminate a number of possible
+       ways to collide. For example, if one only does one level of spawning, it
+       is guaranteed that all states will be unique. But it's easier to
+       estimate the naive upper bound on a napkin and take comfort knowing
+       that the probability is actually lower.
+
+.. [2] In this calculation, we can ignore the amount of numbers drawn from each
+       stream. Each of the PRNGs we provide has some extra protection built in
+       that avoids overlaps if the `~SeedSequence` pools differ in the
+       slightest bit. `~pcg64.PCG64` has :math:`2^{127}` separate cycles
+       determined by the seed in addition to the position in the
+       :math:`2^{128}` long period for each cycle, so one has to both get on or
+       near the same cycle *and* seed a nearby position in the cycle.
+       `~philox.Philox` has completely independent cycles determined by the seed.
+       `~sfc64.SFC64` incorporates a 64-bit counter so every unique seed is at
+       least :math:`2^{64}` iterations away from any other seed. And
+       finally, `~mt19937.MT19937` has just an unimaginably huge period. Getting
+       a collision internal to `~SeedSequence` is the way a failure would be
+       observed.
+
+.. _`implements an algorithm`: http://www.pcg-random.org/posts/developing-a-seed_seq-alternative.html
+.. _`suffers if there are too many 0s`: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html
+.. _`avalanche properties`: https://en.wikipedia.org/wiki/Avalanche_effect
+.. _`not unique to numpy`: https://www.iro.umontreal.ca/~lecuyer/myftp/papers/parallel-rng-imacs.pdf
 
-  entropy = random_entropy(4)
-  # 128-bit number as a seed
-  seed = sum([int(entropy[i]) * 2 ** (32 * i) for i in range(4)])
-  streams = [PCG64(seed, stream) for stream in range(10)]
 
+.. _independent-streams:
+
+Independent Streams
+-------------------
 
-:class:`~philox.Philox` and :class:`~threefry.ThreeFry` are
-counter-based RNGs which use a counter and key.  Different keys can be used
-to produce independent streams.
+:class:`~philox.Philox` is a counter-based RNG based which generates values by
+encrypting an incrementing counter using weak cryptographic primitives. The
+seed determines the key that is used for the encryption. Unique keys create
+unique, independent streams. :class:`~philox.Philox` lets you bypass the
+seeding algorithm to directly set the 128-bit key. Similar, but different, keys
+will still create independent streams.
 
 .. code-block:: python
 
-  import numpy as np
-  from numpy.random import ThreeFry
+  import secrets
+  from numpy.random import Philox
 
-  key = random_entropy(8)
-  key = key.view(np.uint64)
-  key[0] = 0
-  step = np.zeros(4, dtype=np.uint64)
-  step[0] = 1
-  streams = [ThreeFry(key=key + stream * step) for stream in range(10)]
+  # 128-bit number as a seed
+  root_seed = secrets.getrandbits(128)
+  streams = [Philox(key=root_seed + stream_id) for stream_id in range(10)]
 
-.. _jump-and-advance:
+.. end_block
 
-Jump/Advance the BitGenerator state
------------------------------------
+This scheme does require that you avoid reusing stream IDs. This may require
+coordination between the parallel processes.
 
-Jumped
-******
+
+.. _parallel-jumped:
+
+Jumping the BitGenerator state
+------------------------------
 
 ``jumped`` advances the state of the BitGenerator *as-if* a large number of
 random numbers have been drawn, and returns a new instance with this state.
 The specific number of draws varies by BitGenerator, and ranges from
-:math:`2^{64}` to :math:`2^{512}`.  Additionally, the *as-if* draws also depend
+:math:`2^{64}` to :math:`2^{128}`.  Additionally, the *as-if* draws also depend
 on the size of the default random number produced by the specific BitGenerator.
-The BitGenerator that support ``jumped``, along with the period of the
+The BitGenerators that support ``jumped``, along with the period of the
 BitGenerator, the size of the jump and the bits in the default unsigned random
 are listed below.
 
@@ -66,70 +154,40 @@ are listed below.
 +=================+=========================+=========================+=========================+
 | MT19937         | :math:`2^{19937}`       | :math:`2^{128}`         | 32                      |
 +-----------------+-------------------------+-------------------------+-------------------------+
-| PCG64           | :math:`2^{128}`         | :math:`2^{64}`          | 64                      |
+| PCG64           | :math:`2^{128}`         | :math:`~2^{127}` ([3]_) | 64                      |
 +-----------------+-------------------------+-------------------------+-------------------------+
 | Philox          | :math:`2^{256}`         | :math:`2^{128}`         | 64                      |
 +-----------------+-------------------------+-------------------------+-------------------------+
-| ThreeFry        | :math:`2^{256}`         | :math:`2^{128}`         | 64                      |
-+-----------------+-------------------------+-------------------------+-------------------------+
+
+.. [3] The jump size is :math:`(\phi-1)*2^{128}` where :math:`\phi` is the
+       golden ratio. As the jumps wrap around the period, the actual distances
+       between neighboring streams will slowly grow smaller than the jump size,
+       but using the golden ratio this way is a classic method of constructing
+       a low-discrepancy sequence that spreads out the states around the period
+       optimally. You will not be able to jump enough to make those distances
+       small enough to overlap in your lifetime.
 
 ``jumped`` can be used to produce long blocks which should be long enough to not
 overlap.
 
 .. code-block:: python
 
-  from numpy.random.entropy import random_entropy
+  import secrets
   from numpy.random import PCG64
 
-  entropy = random_entropy(2).astype(np.uint64)
-  # 64-bit number as a seed
-  seed = entropy[0] * 2**32 + entropy[1]
+  seed = secrets.getrandbits(128)
   blocked_rng = []
   rng = PCG64(seed)
   for i in range(10):
       blocked_rng.append(rng.jumped(i))
 
-Advance
-*******
-``advance`` can be used to jump the state an arbitrary number of steps, and so
-is a more general approach than ``jumped``.  :class:`~pcg64.PCG64`,
-:class:`~threefry.ThreeFry` and :class:`~philox.Philox`
-support ``advance``, and since these also support
-independent streams, it is not usually necessary to use ``advance``.
-
-Advancing a BitGenerator updates the underlying state as-if a given number of
-calls to the BitGenerator have been made. In general there is not a
-one-to-one relationship between the number output random values from a
-particular distribution and the number of draws from the core BitGenerator.
-This occurs for two reasons:
-
-* The random values are simulated using a rejection-based method
-  and so more than one value from the underlying BitGenerator can be required
-  to generate an single draw.
-* The number of bits required to generate a simulated value differs from the
-  number of bits generated by the underlying BitGenerator. For example, two
-  16-bit integer values can be simulated from a single draw of a 32-bit value.
-
-Advancing the BitGenerator state resets any pre-computed random numbers. This
-is required to ensure exact reproducibility.
-
-This example uses ``advance`` to advance a :class:`~pcg64.PCG64`
-generator 2 ** 127 steps to set a sequence of random number generators.
-
-.. code-block:: python
-
-   from numpy.random import PCG64
-   bit_generator = PCG64()
-   bit_generator_copy = PCG64()
-   bit_generator_copy.state = bit_generator.state
-
-   advance = 2**127
-   bit_generators = [bit_generator]
-   for _ in range(9):
-       bit_generator_copy.advance(advance)
-       bit_generator = PCG64()
-       bit_generator.state = bit_generator_copy.state
-       bit_generators.append(bit_generator)
-
-.. end block
+.. end_block
 
+When using ``jumped``, one does have to take care not to jump to a stream that
+was already used. In the above example, one could not later use
+``blocked_rng[0].jumped()`` as it would overlap with ``blocked_rng[1]``. Like
+with the independent streams, if the main process here wants to split off 10
+more streams by jumping, then it needs to start with ``range(10, 20)``,
+otherwise it would recreate the same streams. On the other hand, if you
+carefully construct the streams, then you are guaranteed to have streams that
+do not overlap.
diff --git a/doc/source/reference/random/performance.rst b/doc/source/reference/random/performance.rst
index 9f59a78c7..4e6f2d9d7 100644
--- a/doc/source/reference/random/performance.rst
+++ b/doc/source/reference/random/performance.rst
@@ -1,13 +1,31 @@
 Performance
 -----------
 
-.. py:module:: numpy.random
-
 .. currentmodule:: numpy.random
 
 Recommendation
 **************
-The recommended generator for single use is :class:`~PCG64`.
+The recommended generator for general use is :class:`~pcg64.PCG64`. It is
+statistically high quality, full-featured, and fast on most platforms, but
+somewhat slow when compiled for 32-bit processes.
+
+:class:`~philox.Philox` is fairly slow, but its statistical properties have
+very high quality, and it is easy to get assuredly-independent stream by using
+unique keys. If that is the style you wish to use for parallel streams, or you
+are porting from another system that uses that style, then
+:class:`~philox.Philox` is your choice.
+
+:class:`~sfc64.SFC64` is statistically high quality and very fast. However, it
+lacks jumpability. If you are not using that capability and want lots of speed,
+even on 32-bit processes, this is your choice.
+
+:class:`~mt19937.MT19937` `fails some statistical tests`_ and is not especially
+fast compared to modern PRNGs. For these reasons, we mostly do not recommend
+using it on its own, only through the legacy `~.RandomState` for
+reproducing old results. That said, it has a very long history as a default in
+many systems.
+
+.. _`fails some statistical tests`: https://www.iro.umontreal.ca/~lecuyer/myftp/papers/testu01.pdf
 
 Timings
 *******
@@ -20,13 +38,13 @@ faster generators.
 Integer performance has a similar ordering.
 
 The pattern is similar for other, more complex generators. The normal
-performance of the legacy :class:`~mtrand.RandomState` generator is much
+performance of the legacy :class:`~.RandomState` generator is much
 lower than the other since it uses the Box-Muller transformation rather
 than the Ziggurat generator. The performance gap for Exponentials is also
 large due to the cost of computing the log function to invert the CDF.
 The column labeled MT19973 is used the same 32-bit generator as
-:class:`~mtrand.RandomState` but produces random values using
-:class:`~generator.Generator`.
+:class:`~.RandomState` but produces random values using
+:class:`~Generator`.
 
 .. csv-table::
     :header: ,MT19937,PCG64,Philox,SFC64,RandomState
@@ -43,7 +61,7 @@ The column labeled MT19973 is used the same 32-bit generator as
     Poissons,67.9,57.7,68.3,58.7,80.8
 
 The next table presents the performance in percentage relative to values
-generated by the legagy generator, `RandomState(MT19937())`. The overall
+generated by the legacy generator, `RandomState(MT19937())`. The overall
 performance was computed using a geometric mean.
 
 .. csv-table::
author	Charles Harris <charlesr.harris@gmail.com>	2019-06-27 19:42:43 -0700
committer	GitHub <noreply@github.com>	2019-06-27 19:42:43 -0700
commit	588310c7bdf75715955e7eefce4151c0569dc730 (patch)
tree	6d7a7008c568a57b98f8250faa5ed7c31fd0a685 /doc/source/reference/random
parent	1461f1da00fc34082d7a2693deb35c1990fd1b41 (diff)
parent	49450483628d96d0bdf9caf18e627f05ab9a3431 (diff)
download	numpy-588310c7bdf75715955e7eefce4151c0569dc730.tar.gz