summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRobert Kern <robert.kern@gmail.com>2018-06-16 00:22:56 -0700
committerRobert Kern <robert.kern@gmail.com>2018-06-16 00:23:26 -0700
commit0dbc462726db740777891b1822a1665f9aeab7b9 (patch)
treef16d4b68bef89596f9383343bd4f998125f717cb
parent263339b5db580cc0a83985dec22437696b8d6d17 (diff)
downloadnumpy-0dbc462726db740777891b1822a1665f9aeab7b9.tar.gz
NEP: Update NEP 19 with discussion results
-rw-r--r--doc/neps/nep-0019-rng-policy.rst222
1 files changed, 159 insertions, 63 deletions
diff --git a/doc/neps/nep-0019-rng-policy.rst b/doc/neps/nep-0019-rng-policy.rst
index 7080a99e3..d1cd2719e 100644
--- a/doc/neps/nep-0019-rng-policy.rst
+++ b/doc/neps/nep-0019-rng-policy.rst
@@ -91,27 +91,12 @@ those contributors simply walked away.
Implementation
--------------
-We propose first freezing ``RandomState`` as it is and developing a new RNG
-subsystem alongside it. This allows anyone who has been relying on our old
-stream-compatibility guarantee to have plenty of time to migrate.
-``RandomState`` will be considered deprecated, but with a long deprecation
-cycle, at least a few years. Deprecation warnings will start silent but become
-increasingly noisy over time. Bugs in the current state of the code will *not*
-be fixed if fixing them would impact the stream. However, if changes in the
-rest of ``numpy`` would break something in the ``RandomState`` code, we will
-fix ``RandomState`` to continue working (for example, some change in the
-C API). No new features will be added to ``RandomState``. Users should
-migrate to the new subsystem as they are able to.
-
-Work on a proposed `new PRNG subsystem
-<https://github.com/bashtage/randomgen>`_ is already underway. The specifics
-of the new design are out of scope for this NEP and up for much discussion, but
-we will discuss general policies that will guide the evolution of whatever code
-is adopted.
-
-
-New Policy
-::::::::::
+Work on a proposed new PRNG subsystem is already underway in the randomgen_
+project. The specifics of the new design are out of scope for this NEP and up
+for much discussion, but we will discuss general policies that will guide the
+evolution of whatever code is adopted. We will also outline just a few of the
+requirements that such a new system must have to support the policy proposed in
+this NEP.
First, we will maintain API source compatibility just as we do with the rest of
``numpy``. If we *must* make a breaking change, we will only do so with an
@@ -120,59 +105,132 @@ appropriate deprecation period and warnings.
Second, breaking stream-compatibility in order to introduce new features or
improve performance will be *allowed* with *caution*. Such changes will be
considered features, and as such will be no faster than the standard release
-cadence of features (i.e. on ``X.Y`` releases, never ``X.Y.Z``). Slowness is
-not a bug. Correctness bug fixes that break stream-compatibility can happen on
-bugfix releases, per usual, but developers should consider if they can wait
-until the next feature release. We encourage developers to strongly weight
-user’s pain from the break in stream-compatibility against the improvements.
-One example of a worthwhile improvement would be to change algorithms for
-a significant increase in performance, for example, moving from the `Box-Muller
-transform <https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform>`_ method
-of Gaussian variate generation to the faster `Ziggurat algorithm
-<https://en.wikipedia.org/wiki/Ziggurat_algorithm>`_. An example of an
-unworthy improvement would be tweaking the Ziggurat tables just a little bit.
+cadence of features (i.e. on ``X.Y`` releases, never ``X.Y.Z``). Slowness will
+not be considered a bug for this purpose. Correctness bug fixes that break
+stream-compatibility can happen on bugfix releases, per usual, but developers
+should consider if they can wait until the next feature release. We encourage
+developers to strongly weight user’s pain from the break in
+stream-compatibility against the improvements. One example of a worthwhile
+improvement would be to change algorithms for a significant increase in
+performance, for example, moving from the `Box-Muller transform
+<https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform>`_ method of
+Gaussian variate generation to the faster `Ziggurat algorithm
+<https://en.wikipedia.org/wiki/Ziggurat_algorithm>`_. An example of a
+discouraged improvement would be tweaking the Ziggurat tables just a little bit
+for a small performance improvement.
Any new design for the RNG subsystem will provide a choice of different core
-uniform PRNG algorithms. We will be more strict about a select subset of
-methods on these core PRNG objects. They MUST guarantee stream-compatibility
-for a minimal, specified set of methods which are chosen to make it easier to
-compose them to build other distributions. Namely,
+uniform PRNG algorithms. A promising design choice is to make these core
+uniform PRNGs their own lightweight objects with a minimal set of methods
+(randomgen_ calls them “basic RNGs”). The broader set of non-uniform
+distributions will be its own class that holds a reference to one of these core
+uniform PRNG objects and simply delegates to the core uniform PRNG object when
+it needs uniform random numbers. To borrow an example from randomgen_, the
+class ``MT19937`` is a basic RNG that implements the classic Mersenne Twister
+algorithm. The class ``RandomGenerator`` wraps around the basic RNG to provide
+all of the non-uniform distribution methods::
+
+ # This is not the only way to instantiate this object.
+ # This is just handy for demonstrating the delegation.
+ >>> brng = MT19937(seed)
+ >>> rg = RandomGenerator(brng)
+ >>> x = rg.standard_normal(10)
+
+We will be more strict about a select subset of methods on these basic RNG
+objects. They MUST guarantee stream-compatibility for a specified set
+of methods which are chosen to make it easier to compose them to build other
+distributions and which are needed to abstract over the implementation details
+of the variety of core PRNG algorithms. Namely,
* ``.bytes()``
* ``.random_uintegers()``
* ``.random_sample()``
+The distributions class (``RandomGenerator``) SHOULD have all of the same
+distribution methods as ``RandomState`` with close-enough function signatures
+such that almost all code that currently works with ``RandomState`` instances
+will work with ``RandomGenerator`` instances (ignoring the precise stream
+values). Some variance will be allowed for integer distributions: in order to
+avoid some of the cross-platform problems described above, these SHOULD be
+rewritten to work with ``uint64`` numbers on all platforms.
+
+.. _randomgen: https://github.com/bashtage/randomgen
+
Supporting Unit Tests
:::::::::::::::::::::
-Furthermore, the new design should also provide one generator class (we shall
-call it ``StableRandom`` for discussion purposes) that provides a slightly
-broader subset of distribution methods for which stream-compatibility is
-*guaranteed*. The point of ``StableRandom`` is to provide something that can
-be used in unit tests so projects that currently have tests which rely on the
-precise stream can be migrated off of ``RandomState``. For the best
-transition, ``StableRandom`` should use as its core uniform PRNG the current
-MT19937 algorithm. As best as possible, the API for the distribution methods
-that are provided on ``StableRandom`` should match their counterparts on
-``RandomState``. They should provide the same stream that the current version
-of ``RandomState`` does. Because their intended use is for unit tests, we do
-not need the performance improvements from the new algorithms that will be
-introduced by the new subsystem.
-
-The list of ``StableRandom`` methods should be chosen to support unit tests:
-
- * ``.randint()``
- * ``.uniform()``
- * ``.normal()``
- * ``.standard_normal()``
- * ``.choice()``
- * ``.shuffle()``
- * ``.permutation()``
-
-
-Not Versioning
---------------
+Because we did make a strong stream-compatibility guarantee early in numpy’s
+life, reliance on stream-compatibility has grown beyond reproducible
+simulations. One use case that remains for stream-compatibility across numpy
+versions is to use pseudorandom streams to generate test data in unit tests.
+With care, many of the cross-platform instabilities can be avoided in the
+context of small unit tests.
+
+The new PRNG subsystem MUST provide a second, legacy distributions class that
+uses the same implementations of the distribution methods as the current
+version of ``numpy.random.RandomState``. The methods of this class will keep
+the same strict stream-compatibility guarantees. It is intended that this
+class will no longer be modified, except to keep it working when numpy
+internals change. All new development should go into the primary distributions
+class. The purpose of ``RandomState`` will be documented as providing certain
+fixed functionality for backwards compatibility and stable numbers for the
+limited purpose of unit testing, and not making whole programs reproducible
+across numpy versions.
+
+This legacy distributions class MUST be accessible under the name
+``numpy.random.RandomState`` for backwards compatibility. All current ways of
+instantiating ``numpy.random.RandomState`` with a given state should
+instantiate the Mersenne Twister basic RNG with the same state. The legacy
+distributions class MUST be capable of accepting other basic RNGs. The purpose
+here is to ensure that one can write a program with a consistent basic RNG
+state with a mixture of libraries that may or may not have upgraded from
+``RandomState``. Instances of the legacy distributions class MUST respond
+``True`` to ``isinstance(rg, numpy.random.RandomState)`` because there is
+current utility code that relies on that check. Similarly, old pickles of
+``numpy.random.RandomState`` instances MUST unpickle correctly.
+
+
+``numpy.random.*``
+::::::::::::::::::
+
+The preferred best practice for getting reproducible pseudorandom numbers is to
+instantiate a generator object with a seed and pass it around. The implicit
+global ``RandomState`` behind the ``numpy.random.*`` convenience functions can
+cause problems, especially when threads or other forms of concurrency are
+involved. Global state is always problematic. We categorically recommend
+avoiding using the convenience functions when reproducibility is involved.
+
+That said, people do use them and use ``numpy.random.seed()`` to control the
+state underneath them. It can be hard to categorize and count API usages
+consistently and usefully, but a very common usage is in unit tests where many
+of the problems of global state are less likely.
+
+The initial release of the new PRNG subsystem MUST leave these convenience
+functions as aliases to the methods on a global ``RandomState`` that is
+initialized with a Mersenne Twister basic RNG object. A call to
+``numpy.random.seed()`` will be forwarded to that basic RNG object. In order
+to allow certain workarounds, it MUST be possible to replace the basic RNG
+underneath the global ``RandomState`` with any other basic RNG object (we leave
+the precise API details up to the new subsystem). Calling ``numpy.random.seed()``
+thereafter SHOULD just pass the given seed to the current basic RNG object and
+not attempt to reset the basic RNG to the Mersenne Twister.
+
+The set of ``numpy.random.*`` convenience functions SHALL remain the same as
+they currently are. They SHALL be aliases to the ``RandomState`` methods and
+not the new less-stable distributions class (``RandomGenerator``, in the
+examples above). Users who want to get the fastest, best distributions can
+follow best practices and instantiate generator objects explicitly.
+
+After we have experience with the new PRNG subsystem, we can and should revisit
+these issues in future NEPs.
+
+
+Alternatives
+------------
+
+Versioning
+::::::::::
For a long time, we considered that the way to allow algorithmic improvements
while maintaining the stream was to apply some form of versioning. That is,
@@ -208,6 +266,44 @@ is to pin the release of ``numpy`` as a whole, versioning ``RandomState`` alone
is superfluous.
+``StableRandom``
+::::::::::::::::
+
+A previous version of this NEP proposed to leave ``RandomState`` completely
+alone for a deprecation period and build the new subsystem alongside with new
+names. To satisfy the unit testing use case, it proposed introducing a small
+distributions class nominally called ``StableRandom``. It would have provided
+a small subset of distribution methods that were considered most useful in unit
+testing, but not the full set such that it would be too likely to be used
+outside of the testing context.
+
+During discussion about this proposal, it became apparent that there was no
+satisfactory subset. At least some projects used a fairly broad selection of
+the ``RandomState`` methods in unit tests.
+
+Downstream project owners would have been forced to modify their code to
+accomodate the new PRNG subsystem. Some modifications might be simply
+mechanical, but the bulk of the work would have been tedious churn for no
+positive improvement to the downstream project, just avoiding being broken.
+
+Furthermore, under this old proposal, we would have had a quite lengthy
+deprecation period where ``RandomState`` existed alongside the new system of
+basic RNGs and distribution classes. Leaving the implementation of
+``RandomState`` fixed meant that it could not use the new basic RNG state
+objects. Developing programs that use a mixture of libraries that have and
+have not upgraded would require managing two sets of PRNG states. This would
+notionally have been time-limited, but we intended the deprecation to be very
+long.
+
+The current proposal solves all of these problems. All current usages of
+``RandomState`` will continue to work in perpetuity, though some may be
+discouraged through documentation. Unit tests can continue to use the full
+complement of ``RandomState`` methods. Mixed ``RandomState/RandomGenerator``
+code can safely share the common basic RNG state. Unmodified ``RandomState``
+code can make use of the new features of alternative basic RNGs like settable
+streams.
+
+
Discussion
----------