diff options
Diffstat (limited to 'doc/source')
28 files changed, 620 insertions, 544 deletions
diff --git a/doc/source/dev/development_environment.rst b/doc/source/dev/development_environment.rst index 445ce3204..1d119ebce 100644 --- a/doc/source/dev/development_environment.rst +++ b/doc/source/dev/development_environment.rst @@ -3,6 +3,8 @@ Setting up and using your development environment ================================================= +.. _recommended-development-setup: + Recommended development setup ----------------------------- @@ -147,9 +149,9 @@ That also takes extra arguments, like ``--pdb`` which drops you into the Python debugger when a test fails or an exception is raised. Running tests with `tox`_ is also supported. For example, to build NumPy and -run the test suite with Python 3.4, use:: +run the test suite with Python 3.7, use:: - $ tox -e py34 + $ tox -e py37 For more extensive information, see :ref:`testing-guidelines` diff --git a/doc/source/dev/gitwash/development_setup.rst b/doc/source/dev/gitwash/development_setup.rst index 1ebd4b486..9027dda64 100644 --- a/doc/source/dev/gitwash/development_setup.rst +++ b/doc/source/dev/gitwash/development_setup.rst @@ -25,6 +25,8 @@ to the instructions at http://help.github.com/forking/ - please see that page for more detail. We're repeating some of it here just to give the specifics for the NumPy_ project, and to suggest some default names. +.. _set-up-and-configure-a-github-account: + Set up and configure a github_ account ====================================== diff --git a/doc/source/dev/index.rst b/doc/source/dev/index.rst index f0b81ba5d..a8bd0bb46 100644 --- a/doc/source/dev/index.rst +++ b/doc/source/dev/index.rst @@ -44,11 +44,11 @@ Here's the short summary, complete TOC links are below: git checkout -b linspace-speedups * Commit locally as you progress (``git add`` and ``git commit``) - Use a `properly formatted <writing-the-commit-message>` commit message, + Use a :ref:`properly formatted<writing-the-commit-message>` commit message, write tests that fail before your change and pass afterward, run all the - `tests locally <development-environment>`. Be sure to document any + :ref:`tests locally<development-environment>`. Be sure to document any changed behavior in docstrings, keeping to the NumPy docstring - `standard <howto-document>`. + :ref:`standard<howto-document>`. 3. To submit your contribution: @@ -57,8 +57,8 @@ Here's the short summary, complete TOC links are below: git push origin linspace-speedups * Enter your GitHub username and password (repeat contributors or advanced - users can remove this step by connecting to GitHub with `SSH <set-up-and- - configure-a-github-account>`. + users can remove this step by connecting to GitHub with + :ref:`SSH<set-up-and-configure-a-github-account>` . * Go to GitHub. The new branch will show up with a green Pull Request button. Make sure the title and message are clear, concise, and self- @@ -92,8 +92,9 @@ Here's the short summary, complete TOC links are below: coding style of your branch. The CI tests must pass before your PR can be merged. If CI fails, you can find out why by clicking on the "failed" icon (red cross) and inspecting the build and test log. To avoid overuse - and waste of this resource, `test your work <recommended-development- - setup>` locally before committing. + and waste of this resource, + :ref:`test your work<recommended-development-setup>` locally before + committing. * A PR must be **approved** by at least one core team member before merging. Approval means the core team member has carefully reviewed the changes, @@ -131,13 +132,14 @@ Divergence between ``upstream/master`` and your feature branch If GitHub indicates that the branch of your Pull Request can no longer be merged automatically, you have to incorporate changes that have been made since you started into your branch. Our recommended way to do this is to -`rebase on master <rebasing-on-master>`. +:ref:`rebase on master<rebasing-on-master>`. Guidelines ---------- * All code should have tests (see `test coverage`_ below for more details). -* All code should be `documented <docstring-standard>`. +* All code should be `documented <https://numpydoc.readthedocs.io/ + en/latest/format.html#docstring-standard>`_. * No changes are ever committed without review and approval by a core team member.Please ask politely on the PR or on the `mailing list`_ if you get no response to your pull request within a week. @@ -156,14 +158,14 @@ Stylistic Guidelines import numpy as np -* For C code, see the `numpy-c-style-guide` +* For C code, see the :ref:`numpy-c-style-guide<style_guide>` Test coverage ------------- Pull requests (PRs) that modify code should either have new tests, or modify existing -tests to fail before the PR and pass afterwards. You should `run the tests +tests to fail before the PR and pass afterwards. You should :ref:`run the tests <development-environment>` before pushing a PR. Tests for a module should ideally cover all code in that module, @@ -175,7 +177,7 @@ and then run:: $ python runtests.py --coverage -This will create a report in `build/coverage`, which can be viewed with:: +This will create a report in ``build/coverage``, which can be viewed with:: $ firefox build/coverage/index.html @@ -226,6 +228,7 @@ The rest of the story releasing governance/index -NumPy-specific workflow is in `numpy-development-workflow`. +NumPy-specific workflow is in :ref:`numpy-development-workflow +<development-workflow>`. .. _`mailing list`: https://mail.python.org/mailman/listinfo/numpy-devel diff --git a/doc/source/reference/arrays.classes.rst b/doc/source/reference/arrays.classes.rst index f17cb932a..3b13530c7 100644 --- a/doc/source/reference/arrays.classes.rst +++ b/doc/source/reference/arrays.classes.rst @@ -43,10 +43,6 @@ NumPy provides several hooks that classes can customize: .. versionadded:: 1.13 - .. note:: The API is `provisional - <https://docs.python.org/3/glossary.html#term-provisional-api>`_, - i.e., we do not yet guarantee backward compatibility. - Any class, ndarray subclass or not, can define this method or set it to :obj:`None` in order to override the behavior of NumPy's ufuncs. This works quite similarly to Python's ``__mul__`` and other binary operation routines. @@ -452,7 +448,7 @@ object, then the Python code:: some code involving val ... -calls ``val = myiter.next()`` repeatedly until :exc:`StopIteration` is +calls ``val = next(myiter)`` repeatedly until :exc:`StopIteration` is raised by the iterator. There are several ways to iterate over an array that may be useful: default iteration, flat iteration, and :math:`N`-dimensional enumeration. diff --git a/doc/source/reference/c-api.array.rst b/doc/source/reference/c-api.array.rst index bd6062b16..3d6246baa 100644 --- a/doc/source/reference/c-api.array.rst +++ b/doc/source/reference/c-api.array.rst @@ -20,27 +20,44 @@ Array API Array structure and data access ------------------------------- -These macros all access the :c:type:`PyArrayObject` structure members. The input -argument, arr, can be any :c:type:`PyObject *<PyObject>` that is directly interpretable -as a :c:type:`PyArrayObject *` (any instance of the :c:data:`PyArray_Type` and its -sub-types). +These macros access the :c:type:`PyArrayObject` structure members and are +defined in ``ndarraytypes.h``. The input argument, *arr*, can be any +:c:type:`PyObject *<PyObject>` that is directly interpretable as a +:c:type:`PyArrayObject *` (any instance of the :c:data:`PyArray_Type` +and itssub-types). .. c:function:: int PyArray_NDIM(PyArrayObject *arr) The number of dimensions in the array. -.. c:function:: npy_intp *PyArray_DIMS(PyArrayObject *arr) +.. c:function:: int PyArray_FLAGS(PyArrayObject* arr) - Returns a pointer to the dimensions/shape of the array. The - number of elements matches the number of dimensions - of the array. Can return ``NULL`` for 0-dimensional arrays. + Returns an integer representing the :ref:`array-flags<array-flags>`. -.. c:function:: npy_intp *PyArray_SHAPE(PyArrayObject *arr) +.. c:function:: int PyArray_TYPE(PyArrayObject* arr) + + Return the (builtin) typenumber for the elements of this array. + +.. c:function:: int PyArray_SETITEM( \ + PyArrayObject* arr, void* itemptr, PyObject* obj) + + Convert obj and place it in the ndarray, *arr*, at the place + pointed to by itemptr. Return -1 if an error occurs or 0 on + success. + +.. c:function:: void PyArray_ENABLEFLAGS(PyArrayObject* arr, int flags) + + .. versionadded:: 1.7 + + Enables the specified array flags. This function does no validation, + and assumes that you know what you're doing. + +.. c:function:: void PyArray_CLEARFLAGS(PyArrayObject* arr, int flags) .. versionadded:: 1.7 - A synonym for PyArray_DIMS, named to be consistent with the - 'shape' usage within Python. + Clears the specified array flags. This function does no validation, + and assumes that you know what you're doing. .. c:function:: void *PyArray_DATA(PyArrayObject *arr) @@ -53,6 +70,19 @@ sub-types). array then be sure you understand how to access the data in the array to avoid memory and/or alignment problems. +.. c:function:: npy_intp *PyArray_DIMS(PyArrayObject *arr) + + Returns a pointer to the dimensions/shape of the array. The + number of elements matches the number of dimensions + of the array. Can return ``NULL`` for 0-dimensional arrays. + +.. c:function:: npy_intp *PyArray_SHAPE(PyArrayObject *arr) + + .. versionadded:: 1.7 + + A synonym for :c:func:`PyArray_DIMS`, named to be consistent with the + `shape <numpy.ndarray.shape>` usage within Python. + .. c:function:: npy_intp *PyArray_STRIDES(PyArrayObject* arr) Returns a pointer to the strides of the array. The @@ -67,6 +97,27 @@ sub-types). Return the stride in the *n* :math:`^{\textrm{th}}` dimension. +.. c:function:: npy_intp PyArray_ITEMSIZE(PyArrayObject* arr) + + Return the itemsize for the elements of this array. + + Note that, in the old API that was deprecated in version 1.7, this function + had the return type ``int``. + +.. c:function:: npy_intp PyArray_SIZE(PyArrayObject* arr) + + Returns the total size (in number of elements) of the array. + +.. c:function:: npy_intp PyArray_Size(PyArrayObject* obj) + + Returns 0 if *obj* is not a sub-class of ndarray. Otherwise, + returns the total number of elements in the array. Safer version + of :c:func:`PyArray_SIZE` (*obj*). + +.. c:function:: npy_intp PyArray_NBYTES(PyArrayObject* arr) + + Returns the total number of bytes consumed by the array. + .. c:function:: PyObject *PyArray_BASE(PyArrayObject* arr) This returns the base object of the array. In most cases, this @@ -93,60 +144,12 @@ sub-types). A synonym for PyArray_DESCR, named to be consistent with the 'dtype' usage within Python. -.. c:function:: void PyArray_ENABLEFLAGS(PyArrayObject* arr, int flags) - - .. versionadded:: 1.7 - - Enables the specified array flags. This function does no validation, - and assumes that you know what you're doing. - -.. c:function:: void PyArray_CLEARFLAGS(PyArrayObject* arr, int flags) - - .. versionadded:: 1.7 - - Clears the specified array flags. This function does no validation, - and assumes that you know what you're doing. - -.. c:function:: int PyArray_FLAGS(PyArrayObject* arr) - -.. c:function:: npy_intp PyArray_ITEMSIZE(PyArrayObject* arr) - - Return the itemsize for the elements of this array. - - Note that, in the old API that was deprecated in version 1.7, this function - had the return type ``int``. - -.. c:function:: int PyArray_TYPE(PyArrayObject* arr) - - Return the (builtin) typenumber for the elements of this array. - .. c:function:: PyObject *PyArray_GETITEM(PyArrayObject* arr, void* itemptr) Get a Python object of a builtin type from the ndarray, *arr*, at the location pointed to by itemptr. Return ``NULL`` on failure. `numpy.ndarray.item` is identical to PyArray_GETITEM. - -.. c:function:: int PyArray_SETITEM( \ - PyArrayObject* arr, void* itemptr, PyObject* obj) - - Convert obj and place it in the ndarray, *arr*, at the place - pointed to by itemptr. Return -1 if an error occurs or 0 on - success. - -.. c:function:: npy_intp PyArray_SIZE(PyArrayObject* arr) - - Returns the total size (in number of elements) of the array. - -.. c:function:: npy_intp PyArray_Size(PyArrayObject* obj) - - Returns 0 if *obj* is not a sub-class of ndarray. Otherwise, - returns the total number of elements in the array. Safer version - of :c:func:`PyArray_SIZE` (*obj*). - -.. c:function:: npy_intp PyArray_NBYTES(PyArrayObject* arr) - - Returns the total number of bytes consumed by the array. Data access @@ -1397,6 +1400,7 @@ Special functions for NPY_OBJECT Returns 0 for success, -1 for failure. +.. _array-flags: Array flags ----------- @@ -3466,6 +3470,10 @@ Other constants The maximum number of dimensions allowed in arrays. +.. c:var:: NPY_MAXARGS + + The maximum number of array arguments that can be used in functions. + .. c:var:: NPY_VERSION The current version of the ndarray object (check to see if this @@ -3558,10 +3566,18 @@ Enumerated Types .. c:type:: NPY_SORTKIND - A special variable-type which can take on the values :c:data:`NPY_{KIND}` - where ``{KIND}`` is + A special variable-type which can take on different values to indicate + the sorting algorithm being used. + + .. c:var:: NPY_QUICKSORT + + .. c:var:: NPY_HEAPSORT + + .. c:var:: NPY_MERGESORT + + .. c:var:: NPY_STABLESORT - **QUICKSORT**, **HEAPSORT**, **MERGESORT**, **STABLESORT** + Used as an alias of :c:data:`NPY_MERGESORT` and vica versa. .. c:var:: NPY_NSORTS diff --git a/doc/source/reference/c-api.types-and-structures.rst b/doc/source/reference/c-api.types-and-structures.rst index a716b5a06..336dff211 100644 --- a/doc/source/reference/c-api.types-and-structures.rst +++ b/doc/source/reference/c-api.types-and-structures.rst @@ -1,3 +1,4 @@ + ***************************** Python Types and C-Structures ***************************** @@ -75,7 +76,8 @@ PyArray_Type and PyArrayObject these structure members should normally be accessed using the provided macros. If you need a shorter name, then you can make use of :c:type:`NPY_AO` (deprecated) which is defined to be equivalent to - :c:type:`PyArrayObject`. + :c:type:`PyArrayObject`. Direct access to the struct fields are + deprecated. Use the `PyArray_*(arr)` form instead. .. code-block:: c @@ -103,7 +105,8 @@ PyArray_Type and PyArrayObject .. c:member:: char *PyArrayObject.data - A pointer to the first element of the array. This pointer can + Accessible via :c:data:`PyArray_DATA`, this data member is a + pointer to the first element of the array. This pointer can (and normally should) be recast to the data type of the array. .. c:member:: int PyArrayObject.nd @@ -111,26 +114,29 @@ PyArray_Type and PyArrayObject An integer providing the number of dimensions for this array. When nd is 0, the array is sometimes called a rank-0 array. Such arrays have undefined dimensions and strides and - cannot be accessed. :c:data:`NPY_MAXDIMS` is the largest number of - dimensions for any array. + cannot be accessed. Macro :c:data:`PyArray_NDIM` defined in + ``ndarraytypes.h`` points to this data member. :c:data:`NPY_MAXDIMS` + is the largest number of dimensions for any array. .. c:member:: npy_intp PyArrayObject.dimensions An array of integers providing the shape in each dimension as long as nd :math:`\geq` 1. The integer is always large enough to hold a pointer on the platform, so the dimension size is - only limited by memory. + only limited by memory. :c:data:`PyArray_DIMS` is the macro + associated with this data member. .. c:member:: npy_intp *PyArrayObject.strides An array of integers providing for each dimension the number of bytes that must be skipped to get to the next element in that - dimension. + dimension. Associated with macro :c:data:`PyArray_STRIDES`. .. c:member:: PyObject *PyArrayObject.base - This member is used to hold a pointer to another Python object that - is related to this array. There are two use cases: + Pointed to by :c:data:`PyArray_BASE`, this member is used to hold a + pointer to another Python object that is related to this array. + There are two use cases: - If this array does not own its own memory, then base points to the Python object that owns it (perhaps another array object) @@ -149,11 +155,13 @@ PyArray_Type and PyArrayObject descriptor structure for each data type supported. This descriptor structure contains useful information about the type as well as a pointer to a table of function pointers to - implement specific functionality. + implement specific functionality. As the name suggests, it is + associated with the macro :c:data:`PyArray_DESCR`. .. c:member:: int PyArrayObject.flags - Flags indicating how the memory pointed to by data is to be + Pointed to by the macro :c:data:`PyArray_FLAGS`, this data member represents + the flags indicating how the memory pointed to by data is to be interpreted. Possible flags are :c:data:`NPY_ARRAY_C_CONTIGUOUS`, :c:data:`NPY_ARRAY_F_CONTIGUOUS`, :c:data:`NPY_ARRAY_OWNDATA`, :c:data:`NPY_ARRAY_ALIGNED`, :c:data:`NPY_ARRAY_WRITEABLE`, diff --git a/doc/source/reference/c-api.ufunc.rst b/doc/source/reference/c-api.ufunc.rst index 0499ccf5b..ba5673cc3 100644 --- a/doc/source/reference/c-api.ufunc.rst +++ b/doc/source/reference/c-api.ufunc.rst @@ -21,7 +21,17 @@ Constants .. c:var:: PyUFunc_{VALUE} - ``{VALUE}`` can be **One** (1), **Zero** (0), or **None** (-1) + .. c:var:: PyUFunc_One + + .. c:var:: PyUFunc_Zero + + .. c:var:: PyUFunc_MinusOne + + .. c:var:: PyUFunc_ReorderableNone + + .. c:var:: PyUFunc_None + + .. c:var:: PyUFunc_IdentityValue Macros diff --git a/doc/source/reference/random/bit_generators/bitgenerators.rst b/doc/source/reference/random/bit_generators/bitgenerators.rst new file mode 100644 index 000000000..1474f7dac --- /dev/null +++ b/doc/source/reference/random/bit_generators/bitgenerators.rst @@ -0,0 +1,11 @@ +:orphan: + +BitGenerator +------------ + +.. currentmodule:: numpy.random.bit_generator + +.. autosummary:: + :toctree: generated/ + + BitGenerator diff --git a/doc/source/reference/random/bit_generators/index.rst b/doc/source/reference/random/bit_generators/index.rst index 4d3d39ae2..35d9e5d09 100644 --- a/doc/source/reference/random/bit_generators/index.rst +++ b/doc/source/reference/random/bit_generators/index.rst @@ -1,10 +1,10 @@ .. _bit_generator: +.. currentmodule:: numpy.random + Bit Generators -------------- -.. currentmodule:: numpy.random - The random values produced by :class:`~Generator` orignate in a BitGenerator. The BitGenerators do not directly provide random numbers and only contains methods used for seeding, getting or @@ -12,17 +12,101 @@ setting the state, jumping or advancing the state, and for accessing low-level wrappers for consumption by code that can efficiently access the functions provided, e.g., `numba <https://numba.pydata.org>`_. -Stable RNGs -=========== +Supported BitGenerators +======================= + +The included BitGenerators are: + +* PCG-64 - The default. A fast generator that supports many parallel streams + and can be advanced by an arbitrary amount. See the documentation for + :meth:`~.PCG64.advance`. PCG-64 has a period of :math:`2^{128}`. See the `PCG + author's page`_ for more details about this class of PRNG. +* MT19937 - The standard Python BitGenerator. Adds a `~mt19937.MT19937.jumped` + function that returns a new generator with state as-if :math:`2^{128}` draws have + been made. +* Philox - A counter-based generator capable of being advanced an + arbitrary number of steps or generating independent streams. See the + `Random123`_ page for more details about this class of bit generators. +* SFC64 - A fast generator based on random invertible mappings. Usually the + fastest generator of the four. See the `SFC author's page`_ for (a little) + more detail. + +.. _`PCG author's page`: http://www.pcg-random.org/ +.. _`Random123`: https://www.deshawresearch.com/resources_random123.html +.. _`SFC author's page`: http://pracrand.sourceforge.net/RNG_engines.txt .. toctree:: :maxdepth: 1 + BitGenerator <bitgenerators> MT19937 <mt19937> - PCG32 <pcg32> PCG64 <pcg64> Philox <philox> - ThreeFry <threefry> - Xoshiro256** <xoshiro256> - Xoshiro512** <xoshiro512> + SFC64 <sfc64> + +Seeding and Entropy +------------------- + +A BitGenerator provides a stream of random values. In order to generate +reproducible streams, BitGenerators support setting their initial state via a +seed. All of the provided BitGenerators will take an arbitrary-sized +non-negative integer, or a list of such integers, as a seed. BitGenerators +need to take those inputs and process them into a high-quality internal state +for the BitGenerator. All of the BitGenerators in numpy delegate that task to +`~SeedSequence`, which uses hashing techniques to ensure that even low-quality +seeds generate high-quality initial states. + +.. code-block:: python + + from numpy.random import PCG64 + + bg = PCG64(12345678903141592653589793) + +.. end_block + +`~SeedSequence` is designed to be convenient for implementing best practices. +We recommend that a stochastic program defaults to using entropy from the OS so +that each run is different. The program should print out or log that entropy. +In order to reproduce a past value, the program should allow the user to +provide that value through some mechanism, a command-line argument is common, +so that the user can then re-enter that entropy to reproduce the result. +`~SeedSequence` can take care of everything except for communicating with the +user, which is up to you. + +.. code-block:: python + + from numpy.random import PCG64, SeedSequence + + # Get the user's seed somehow, maybe through `argparse`. + # If the user did not provide a seed, it should return `None`. + seed = get_user_seed() + ss = SeedSequence(seed) + print('seed = {}'.format(ss.entropy)) + bg = PCG64(ss) + +.. end_block + +We default to using a 128-bit integer using entropy gathered from the OS. This +is a good amount of entropy to initialize all of the generators that we have in +numpy. We do not recommend using small seeds below 32 bits for general use. +Using just a small set of seeds to instantiate larger state spaces means that +there are some initial states that are impossible to reach. This creates some +biases if everyone uses such values. + +There will not be anything *wrong* with the results, per se; even a seed of +0 is perfectly fine thanks to the processing that `~SeedSequence` does. If you +just need *some* fixed value for unit tests or debugging, feel free to use +whatever seed you like. But if you want to make inferences from the results or +publish them, drawing from a larger set of seeds is good practice. + +If you need to generate a good seed "offline", then ``SeedSequence().entropy`` +or using ``secrets.randbits(128)`` from the standard library are both +convenient ways. + +.. autosummary:: + :toctree: generated/ + SeedSequence + bit_generator.ISeedSequence + bit_generator.ISpawnableSeedSequence + bit_generator.SeedlessSeedSequence diff --git a/doc/source/reference/random/bit_generators/mt19937.rst b/doc/source/reference/random/bit_generators/mt19937.rst index f5843ccf0..25ba1d7b5 100644 --- a/doc/source/reference/random/bit_generators/mt19937.rst +++ b/doc/source/reference/random/bit_generators/mt19937.rst @@ -8,13 +8,12 @@ Mersenne Twister (MT19937) .. autoclass:: MT19937 :exclude-members: -Seeding and State -================= +State +===== .. autosummary:: :toctree: generated/ - ~MT19937.seed ~MT19937.state Parallel generation diff --git a/doc/source/reference/random/bit_generators/pcg32.rst b/doc/source/reference/random/bit_generators/pcg32.rst deleted file mode 100644 index faaccaf9b..000000000 --- a/doc/source/reference/random/bit_generators/pcg32.rst +++ /dev/null @@ -1,34 +0,0 @@ -Parallel Congruent Generator (32-bit, PCG32) --------------------------------------------- - -.. module:: numpy.random.pcg32 - -.. currentmodule:: numpy.random.pcg32 - -.. autoclass:: PCG32 - :exclude-members: - -Seeding and State -================= - -.. autosummary:: - :toctree: generated/ - - ~PCG32.seed - ~PCG32.state - -Parallel generation -=================== -.. autosummary:: - :toctree: generated/ - - ~PCG32.advance - ~PCG32.jumped - -Extending -========= -.. autosummary:: - :toctree: generated/ - - ~PCG32.cffi - ~PCG32.ctypes diff --git a/doc/source/reference/random/bit_generators/pcg64.rst b/doc/source/reference/random/bit_generators/pcg64.rst index fa719cea4..7aef1e0dd 100644 --- a/doc/source/reference/random/bit_generators/pcg64.rst +++ b/doc/source/reference/random/bit_generators/pcg64.rst @@ -8,13 +8,12 @@ Parallel Congruent Generator (64-bit, PCG64) .. autoclass:: PCG64 :exclude-members: -Seeding and State -================= +State +===== .. autosummary:: :toctree: generated/ - ~PCG64.seed ~PCG64.state Parallel generation diff --git a/doc/source/reference/random/bit_generators/philox.rst b/doc/source/reference/random/bit_generators/philox.rst index 7ef451d4b..5e581e094 100644 --- a/doc/source/reference/random/bit_generators/philox.rst +++ b/doc/source/reference/random/bit_generators/philox.rst @@ -8,13 +8,12 @@ Philox Counter-based RNG .. autoclass:: Philox :exclude-members: -Seeding and State -================= +State +===== .. autosummary:: :toctree: generated/ - ~Philox.seed ~Philox.state Parallel generation diff --git a/doc/source/reference/random/bit_generators/sfc64.rst b/doc/source/reference/random/bit_generators/sfc64.rst new file mode 100644 index 000000000..dc03820ae --- /dev/null +++ b/doc/source/reference/random/bit_generators/sfc64.rst @@ -0,0 +1,28 @@ +SFC64 Small Fast Chaotic PRNG +----------------------------- + +.. module:: numpy.random.sfc64 + +.. currentmodule:: numpy.random.sfc64 + +.. autoclass:: SFC64 + :exclude-members: + +State +===== + +.. autosummary:: + :toctree: generated/ + + ~SFC64.state + +Extending +========= +.. autosummary:: + :toctree: generated/ + + ~SFC64.cffi + ~SFC64.ctypes + + + diff --git a/doc/source/reference/random/bit_generators/threefry.rst b/doc/source/reference/random/bit_generators/threefry.rst deleted file mode 100644 index 951108d72..000000000 --- a/doc/source/reference/random/bit_generators/threefry.rst +++ /dev/null @@ -1,36 +0,0 @@ -ThreeFry Counter-based RNG --------------------------- - -.. module:: numpy.random.threefry - -.. currentmodule:: numpy.random.threefry - -.. autoclass:: ThreeFry - :exclude-members: - -Seeding and State -================= - -.. autosummary:: - :toctree: generated/ - - ~ThreeFry.seed - ~ThreeFry.state - -Parallel generation -=================== -.. autosummary:: - :toctree: generated/ - - ~ThreeFry.advance - ~ThreeFry.jumped - -Extending -========= -.. autosummary:: - :toctree: generated/ - - ~ThreeFry.cffi - ~ThreeFry.ctypes - - diff --git a/doc/source/reference/random/bit_generators/xoshiro256.rst b/doc/source/reference/random/bit_generators/xoshiro256.rst deleted file mode 100644 index fedc61b33..000000000 --- a/doc/source/reference/random/bit_generators/xoshiro256.rst +++ /dev/null @@ -1,35 +0,0 @@ -Xoshiro256** ------------- - -.. module:: numpy.random.xoshiro256 - -.. currentmodule:: numpy.random.xoshiro256 - -.. autoclass:: Xoshiro256 - :exclude-members: - -Seeding and State -================= - -.. autosummary:: - :toctree: generated/ - - ~Xoshiro256.seed - ~Xoshiro256.state - -Parallel generation -=================== -.. autosummary:: - :toctree: generated/ - - ~Xoshiro256.jumped - -Extending -========= -.. autosummary:: - :toctree: generated/ - - ~Xoshiro256.cffi - ~Xoshiro256.ctypes - - diff --git a/doc/source/reference/random/bit_generators/xoshiro512.rst b/doc/source/reference/random/bit_generators/xoshiro512.rst deleted file mode 100644 index e39346cd6..000000000 --- a/doc/source/reference/random/bit_generators/xoshiro512.rst +++ /dev/null @@ -1,35 +0,0 @@ -Xoshiro512** ------------- - -.. module:: numpy.random.xoshiro512 - -.. currentmodule:: numpy.random.xoshiro512 - -.. autoclass:: Xoshiro512 - :exclude-members: - -Seeding and State -================= - -.. autosummary:: - :toctree: generated/ - - ~Xoshiro512.seed - ~Xoshiro512.state - -Parallel generation -=================== -.. autosummary:: - :toctree: generated/ - - ~Xoshiro512.jumped - -Extending -========= -.. autosummary:: - :toctree: generated/ - - ~Xoshiro512.cffi - ~Xoshiro512.ctypes - - diff --git a/doc/source/reference/random/extending.rst b/doc/source/reference/random/extending.rst index 28db4021c..22f9cb7e4 100644 --- a/doc/source/reference/random/extending.rst +++ b/doc/source/reference/random/extending.rst @@ -18,11 +18,11 @@ provided by ``ctypes.next_double``. .. code-block:: python - from numpy.random import Xoshiro256 + from numpy.random import PCG64 import numpy as np import numba as nb - x = Xoshiro256() + x = PCG64() f = x.ctypes.next_double s = x.ctypes.state state_addr = x.ctypes.state_address @@ -50,7 +50,7 @@ provided by ``ctypes.next_double``. # Must use state address not state with numba normalsj(1, state_addr) %timeit normalsj(1000000, state_addr) - print('1,000,000 Box-Muller (numba/Xoshiro256) randoms') + print('1,000,000 Box-Muller (numba/PCG64) randoms') %timeit np.random.standard_normal(1000000) print('1,000,000 Box-Muller (NumPy) randoms') @@ -66,7 +66,7 @@ Cython ====== Cython can be used to unpack the ``PyCapsule`` provided by a BitGenerator. -This example uses `~xoshiro256.Xoshiro256` and +This example uses `~pcg64.PCG64` and ``random_gauss_zig``, the Ziggurat-based generator for normals, to fill an array. The usual caveats for writing high-performance code using Cython -- removing bounds checks and wrap around, providing array alignment information @@ -80,7 +80,7 @@ removing bounds checks and wrap around, providing array alignment information from cpython.pycapsule cimport PyCapsule_IsValid, PyCapsule_GetPointer from numpy.random.common cimport * from numpy.random.distributions cimport random_gauss_zig - from numpy.random import Xoshiro256 + from numpy.random import PCG64 @cython.boundscheck(False) @@ -91,7 +91,7 @@ removing bounds checks and wrap around, providing array alignment information cdef const char *capsule_name = "BitGenerator" cdef double[::1] random_values - x = Xoshiro256() + x = PCG64() capsule = x.capsule if not PyCapsule_IsValid(capsule, capsule_name): raise ValueError("Invalid pointer to anon_func_state") @@ -117,7 +117,7 @@ RNG structure. cdef const char *capsule_name = "BitGenerator" cdef double[::1] random_values - x = Xoshiro256() + x = PCG64() capsule = x.capsule # Optional check that the capsule if from a BitGenerator if not PyCapsule_IsValid(capsule, capsule_name): diff --git a/doc/source/reference/random/generator.rst b/doc/source/reference/random/generator.rst index 8b086e901..c3803bcab 100644 --- a/doc/source/reference/random/generator.rst +++ b/doc/source/reference/random/generator.rst @@ -8,10 +8,12 @@ a wide range of distributions, and served as a replacement for the two is that ``Generator`` relies on an additional BitGenerator to manage state and generate the random bits, which are then transformed into random values from useful distributions. The default BitGenerator used by -``Generator`` is :class:`~xoshiro256.Xoshiro256`. The BitGenerator +``Generator`` is `~PCG64`. The BitGenerator can be changed by passing an instantized BitGenerator to ``Generator``. +.. autofunction:: default_rng + .. autoclass:: Generator :exclude-members: diff --git a/doc/source/reference/random/index.rst b/doc/source/reference/random/index.rst index 42956590a..51e72513d 100644 --- a/doc/source/reference/random/index.rst +++ b/doc/source/reference/random/index.rst @@ -1,9 +1,11 @@ .. _numpyrandom: +.. py:module:: numpy.random + .. currentmodule:: numpy.random -numpy.random -============ +Random sampling (:mod:`numpy.random`) +===================================== Numpy's random number routines produce pseudo random numbers using combinations of a `BitGenerator` to create sequences and a `Generator` @@ -20,18 +22,18 @@ Since Numpy version 1.17.0 the Generator can be initialized with a number of different BitGenerators. It exposes many different probability distributions. See `NEP 19 <https://www.numpy.org/neps/ nep-0019-rng-policy.html>`_ for context on the updated random Numpy number -routines. The legacy `RandomState` random number routines are still +routines. The legacy `.RandomState` random number routines are still available, but limited to a single BitGenerator. -For convenience and backward compatibility, a single `RandomState` +For convenience and backward compatibility, a single `~.RandomState` instance's methods are imported into the numpy.random namespace, see :ref:`legacy` for the complete list. Quick Start ----------- -By default, `Generator` uses normals provided by `xoshiro256.Xoshiro256` -which will be faster than the legacy methods in `RandomState` +By default, `~Generator` uses normals provided by `~pcg64.PCG64` which will be +statistically more reliable than the legacy methods in `~.RandomState` .. code-block:: python @@ -39,41 +41,40 @@ which will be faster than the legacy methods in `RandomState` from numpy import random random.standard_normal() -`Generator` can be used as a direct replacement for `~RandomState`, although -the random values are generated by `~xoshiro256.Xoshiro256`. The -`Generator` holds an instance of a BitGenerator. It is accessible as +`~Generator` can be used as a direct replacement for `~.RandomState`, although +the random values are generated by `~.PCG64`. The +`~Generator` holds an instance of a BitGenerator. It is accessible as ``gen.bit_generator``. .. code-block:: python - # As replacement for RandomState() - from numpy.random import Generator - rg = Generator() + # As replacement for RandomState(); default_rng() instantiates Generator with + # the default PCG64 BitGenerator. + from numpy.random import default_rng + rg = default_rng() rg.standard_normal() rg.bit_generator - -Seeds can be passed to any of the BitGenerators. Here `mt19937.MT19937` is used -and is the wrapped with a `~.Generator`. - +Seeds can be passed to any of the BitGenerators. The provided value is mixed +via `~.SeedSequence` to spread a possible sequence of seeds across a wider +range of initialization states for the BitGenerator. Here `~.PCG64` is used and +is wrapped with a `~.Generator`. .. code-block:: python - from numpy.random import Generator, MT19937 - rg = Generator(MT19937(12345)) + from numpy.random import Generator, PCG64 + rg = Generator(PCG64(12345)) rg.standard_normal() - Introduction ------------ -RandomGen takes a different approach to producing random numbers from the -`RandomState` object. Random number generation is separated into two -components, a bit generator and a random generator. +The new infrastructure takes a different approach to producing random numbers +from the `~.RandomState` object. Random number generation is separated into +two components, a bit generator and a random generator. -The bit generator has a limited set of responsibilities. It manages state +The `BitGenerator` has a limited set of responsibilities. It manages state and provides functions to produce random doubles and random unsigned 32- and -64-bit values. The bit generator also handles all seeding which varies with -different bit generators. +64-bit values. The `random generator <Generator>` takes the bit generator-provided stream and transforms them into more useful @@ -81,19 +82,22 @@ distributions, e.g., simulated normal random values. This structure allows alternative bit generators to be used with little code duplication. The `Generator` is the user-facing object that is nearly identical to -`RandomState`. The canonical method to initialize a generator passes a -`~mt19937.MT19937` bit generator, the underlying bit generator in Python -- as -the sole argument. Note that the BitGenerator must be instantiated. +`.RandomState`. The canonical method to initialize a generator passes a +`~.PCG64` bit generator as the sole argument. + .. code-block:: python - from numpy.random import Generator, MT19937 - rg = Generator(MT19937()) + from numpy.random import default_rng + rg = default_rng(12345) rg.random() -Seed information is directly passed to the bit generator. +One can also instantiate `Generator` directly with a `BitGenerator` instance. +To use the older `~mt19937.MT19937` algorithm, one can instantiate it directly +and pass it to `Generator`. .. code-block:: python + from numpy.random import Generator, MT19937 rg = Generator(MT19937(12345)) rg.random() @@ -104,9 +108,9 @@ What's New or Different The Box-Muller method used to produce NumPy's normals is no longer available in `Generator`. It is not possible to reproduce the exact random values using Generator for the normal distribution or any other - distribution that relies on the normal such as the `numpy.random.gamma` or - `numpy.random.standard_t`. If you require bitwise backward compatible - streams, use `RandomState`. + distribution that relies on the normal such as the `.RandomState.gamma` or + `.RandomState.standard_t`. If you require bitwise backward compatible + streams, use `.RandomState`. * The Generator's normal, exponential and gamma functions use 256-step Ziggurat methods which are 2-10 times faster than NumPy's Box-Muller or inverse CDF @@ -120,9 +124,8 @@ What's New or Different source of randomness that is used in cryptographic applications (e.g., ``/dev/urandom`` on Unix). * All BitGenerators can produce doubles, uint64s and uint32s via CTypes - (`~xoshiro256.Xoshiro256.ctypes`) and CFFI - (:meth:`~xoshiro256.Xoshiro256.cffi`). This allows the bit generators to - be used in numba. + (`~.PCG64.ctypes`) and CFFI (`~.PCG64.cffi`). This allows the bit generators + to be used in numba. * The bit generators can be used in downstream projects via :ref:`Cython <randomgen_cython>`. * `~.Generator.integers` is now the canonical way to generate integer @@ -131,8 +134,11 @@ What's New or Different The ``endpoint`` keyword can be used to specify open or closed intervals. This replaces both ``randint`` and the deprecated ``random_integers``. * `~.Generator.random` is now the canonical way to generate floating-point - random numbers, which replaces `random_sample`, `sample`, and `ranf`. This - is consistent with Python's `random.random`. + random numbers, which replaces `.RandomState.random_sample`, + `.RandomState.sample`, and `.RandomState.ranf`. This is consistent with + Python's `random.random`. +* All BitGenerators in numpy use `~SeedSequence` to convert seeds into + initialized states. See :ref:`new-or-different` for a complete list of improvements and differences from the traditional ``Randomstate``. @@ -141,47 +147,20 @@ Parallel Generation ~~~~~~~~~~~~~~~~~~~ The included generators can be used in parallel, distributed applications in -one of two ways: +one of three ways: +* :ref:`seedsequence-spawn` * :ref:`independent-streams` -* :ref:`jump-and-advance` - -Supported BitGenerators ------------------------ -The included BitGenerators are: - -* MT19937 - The standard Python BitGenerator. Produces identical results to - Python using the same seed/state. Adds a `~mt19937.MT19937.jumped` function - that returns a new generator with state as-if ``2**128`` draws have been made. -* Xorshiro256** and Xorshiro512** - The most recently introduced XOR, - shift, and rotate generator. Supports ``jumped`` and so can be used in - parallel applications. See the documentation for - `~xoshiro256.Xoshirt256.jumped` for details. More information about these bit - generators is available at the `xorshift, xoroshiro and xoshiro authors' - page`_. -* ThreeFry and Philox - counter-based generators capable of being advanced an - arbitrary number of steps or generating independent streams. See the - `Random123`_ page for more details about this class of bit generators. - -.. _`PCG author's page`: http://www.pcg-random.org/ -.. _`xorshift, xoroshiro and xoshiro authors' page`: http://xoroshiro.di.unimi.it/ -.. _`Random123`: https://www.deshawresearch.com/resources_random123.html - -Generator ---------- +* :ref:`parallel-jumped` + +Concepts +-------- .. toctree:: :maxdepth: 1 generator legacy mtrand <legacy> - -BitGenerators -------------- - -.. toctree:: - :maxdepth: 1 - - BitGenerators <bit_generators/index> + BitGenerators, SeedSequences <bit_generators/index> Features -------- diff --git a/doc/source/reference/random/legacy.rst b/doc/source/reference/random/legacy.rst index d9391e9e2..04d4d3569 100644 --- a/doc/source/reference/random/legacy.rst +++ b/doc/source/reference/random/legacy.rst @@ -1,3 +1,5 @@ +.. currentmodule:: numpy.random + .. _legacy: Legacy Random Generation @@ -8,7 +10,7 @@ no further improvements. It is guaranteed to produce the same values as the final point release of NumPy v1.16. These all depend on Box-Muller normals or inverse CDF exponentials or gammas. This class should only be used if it is essential to have randoms that are identical to what -would have been produced by NumPy. +would have been produced by previous versions of NumPy. `~mtrand.RandomState` adds additional information to the state which is required when using Box-Muller normals since these @@ -16,38 +18,33 @@ are produced in pairs. It is important to use `~mtrand.RandomState.get_state`, and not the underlying bit generators `state`, when accessing the state so that these extra values are saved. -.. warning:: - - :class:`~randomgen.legacy.LegacyGenerator` only contains functions - that have changed. Since it does not contain other functions, it - is not directly possible to replace :class:`~numpy.random.RandomState`. - In order to full replace :class:`~numpy.random.RandomState`, it is - necessary to use both :class:`~randomgen.legacy.LegacyGenerator` - and :class:`~randomgen.generator.RandomGenerator` both driven - by the same basic RNG. Methods present in :class:`~randomgen.legacy.LegacyGenerator` - must be called from :class:`~randomgen.legacy.LegacyGenerator`. Other Methods - should be called from :class:`~randomgen.generator.RandomGenerator`. - +Although we provide the `~mt19937.MT19937` BitGenerator for use independent of +`~mtrand.RandomState`, note that its default seeding uses `~SeedSequence` +rather than the legacy seeding algorithm. `~mtrand.RandomState` will use the +legacy seeding algorithm. The methods to use the legacy seeding algorithm are +currently private as the main reason to use them is just to implement +`~mtrand.RandomState`. However, one can reset the state of `~mt19937.MT19937` +using the state of the `~mtrand.RandomState`: .. code-block:: python from numpy.random import MT19937 from numpy.random import RandomState - # Use same seed rs = RandomState(12345) - mt19937 = MT19937(12345) - lg = RandomState(mt19937) + mt19937 = MT19937() + mt19937.state = rs.get_state() + rs2 = RandomState(mt19937) - # Identical output + # Same output rs.standard_normal() - lg.standard_normal() + rs2.standard_normal() rs.random() - lg.random() + rs2.random() rs.standard_exponential() - lg.standard_exponential() + rs2.standard_exponential() .. currentmodule:: numpy.random.mtrand diff --git a/doc/source/reference/random/multithreading.rst b/doc/source/reference/random/multithreading.rst index 7ce90af99..6883d3672 100644 --- a/doc/source/reference/random/multithreading.rst +++ b/doc/source/reference/random/multithreading.rst @@ -1,30 +1,32 @@ Multithreaded Generation ======================== -The four core distributions all allow existing arrays to be filled using the -``out`` keyword argument. Existing arrays need to be contiguous and -well-behaved (writable and aligned). Under normal circumstances, arrays +The four core distributions (:meth:`~.Generator.random`, +:meth:`~.Generator.standard_normal`, :meth:`~.Generator.standard_exponential`, +and :meth:`~.Generator.standard_gamma`) all allow existing arrays to be filled +using the ``out`` keyword argument. Existing arrays need to be contiguous and +well-behaved (writable and aligned). Under normal circumstances, arrays created using the common constructors such as :meth:`numpy.empty` will satisfy these requirements. This example makes use of Python 3 :mod:`concurrent.futures` to fill an array using multiple threads. Threads are long-lived so that repeated calls do not require any additional overheads from thread creation. The underlying -BitGenerator is `Xoshiro256` which is fast, has a long period and supports -using `Xoshiro256.jumped` to return a new generator while advancing the +BitGenerator is `PCG64` which is fast, has a long period and supports +using `PCG64.jumped` to return a new generator while advancing the state. The random numbers generated are reproducible in the sense that the same seed will produce the same outputs. .. code-block:: ipython - from numpy.random import Generator, Xoshiro256 + from numpy.random import Generator, PCG64 import multiprocessing import concurrent.futures import numpy as np class MultithreadedRNG(object): def __init__(self, n, seed=None, threads=None): - rg = Xoshiro256(seed) + rg = PCG64(seed) if threads is None: threads = multiprocessing.cpu_count() self.threads = threads @@ -89,7 +91,7 @@ The single threaded call directly uses the BitGenerator. .. code-block:: ipython In [5]: values = np.empty(10000000) - ...: rg = Generator(Xoshiro256()) + ...: rg = Generator(PCG64()) ...: %timeit rg.standard_normal(out=values) 99.6 ms ± 222 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) @@ -100,7 +102,7 @@ that does not use an existing array due to array creation overhead. .. code-block:: ipython - In [6]: rg = Generator(Xoshiro256()) + In [6]: rg = Generator(PCG64()) ...: %timeit rg.standard_normal(10000000) 125 ms ± 309 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) diff --git a/doc/source/reference/random/new-or-different.rst b/doc/source/reference/random/new-or-different.rst index a6de9c8dc..4eb175d57 100644 --- a/doc/source/reference/random/new-or-different.rst +++ b/doc/source/reference/random/new-or-different.rst @@ -19,15 +19,15 @@ Quick comparison of legacy `mtrand <legacy>`_ to the new `Generator` ================== ==================== ============= Feature Older Equivalent Notes ------------------ -------------------- ------------- -`Generator` `RandomState` ``Generator`` requires a stream +`~.Generator` `~.RandomState` ``Generator`` requires a stream source, called a `BitGenerator <bit_generators>` A number of these are provided. ``RandomState`` uses - only the Box- Muller method. + only the Mersenne Twister. ------------------ -------------------- ------------- -``np.random.`` ``np.random.`` Access the values in a BitGenerator, -``Generator().`` ``random_sample()`` convert them to ``float64`` in the -``random()`` interval ``[0.0.,`` `` 1.0)``. +``random`` ``random_sample`` Access the values in a BitGenerator, + convert them to ``float64`` in the + interval ``[0.0.,`` `` 1.0)``. In addition to the ``size`` kwarg, now supports ``dtype='d'`` or ``dtype='f'``, and an ``out`` kwarg to fill a user- @@ -36,8 +36,8 @@ Feature Older Equivalent Notes Many other distributions are also supported. ------------------ -------------------- ------------- -``Generator().`` ``randint``, Use the ``endpoint`` kwarg to adjust -``integers()`` ``random_integers`` the inclusion or exclution of the +``integers`` ``randint``, Use the ``endpoint`` kwarg to adjust + ``random_integers`` the inclusion or exclution of the ``high`` interval endpoint ================== ==================== ============= @@ -56,10 +56,9 @@ And in more detail: random numbers from a discrete uniform distribution. The ``rand`` and ``randn`` methods are only available through the legacy `~.RandomState`. This replaces both ``randint`` and the deprecated ``random_integers``. -* The Box-Muller used to produce NumPy's normals is no longer available. +* The Box-Muller method used to produce NumPy's normals is no longer available. * All bit generators can produce doubles, uint64s and - uint32s via CTypes (`~.xoshiro256.Xoshiro256. - ctypes`) and CFFI (`~.xoshiro256.Xoshiro256.cffi`). + uint32s via CTypes (`~PCG64.ctypes`) and CFFI (`~PCG64.cffi`). This allows these bit generators to be used in numba. * The bit generators can be used in downstream projects via Cython. @@ -67,9 +66,9 @@ And in more detail: .. ipython:: python - from numpy.random import Generator, Xoshiro256 + from numpy.random import Generator, PCG64 import numpy.random - rg = Generator(Xoshiro256()) + rg = Generator(PCG64()) %timeit rg.standard_normal(100000) %timeit numpy.random.standard_normal(100000) @@ -94,9 +93,8 @@ And in more detail: .. ipython:: python - rg.bit_generator.seed(0) + rg = Generator(PCG64(0)) rg.random(3, dtype='d') - rg.bit_generator.seed(0) rg.random(3, dtype='f') * Optional ``out`` argument that allows existing arrays to be filled for diff --git a/doc/source/reference/random/parallel.rst b/doc/source/reference/random/parallel.rst index 6c495cc29..2f79f22d8 100644 --- a/doc/source/reference/random/parallel.rst +++ b/doc/source/reference/random/parallel.rst @@ -5,59 +5,147 @@ There are three strategies implemented that can be used to produce repeatable pseudo-random numbers across multiple processes (local or distributed). -.. _independent-streams: - .. currentmodule:: numpy.random -Independent Streams -------------------- +.. _seedsequence-spawn: + +`~SeedSequence` spawning +------------------------ + +`~SeedSequence` `implements an algorithm`_ to process a user-provided seed, +typically as an integer of some size, and to convert it into an initial state for +a `~BitGenerator`. It uses hashing techniques to ensure that low-quality seeds +are turned into high quality initial states (at least, with very high +probability). + +For example, `~mt19937.MT19937` has a state consisting of 624 +`uint32` integers. A naive way to take a 32-bit integer seed would be to just set +the last element of the state to the 32-bit seed and leave the rest 0s. This is +a valid state for `~mt19937.MT19937`, but not a good one. The Mersenne Twister +algorithm `suffers if there are too many 0s`_. Similarly, two adjacent 32-bit +integer seeds (i.e. ``12345`` and ``12346``) would produce very similar +streams. + +`~SeedSequence` avoids these problems by using successions of integer hashes +with good `avalanche properties`_ to ensure that flipping any bit in the input +input has about a 50% chance of flipping any bit in the output. Two input seeds +that are very close to each other will produce initial states that are very far +from each other (with very high probability). It is also constructed in such +a way that you can provide arbitrary-sized integers or lists of integers. +`~SeedSequence` will take all of the bits that you provide and mix them +together to produce however many bits the consuming `~BitGenerator` needs to +initialize itself. + +These properties together mean that we can safely mix together the usual +user-provided seed with simple incrementing counters to get `~BitGenerator` +states that are (to very high probability) independent of each other. We can +wrap this together into an API that is easy to use and difficult to misuse. + +.. code-block:: python + + from numpy.random import SeedSequence, default_rng + + ss = SeedSequence(12345) -:class:`~pcg64.PCG64`, :class:`~threefry.ThreeFry` -and :class:`~philox.Philox` support independent streams. This -example shows how many streams can be created by passing in different index -values in the second input while using the same seed in the first. + # Spawn off 10 child SeedSequences to pass to child processes. + child_seeds = ss.spawn(10) + streams = [default_rng(s) for s in child_seeds] + +.. end_block + +Child `~SeedSequence` objects can also spawn to make grandchildren, and so on. +Each `~SeedSequence` has its position in the tree of spawned `~SeedSequence` +objects mixed in with the user-provided seed to generate independent (with very +high probability) streams. .. code-block:: python - from numpy.random.entropy import random_entropy - from numpy.random import PCG64 + grandchildren = child_seeds[0].spawn(4) + grand_streams = [default_rng(s) for s in grandchildren] + +.. end_block + +This feature lets you make local decisions about when and how to split up +streams without coordination between processes. You do not have to preallocate +space to avoid overlapping or request streams from a common global service. This +general "tree-hashing" scheme is `not unique to numpy`_ but not yet widespread. +Python has increasingly-flexible mechanisms for parallelization available, and +this scheme fits in very well with that kind of use. + +Using this scheme, an upper bound on the probability of a collision can be +estimated if one knows the number of streams that you derive. `~SeedSequence` +hashes its inputs, both the seed and the spawn-tree-path, down to a 128-bit +pool by default. The probability that there is a collision in +that pool, pessimistically-estimated ([1]_), will be about :math:`n^2*2^{-128}` where +`n` is the number of streams spawned. If a program uses an aggressive million +streams, about :math:`2^{20}`, then the probability that at least one pair of +them are identical is about :math:`2^{-88}`, which is in solidly-ignorable +territory ([2]_). + +.. [1] The algorithm is carefully designed to eliminate a number of possible + ways to collide. For example, if one only does one level of spawning, it + is guaranteed that all states will be unique. But it's easier to + estimate the naive upper bound on a napkin and take comfort knowing + that the probability is actually lower. + +.. [2] In this calculation, we can ignore the amount of numbers drawn from each + stream. Each of the PRNGs we provide has some extra protection built in + that avoids overlaps if the `~SeedSequence` pools differ in the + slightest bit. `~pcg64.PCG64` has :math:`2^{127}` separate cycles + determined by the seed in addition to the position in the + :math:`2^{128}` long period for each cycle, so one has to both get on or + near the same cycle *and* seed a nearby position in the cycle. + `~philox.Philox` has completely independent cycles determined by the seed. + `~sfc64.SFC64` incorporates a 64-bit counter so every unique seed is at + least :math:`2^{64}` iterations away from any other seed. And + finally, `~mt19937.MT19937` has just an unimaginably huge period. Getting + a collision internal to `~SeedSequence` is the way a failure would be + observed. + +.. _`implements an algorithm`: http://www.pcg-random.org/posts/developing-a-seed_seq-alternative.html +.. _`suffers if there are too many 0s`: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html +.. _`avalanche properties`: https://en.wikipedia.org/wiki/Avalanche_effect +.. _`not unique to numpy`: https://www.iro.umontreal.ca/~lecuyer/myftp/papers/parallel-rng-imacs.pdf - entropy = random_entropy(4) - # 128-bit number as a seed - seed = sum([int(entropy[i]) * 2 ** (32 * i) for i in range(4)]) - streams = [PCG64(seed, stream) for stream in range(10)] +.. _independent-streams: -:class:`~philox.Philox` and :class:`~threefry.ThreeFry` are -counter-based RNGs which use a counter and key. Different keys can be used -to produce independent streams. +Independent Streams +------------------- + +:class:`~philox.Philox` is a counter-based RNG based which generates values by +encrypting an incrementing counter using weak cryptographic primitives. The +seed determines the key that is used for the encryption. Unique keys create +unique, independent streams. :class:`~philox.Philox` lets you bypass the +seeding algorithm to directly set the 128-bit key. Similar, but different, keys +will still create independent streams. .. code-block:: python - import numpy as np - from numpy.random import ThreeFry + import secrets + from numpy.random import Philox + + # 128-bit number as a seed + root_seed = secrets.getrandbits(128) + streams = [Philox(key=root_seed + stream_id) for stream_id in range(10)] + +.. end_block - key = random_entropy(8) - key = key.view(np.uint64) - key[0] = 0 - step = np.zeros(4, dtype=np.uint64) - step[0] = 1 - streams = [ThreeFry(key=key + stream * step) for stream in range(10)] +This scheme does require that you avoid reusing stream IDs. This may require +coordination between the parallel processes. -.. _jump-and-advance: -Jump/Advance the BitGenerator state ------------------------------------ +.. _parallel-jumped: -Jumped -****** +Jumping the BitGenerator state +------------------------------ ``jumped`` advances the state of the BitGenerator *as-if* a large number of random numbers have been drawn, and returns a new instance with this state. The specific number of draws varies by BitGenerator, and ranges from -:math:`2^{64}` to :math:`2^{512}`. Additionally, the *as-if* draws also depend +:math:`2^{64}` to :math:`2^{128}`. Additionally, the *as-if* draws also depend on the size of the default random number produced by the specific BitGenerator. -The BitGenerator that support ``jumped``, along with the period of the +The BitGenerators that support ``jumped``, along with the period of the BitGenerator, the size of the jump and the bits in the default unsigned random are listed below. @@ -66,74 +154,40 @@ are listed below. +=================+=========================+=========================+=========================+ | MT19937 | :math:`2^{19937}` | :math:`2^{128}` | 32 | +-----------------+-------------------------+-------------------------+-------------------------+ -| PCG64 | :math:`2^{128}` | :math:`2^{64}` | 64 | +| PCG64 | :math:`2^{128}` | :math:`~2^{127}` ([3]_) | 64 | +-----------------+-------------------------+-------------------------+-------------------------+ | Philox | :math:`2^{256}` | :math:`2^{128}` | 64 | +-----------------+-------------------------+-------------------------+-------------------------+ -| ThreeFry | :math:`2^{256}` | :math:`2^{128}` | 64 | -+-----------------+-------------------------+-------------------------+-------------------------+ -| Xoshiro256** | :math:`2^{256}` | :math:`2^{128}` | 64 | -+-----------------+-------------------------+-------------------------+-------------------------+ -| Xoshiro512** | :math:`2^{512}` | :math:`2^{256}` | 64 | -+-----------------+-------------------------+-------------------------+-------------------------+ + +.. [3] The jump size is :math:`(\phi-1)*2^{128}` where :math:`\phi` is the + golden ratio. As the jumps wrap around the period, the actual distances + between neighboring streams will slowly grow smaller than the jump size, + but using the golden ratio this way is a classic method of constructing + a low-discrepancy sequence that spreads out the states around the period + optimally. You will not be able to jump enough to make those distances + small enough to overlap in your lifetime. ``jumped`` can be used to produce long blocks which should be long enough to not overlap. .. code-block:: python - from numpy.random.entropy import random_entropy - from numpy.random import Xoshiro256 + import secrets + from numpy.random import PCG64 - entropy = random_entropy(2).astype(np.uint64) - # 64-bit number as a seed - seed = entropy[0] * 2**32 + entropy[1] + seed = secrets.getrandbits(128) blocked_rng = [] - rng = Xoshiro256(seed) + rng = PCG64(seed) for i in range(10): blocked_rng.append(rng.jumped(i)) -Advance -******* -``advance`` can be used to jump the state an arbitrary number of steps, and so -is a more general approach than ``jumped``. :class:`~pcg64.PCG64`, -:class:`~threefry.ThreeFry` and :class:`~philox.Philox` -support ``advance``, and since these also support -independent streams, it is not usually necessary to use ``advance``. - -Advancing a BitGenerator updates the underlying state as-if a given number of -calls to the BitGenerator have been made. In general there is not a -one-to-one relationship between the number output random values from a -particular distribution and the number of draws from the core BitGenerator. -This occurs for two reasons: - -* The random values are simulated using a rejection-based method - and so more than one value from the underlying BitGenerator can be required - to generate an single draw. -* The number of bits required to generate a simulated value differs from the - number of bits generated by the underlying BitGenerator. For example, two - 16-bit integer values can be simulated from a single draw of a 32-bit value. - -Advancing the BitGenerator state resets any pre-computed random numbers. This -is required to ensure exact reproducibility. - -This example uses ``advance`` to advance a :class:`~pcg64.PCG64` -generator 2 ** 127 steps to set a sequence of random number generators. - -.. code-block:: python - - from numpy.random import PCG64 - bit_generator = PCG64() - bit_generator_copy = PCG64() - bit_generator_copy.state = bit_generator.state - - advance = 2**127 - bit_generators = [bit_generator] - for _ in range(9): - bit_generator_copy.advance(advance) - bit_generator = PCG64() - bit_generator.state = bit_generator_copy.state - bit_generators.append(bit_generator) - -.. end block +.. end_block +When using ``jumped``, one does have to take care not to jump to a stream that +was already used. In the above example, one could not later use +``blocked_rng[0].jumped()`` as it would overlap with ``blocked_rng[1]``. Like +with the independent streams, if the main process here wants to split off 10 +more streams by jumping, then it needs to start with ``range(10, 20)``, +otherwise it would recreate the same streams. On the other hand, if you +carefully construct the streams, then you are guaranteed to have streams that +do not overlap. diff --git a/doc/source/reference/random/performance.py b/doc/source/reference/random/performance.py index f1dc50c9d..28a42eb0d 100644 --- a/doc/source/reference/random/performance.py +++ b/doc/source/reference/random/performance.py @@ -4,10 +4,9 @@ from timeit import repeat import pandas as pd import numpy as np -from numpy.random import MT19937, ThreeFry, PCG64, Philox, \ - Xoshiro256, Xoshiro512 +from numpy.random import MT19937, PCG64, Philox, SFC64 -PRNGS = [MT19937, PCG64, Philox, ThreeFry, Xoshiro256, Xoshiro512] +PRNGS = [MT19937, PCG64, Philox, SFC64] funcs = OrderedDict() integers = 'integers(0, 2**{bits},size=1000000, dtype="uint{bits}")' @@ -55,15 +54,16 @@ for key in npfuncs: col[key] = 1000 * min(t) table['RandomState'] = pd.Series(col) +columns = ['MT19937','PCG64','Philox','SFC64', 'RandomState'] table = pd.DataFrame(table) -table = table.reindex(table.mean(1).sort_values().index) order = np.log(table).mean().sort_values().index table = table.T -table = table.reindex(order) +table = table.reindex(columns) table = table.T table = table.reindex([k for k in funcs], axis=0) print(table.to_csv(float_format='%0.1f')) + rel = table.loc[:, ['RandomState']].values @ np.ones( (1, table.shape[1])) / table rel.pop('RandomState') @@ -73,3 +73,15 @@ rel *= 100 rel = np.round(rel) rel = rel.T print(rel.to_csv(float_format='%0d')) + +# Cross-platform table +rows = ['32-bit Unsigned Ints','64-bit Unsigned Ints','Uniforms','Normals','Exponentials'] +xplat = rel.reindex(rows, axis=0) +xplat = 100 * (xplat / xplat.MT19937.values[:,None]) +overall = np.exp(np.log(xplat).mean(0)) +xplat = xplat.T.copy() +xplat['Overall']=overall +print(xplat.T.round(1)) + + + diff --git a/doc/source/reference/random/performance.rst b/doc/source/reference/random/performance.rst index 014db19a3..2d5fca496 100644 --- a/doc/source/reference/random/performance.rst +++ b/doc/source/reference/random/performance.rst @@ -1,19 +1,31 @@ Performance ----------- -.. py:module:: numpy.random - .. currentmodule:: numpy.random Recommendation ************** -The recommended generator for single use is :class:`~.xoshiro256.Xoshiro256`. -The recommended generator for use in large-scale parallel applications is -:class:`~.xoshiro512.Xoshiro512` where the `jumped` method is used to advance -the state. For very large scale applications -- requiring 1,000+ independent -streams -- is the best choice. For very large scale applications -- requiring -1,000+ independent streams, :class:`~pcg64.PCG64` or :class:`~.philox.Philox` -are the best choices. +The recommended generator for general use is :class:`~pcg64.PCG64`. It is +statistically high quality, full-featured, and fast on most platforms, but +somewhat slow when compiled for 32-bit processes. + +:class:`~philox.Philox` is fairly slow, but its statistical properties have +very high quality, and it is easy to get assuredly-independent stream by using +unique keys. If that is the style you wish to use for parallel streams, or you +are porting from another system that uses that style, then +:class:`~philox.Philox` is your choice. + +:class:`~sfc64.SFC64` is statistically high quality and very fast. However, it +lacks jumpability. If you are not using that capability and want lots of speed, +even on 32-bit processes, this is your choice. + +:class:`~mt19937.MT19937` `fails some statistical tests`_ and is not especially +fast compared to modern PRNGs. For these reasons, we mostly do not recommend +using it on its own, only through the legacy `~.RandomState` for +reproducing old results. That said, it has a very long history as a default in +many systems. + +.. _`fails some statistical tests`: https://www.iro.umontreal.ca/~lecuyer/myftp/papers/testu01.pdf Timings ******* @@ -26,47 +38,46 @@ faster generators. Integer performance has a similar ordering. The pattern is similar for other, more complex generators. The normal -performance of the legacy :class:`~mtrand.RandomState` generator is much +performance of the legacy :class:`~.RandomState` generator is much lower than the other since it uses the Box-Muller transformation rather than the Ziggurat generator. The performance gap for Exponentials is also large due to the cost of computing the log function to invert the CDF. The column labeled MT19973 is used the same 32-bit generator as -:class:`~mtrand.RandomState` but produces random values using -:class:`~generator.Generator`. +:class:`~.RandomState` but produces random values using +:class:`~Generator`. .. csv-table:: - :header: ,Xoshiro256**,Xoshiro512**,DSFMT,PCG64,MT19937,Philox,RandomState,ThreeFry - :widths: 14,14,14,14,14,14,14,14,14 - - 32-bit Unsigned Ints,2.6,2.9,3.5,3.2,3.3,4.8,3.2,7.6 - 64-bit Unsigned Ints,3.3,4.3,5.7,4.8,5.7,6.9,5.7,12.8 - Uniforms,3.4,4.0,3.2,5.0,7.3,8.0,7.3,12.8 - Normals,7.9,9.0,11.8,11.3,13.0,13.7,34.4,18.1 - Exponentials,4.7,5.2,7.4,6.7,7.9,8.6,40.3,14.7 - Gammas,29.1,27.5,28.5,30.6,34.2,35.1,58.1,47.6 - Binomials,22.7,23.1,21.1,25.7,27.7,28.4,25.9,32.1 - Laplaces,38.5,38.1,36.9,41.1,44.5,45.4,46.9,50.2 - Poissons,46.9,50.9,46.4,58.1,68.4,70.2,86.0,88.2 - + :header: ,MT19937,PCG64,Philox,SFC64,RandomState + :widths: 14,14,14,14,14,14 + + 32-bit Unsigned Ints,3.2,2.7,4.9,2.7,3.2 + 64-bit Unsigned Ints,5.6,3.7,6.3,2.9,5.7 + Uniforms,7.3,4.1,8.1,3.1,7.3 + Normals,13.1,10.2,13.5,7.8,34.6 + Exponentials,7.9,5.4,8.5,4.1,40.3 + Gammas,34.8,28.0,34.7,25.1,58.1 + Binomials,25.0,21.4,26.1,19.5,25.2 + Laplaces,45.1,40.7,45.5,38.1,45.6 + Poissons,67.6,52.4,69.2,46.4,78.1 The next table presents the performance in percentage relative to values -generated by the legagy generator, `RandomState(MT19937())`. The overall +generated by the legacy generator, `RandomState(MT19937())`. The overall performance was computed using a geometric mean. .. csv-table:: - :header: ,Xoshiro256**,Xoshiro256**,DSFMT,PCG64,MT19937,Philox,ThreeFry - :widths: 14,14,14,14,14,14,14,14 - - 32-bit Unsigned Ints,124,113,93,100,99,67,43 - 64-bit Unsigned Ints,174,133,100,118,100,83,44 - Uniforms,212,181,229,147,100,91,57 - Normals,438,382,291,304,264,252,190 - Exponentials,851,770,547,601,512,467,275 - Gammas,200,212,204,190,170,166,122 - Binomials,114,112,123,101,93,91,81 - Laplaces,122,123,127,114,105,103,93 - Poissons,183,169,185,148,126,123,98 - Overall,212,194,180,167,145,131,93 + :header: ,MT19937,PCG64,Philox,SFC64 + :widths: 14,14,14,14,14 + + 32-bit Unsigned Ints,101,121,67,121 + 64-bit Unsigned Ints,102,156,91,199 + Uniforms,100,179,90,235 + Normals,263,338,257,443 + Exponentials,507,752,474,985 + Gammas,167,207,167,231 + Binomials,101,118,96,129 + Laplaces,101,112,100,120 + Poissons,116,149,113,168 + Overall,144,192,132,225 .. note:: @@ -87,33 +98,34 @@ across tables. 64-bit Linux ~~~~~~~~~~~~ -=================== ======= ========= ======= ======== ========== ============ -Distribution DSFMT MT19937 PCG64 Philox ThreeFry Xoshiro256 -=================== ======= ========= ======= ======== ========== ============ -32-bit Unsigned Int 99.3 100 113.9 72.1 48.3 117.1 -64-bit Unsigned Int 105.7 100 143.3 89.7 48.1 161.7 -Uniform 222.1 100 181.5 90.8 59.9 204.7 -Exponential 110.8 100 145.5 92.5 55.0 177.1 -Normal 113.2 100 121.4 98.3 71.9 162.0 -**Overall** 123.9 100 139.3 88.2 56.0 161.9 -=================== ======= ========= ======= ======== ========== ============ +=================== ========= ======= ======== ======= +Distribution MT19937 PCG64 Philox SFC64 +=================== ========= ======= ======== ======= +32-bit Unsigned Int 100 119.8 67.7 120.2 +64-bit Unsigned Int 100 152.9 90.8 213.3 +Uniforms 100 179.0 87.0 232.0 +Normals 100 128.5 99.2 167.8 +Exponentials 100 148.3 93.0 189.3 +**Overall** 100 144.3 86.8 180.0 +=================== ========= ======= ======== ======= 64-bit Windows ~~~~~~~~~~~~~~ -The performance on 64-bit Linux and 64-bit Windows is broadly similar. +The relative performance on 64-bit Linux and 64-bit Windows is broadly similar. + +=================== ========= ======= ======== ======= +Distribution MT19937 PCG64 Philox SFC64 +=================== ========= ======= ======== ======= +32-bit Unsigned Int 100 129.1 35.0 135.0 +64-bit Unsigned Int 100 146.9 35.7 176.5 +Uniforms 100 165.0 37.0 192.0 +Normals 100 128.5 48.5 158.0 +Exponentials 100 151.6 39.0 172.8 +**Overall** 100 143.6 38.7 165.7 +=================== ========= ======= ======== ======= -=================== ======= ========= ======= ======== ========== ============ -Distribution DSFMT MT19937 PCG64 Philox ThreeFry Xoshiro256 -=================== ======= ========= ======= ======== ========== ============ -32-bit Unsigned Int 122.8 100 134.9 44.1 72.3 133.1 -64-bit Unsigned Int 130.4 100 162.7 41.0 77.7 142.3 -Uniform 273.2 100 200.0 44.8 84.6 175.8 -Exponential 135.0 100 167.8 47.4 84.5 166.9 -Normal 115.3 100 135.6 60.3 93.6 169.6 -**Overall** 146.7 100 158.4 47.1 82.2 156.5 -=================== ======= ========= ======= ======== ========== ============ 32-bit Windows ~~~~~~~~~~~~~~ @@ -122,20 +134,20 @@ The performance of 64-bit generators on 32-bit Windows is much lower than on 64- operating systems due to register width. MT19937, the generator that has been in NumPy since 2005, operates on 32-bit integers. -=================== ======= ========= ======= ======== ========== ============ -Distribution DSFMT MT19937 PCG64 Philox ThreeFry Xoshiro256 -=================== ======= ========= ======= ======== ========== ============ -32-bit Unsigned Int 110.9 100 30.6 28.1 29.2 74.4 -64-bit Unsigned Int 104.7 100 24.2 23.7 22.7 72.7 -Uniform 247.0 100 26.7 28.4 27.8 78.8 -Exponential 110.1 100 32.1 32.6 30.5 89.6 -Normal 107.2 100 36.3 37.5 35.2 93.0 -**Overall** 127.6 100 29.7 29.7 28.8 81.3 -=================== ======= ========= ======= ======== ========== ============ +=================== ========= ======= ======== ======= +Distribution MT19937 PCG64 Philox SFC64 +=================== ========= ======= ======== ======= +32-bit Unsigned Int 100 30.5 21.1 77.9 +64-bit Unsigned Int 100 26.3 19.2 97.0 +Uniforms 100 28.0 23.0 106.0 +Normals 100 40.1 31.3 112.6 +Exponentials 100 33.7 26.3 109.8 +**Overall** 100 31.4 23.8 99.8 +=================== ========= ======= ======== ======= .. note:: - Linux timings used Ubuntu 18.04 and GCC 7.4. Windows timings were made on Windows 10 - using Microsoft C/C++ Optimizing Compiler Version 19 (Visual Studio 2015). All timings - were produced on a i5-3570 processor. + Linux timings used Ubuntu 18.04 and GCC 7.4. Windows timings were made on + Windows 10 using Microsoft C/C++ Optimizing Compiler Version 19 (Visual + Studio 2015). All timings were produced on a i5-3570 processor. diff --git a/doc/source/reference/routines.char.rst b/doc/source/reference/routines.char.rst index b58dd60b3..ed8393855 100644 --- a/doc/source/reference/routines.char.rst +++ b/doc/source/reference/routines.char.rst @@ -1,11 +1,13 @@ String operations ***************** -.. currentmodule:: numpy.core.defchararray +.. currentmodule:: numpy.char -This module provides a set of vectorized string operations for arrays -of type `numpy.string_` or `numpy.unicode_`. All of them are based on -the string methods in the Python standard library. +.. module:: numpy.char + +The `numpy.char` module provides a set of vectorized string +operations for arrays of type `numpy.string_` or `numpy.unicode_`. +All of them are based on the string methods in the Python standard library. String operations ----------------- diff --git a/doc/source/release.rst b/doc/source/release.rst index f8d83726f..8dfb8db1d 100644 --- a/doc/source/release.rst +++ b/doc/source/release.rst @@ -2,6 +2,7 @@ Release Notes ************* +.. include:: ../release/1.18.0-notes.rst .. include:: ../release/1.17.0-notes.rst .. include:: ../release/1.16.4-notes.rst .. include:: ../release/1.16.3-notes.rst |