summaryrefslogtreecommitdiff
path: root/doc/source/reference/random/performance.rst
blob: d70dd064a70b3e7e8af4c80d69ab2045b62ae351 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
Performance
-----------

.. currentmodule:: numpy.random

Recommendation
**************
The recommended generator for general use is `PCG64`. It is
statistically high quality, full-featured, and fast on most platforms, but
somewhat slow when compiled for 32-bit processes.

`Philox` is fairly slow, but its statistical properties have
very high quality, and it is easy to get assuredly-independent stream by using
unique keys. If that is the style you wish to use for parallel streams, or you
are porting from another system that uses that style, then
`Philox` is your choice.

`SFC64` is statistically high quality and very fast. However, it
lacks jumpability. If you are not using that capability and want lots of speed,
even on 32-bit processes, this is your choice.

`MT19937` `fails some statistical tests`_ and is not especially
fast compared to modern PRNGs. For these reasons, we mostly do not recommend
using it on its own, only through the legacy `~.RandomState` for
reproducing old results. That said, it has a very long history as a default in
many systems.

.. _`fails some statistical tests`: https://www.iro.umontreal.ca/~lecuyer/myftp/papers/testu01.pdf

Timings
*******

The timings below are the time in ns to produce 1 random value from a
specific distribution.  The original `MT19937` generator is
much slower since it requires 2 32-bit values to equal the output of the
faster generators.

Integer performance has a similar ordering.

The pattern is similar for other, more complex generators. The normal
performance of the legacy `RandomState` generator is much
lower than the other since it uses the Box-Muller transformation rather
than the Ziggurat generator. The performance gap for Exponentials is also
large due to the cost of computing the log function to invert the CDF.
The column labeled MT19973 is used the same 32-bit generator as
`RandomState` but produces random values using
`Generator`.

.. csv-table::
    :header: ,MT19937,PCG64,Philox,SFC64,RandomState
    :widths: 14,14,14,14,14,14

    32-bit Unsigned Ints,3.2,2.7,4.9,2.7,3.2
    64-bit Unsigned Ints,5.6,3.7,6.3,2.9,5.7
    Uniforms,7.3,4.1,8.1,3.1,7.3
    Normals,13.1,10.2,13.5,7.8,34.6
    Exponentials,7.9,5.4,8.5,4.1,40.3
    Gammas,34.8,28.0,34.7,25.1,58.1
    Binomials,25.0,21.4,26.1,19.5,25.2
    Laplaces,45.1,40.7,45.5,38.1,45.6
    Poissons,67.6,52.4,69.2,46.4,78.1

The next table presents the performance in percentage relative to values
generated by the legacy generator, ``RandomState(MT19937())``. The overall
performance was computed using a geometric mean.

.. csv-table::
    :header: ,MT19937,PCG64,Philox,SFC64
    :widths: 14,14,14,14,14

    32-bit Unsigned Ints,101,121,67,121
    64-bit Unsigned Ints,102,156,91,199
    Uniforms,100,179,90,235
    Normals,263,338,257,443
    Exponentials,507,752,474,985
    Gammas,167,207,167,231
    Binomials,101,118,96,129
    Laplaces,101,112,100,120
    Poissons,116,149,113,168
    Overall,144,192,132,225

.. note::

   All timings were taken using Linux on a i5-3570 processor.

Performance on different Operating Systems
******************************************
Performance differs across platforms due to compiler and hardware availability
(e.g., register width) differences. The default bit generator has been chosen
to perform well on 64-bit platforms.  Performance on 32-bit operating systems
is very different.

The values reported are normalized relative to the speed of MT19937 in
each table. A value of 100 indicates that the performance matches the MT19937.
Higher values indicate improved performance. These values cannot be compared
across tables.

64-bit Linux
~~~~~~~~~~~~

===================   =========  =======  ========  =======
Distribution            MT19937    PCG64    Philox    SFC64
===================   =========  =======  ========  =======
32-bit Unsigned Int         100    119.8      67.7    120.2
64-bit Unsigned Int         100    152.9      90.8    213.3
Uniforms                    100    179.0      87.0    232.0
Normals                     100    128.5      99.2    167.8
Exponentials                100    148.3      93.0    189.3
**Overall**                 100    144.3      86.8    180.0
===================   =========  =======  ========  =======


64-bit Windows
~~~~~~~~~~~~~~
The relative performance on 64-bit Linux and 64-bit Windows is broadly similar.


===================   =========  =======  ========  =======
Distribution            MT19937    PCG64    Philox    SFC64
===================   =========  =======  ========  =======
32-bit Unsigned Int         100    129.1      35.0    135.0
64-bit Unsigned Int         100    146.9      35.7    176.5
Uniforms                    100    165.0      37.0    192.0
Normals                     100    128.5      48.5    158.0
Exponentials                100    151.6      39.0    172.8
**Overall**                 100    143.6      38.7    165.7
===================   =========  =======  ========  =======


32-bit Windows
~~~~~~~~~~~~~~

The performance of 64-bit generators on 32-bit Windows is much lower than on 64-bit
operating systems due to register width. MT19937, the generator that has been
in NumPy since 2005, operates on 32-bit integers.

===================   =========  =======  ========  =======
Distribution            MT19937    PCG64    Philox    SFC64
===================   =========  =======  ========  =======
32-bit Unsigned Int         100     30.5      21.1     77.9
64-bit Unsigned Int         100     26.3      19.2     97.0
Uniforms                    100     28.0      23.0    106.0
Normals                     100     40.1      31.3    112.6
Exponentials                100     33.7      26.3    109.8
**Overall**                 100     31.4      23.8     99.8
===================   =========  =======  ========  =======


.. note::

   Linux timings used Ubuntu 18.04 and GCC 7.4.  Windows timings were made on
   Windows 10 using Microsoft C/C++ Optimizing Compiler Version 19 (Visual
   Studio 2015). All timings were produced on a i5-3570 processor.