summaryrefslogtreecommitdiff
path: root/doc/source/reference/random/performance.rst
blob: 07867ee076b3e6bdb9066812dbb9fdcd97320484 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
Performance
-----------

.. py:module:: numpy.random

.. currentmodule:: numpy.random

Recommendation
**************
The recommended generator for single use is :class:`~.xoshiro256.Xoshiro256`.
The recommended generator for use in large-scale parallel applications is
:class:`~.xoshiro512.Xoshiro512` where the `jumped` method is used to advance
the state. For very large scale applications -- requiring 1,000+ independent
streams -- is the best choice. For very large scale applications -- requiring
1,000+ independent streams, :class:`~pcg64.PCG64` or :class:`~.philox.Philox`
are the best choices.

Timings
*******

The timings below are the time in ns to produce 1 random value from a
specific distribution.  The original :class:`~mt19937.MT19937` generator is
much slower since it requires 2 32-bit values to equal the output of the
faster generators.

Integer performance has a similar ordering although `dSFMT` is slower since
it generates 53-bit floating point values rather than integer values.

The pattern is similar for other, more complex generators. The normal
performance of the legacy :class:`~mtrand.RandomState` generator is much
lower than the other since it uses the Box-Muller transformation rather
than the Ziggurat generator. The performance gap for Exponentials is also
large due to the cost of computing the log function to invert the CDF.
The column labeled MT19973 is used the same 32-bit generator as
:class:`~mtrand.RandomState` but produces random values using
:class:`~generator.Generator`.

.. csv-table::
    :header: ,Xoshiro256**,Xoshiro512**,DSFMT,PCG64,MT19937,Philox,RandomState,ThreeFry
    :widths: 14,14,14,14,14,14,14,14,14

    32-bit Unsigned Ints,2.6,2.9,3.5,3.2,3.3,4.8,3.2,7.6
    64-bit Unsigned Ints,3.3,4.3,5.7,4.8,5.7,6.9,5.7,12.8
    Uniforms,3.4,4.0,3.2,5.0,7.3,8.0,7.3,12.8
    Normals,7.9,9.0,11.8,11.3,13.0,13.7,34.4,18.1
    Exponentials,4.7,5.2,7.4,6.7,7.9,8.6,40.3,14.7
    Gammas,29.1,27.5,28.5,30.6,34.2,35.1,58.1,47.6
    Binomials,22.7,23.1,21.1,25.7,27.7,28.4,25.9,32.1
    Laplaces,38.5,38.1,36.9,41.1,44.5,45.4,46.9,50.2
    Poissons,46.9,50.9,46.4,58.1,68.4,70.2,86.0,88.2


The next table presents the performance in percentage relative to values
generated by the legagy generator, `RandomState(MT19937())`. The overall
performance was computed using a geometric mean.

.. csv-table::
    :header: ,Xoshiro256**,Xoshiro256**,DSFMT,PCG64,MT19937,Philox,ThreeFry
    :widths: 14,14,14,14,14,14,14,14

    32-bit Unsigned Ints,124,113,93,100,99,67,43
    64-bit Unsigned Ints,174,133,100,118,100,83,44
    Uniforms,212,181,229,147,100,91,57
    Normals,438,382,291,304,264,252,190
    Exponentials,851,770,547,601,512,467,275
    Gammas,200,212,204,190,170,166,122
    Binomials,114,112,123,101,93,91,81
    Laplaces,122,123,127,114,105,103,93
    Poissons,183,169,185,148,126,123,98
    Overall,212,194,180,167,145,131,93

.. note::

   All timings were taken using Linux on a i5-3570 processor.

Performance on different Operating Systems
******************************************
Performance differs across platforms due to compiler and hardware availability
(e.g., register width) differences. The default bit generator has been chosen
to perform well on 64-bit platforms.  Performance on 32-bit operating systems
is very different.

The values reported are normalized relative to the speed of MT19937 in
each table. A value of 100 indicates that the performance matches the MT19937.
Higher values indicate improved performance. These values cannot be compared
across tables.

64-bit Linux
~~~~~~~~~~~~

===================  =======  =========  =======  ========  ==========  ============
Distribution           DSFMT    MT19937    PCG64    Philox    ThreeFry    Xoshiro256
===================  =======  =========  =======  ========  ==========  ============
32-bit Unsigned Int     99.3        100    113.9      72.1        48.3         117.1
64-bit Unsigned Int    105.7        100    143.3      89.7        48.1         161.7
Uniform                222.1        100    181.5      90.8        59.9         204.7
Exponential            110.8        100    145.5      92.5        55.0         177.1
Normal                 113.2        100    121.4      98.3        71.9         162.0
**Overall**            123.9        100    139.3      88.2        56.0         161.9
===================  =======  =========  =======  ========  ==========  ============


64-bit Windows
~~~~~~~~~~~~~~
The performance on 64-bit Linux and 64-bit Windows is broadly similar.


===================  =======  =========  =======  ========  ==========  ============
Distribution           DSFMT    MT19937    PCG64    Philox    ThreeFry    Xoshiro256
===================  =======  =========  =======  ========  ==========  ============
32-bit Unsigned Int    122.8        100    134.9      44.1        72.3         133.1
64-bit Unsigned Int    130.4        100    162.7      41.0        77.7         142.3
Uniform                273.2        100    200.0      44.8        84.6         175.8
Exponential            135.0        100    167.8      47.4        84.5         166.9
Normal                 115.3        100    135.6      60.3        93.6         169.6
**Overall**            146.7        100    158.4      47.1        82.2         156.5
===================  =======  =========  =======  ========  ==========  ============

32-bit Windows
~~~~~~~~~~~~~~

The performance of 64-bit generators on 32-bit Windows is much lower than on 64-bit
operating systems due to register width. DSFMT uses SSE2 when available, and so is less
affected by the size of the operating system's register. MT19937, the generator that has been
in NumPy since 2005, operates on 32-bit integers and so is close to DSFMT.

===================  =======  =========  =======  ========  ==========  ============
Distribution           DSFMT    MT19937    PCG64    Philox    ThreeFry    Xoshiro256
===================  =======  =========  =======  ========  ==========  ============
32-bit Unsigned Int    110.9        100     30.6      28.1        29.2          74.4
64-bit Unsigned Int    104.7        100     24.2      23.7        22.7          72.7
Uniform                247.0        100     26.7      28.4        27.8          78.8
Exponential            110.1        100     32.1      32.6        30.5          89.6
Normal                 107.2        100     36.3      37.5        35.2          93.0
**Overall**            127.6        100     29.7      29.7        28.8          81.3
===================  =======  =========  =======  ========  ==========  ============


.. note::

   Linux timings used Ubuntu 18.04 and GCC 7.4.  Windows timings were made on Windows 10
   using Microsoft C/C++ Optimizing Compiler Version 19 (Visual Studio 2015). All timings
   were produced on a i5-3570 processor.