MAINT: random: Rewrite the hypergeometric distribution.

Summary of the changes: * Move the functions random_hypergeometric_hyp, random_hypergeometric_hrua and random_hypergeometric from distributions.c to legacy-distributions.c. These are now the legacy implementation of hypergeometric. * Add the files logfactorial.c and logfactorial.h, containing the function logfactorial(int64_t k). * Add the files random_hypergeometric.c and random_hypergeometric.h, containing the function random_hypergeometric (the new implementation of the hypergeometric distribution). See more details below. * Fix two tests in numpy/random/tests/test_generator_mt19937.py that used values returned by the hypergeometric distribution. The new implementation changes the stream, so those tests needed to be updated. * Remove another test obviated by an added constraint on the arguments of hypergeometric. Details of the rewrite: If you carefully step through the old function rk_hypergeometric_hyp(), you'll see that the end result is basically the same as the new function hypergeometric_sample(), but the new function accomplishes the result with just integers. The floating point calculations in the old code caused problems when the arguments were extremely large (explained in more detail in the unmerged pull request https://github.com/numpy/numpy/pull/9834). The new version of hypergeometric_hrua() is a new translation of Stadlober's ratio-of-uniforms algorithm for the hypergeometric distribution. It fixes a mistake in the old implementation that made the method less efficient than it could be (see the details in the unmerged pull request https://github.com/numpy/numpy/pull/11138), and uses a faster function for computing log(k!). The HRUA algorithm suffers from loss of floating point precision when the arguments are *extremely* large (see the comments in github issue 11443). To avoid these problems, the arguments `ngood` and `nbad` of hypergeometric must be less than 10**9. This constraint obviates an existing regression test that was run on systems with 64 bit long integers, so that test was removed.
author: Warren Weckesser <warren.weckesser@gmail.com> 2019-06-10 23:00:39 -0400
committer: Warren Weckesser <warren.weckesser@gmail.com> 2019-06-14 16:14:31 -0400
commit: b2d2b677c26c8da7052bbc653b20ad9717f078fe (patch)
tree: efbeb7af0d5ae96d7ee61cafc8625c937dad5d2b /numpy/random/tests
parent: e3eb3986dd87e700a694d6b4151c96ef92dfabe0 (diff)
download: numpy-b2d2b677c26c8da7052bbc653b20ad9717f078fe.tar.gz
2 files changed, 11 insertions, 13 deletions
diff --git a/numpy/random/tests/test_generator_mt19937.py b/numpy/random/tests/test_generator_mt19937.py
index c6259c42a..44324c9e3 100644
--- a/numpy/random/tests/test_generator_mt19937.py
+++ b/numpy/random/tests/test_generator_mt19937.py
@@ -897,9 +897,9 @@ class TestRandomDist(object):
     def test_hypergeometric(self):
         random.bit_generator.seed(self.seed)
         actual = random.hypergeometric(10.1, 5.5, 14, size=(3, 2))
-        desired = np.array([[10, 10],
-                            [10, 10],
-                            [9, 9]])
+        desired = np.array([[9, 9],
+                            [10, 9],
+                            [9, 10]])
         assert_array_equal(actual, desired)
 
         # Test nbad = 0
@@ -1875,7 +1875,7 @@ class TestBroadcast(object):
         bad_nsample_one = [-1]
         bad_nsample_two = [4]
         hypergeom = random.hypergeometric
-        desired = np.array([1, 1, 1])
+        desired = np.array([0, 0, 1])
 
         self.set_seed()
         actual = hypergeom(ngood * 3, nbad, nsample)
@@ -1906,6 +1906,11 @@ class TestBroadcast(object):
         assert_raises(ValueError, hypergeom, 10, 10, -1)
         assert_raises(ValueError, hypergeom, 10, 10, 25)
 
+        # ValueError for arguments that are too big.
+        assert_raises(ValueError, hypergeom, 2**30, 10, 20)
+        assert_raises(ValueError, hypergeom, 999, 2**31, 50)
+        assert_raises(ValueError, hypergeom, 999, [2**29, 2**30], 1000)
+
     def test_logseries(self):
         p = [0.5]
         bad_p_one = [2]
diff --git a/numpy/random/tests/test_generator_mt19937_regressions.py b/numpy/random/tests/test_generator_mt19937_regressions.py
index cd4e08d6f..7dca65071 100644
--- a/numpy/random/tests/test_generator_mt19937_regressions.py
+++ b/numpy/random/tests/test_generator_mt19937_regressions.py
@@ -23,15 +23,8 @@ class TestRegression(object):
         assert_(np.all(mt19937.hypergeometric(18, 3, 11, size=10) > 0))
 
         # Test for ticket #5623
-        args = [
-            (2**20 - 2, 2**20 - 2, 2**20 - 2),  # Check for 32-bit systems
-        ]
-        is_64bits = sys.maxsize > 2**32
-        if is_64bits and sys.platform != 'win32':
-            # Check for 64-bit systems
-            args.append((2**40 - 2, 2**40 - 2, 2**40 - 2))
-        for arg in args:
-            assert_(mt19937.hypergeometric(*arg) > 0)
+        args = (2**20 - 2, 2**20 - 2, 2**20 - 2)  # Check for 32-bit systems
+        assert_(mt19937.hypergeometric(*args) > 0)
 
     def test_logseries_convergence(self):
         # Test for ticket #923
author	Warren Weckesser <warren.weckesser@gmail.com>	2019-06-10 23:00:39 -0400
committer	Warren Weckesser <warren.weckesser@gmail.com>	2019-06-14 16:14:31 -0400
commit	b2d2b677c26c8da7052bbc653b20ad9717f078fe (patch)
tree	efbeb7af0d5ae96d7ee61cafc8625c937dad5d2b /numpy/random/tests
parent	e3eb3986dd87e700a694d6b4151c96ef92dfabe0 (diff)
download	numpy-b2d2b677c26c8da7052bbc653b20ad9717f078fe.tar.gz