Remove poorly cited reference

Clarify point about std computation Use absolute value for p-value computation of 2-tailed test
author: François Le Lay <fly@spotify.com> 2021-02-12 09:26:45 -0500
committer: François Le Lay <fly@spotify.com> 2021-02-12 09:26:45 -0500
commit: 98bf466b42aea9bb804275af6f11d1c7cfdebbad (patch)
tree: 335e46bc8e422f383ee73dce9d6adf8c81444386
parent: e29d0955c7c4e482e4128dde68ba5ff2246b8dbf (diff)
download: numpy-98bf466b42aea9bb804275af6f11d1c7cfdebbad.tar.gz
2 files changed, 16 insertions, 24 deletions
diff --git a/numpy/random/_generator.pyx b/numpy/random/_generator.pyx
index 003c16113..3a84470ea 100644
--- a/numpy/random/_generator.pyx
+++ b/numpy/random/_generator.pyx
@@ -1722,9 +1722,6 @@ cdef class Generator:
                Springer, 2002.
         .. [2] Wikipedia, "Student's t-distribution"
                https://en.wikipedia.org/wiki/Student's_t-distribution
-        .. [3] Walker, Helen M., "Degrees of freedom" in Journal of 
-               Educational Psychology
-               http://www.nohsteachers.info/pcaso/ap_statistics/PDFs/DegreesOfFreedom.pdf
 
         Examples
         --------
@@ -1740,11 +1737,12 @@ cdef class Generator:
         either positive or negative, hence making our test 2-tailed. 
 
         Because we are estimating the mean and we have N=11 values in our sample,
-        we have N-1=10 degrees of freedom [3]_. We set our signifance level to 95% and 
-        compute the t statistic using the empirical mean and standard deviation
-        of our intake, setting the ddof parameter to the unbiased
-        value so the divisor in the standard deviation will be degrees of
-        freedom.
+        we have N-1=10 degrees of freedom. We set our signifance level to 95% and 
+        compute the t statistic using the empirical mean and empirical standard 
+        deviation of our intake. We use a ddof of 1 to base the computation of our 
+        empirical standard deviation on an unbiased estimate of the variance (note:
+        the final estimate is not unbiased due to the concave nature of the square 
+        root).
 
         >>> np.mean(intake)
         6753.636363636364
@@ -1764,10 +1762,8 @@ cdef class Generator:
         Does our t statistic land in one of the two critical regions found at 
         both tails of the distribution?
 
-        >>> np.sum(s<t) / float(len(s))
-        0.009159  #random <0.025, statistic is in critical region 
-        >>> 2*0.009159
-        0.018318  #random
+        >>> np.sum(np.abs(t) < np.abs(s)) / float(len(s))
+        0.018318  #random < 0.05, statistic is in critical region
 
         The probability value for this 2-tailed test is about 1.83%, which is 
         lower than the 5% pre-determined significance threshold. 
diff --git a/numpy/random/mtrand.pyx b/numpy/random/mtrand.pyx
index 8429cf17d..683705a05 100644
--- a/numpy/random/mtrand.pyx
+++ b/numpy/random/mtrand.pyx
@@ -2137,9 +2137,6 @@ cdef class RandomState:
                Springer, 2002.
         .. [2] Wikipedia, "Student's t-distribution"
                https://en.wikipedia.org/wiki/Student's_t-distribution
-        .. [3] Walker, Helen M., "Degrees of freedom" in Journal of 
-               Educational Psychology
-               http://www.nohsteachers.info/pcaso/ap_statistics/PDFs/DegreesOfFreedom.pdf
 
         Examples
         --------
@@ -2155,11 +2152,12 @@ cdef class RandomState:
         either positive or negative, hence making our test 2-tailed. 
 
         Because we are estimating the mean and we have N=11 values in our sample,
-        we have N-1=10 degrees of freedom [3]_. We set our signifance level to 95% and 
-        compute the t statistic using the empirical mean and standard deviation
-        of our intake, setting the ddof parameter to the unbiased
-        value so the divisor in the standard deviation will be degrees of
-        freedom.
+        we have N-1=10 degrees of freedom. We set our signifance level to 95% and 
+        compute the t statistic using the empirical mean and empirical standard 
+        deviation of our intake. We use a ddof of 1 to base the computation of our 
+        empirical standard deviation on an unbiased estimate of the variance (note:
+        the final estimate is not unbiased due to the concave nature of the square 
+        root).
 
         >>> np.mean(intake)
         6753.636363636364
@@ -2179,10 +2177,8 @@ cdef class RandomState:
         Does our t statistic land in one of the two critical regions found at 
         both tails of the distribution?
 
-        >>> np.sum(s<t) / float(len(s))
-        0.009159  #random < 0.025, statistic is in critical region
-        >>> 2*0.009159
-        0.018318  #random
+        >>> np.sum(np.abs(t) < np.abs(s)) / float(len(s))
+        0.018318  #random < 0.05, statistic is in critical region
 
         The probability value for this 2-tailed test is about 1.83%, which is 
         lower than the 5% pre-determined significance threshold.
author	François Le Lay <fly@spotify.com>	2021-02-12 09:26:45 -0500
committer	François Le Lay <fly@spotify.com>	2021-02-12 09:26:45 -0500
commit	98bf466b42aea9bb804275af6f11d1c7cfdebbad (patch)
tree	335e46bc8e422f383ee73dce9d6adf8c81444386
parent	e29d0955c7c4e482e4128dde68ba5ff2246b8dbf (diff)
download	numpy-98bf466b42aea9bb804275af6f11d1c7cfdebbad.tar.gz