summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorFrançois Le Lay <fly@spotify.com>2021-02-12 09:26:45 -0500
committerFrançois Le Lay <fly@spotify.com>2021-02-12 09:26:45 -0500
commit98bf466b42aea9bb804275af6f11d1c7cfdebbad (patch)
tree335e46bc8e422f383ee73dce9d6adf8c81444386
parente29d0955c7c4e482e4128dde68ba5ff2246b8dbf (diff)
downloadnumpy-98bf466b42aea9bb804275af6f11d1c7cfdebbad.tar.gz
Remove poorly cited reference
Clarify point about std computation Use absolute value for p-value computation of 2-tailed test
-rw-r--r--numpy/random/_generator.pyx20
-rw-r--r--numpy/random/mtrand.pyx20
2 files changed, 16 insertions, 24 deletions
diff --git a/numpy/random/_generator.pyx b/numpy/random/_generator.pyx
index 003c16113..3a84470ea 100644
--- a/numpy/random/_generator.pyx
+++ b/numpy/random/_generator.pyx
@@ -1722,9 +1722,6 @@ cdef class Generator:
Springer, 2002.
.. [2] Wikipedia, "Student's t-distribution"
https://en.wikipedia.org/wiki/Student's_t-distribution
- .. [3] Walker, Helen M., "Degrees of freedom" in Journal of
- Educational Psychology
- http://www.nohsteachers.info/pcaso/ap_statistics/PDFs/DegreesOfFreedom.pdf
Examples
--------
@@ -1740,11 +1737,12 @@ cdef class Generator:
either positive or negative, hence making our test 2-tailed.
Because we are estimating the mean and we have N=11 values in our sample,
- we have N-1=10 degrees of freedom [3]_. We set our signifance level to 95% and
- compute the t statistic using the empirical mean and standard deviation
- of our intake, setting the ddof parameter to the unbiased
- value so the divisor in the standard deviation will be degrees of
- freedom.
+ we have N-1=10 degrees of freedom. We set our signifance level to 95% and
+ compute the t statistic using the empirical mean and empirical standard
+ deviation of our intake. We use a ddof of 1 to base the computation of our
+ empirical standard deviation on an unbiased estimate of the variance (note:
+ the final estimate is not unbiased due to the concave nature of the square
+ root).
>>> np.mean(intake)
6753.636363636364
@@ -1764,10 +1762,8 @@ cdef class Generator:
Does our t statistic land in one of the two critical regions found at
both tails of the distribution?
- >>> np.sum(s<t) / float(len(s))
- 0.009159 #random <0.025, statistic is in critical region
- >>> 2*0.009159
- 0.018318 #random
+ >>> np.sum(np.abs(t) < np.abs(s)) / float(len(s))
+ 0.018318 #random < 0.05, statistic is in critical region
The probability value for this 2-tailed test is about 1.83%, which is
lower than the 5% pre-determined significance threshold.
diff --git a/numpy/random/mtrand.pyx b/numpy/random/mtrand.pyx
index 8429cf17d..683705a05 100644
--- a/numpy/random/mtrand.pyx
+++ b/numpy/random/mtrand.pyx
@@ -2137,9 +2137,6 @@ cdef class RandomState:
Springer, 2002.
.. [2] Wikipedia, "Student's t-distribution"
https://en.wikipedia.org/wiki/Student's_t-distribution
- .. [3] Walker, Helen M., "Degrees of freedom" in Journal of
- Educational Psychology
- http://www.nohsteachers.info/pcaso/ap_statistics/PDFs/DegreesOfFreedom.pdf
Examples
--------
@@ -2155,11 +2152,12 @@ cdef class RandomState:
either positive or negative, hence making our test 2-tailed.
Because we are estimating the mean and we have N=11 values in our sample,
- we have N-1=10 degrees of freedom [3]_. We set our signifance level to 95% and
- compute the t statistic using the empirical mean and standard deviation
- of our intake, setting the ddof parameter to the unbiased
- value so the divisor in the standard deviation will be degrees of
- freedom.
+ we have N-1=10 degrees of freedom. We set our signifance level to 95% and
+ compute the t statistic using the empirical mean and empirical standard
+ deviation of our intake. We use a ddof of 1 to base the computation of our
+ empirical standard deviation on an unbiased estimate of the variance (note:
+ the final estimate is not unbiased due to the concave nature of the square
+ root).
>>> np.mean(intake)
6753.636363636364
@@ -2179,10 +2177,8 @@ cdef class RandomState:
Does our t statistic land in one of the two critical regions found at
both tails of the distribution?
- >>> np.sum(s<t) / float(len(s))
- 0.009159 #random < 0.025, statistic is in critical region
- >>> 2*0.009159
- 0.018318 #random
+ >>> np.sum(np.abs(t) < np.abs(s)) / float(len(s))
+ 0.018318 #random < 0.05, statistic is in critical region
The probability value for this 2-tailed test is about 1.83%, which is
lower than the 5% pre-determined significance threshold.