python - Perform 2 sample t-test -
i have mean, std dev , n of sample 1 , sample 2 - samples taken sample population, measured different labs.
n different sample 1 , sample 2. want weighted (take n account) two-tailed t-test.
i tried using scipy.stat module creating numbers np.random.normal
, since takes data , not stat values mean , std dev (is there way use these values directly). didn't work since data arrays has of equal size.
any on how p-value highly appreciated.
if have original data arrays a
, b
, can use scipy.stats.ttest_ind
argument equal_var=false
:
t, p = ttest_ind(a, b, equal_var=false)
if have summary statistics of 2 data sets, can calculate t value using scipy.stats.ttest_ind_from_stats
(added scipy in version 0.16) or formula (http://en.wikipedia.org/wiki/welch%27s_t_test).
the following script shows possibilities.
from __future__ import print_function import numpy np scipy.stats import ttest_ind, ttest_ind_from_stats scipy.special import stdtr np.random.seed(1) # create sample data. = np.random.randn(40) b = 4*np.random.randn(50) # use scipy.stats.ttest_ind. t, p = ttest_ind(a, b, equal_var=false) print("ttest_ind: t = %g p = %g" % (t, p)) # compute descriptive statistics of , b. abar = a.mean() avar = a.var(ddof=1) na = a.size adof = na - 1 bbar = b.mean() bvar = b.var(ddof=1) nb = b.size bdof = nb - 1 # use scipy.stats.ttest_ind_from_stats. t2, p2 = ttest_ind_from_stats(abar, np.sqrt(avar), na, bbar, np.sqrt(bvar), nb, equal_var=false) print("ttest_ind_from_stats: t = %g p = %g" % (t2, p2)) # use formulas directly. tf = (abar - bbar) / np.sqrt(avar/na + bvar/nb) dof = (avar/na + bvar/nb)**2 / (avar**2/(na**2*adof) + bvar**2/(nb**2*bdof)) pf = 2*stdtr(dof, -np.abs(tf)) print("formula: t = %g p = %g" % (tf, pf))
the output:
ttest_ind: t = -1.5827 p = 0.118873 ttest_ind_from_stats: t = -1.5827 p = 0.118873 formula: t = -1.5827 p = 0.118873
Comments
Post a Comment