T Tests#

import scipy
from IPython.display import Markdown as md

from nycschools import schools, exams
demo = schools.load_school_demographics()
df = exams.load_math_ela_long()
df = df[df["mean_scale_score"].notnull()]
df = df.merge(demo, how="inner", on=["dbn", "ay"])

df.head()
dbn grade category number_tested mean_scale_score level_1_n level_1_pct level_2_n level_2_pct level_3_n ... missing_race_ethnicity_data_pct swd_n swd_pct ell_n ell_pct poverty_n poverty_pct eni_pct clean_name zip
0 01M015 3 All Students 29 301.551727 8.0 0.275862 9.0 0.310345 7.0 ... 0.0 51 0.287 12 0.067 152 0.854 0.882 roberto clemente 10009
1 01M015 4 All Students 23 301.391296 6.0 0.260870 8.0 0.347826 8.0 ... 0.0 51 0.287 12 0.067 152 0.854 0.882 roberto clemente 10009
2 01M015 5 All Students 17 322.000000 2.0 0.117647 5.0 0.294118 8.0 ... 0.0 51 0.287 12 0.067 152 0.854 0.882 roberto clemente 10009
3 01M015 All Grades All Students 69 306.536224 16.0 0.231884 22.0 0.318841 23.0 ... 0.0 51 0.287 12 0.067 152 0.854 0.882 roberto clemente 10009
4 01M015 3 Not SWD 23 307.652161 4.0 0.173913 9.0 0.391304 6.0 ... 0.0 51 0.287 12 0.067 152 0.854 0.882 roberto clemente 10009

5 rows × 69 columns

# let's look at charter schools vs non-charter schools 
df["charter"] = df.district == 84
# get the schools in the range of 1-32 and district 84 (excludes other special districts)
data = df[df.district.isin(list(range(1,33)) + [84])]

# just get all students
data = data[data.category=="All Students"]

# remove null data
data = data[data["mean_scale_score"].notnull()]

# show the correlation between chater and test score
data = df[["mean_scale_score", "charter"]]
data.corr()
mean_scale_score charter
mean_scale_score 1.000000 0.015811
charter 0.015811 1.000000
# split the data into two groups for the t test
charter = data[data.charter == True]
community = data[data.charter == False]
# run a t-test to see if there is a statistical difference between charter and non-charter test results
t = scipy.stats.ttest_ind(charter.mean_scale_score, community.mean_scale_score)

# the scipy results include the t-value and the p-value
# t is the score of the test, and p is the probability that the difference is the result of chance
t
Ttest_indResult(statistic=9.1114087193616, pvalue=8.175234619664718e-20)
# when reporting the results we care about these variables too

# population size
n_charter = len(charter)
n_community = len(community)

# mean average
M_charter = charter.mean_scale_score.mean()
M_community = community.mean_scale_score.mean()

# standard deviation 
sd_charter = charter.mean_scale_score.std()
sd_community = community.mean_scale_score.std()


display(md(f"""
**T-Test results** comparing school averages of 
Charter School Test Results (`n={n_charter}`) and Community School Test Results (`n={n_community}`)
students in 3-8th grade student ELA and Math scores.

- Charter test results: M={M_charter:.02f}, SD={sd_charter:.02f}
- Community test results: M={M_community:.02f}, SD={sd_community:.02f}
- T-score: {t.statistic:.04f}, p-val: {t.pvalue:.04f}

`n` values report the number of school average test results observed, not the number of test takers. 
"""))

T-Test results comparing school averages of Charter School Test Results (n=5983) and Community School Test Results (n=326004) students in 3-8th grade student ELA and Math scores.

  • Charter test results: M=517.14, SD=134.96

  • Community test results: M=500.42, SD=140.75

  • T-score: 9.1114, p-val: 0.0000

n values report the number of school average test results observed, not the number of test takers.

pingouin stats wrapper#

The pingouin library has a number of functions that “wrap” standard python stats functions to include additional information and nicer formatting out of the box. ttest is one of these functions. We can see in the output below that we get the t value, p value (like scipy.stats), but we also get degrees of freedom, confidence intervals, and more, without having to calculate these independently for each test.

import pingouin as pg
pg.ttest(charter.mean_scale_score, community.mean_scale_score, correction=False)
T dof alternative p-val CI95% cohen-d BF10 power
T-test 9.111409 331985 two-sided 8.175235e-20 [13.12, 20.32] 0.118871 1.516e+16 1.0