T Tests#

import scipy
from IPython.display import Markdown as md

from nycschools import schools, exams

demo = schools.load_school_demographics()
df = exams.load_math_ela_long()
df = df[df["mean_scale_score"].notnull()]
df = df.merge(demo, how="inner", on=["dbn", "ay"])

df.head()

	dbn	grade	category	number_tested	mean_scale_score	level_1_n	level_1_pct	level_2_n	level_2_pct	level_3_n	...	swd_n	swd_pct	ell_n	ell_pct	poverty_n	poverty_pct	eni_pct	clean_name	zip
0	01M015	3	All Students	29	301.551727	8.0	0.275862	9.0	0.310345	7.0	...	51	0.287	12	0.067	152	0.854	0.882	roberto clemente	10009
1	01M015	4	All Students	23	301.391296	6.0	0.260870	8.0	0.347826	8.0	...	51	0.287	12	0.067	152	0.854	0.882	roberto clemente	10009
2	01M015	5	All Students	17	322.000000	2.0	0.117647	5.0	0.294118	8.0	...	51	0.287	12	0.067	152	0.854	0.882	roberto clemente	10009
3	01M015	All Grades	All Students	69	306.536224	16.0	0.231884	22.0	0.318841	23.0	...	51	0.287	12	0.067	152	0.854	0.882	roberto clemente	10009
4	01M015	3	Not SWD	23	307.652161	4.0	0.173913	9.0	0.391304	6.0	...	51	0.287	12	0.067	152	0.854	0.882	roberto clemente	10009

5 rows × 69 columns

# let's look at charter schools vs non-charter schools 
df["charter"] = df.district == 84
# get the schools in the range of 1-32 and district 84 (excludes other special districts)
data = df[df.district.isin(list(range(1,33)) + [84])]

# just get all students
data = data[data.category=="All Students"]

# remove null data
data = data[data["mean_scale_score"].notnull()]

# show the correlation between chater and test score
data = df[["mean_scale_score", "charter"]]
data.corr()

	mean_scale_score	charter
mean_scale_score	1.000000	0.015811
charter	0.015811	1.000000

# split the data into two groups for the t test
charter = data[data.charter == True]
community = data[data.charter == False]
# run a t-test to see if there is a statistical difference between charter and non-charter test results
t = scipy.stats.ttest_ind(charter.mean_scale_score, community.mean_scale_score)

# the scipy results include the t-value and the p-value
# t is the score of the test, and p is the probability that the difference is the result of chance
t

Ttest_indResult(statistic=9.1114087193616, pvalue=8.175234619664718e-20)

# when reporting the results we care about these variables too

# population size
n_charter = len(charter)
n_community = len(community)

# mean average
M_charter = charter.mean_scale_score.mean()
M_community = community.mean_scale_score.mean()

# standard deviation 
sd_charter = charter.mean_scale_score.std()
sd_community = community.mean_scale_score.std()


display(md(f"""
**T-Test results** comparing school averages of 
Charter School Test Results (`n={n_charter}`) and Community School Test Results (`n={n_community}`)
students in 3-8th grade student ELA and Math scores.

- Charter test results: M={M_charter:.02f}, SD={sd_charter:.02f}
- Community test results: M={M_community:.02f}, SD={sd_community:.02f}
- T-score: {t.statistic:.04f}, p-val: {t.pvalue:.04f}

`n` values report the number of school average test results observed, not the number of test takers. 
"""))

T-Test results comparing school averages of Charter School Test Results (n=5983) and Community School Test Results (n=326004) students in 3-8th grade student ELA and Math scores.

Charter test results: M=517.14, SD=134.96
Community test results: M=500.42, SD=140.75
T-score: 9.1114, p-val: 0.0000

n values report the number of school average test results observed, not the number of test takers.

`pingouin` stats wrapper#

The pingouin library has a number of functions that “wrap” standard python stats functions to include additional information and nicer formatting out of the box. ttest is one of these functions. We can see in the output below that we get the t value, p value (like scipy.stats), but we also get degrees of freedom, confidence intervals, and more, without having to calculate these independently for each test.

import pingouin as pg
pg.ttest(charter.mean_scale_score, community.mean_scale_score, correction=False)

	T	dof	alternative	p-val	CI95%	cohen-d	BF10	power
T-test	9.111409	331985	two-sided	8.175235e-20	[13.12, 20.32]	0.118871	1.516e+16	1.0

NYC Schools Open Data Portal

T Tests

Contents

T Tests#

`pingouin` stats wrapper#

NYC Schools Open Data Portal

T Tests

Contents

T Tests#

pingouin stats wrapper#

`pingouin` stats wrapper#