{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "85984ae3", "metadata": {}, "source": [ "Merges & Joins\n", "==============\n", "Here, finally, we combine data sets together. I say finally because this is one of the _first_ things you will do when working with real world data. `nycschools` has already combined multiple data sets into its core DataFrames, but you will want to combine this data in new ways, make new DataFrames from your results, and pull in new data from the outside world." ] }, { "cell_type": "code", "execution_count": 1, "id": "0a4ea7ae", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from nycschools import schools, exams\n", "\n", "# school demographics in one frame\n", "demo = schools.load_school_demographics()\n", "\n", "# exam results in another frame\n", "ela = exams.load_ela()\n", "math = exams.load_math()\n" ] }, { "cell_type": "markdown", "id": "515f5d3b", "metadata": {}, "source": [ "Merging Data\n", "----------------------\n", "When two data sets have a shared key then combining them into a single dataframe is straightforward. Here we use the `merge()` function in `DataFrame` to combine the school demographcs and ela test results into a single data frame. We merge them \"on\" the `dbn` and `ay` columns because these cols represent a unique identifier for each row in both data sets. _Note_ that the result has only the intersecton of both datasets. If our demographic data has a DBN that's not in the test data set, that school's data will be dropped from the results." ] }, { "cell_type": "code", "execution_count": 2, "id": "d47561f0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dbnbedsdistrictgeo_districtboroschool_name_xshort_nameayyeartotal_enrollment...level_2_pctlevel_3_nlevel_3_pctlevel_4_nlevel_4_pctlevel_3_4_nlevel_3_4_pcttest_yearcharterschool_name_y
001M01531010001001511ManhattanP.S. 015 Roberto ClementePS 1520162016-17178...0.2962967.00.2592592.00.0740749.00.33333320170NaN
101M01531010001001511ManhattanP.S. 015 Roberto ClementePS 1520162016-17178...0.5217399.00.3913041.00.04347810.00.43478320170NaN
201M01531010001001511ManhattanP.S. 015 Roberto ClementePS 1520162016-17178...0.6470595.00.2941180.00.0000005.00.29411820170NaN
301M01531010001001511ManhattanP.S. 015 Roberto ClementePS 1520162016-17178...0.46268721.00.3134333.00.04477624.00.35820920170NaN
401M01531010001001511ManhattanP.S. 015 Roberto ClementePS 1520162016-17178...0.3333337.00.3333332.00.0952389.00.42857120170NaN
\n", "

5 rows × 68 columns

\n", "
" ], "text/plain": [ " dbn beds district geo_district boro \\\n", "0 01M015 310100010015 1 1 Manhattan \n", "1 01M015 310100010015 1 1 Manhattan \n", "2 01M015 310100010015 1 1 Manhattan \n", "3 01M015 310100010015 1 1 Manhattan \n", "4 01M015 310100010015 1 1 Manhattan \n", "\n", " school_name_x short_name ay year total_enrollment ... \\\n", "0 P.S. 015 Roberto Clemente PS 15 2016 2016-17 178 ... \n", "1 P.S. 015 Roberto Clemente PS 15 2016 2016-17 178 ... \n", "2 P.S. 015 Roberto Clemente PS 15 2016 2016-17 178 ... \n", "3 P.S. 015 Roberto Clemente PS 15 2016 2016-17 178 ... \n", "4 P.S. 015 Roberto Clemente PS 15 2016 2016-17 178 ... \n", "\n", " level_2_pct level_3_n level_3_pct level_4_n level_4_pct level_3_4_n \\\n", "0 0.296296 7.0 0.259259 2.0 0.074074 9.0 \n", "1 0.521739 9.0 0.391304 1.0 0.043478 10.0 \n", "2 0.647059 5.0 0.294118 0.0 0.000000 5.0 \n", "3 0.462687 21.0 0.313433 3.0 0.044776 24.0 \n", "4 0.333333 7.0 0.333333 2.0 0.095238 9.0 \n", "\n", " level_3_4_pct test_year charter school_name_y \n", "0 0.333333 2017 0 NaN \n", "1 0.434783 2017 0 NaN \n", "2 0.294118 2017 0 NaN \n", "3 0.358209 2017 0 NaN \n", "4 0.428571 2017 0 NaN \n", "\n", "[5 rows x 68 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "demo.merge(ela, on=[\"dbn\", \"ay\"]).head()" ] }, { "cell_type": "markdown", "id": "5f2884da", "metadata": {}, "source": [ "Astute readers will see a new column called `school_name_x` in our results. You can't see it in the truncated data above, but there is also a `school_name_y`. Both data sets had a `school_name` column, so `pandas` suffixed the one from `demo` with `x` and the one from `ela` with `y`.\n", "\n", "Because both columns _should_ contain the same data (because they have the save DBN for the same school year), in this case we can just drop one of the columns before the merge. We'll drop `school_name` from `ela` because the demographics data set should have our canoninical school names." ] }, { "cell_type": "code", "execution_count": 3, "id": "12048ebc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dbnbedsdistrictgeo_districtboroschool_nameshort_nameayyeartotal_enrollment...level_2_nlevel_2_pctlevel_3_nlevel_3_pctlevel_4_nlevel_4_pctlevel_3_4_nlevel_3_4_pcttest_yearcharter
001M01531010001001511ManhattanP.S. 015 Roberto ClementePS 1520162016-17178...8.00.2962967.00.2592592.00.0740749.00.33333320170
101M01531010001001511ManhattanP.S. 015 Roberto ClementePS 1520162016-17178...12.00.5217399.00.3913041.00.04347810.00.43478320170
201M01531010001001511ManhattanP.S. 015 Roberto ClementePS 1520162016-17178...11.00.6470595.00.2941180.00.0000005.00.29411820170
301M01531010001001511ManhattanP.S. 015 Roberto ClementePS 1520162016-17178...31.00.46268721.00.3134333.00.04477624.00.35820920170
401M01531010001001511ManhattanP.S. 015 Roberto ClementePS 1520162016-17178...7.00.3333337.00.3333332.00.0952389.00.42857120170
\n", "

5 rows × 67 columns

\n", "
" ], "text/plain": [ " dbn beds district geo_district boro \\\n", "0 01M015 310100010015 1 1 Manhattan \n", "1 01M015 310100010015 1 1 Manhattan \n", "2 01M015 310100010015 1 1 Manhattan \n", "3 01M015 310100010015 1 1 Manhattan \n", "4 01M015 310100010015 1 1 Manhattan \n", "\n", " school_name short_name ay year total_enrollment ... \\\n", "0 P.S. 015 Roberto Clemente PS 15 2016 2016-17 178 ... \n", "1 P.S. 015 Roberto Clemente PS 15 2016 2016-17 178 ... \n", "2 P.S. 015 Roberto Clemente PS 15 2016 2016-17 178 ... \n", "3 P.S. 015 Roberto Clemente PS 15 2016 2016-17 178 ... \n", "4 P.S. 015 Roberto Clemente PS 15 2016 2016-17 178 ... \n", "\n", " level_2_n level_2_pct level_3_n level_3_pct level_4_n level_4_pct \\\n", "0 8.0 0.296296 7.0 0.259259 2.0 0.074074 \n", "1 12.0 0.521739 9.0 0.391304 1.0 0.043478 \n", "2 11.0 0.647059 5.0 0.294118 0.0 0.000000 \n", "3 31.0 0.462687 21.0 0.313433 3.0 0.044776 \n", "4 7.0 0.333333 7.0 0.333333 2.0 0.095238 \n", "\n", " level_3_4_n level_3_4_pct test_year charter \n", "0 9.0 0.333333 2017 0 \n", "1 10.0 0.434783 2017 0 \n", "2 5.0 0.294118 2017 0 \n", "3 24.0 0.358209 2017 0 \n", "4 9.0 0.428571 2017 0 \n", "\n", "[5 rows x 67 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ela2 = ela.drop(columns=\"school_name\")\n", "demo.merge(ela2, on=[\"dbn\", \"ay\"]).head()" ] }, { "cell_type": "markdown", "id": "8a804218", "metadata": {}, "source": [ "Merge with suffixes: wide data\n", "--------------------------------------------\n", "When we have a data set where we use columns to represent different categories of data, we can call this \"wide data.\" In this example, we're going to make a test results data frame that has math and ela test scores in the same row.\n", "\n", "At the start of this notebook we loaded both the math and ela exam data. Note that they have exactly the same columns. When we _merge_ them, we're going to **suffix** the ela data with `_ela` and the math data with `_math`. We will drop `school_name` from one of the data sets and use `[\"dbn\", \"ay\", \"test_year\", \"grade\", \"category\"]` to join the two.\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "38e0d542", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dbngradecategorynumber_tested_elamean_scale_score_elalevel_1_n_elalevel_1_pct_elalevel_2_n_elalevel_2_pct_elalevel_3_n_ela...level_2_n_mathlevel_2_pct_mathlevel_3_n_mathlevel_3_pct_mathlevel_4_n_mathlevel_4_pct_mathlevel_3_4_n_mathlevel_3_4_pct_mathcharter_mathschool_name
001M0153All Students27289.29629514.00.51851911.00.4074072.0...11.00.4074070.00.0000000.00.0000000.00.0000000NaN
101M0154All Students20277.6499948.00.40000011.00.5500001.0...6.00.3000001.00.0500001.00.0500002.00.1000000NaN
201M0155All Students24283.95834412.00.50000011.00.4583331.0...6.00.2500001.00.0416670.00.0000001.00.0416670NaN
301M015All GradesAll Students71284.21127334.00.47887333.00.4647894.0...23.00.3239442.00.0281691.00.0140853.00.0422540NaN
401M0153Not SWD19287.15789811.00.5789478.00.4210530.0...8.00.4210530.00.0000000.00.0000000.00.0000000NaN
\n", "

5 rows × 32 columns

\n", "
" ], "text/plain": [ " dbn grade category number_tested_ela mean_scale_score_ela \\\n", "0 01M015 3 All Students 27 289.296295 \n", "1 01M015 4 All Students 20 277.649994 \n", "2 01M015 5 All Students 24 283.958344 \n", "3 01M015 All Grades All Students 71 284.211273 \n", "4 01M015 3 Not SWD 19 287.157898 \n", "\n", " level_1_n_ela level_1_pct_ela level_2_n_ela level_2_pct_ela \\\n", "0 14.0 0.518519 11.0 0.407407 \n", "1 8.0 0.400000 11.0 0.550000 \n", "2 12.0 0.500000 11.0 0.458333 \n", "3 34.0 0.478873 33.0 0.464789 \n", "4 11.0 0.578947 8.0 0.421053 \n", "\n", " level_3_n_ela ... level_2_n_math level_2_pct_math level_3_n_math \\\n", "0 2.0 ... 11.0 0.407407 0.0 \n", "1 1.0 ... 6.0 0.300000 1.0 \n", "2 1.0 ... 6.0 0.250000 1.0 \n", "3 4.0 ... 23.0 0.323944 2.0 \n", "4 0.0 ... 8.0 0.421053 0.0 \n", "\n", " level_3_pct_math level_4_n_math level_4_pct_math level_3_4_n_math \\\n", "0 0.000000 0.0 0.000000 0.0 \n", "1 0.050000 1.0 0.050000 2.0 \n", "2 0.041667 0.0 0.000000 1.0 \n", "3 0.028169 1.0 0.014085 3.0 \n", "4 0.000000 0.0 0.000000 0.0 \n", "\n", " level_3_4_pct_math charter_math school_name \n", "0 0.000000 0 NaN \n", "1 0.100000 0 NaN \n", "2 0.041667 0 NaN \n", "3 0.042254 0 NaN \n", "4 0.000000 0 NaN \n", "\n", "[5 rows x 32 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ela2 = ela.drop(columns=\"school_name\")\n", "wide = ela2.merge(math, on=[\"dbn\", \"ay\", \"test_year\", \"grade\", \"category\"], suffixes=[\"_ela\", \"_math\"])\n", "wide.head()" ] }, { "cell_type": "markdown", "id": "48d5ab53", "metadata": {}, "source": [ "Now that we have the wide data, we can easily compare math and ela results for the same set school/year/grade/category. Below we will calculate a new column called `test_delta` that shows the difference between math and ela results for the same cohort of students in the same school. This will let us see, for example, which if any schools are \"unbalanced\" in math and ELA. Later, we might correlate this to other demographic data such as high percentages of ELL or SWD students." ] }, { "cell_type": "code", "execution_count": 5, "id": "e8285c63", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dbngradeaycategorymean_scale_score_mathmean_scale_score_elatest_delta
2825403M05482014Never ELL243.764710331.356811-87.592102
8204107X50082014Female210.500000296.057129-85.557129
2823003M05482014Female244.666672329.838715-85.172043
8202007X50082014Not SWD222.750000301.250000-78.500000
24073620K10552015Current ELL330.679260252.88888577.790375
\n", "
" ], "text/plain": [ " dbn grade ay category mean_scale_score_math \\\n", "28254 03M054 8 2014 Never ELL 243.764710 \n", "82041 07X500 8 2014 Female 210.500000 \n", "28230 03M054 8 2014 Female 244.666672 \n", "82020 07X500 8 2014 Not SWD 222.750000 \n", "240736 20K105 5 2015 Current ELL 330.679260 \n", "\n", " mean_scale_score_ela test_delta \n", "28254 331.356811 -87.592102 \n", "82041 296.057129 -85.557129 \n", "28230 329.838715 -85.172043 \n", "82020 301.250000 -78.500000 \n", "240736 252.888885 77.790375 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wide[\"test_delta\"] = wide.mean_scale_score_math - wide.mean_scale_score_ela\n", "wide[\"test_delta_abs\"] = abs(wide.mean_scale_score_math - wide.mean_scale_score_ela)\n", "\n", "wide = wide.sort_values(by=\"test_delta_abs\", ascending=False)\n", "\n", "wide[[\"dbn\",\"grade\",\"ay\",\"category\",\"mean_scale_score_math\", \"mean_scale_score_ela\", \"test_delta\"]].head()" ] }, { "cell_type": "markdown", "id": "26f91652", "metadata": {}, "source": [ "Concatenate DataFrames: long data\n", "---------------------------------------------------\n", "If we have two DataFrames with the same columns (like our math and ela results) we can concatenate the data from one df onto the other to create a longer dataframe. Generally, this is a more flexible format for analysis than the wide data format. In our math/ela example, it allows us to, for example, easily average the math and ela test scores together (filtering for group, year, etc).\n", "\n", "In the example here we will add a new column, `test`, which will have the value of \"ela\" or \"math\". This column will let us know whether that row reports a math test or ela test result." ] }, { "cell_type": "code", "execution_count": 6, "id": "f25522ca", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dbngradecategorynumber_testedmean_scale_scorelevel_1_nlevel_1_pctlevel_2_nlevel_2_pctlevel_3_nlevel_3_pctlevel_4_nlevel_4_pctlevel_3_4_nlevel_3_4_pcttest_yearaycharterschool_nametest
001M0153All Students27277.77777116.00.59259311.00.4074070.00.0000000.00.0000000.00.000000201320120NaNmath
701M0154All Students20277.39999412.00.6000006.00.3000001.00.0500001.00.0500002.00.100000201320120NaNmath
1401M0155All Students24274.00000017.00.7083336.00.2500001.00.0416670.00.0000001.00.041667201320120NaNmath
2101M015All GradesAll Students71276.39437945.00.63380323.00.3239442.00.0281691.00.0140853.00.042254201320120NaNmath
3260601M0153Not SWD19275.73684711.00.5789478.00.4210530.00.0000000.00.0000000.00.000000201320120NaNmath
...............................................................
43000284X7303All Students51602.0000002.00.03900022.00.43100023.00.4510004.00.07800027.00.529000201920181BRONX CHARTER SCHOOL FOR THE ARTSela
43000984X7304All Students49591.00000018.00.36700014.00.28600013.00.2650004.00.08200017.00.347000201920181BRONX CHARTER SCHOOL FOR THE ARTSela
43001684X7305All Students53597.00000018.00.34000020.00.3770008.00.1510007.00.13200015.00.283000201920181BRONX CHARTER SCHOOL FOR THE ARTSela
43001784X7306All Students113592.00000045.00.39800033.00.29200023.00.20400012.00.10600035.00.310000201920181BRONX CHARTER SCHOOL FOR THE ARTSela
43002484X730All GradesAll Students266594.00000083.00.31200089.00.33500067.00.25200027.00.10200094.00.353000201920181BRONX CHARTER SCHOOL FOR THE ARTSela
\n", "

856567 rows × 20 columns

\n", "
" ], "text/plain": [ " dbn grade category number_tested mean_scale_score \\\n", "0 01M015 3 All Students 27 277.777771 \n", "7 01M015 4 All Students 20 277.399994 \n", "14 01M015 5 All Students 24 274.000000 \n", "21 01M015 All Grades All Students 71 276.394379 \n", "32606 01M015 3 Not SWD 19 275.736847 \n", "... ... ... ... ... ... \n", "430002 84X730 3 All Students 51 602.000000 \n", "430009 84X730 4 All Students 49 591.000000 \n", "430016 84X730 5 All Students 53 597.000000 \n", "430017 84X730 6 All Students 113 592.000000 \n", "430024 84X730 All Grades All Students 266 594.000000 \n", "\n", " level_1_n level_1_pct level_2_n level_2_pct level_3_n \\\n", "0 16.0 0.592593 11.0 0.407407 0.0 \n", "7 12.0 0.600000 6.0 0.300000 1.0 \n", "14 17.0 0.708333 6.0 0.250000 1.0 \n", "21 45.0 0.633803 23.0 0.323944 2.0 \n", "32606 11.0 0.578947 8.0 0.421053 0.0 \n", "... ... ... ... ... ... \n", "430002 2.0 0.039000 22.0 0.431000 23.0 \n", "430009 18.0 0.367000 14.0 0.286000 13.0 \n", "430016 18.0 0.340000 20.0 0.377000 8.0 \n", "430017 45.0 0.398000 33.0 0.292000 23.0 \n", "430024 83.0 0.312000 89.0 0.335000 67.0 \n", "\n", " level_3_pct level_4_n level_4_pct level_3_4_n level_3_4_pct \\\n", "0 0.000000 0.0 0.000000 0.0 0.000000 \n", "7 0.050000 1.0 0.050000 2.0 0.100000 \n", "14 0.041667 0.0 0.000000 1.0 0.041667 \n", "21 0.028169 1.0 0.014085 3.0 0.042254 \n", "32606 0.000000 0.0 0.000000 0.0 0.000000 \n", "... ... ... ... ... ... \n", "430002 0.451000 4.0 0.078000 27.0 0.529000 \n", "430009 0.265000 4.0 0.082000 17.0 0.347000 \n", "430016 0.151000 7.0 0.132000 15.0 0.283000 \n", "430017 0.204000 12.0 0.106000 35.0 0.310000 \n", "430024 0.252000 27.0 0.102000 94.0 0.353000 \n", "\n", " test_year ay charter school_name test \n", "0 2013 2012 0 NaN math \n", "7 2013 2012 0 NaN math \n", "14 2013 2012 0 NaN math \n", "21 2013 2012 0 NaN math \n", "32606 2013 2012 0 NaN math \n", "... ... ... ... ... ... \n", "430002 2019 2018 1 BRONX CHARTER SCHOOL FOR THE ARTS ela \n", "430009 2019 2018 1 BRONX CHARTER SCHOOL FOR THE ARTS ela \n", "430016 2019 2018 1 BRONX CHARTER SCHOOL FOR THE ARTS ela \n", "430017 2019 2018 1 BRONX CHARTER SCHOOL FOR THE ARTS ela \n", "430024 2019 2018 1 BRONX CHARTER SCHOOL FOR THE ARTS ela \n", "\n", "[856567 rows x 20 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "math[\"test\"] = \"math\"\n", "ela[\"test\"] = \"ela\"\n", "long = pd.concat([math, ela])\n", "long" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.10.6 ('school-data')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" }, "vscode": { "interpreter": { "hash": "c853444e20c489e5b96d8e1a4533affead1d94f1ba40ff9ef08cffb9c8ee794e" } } }, "nbformat": 4, "nbformat_minor": 5 }