{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "85984ae3", "metadata": {}, "source": [ "Merges & Joins\n", "==============\n", "Here, finally, we combine data sets together. I say finally because this is one of the _first_ things you will do when working with real world data. `nycschools` has already combined multiple data sets into its core DataFrames, but you will want to combine this data in new ways, make new DataFrames from your results, and pull in new data from the outside world." ] }, { "cell_type": "code", "execution_count": 1, "id": "0a4ea7ae", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from nycschools import schools, exams\n", "\n", "# school demographics in one frame\n", "demo = schools.load_school_demographics()\n", "\n", "# exam results in another frame\n", "ela = exams.load_ela()\n", "math = exams.load_math()\n" ] }, { "cell_type": "markdown", "id": "515f5d3b", "metadata": {}, "source": [ "Merging Data\n", "----------------------\n", "When two data sets have a shared key then combining them into a single dataframe is straightforward. Here we use the `merge()` function in `DataFrame` to combine the school demographcs and ela test results into a single data frame. We merge them \"on\" the `dbn` and `ay` columns because these cols represent a unique identifier for each row in both data sets. _Note_ that the result has only the intersecton of both datasets. If our demographic data has a DBN that's not in the test data set, that school's data will be dropped from the results." ] }, { "cell_type": "code", "execution_count": 2, "id": "d47561f0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | dbn | \n", "beds | \n", "district | \n", "geo_district | \n", "boro | \n", "school_name_x | \n", "short_name | \n", "ay | \n", "year | \n", "total_enrollment | \n", "... | \n", "level_2_pct | \n", "level_3_n | \n", "level_3_pct | \n", "level_4_n | \n", "level_4_pct | \n", "level_3_4_n | \n", "level_3_4_pct | \n", "test_year | \n", "charter | \n", "school_name_y | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "01M015 | \n", "310100010015 | \n", "1 | \n", "1 | \n", "Manhattan | \n", "P.S. 015 Roberto Clemente | \n", "PS 15 | \n", "2016 | \n", "2016-17 | \n", "178 | \n", "... | \n", "0.296296 | \n", "7.0 | \n", "0.259259 | \n", "2.0 | \n", "0.074074 | \n", "9.0 | \n", "0.333333 | \n", "2017 | \n", "0 | \n", "NaN | \n", "
1 | \n", "01M015 | \n", "310100010015 | \n", "1 | \n", "1 | \n", "Manhattan | \n", "P.S. 015 Roberto Clemente | \n", "PS 15 | \n", "2016 | \n", "2016-17 | \n", "178 | \n", "... | \n", "0.521739 | \n", "9.0 | \n", "0.391304 | \n", "1.0 | \n", "0.043478 | \n", "10.0 | \n", "0.434783 | \n", "2017 | \n", "0 | \n", "NaN | \n", "
2 | \n", "01M015 | \n", "310100010015 | \n", "1 | \n", "1 | \n", "Manhattan | \n", "P.S. 015 Roberto Clemente | \n", "PS 15 | \n", "2016 | \n", "2016-17 | \n", "178 | \n", "... | \n", "0.647059 | \n", "5.0 | \n", "0.294118 | \n", "0.0 | \n", "0.000000 | \n", "5.0 | \n", "0.294118 | \n", "2017 | \n", "0 | \n", "NaN | \n", "
3 | \n", "01M015 | \n", "310100010015 | \n", "1 | \n", "1 | \n", "Manhattan | \n", "P.S. 015 Roberto Clemente | \n", "PS 15 | \n", "2016 | \n", "2016-17 | \n", "178 | \n", "... | \n", "0.462687 | \n", "21.0 | \n", "0.313433 | \n", "3.0 | \n", "0.044776 | \n", "24.0 | \n", "0.358209 | \n", "2017 | \n", "0 | \n", "NaN | \n", "
4 | \n", "01M015 | \n", "310100010015 | \n", "1 | \n", "1 | \n", "Manhattan | \n", "P.S. 015 Roberto Clemente | \n", "PS 15 | \n", "2016 | \n", "2016-17 | \n", "178 | \n", "... | \n", "0.333333 | \n", "7.0 | \n", "0.333333 | \n", "2.0 | \n", "0.095238 | \n", "9.0 | \n", "0.428571 | \n", "2017 | \n", "0 | \n", "NaN | \n", "
5 rows × 68 columns
\n", "\n", " | dbn | \n", "beds | \n", "district | \n", "geo_district | \n", "boro | \n", "school_name | \n", "short_name | \n", "ay | \n", "year | \n", "total_enrollment | \n", "... | \n", "level_2_n | \n", "level_2_pct | \n", "level_3_n | \n", "level_3_pct | \n", "level_4_n | \n", "level_4_pct | \n", "level_3_4_n | \n", "level_3_4_pct | \n", "test_year | \n", "charter | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "01M015 | \n", "310100010015 | \n", "1 | \n", "1 | \n", "Manhattan | \n", "P.S. 015 Roberto Clemente | \n", "PS 15 | \n", "2016 | \n", "2016-17 | \n", "178 | \n", "... | \n", "8.0 | \n", "0.296296 | \n", "7.0 | \n", "0.259259 | \n", "2.0 | \n", "0.074074 | \n", "9.0 | \n", "0.333333 | \n", "2017 | \n", "0 | \n", "
1 | \n", "01M015 | \n", "310100010015 | \n", "1 | \n", "1 | \n", "Manhattan | \n", "P.S. 015 Roberto Clemente | \n", "PS 15 | \n", "2016 | \n", "2016-17 | \n", "178 | \n", "... | \n", "12.0 | \n", "0.521739 | \n", "9.0 | \n", "0.391304 | \n", "1.0 | \n", "0.043478 | \n", "10.0 | \n", "0.434783 | \n", "2017 | \n", "0 | \n", "
2 | \n", "01M015 | \n", "310100010015 | \n", "1 | \n", "1 | \n", "Manhattan | \n", "P.S. 015 Roberto Clemente | \n", "PS 15 | \n", "2016 | \n", "2016-17 | \n", "178 | \n", "... | \n", "11.0 | \n", "0.647059 | \n", "5.0 | \n", "0.294118 | \n", "0.0 | \n", "0.000000 | \n", "5.0 | \n", "0.294118 | \n", "2017 | \n", "0 | \n", "
3 | \n", "01M015 | \n", "310100010015 | \n", "1 | \n", "1 | \n", "Manhattan | \n", "P.S. 015 Roberto Clemente | \n", "PS 15 | \n", "2016 | \n", "2016-17 | \n", "178 | \n", "... | \n", "31.0 | \n", "0.462687 | \n", "21.0 | \n", "0.313433 | \n", "3.0 | \n", "0.044776 | \n", "24.0 | \n", "0.358209 | \n", "2017 | \n", "0 | \n", "
4 | \n", "01M015 | \n", "310100010015 | \n", "1 | \n", "1 | \n", "Manhattan | \n", "P.S. 015 Roberto Clemente | \n", "PS 15 | \n", "2016 | \n", "2016-17 | \n", "178 | \n", "... | \n", "7.0 | \n", "0.333333 | \n", "7.0 | \n", "0.333333 | \n", "2.0 | \n", "0.095238 | \n", "9.0 | \n", "0.428571 | \n", "2017 | \n", "0 | \n", "
5 rows × 67 columns
\n", "\n", " | dbn | \n", "grade | \n", "category | \n", "number_tested_ela | \n", "mean_scale_score_ela | \n", "level_1_n_ela | \n", "level_1_pct_ela | \n", "level_2_n_ela | \n", "level_2_pct_ela | \n", "level_3_n_ela | \n", "... | \n", "level_2_n_math | \n", "level_2_pct_math | \n", "level_3_n_math | \n", "level_3_pct_math | \n", "level_4_n_math | \n", "level_4_pct_math | \n", "level_3_4_n_math | \n", "level_3_4_pct_math | \n", "charter_math | \n", "school_name | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "01M015 | \n", "3 | \n", "All Students | \n", "27 | \n", "289.296295 | \n", "14.0 | \n", "0.518519 | \n", "11.0 | \n", "0.407407 | \n", "2.0 | \n", "... | \n", "11.0 | \n", "0.407407 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.000000 | \n", "0 | \n", "NaN | \n", "
1 | \n", "01M015 | \n", "4 | \n", "All Students | \n", "20 | \n", "277.649994 | \n", "8.0 | \n", "0.400000 | \n", "11.0 | \n", "0.550000 | \n", "1.0 | \n", "... | \n", "6.0 | \n", "0.300000 | \n", "1.0 | \n", "0.050000 | \n", "1.0 | \n", "0.050000 | \n", "2.0 | \n", "0.100000 | \n", "0 | \n", "NaN | \n", "
2 | \n", "01M015 | \n", "5 | \n", "All Students | \n", "24 | \n", "283.958344 | \n", "12.0 | \n", "0.500000 | \n", "11.0 | \n", "0.458333 | \n", "1.0 | \n", "... | \n", "6.0 | \n", "0.250000 | \n", "1.0 | \n", "0.041667 | \n", "0.0 | \n", "0.000000 | \n", "1.0 | \n", "0.041667 | \n", "0 | \n", "NaN | \n", "
3 | \n", "01M015 | \n", "All Grades | \n", "All Students | \n", "71 | \n", "284.211273 | \n", "34.0 | \n", "0.478873 | \n", "33.0 | \n", "0.464789 | \n", "4.0 | \n", "... | \n", "23.0 | \n", "0.323944 | \n", "2.0 | \n", "0.028169 | \n", "1.0 | \n", "0.014085 | \n", "3.0 | \n", "0.042254 | \n", "0 | \n", "NaN | \n", "
4 | \n", "01M015 | \n", "3 | \n", "Not SWD | \n", "19 | \n", "287.157898 | \n", "11.0 | \n", "0.578947 | \n", "8.0 | \n", "0.421053 | \n", "0.0 | \n", "... | \n", "8.0 | \n", "0.421053 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.000000 | \n", "0 | \n", "NaN | \n", "
5 rows × 32 columns
\n", "\n", " | dbn | \n", "grade | \n", "ay | \n", "category | \n", "mean_scale_score_math | \n", "mean_scale_score_ela | \n", "test_delta | \n", "
---|---|---|---|---|---|---|---|
28254 | \n", "03M054 | \n", "8 | \n", "2014 | \n", "Never ELL | \n", "243.764710 | \n", "331.356811 | \n", "-87.592102 | \n", "
82041 | \n", "07X500 | \n", "8 | \n", "2014 | \n", "Female | \n", "210.500000 | \n", "296.057129 | \n", "-85.557129 | \n", "
28230 | \n", "03M054 | \n", "8 | \n", "2014 | \n", "Female | \n", "244.666672 | \n", "329.838715 | \n", "-85.172043 | \n", "
82020 | \n", "07X500 | \n", "8 | \n", "2014 | \n", "Not SWD | \n", "222.750000 | \n", "301.250000 | \n", "-78.500000 | \n", "
240736 | \n", "20K105 | \n", "5 | \n", "2015 | \n", "Current ELL | \n", "330.679260 | \n", "252.888885 | \n", "77.790375 | \n", "
\n", " | dbn | \n", "grade | \n", "category | \n", "number_tested | \n", "mean_scale_score | \n", "level_1_n | \n", "level_1_pct | \n", "level_2_n | \n", "level_2_pct | \n", "level_3_n | \n", "level_3_pct | \n", "level_4_n | \n", "level_4_pct | \n", "level_3_4_n | \n", "level_3_4_pct | \n", "test_year | \n", "ay | \n", "charter | \n", "school_name | \n", "test | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "01M015 | \n", "3 | \n", "All Students | \n", "27 | \n", "277.777771 | \n", "16.0 | \n", "0.592593 | \n", "11.0 | \n", "0.407407 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.000000 | \n", "2013 | \n", "2012 | \n", "0 | \n", "NaN | \n", "math | \n", "
7 | \n", "01M015 | \n", "4 | \n", "All Students | \n", "20 | \n", "277.399994 | \n", "12.0 | \n", "0.600000 | \n", "6.0 | \n", "0.300000 | \n", "1.0 | \n", "0.050000 | \n", "1.0 | \n", "0.050000 | \n", "2.0 | \n", "0.100000 | \n", "2013 | \n", "2012 | \n", "0 | \n", "NaN | \n", "math | \n", "
14 | \n", "01M015 | \n", "5 | \n", "All Students | \n", "24 | \n", "274.000000 | \n", "17.0 | \n", "0.708333 | \n", "6.0 | \n", "0.250000 | \n", "1.0 | \n", "0.041667 | \n", "0.0 | \n", "0.000000 | \n", "1.0 | \n", "0.041667 | \n", "2013 | \n", "2012 | \n", "0 | \n", "NaN | \n", "math | \n", "
21 | \n", "01M015 | \n", "All Grades | \n", "All Students | \n", "71 | \n", "276.394379 | \n", "45.0 | \n", "0.633803 | \n", "23.0 | \n", "0.323944 | \n", "2.0 | \n", "0.028169 | \n", "1.0 | \n", "0.014085 | \n", "3.0 | \n", "0.042254 | \n", "2013 | \n", "2012 | \n", "0 | \n", "NaN | \n", "math | \n", "
32606 | \n", "01M015 | \n", "3 | \n", "Not SWD | \n", "19 | \n", "275.736847 | \n", "11.0 | \n", "0.578947 | \n", "8.0 | \n", "0.421053 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.000000 | \n", "2013 | \n", "2012 | \n", "0 | \n", "NaN | \n", "math | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
430002 | \n", "84X730 | \n", "3 | \n", "All Students | \n", "51 | \n", "602.000000 | \n", "2.0 | \n", "0.039000 | \n", "22.0 | \n", "0.431000 | \n", "23.0 | \n", "0.451000 | \n", "4.0 | \n", "0.078000 | \n", "27.0 | \n", "0.529000 | \n", "2019 | \n", "2018 | \n", "1 | \n", "BRONX CHARTER SCHOOL FOR THE ARTS | \n", "ela | \n", "
430009 | \n", "84X730 | \n", "4 | \n", "All Students | \n", "49 | \n", "591.000000 | \n", "18.0 | \n", "0.367000 | \n", "14.0 | \n", "0.286000 | \n", "13.0 | \n", "0.265000 | \n", "4.0 | \n", "0.082000 | \n", "17.0 | \n", "0.347000 | \n", "2019 | \n", "2018 | \n", "1 | \n", "BRONX CHARTER SCHOOL FOR THE ARTS | \n", "ela | \n", "
430016 | \n", "84X730 | \n", "5 | \n", "All Students | \n", "53 | \n", "597.000000 | \n", "18.0 | \n", "0.340000 | \n", "20.0 | \n", "0.377000 | \n", "8.0 | \n", "0.151000 | \n", "7.0 | \n", "0.132000 | \n", "15.0 | \n", "0.283000 | \n", "2019 | \n", "2018 | \n", "1 | \n", "BRONX CHARTER SCHOOL FOR THE ARTS | \n", "ela | \n", "
430017 | \n", "84X730 | \n", "6 | \n", "All Students | \n", "113 | \n", "592.000000 | \n", "45.0 | \n", "0.398000 | \n", "33.0 | \n", "0.292000 | \n", "23.0 | \n", "0.204000 | \n", "12.0 | \n", "0.106000 | \n", "35.0 | \n", "0.310000 | \n", "2019 | \n", "2018 | \n", "1 | \n", "BRONX CHARTER SCHOOL FOR THE ARTS | \n", "ela | \n", "
430024 | \n", "84X730 | \n", "All Grades | \n", "All Students | \n", "266 | \n", "594.000000 | \n", "83.0 | \n", "0.312000 | \n", "89.0 | \n", "0.335000 | \n", "67.0 | \n", "0.252000 | \n", "27.0 | \n", "0.102000 | \n", "94.0 | \n", "0.353000 | \n", "2019 | \n", "2018 | \n", "1 | \n", "BRONX CHARTER SCHOOL FOR THE ARTS | \n", "ela | \n", "
856567 rows × 20 columns
\n", "