{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "85984ae3",
   "metadata": {},
   "source": [
    "Merges & Joins\n",
    "==============\n",
    "Here, finally, we combine data sets together. I say finally because this is one of the _first_ things you will do when working with real world data. `nycschools` has already combined multiple data sets into its core DataFrames, but you will want to combine this data in new ways, make new DataFrames from your results, and pull in new data from the outside world."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "0a4ea7ae",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from nycschools import schools, exams\n",
    "\n",
    "# school demographics in one frame\n",
    "demo = schools.load_school_demographics()\n",
    "\n",
    "# exam results in another frame\n",
    "ela = exams.load_ela()\n",
    "math = exams.load_math()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "515f5d3b",
   "metadata": {},
   "source": [
    "Merging Data\n",
    "----------------------\n",
    "When two data sets have a shared key then combining them into a single dataframe is straightforward. Here we use the `merge()` function in `DataFrame` to combine the school demographcs and ela test results into a single data frame. We merge them \"on\" the `dbn` and `ay` columns because these cols represent a unique identifier for each row in both data sets. _Note_ that the result has only the intersecton of both datasets. If our demographic data has a DBN that's not in the test data set, that school's data will be dropped from the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "d47561f0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>dbn</th>\n",
       "      <th>beds</th>\n",
       "      <th>district</th>\n",
       "      <th>geo_district</th>\n",
       "      <th>boro</th>\n",
       "      <th>school_name_x</th>\n",
       "      <th>short_name</th>\n",
       "      <th>ay</th>\n",
       "      <th>year</th>\n",
       "      <th>total_enrollment</th>\n",
       "      <th>...</th>\n",
       "      <th>level_2_pct</th>\n",
       "      <th>level_3_n</th>\n",
       "      <th>level_3_pct</th>\n",
       "      <th>level_4_n</th>\n",
       "      <th>level_4_pct</th>\n",
       "      <th>level_3_4_n</th>\n",
       "      <th>level_3_4_pct</th>\n",
       "      <th>test_year</th>\n",
       "      <th>charter</th>\n",
       "      <th>school_name_y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>01M015</td>\n",
       "      <td>310100010015</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Manhattan</td>\n",
       "      <td>P.S. 015 Roberto Clemente</td>\n",
       "      <td>PS 15</td>\n",
       "      <td>2016</td>\n",
       "      <td>2016-17</td>\n",
       "      <td>178</td>\n",
       "      <td>...</td>\n",
       "      <td>0.296296</td>\n",
       "      <td>7.0</td>\n",
       "      <td>0.259259</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.074074</td>\n",
       "      <td>9.0</td>\n",
       "      <td>0.333333</td>\n",
       "      <td>2017</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>01M015</td>\n",
       "      <td>310100010015</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Manhattan</td>\n",
       "      <td>P.S. 015 Roberto Clemente</td>\n",
       "      <td>PS 15</td>\n",
       "      <td>2016</td>\n",
       "      <td>2016-17</td>\n",
       "      <td>178</td>\n",
       "      <td>...</td>\n",
       "      <td>0.521739</td>\n",
       "      <td>9.0</td>\n",
       "      <td>0.391304</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.043478</td>\n",
       "      <td>10.0</td>\n",
       "      <td>0.434783</td>\n",
       "      <td>2017</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>01M015</td>\n",
       "      <td>310100010015</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Manhattan</td>\n",
       "      <td>P.S. 015 Roberto Clemente</td>\n",
       "      <td>PS 15</td>\n",
       "      <td>2016</td>\n",
       "      <td>2016-17</td>\n",
       "      <td>178</td>\n",
       "      <td>...</td>\n",
       "      <td>0.647059</td>\n",
       "      <td>5.0</td>\n",
       "      <td>0.294118</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>5.0</td>\n",
       "      <td>0.294118</td>\n",
       "      <td>2017</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>01M015</td>\n",
       "      <td>310100010015</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Manhattan</td>\n",
       "      <td>P.S. 015 Roberto Clemente</td>\n",
       "      <td>PS 15</td>\n",
       "      <td>2016</td>\n",
       "      <td>2016-17</td>\n",
       "      <td>178</td>\n",
       "      <td>...</td>\n",
       "      <td>0.462687</td>\n",
       "      <td>21.0</td>\n",
       "      <td>0.313433</td>\n",
       "      <td>3.0</td>\n",
       "      <td>0.044776</td>\n",
       "      <td>24.0</td>\n",
       "      <td>0.358209</td>\n",
       "      <td>2017</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>01M015</td>\n",
       "      <td>310100010015</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Manhattan</td>\n",
       "      <td>P.S. 015 Roberto Clemente</td>\n",
       "      <td>PS 15</td>\n",
       "      <td>2016</td>\n",
       "      <td>2016-17</td>\n",
       "      <td>178</td>\n",
       "      <td>...</td>\n",
       "      <td>0.333333</td>\n",
       "      <td>7.0</td>\n",
       "      <td>0.333333</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.095238</td>\n",
       "      <td>9.0</td>\n",
       "      <td>0.428571</td>\n",
       "      <td>2017</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 68 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      dbn          beds  district  geo_district       boro  \\\n",
       "0  01M015  310100010015         1             1  Manhattan   \n",
       "1  01M015  310100010015         1             1  Manhattan   \n",
       "2  01M015  310100010015         1             1  Manhattan   \n",
       "3  01M015  310100010015         1             1  Manhattan   \n",
       "4  01M015  310100010015         1             1  Manhattan   \n",
       "\n",
       "               school_name_x short_name    ay     year  total_enrollment  ...  \\\n",
       "0  P.S. 015 Roberto Clemente      PS 15  2016  2016-17               178  ...   \n",
       "1  P.S. 015 Roberto Clemente      PS 15  2016  2016-17               178  ...   \n",
       "2  P.S. 015 Roberto Clemente      PS 15  2016  2016-17               178  ...   \n",
       "3  P.S. 015 Roberto Clemente      PS 15  2016  2016-17               178  ...   \n",
       "4  P.S. 015 Roberto Clemente      PS 15  2016  2016-17               178  ...   \n",
       "\n",
       "   level_2_pct  level_3_n  level_3_pct  level_4_n  level_4_pct  level_3_4_n  \\\n",
       "0     0.296296        7.0     0.259259        2.0     0.074074          9.0   \n",
       "1     0.521739        9.0     0.391304        1.0     0.043478         10.0   \n",
       "2     0.647059        5.0     0.294118        0.0     0.000000          5.0   \n",
       "3     0.462687       21.0     0.313433        3.0     0.044776         24.0   \n",
       "4     0.333333        7.0     0.333333        2.0     0.095238          9.0   \n",
       "\n",
       "   level_3_4_pct  test_year  charter  school_name_y  \n",
       "0       0.333333       2017        0            NaN  \n",
       "1       0.434783       2017        0            NaN  \n",
       "2       0.294118       2017        0            NaN  \n",
       "3       0.358209       2017        0            NaN  \n",
       "4       0.428571       2017        0            NaN  \n",
       "\n",
       "[5 rows x 68 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "demo.merge(ela, on=[\"dbn\", \"ay\"]).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f2884da",
   "metadata": {},
   "source": [
    "Astute readers will see a new column called `school_name_x` in our results. You can't see it in the truncated data above, but there is also a `school_name_y`. Both data sets had a `school_name` column, so `pandas` suffixed the one from `demo` with `x` and the one from `ela` with `y`.\n",
    "\n",
    "Because both columns _should_ contain the same data (because they have the save DBN for the same school year), in this case we can just drop one of the columns before the merge. We'll drop `school_name` from `ela` because the demographics data set should have our canoninical school names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "12048ebc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>dbn</th>\n",
       "      <th>beds</th>\n",
       "      <th>district</th>\n",
       "      <th>geo_district</th>\n",
       "      <th>boro</th>\n",
       "      <th>school_name</th>\n",
       "      <th>short_name</th>\n",
       "      <th>ay</th>\n",
       "      <th>year</th>\n",
       "      <th>total_enrollment</th>\n",
       "      <th>...</th>\n",
       "      <th>level_2_n</th>\n",
       "      <th>level_2_pct</th>\n",
       "      <th>level_3_n</th>\n",
       "      <th>level_3_pct</th>\n",
       "      <th>level_4_n</th>\n",
       "      <th>level_4_pct</th>\n",
       "      <th>level_3_4_n</th>\n",
       "      <th>level_3_4_pct</th>\n",
       "      <th>test_year</th>\n",
       "      <th>charter</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>01M015</td>\n",
       "      <td>310100010015</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Manhattan</td>\n",
       "      <td>P.S. 015 Roberto Clemente</td>\n",
       "      <td>PS 15</td>\n",
       "      <td>2016</td>\n",
       "      <td>2016-17</td>\n",
       "      <td>178</td>\n",
       "      <td>...</td>\n",
       "      <td>8.0</td>\n",
       "      <td>0.296296</td>\n",
       "      <td>7.0</td>\n",
       "      <td>0.259259</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.074074</td>\n",
       "      <td>9.0</td>\n",
       "      <td>0.333333</td>\n",
       "      <td>2017</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>01M015</td>\n",
       "      <td>310100010015</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Manhattan</td>\n",
       "      <td>P.S. 015 Roberto Clemente</td>\n",
       "      <td>PS 15</td>\n",
       "      <td>2016</td>\n",
       "      <td>2016-17</td>\n",
       "      <td>178</td>\n",
       "      <td>...</td>\n",
       "      <td>12.0</td>\n",
       "      <td>0.521739</td>\n",
       "      <td>9.0</td>\n",
       "      <td>0.391304</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.043478</td>\n",
       "      <td>10.0</td>\n",
       "      <td>0.434783</td>\n",
       "      <td>2017</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>01M015</td>\n",
       "      <td>310100010015</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Manhattan</td>\n",
       "      <td>P.S. 015 Roberto Clemente</td>\n",
       "      <td>PS 15</td>\n",
       "      <td>2016</td>\n",
       "      <td>2016-17</td>\n",
       "      <td>178</td>\n",
       "      <td>...</td>\n",
       "      <td>11.0</td>\n",
       "      <td>0.647059</td>\n",
       "      <td>5.0</td>\n",
       "      <td>0.294118</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>5.0</td>\n",
       "      <td>0.294118</td>\n",
       "      <td>2017</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>01M015</td>\n",
       "      <td>310100010015</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Manhattan</td>\n",
       "      <td>P.S. 015 Roberto Clemente</td>\n",
       "      <td>PS 15</td>\n",
       "      <td>2016</td>\n",
       "      <td>2016-17</td>\n",
       "      <td>178</td>\n",
       "      <td>...</td>\n",
       "      <td>31.0</td>\n",
       "      <td>0.462687</td>\n",
       "      <td>21.0</td>\n",
       "      <td>0.313433</td>\n",
       "      <td>3.0</td>\n",
       "      <td>0.044776</td>\n",
       "      <td>24.0</td>\n",
       "      <td>0.358209</td>\n",
       "      <td>2017</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>01M015</td>\n",
       "      <td>310100010015</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Manhattan</td>\n",
       "      <td>P.S. 015 Roberto Clemente</td>\n",
       "      <td>PS 15</td>\n",
       "      <td>2016</td>\n",
       "      <td>2016-17</td>\n",
       "      <td>178</td>\n",
       "      <td>...</td>\n",
       "      <td>7.0</td>\n",
       "      <td>0.333333</td>\n",
       "      <td>7.0</td>\n",
       "      <td>0.333333</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.095238</td>\n",
       "      <td>9.0</td>\n",
       "      <td>0.428571</td>\n",
       "      <td>2017</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 67 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      dbn          beds  district  geo_district       boro  \\\n",
       "0  01M015  310100010015         1             1  Manhattan   \n",
       "1  01M015  310100010015         1             1  Manhattan   \n",
       "2  01M015  310100010015         1             1  Manhattan   \n",
       "3  01M015  310100010015         1             1  Manhattan   \n",
       "4  01M015  310100010015         1             1  Manhattan   \n",
       "\n",
       "                 school_name short_name    ay     year  total_enrollment  ...  \\\n",
       "0  P.S. 015 Roberto Clemente      PS 15  2016  2016-17               178  ...   \n",
       "1  P.S. 015 Roberto Clemente      PS 15  2016  2016-17               178  ...   \n",
       "2  P.S. 015 Roberto Clemente      PS 15  2016  2016-17               178  ...   \n",
       "3  P.S. 015 Roberto Clemente      PS 15  2016  2016-17               178  ...   \n",
       "4  P.S. 015 Roberto Clemente      PS 15  2016  2016-17               178  ...   \n",
       "\n",
       "   level_2_n  level_2_pct  level_3_n  level_3_pct  level_4_n  level_4_pct  \\\n",
       "0        8.0     0.296296        7.0     0.259259        2.0     0.074074   \n",
       "1       12.0     0.521739        9.0     0.391304        1.0     0.043478   \n",
       "2       11.0     0.647059        5.0     0.294118        0.0     0.000000   \n",
       "3       31.0     0.462687       21.0     0.313433        3.0     0.044776   \n",
       "4        7.0     0.333333        7.0     0.333333        2.0     0.095238   \n",
       "\n",
       "   level_3_4_n  level_3_4_pct  test_year  charter  \n",
       "0          9.0       0.333333       2017        0  \n",
       "1         10.0       0.434783       2017        0  \n",
       "2          5.0       0.294118       2017        0  \n",
       "3         24.0       0.358209       2017        0  \n",
       "4          9.0       0.428571       2017        0  \n",
       "\n",
       "[5 rows x 67 columns]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ela2 = ela.drop(columns=\"school_name\")\n",
    "demo.merge(ela2, on=[\"dbn\", \"ay\"]).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a804218",
   "metadata": {},
   "source": [
    "Merge with suffixes: wide data\n",
    "--------------------------------------------\n",
    "When we have a data set where we use columns to represent different categories of data, we can call this \"wide data.\" In this example, we're going to make a test results data frame that has math and ela test scores in the same row.\n",
    "\n",
    "At the start of this notebook we loaded both the math and ela exam data. Note that they have exactly the same columns. When we _merge_ them, we're going to **suffix** the ela data with `_ela` and the math data with `_math`. We will drop `school_name` from one of the data sets and use `[\"dbn\", \"ay\", \"test_year\", \"grade\", \"category\"]` to join the two.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "38e0d542",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>dbn</th>\n",
       "      <th>grade</th>\n",
       "      <th>category</th>\n",
       "      <th>number_tested_ela</th>\n",
       "      <th>mean_scale_score_ela</th>\n",
       "      <th>level_1_n_ela</th>\n",
       "      <th>level_1_pct_ela</th>\n",
       "      <th>level_2_n_ela</th>\n",
       "      <th>level_2_pct_ela</th>\n",
       "      <th>level_3_n_ela</th>\n",
       "      <th>...</th>\n",
       "      <th>level_2_n_math</th>\n",
       "      <th>level_2_pct_math</th>\n",
       "      <th>level_3_n_math</th>\n",
       "      <th>level_3_pct_math</th>\n",
       "      <th>level_4_n_math</th>\n",
       "      <th>level_4_pct_math</th>\n",
       "      <th>level_3_4_n_math</th>\n",
       "      <th>level_3_4_pct_math</th>\n",
       "      <th>charter_math</th>\n",
       "      <th>school_name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>01M015</td>\n",
       "      <td>3</td>\n",
       "      <td>All Students</td>\n",
       "      <td>27</td>\n",
       "      <td>289.296295</td>\n",
       "      <td>14.0</td>\n",
       "      <td>0.518519</td>\n",
       "      <td>11.0</td>\n",
       "      <td>0.407407</td>\n",
       "      <td>2.0</td>\n",
       "      <td>...</td>\n",
       "      <td>11.0</td>\n",
       "      <td>0.407407</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>01M015</td>\n",
       "      <td>4</td>\n",
       "      <td>All Students</td>\n",
       "      <td>20</td>\n",
       "      <td>277.649994</td>\n",
       "      <td>8.0</td>\n",
       "      <td>0.400000</td>\n",
       "      <td>11.0</td>\n",
       "      <td>0.550000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>...</td>\n",
       "      <td>6.0</td>\n",
       "      <td>0.300000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.050000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.050000</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.100000</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>01M015</td>\n",
       "      <td>5</td>\n",
       "      <td>All Students</td>\n",
       "      <td>24</td>\n",
       "      <td>283.958344</td>\n",
       "      <td>12.0</td>\n",
       "      <td>0.500000</td>\n",
       "      <td>11.0</td>\n",
       "      <td>0.458333</td>\n",
       "      <td>1.0</td>\n",
       "      <td>...</td>\n",
       "      <td>6.0</td>\n",
       "      <td>0.250000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.041667</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.041667</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>01M015</td>\n",
       "      <td>All Grades</td>\n",
       "      <td>All Students</td>\n",
       "      <td>71</td>\n",
       "      <td>284.211273</td>\n",
       "      <td>34.0</td>\n",
       "      <td>0.478873</td>\n",
       "      <td>33.0</td>\n",
       "      <td>0.464789</td>\n",
       "      <td>4.0</td>\n",
       "      <td>...</td>\n",
       "      <td>23.0</td>\n",
       "      <td>0.323944</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.028169</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.014085</td>\n",
       "      <td>3.0</td>\n",
       "      <td>0.042254</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>01M015</td>\n",
       "      <td>3</td>\n",
       "      <td>Not SWD</td>\n",
       "      <td>19</td>\n",
       "      <td>287.157898</td>\n",
       "      <td>11.0</td>\n",
       "      <td>0.578947</td>\n",
       "      <td>8.0</td>\n",
       "      <td>0.421053</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>8.0</td>\n",
       "      <td>0.421053</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 32 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      dbn       grade      category  number_tested_ela  mean_scale_score_ela  \\\n",
       "0  01M015           3  All Students                 27            289.296295   \n",
       "1  01M015           4  All Students                 20            277.649994   \n",
       "2  01M015           5  All Students                 24            283.958344   \n",
       "3  01M015  All Grades  All Students                 71            284.211273   \n",
       "4  01M015           3       Not SWD                 19            287.157898   \n",
       "\n",
       "   level_1_n_ela  level_1_pct_ela  level_2_n_ela  level_2_pct_ela  \\\n",
       "0           14.0         0.518519           11.0         0.407407   \n",
       "1            8.0         0.400000           11.0         0.550000   \n",
       "2           12.0         0.500000           11.0         0.458333   \n",
       "3           34.0         0.478873           33.0         0.464789   \n",
       "4           11.0         0.578947            8.0         0.421053   \n",
       "\n",
       "   level_3_n_ela  ...  level_2_n_math  level_2_pct_math  level_3_n_math  \\\n",
       "0            2.0  ...            11.0          0.407407             0.0   \n",
       "1            1.0  ...             6.0          0.300000             1.0   \n",
       "2            1.0  ...             6.0          0.250000             1.0   \n",
       "3            4.0  ...            23.0          0.323944             2.0   \n",
       "4            0.0  ...             8.0          0.421053             0.0   \n",
       "\n",
       "   level_3_pct_math  level_4_n_math  level_4_pct_math  level_3_4_n_math  \\\n",
       "0          0.000000             0.0          0.000000               0.0   \n",
       "1          0.050000             1.0          0.050000               2.0   \n",
       "2          0.041667             0.0          0.000000               1.0   \n",
       "3          0.028169             1.0          0.014085               3.0   \n",
       "4          0.000000             0.0          0.000000               0.0   \n",
       "\n",
       "   level_3_4_pct_math  charter_math  school_name  \n",
       "0            0.000000             0          NaN  \n",
       "1            0.100000             0          NaN  \n",
       "2            0.041667             0          NaN  \n",
       "3            0.042254             0          NaN  \n",
       "4            0.000000             0          NaN  \n",
       "\n",
       "[5 rows x 32 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ela2 = ela.drop(columns=\"school_name\")\n",
    "wide = ela2.merge(math, on=[\"dbn\", \"ay\", \"test_year\", \"grade\", \"category\"], suffixes=[\"_ela\", \"_math\"])\n",
    "wide.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "48d5ab53",
   "metadata": {},
   "source": [
    "Now that we have the wide data, we can easily compare math and ela results for the same set school/year/grade/category. Below we will calculate a new column called `test_delta` that shows the difference between math and ela results for the same cohort of students in the same school. This will let us see, for example, which if any schools are \"unbalanced\" in math and ELA. Later, we might correlate this to other demographic data such as high percentages of ELL or SWD students."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "e8285c63",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>dbn</th>\n",
       "      <th>grade</th>\n",
       "      <th>ay</th>\n",
       "      <th>category</th>\n",
       "      <th>mean_scale_score_math</th>\n",
       "      <th>mean_scale_score_ela</th>\n",
       "      <th>test_delta</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>28254</th>\n",
       "      <td>03M054</td>\n",
       "      <td>8</td>\n",
       "      <td>2014</td>\n",
       "      <td>Never ELL</td>\n",
       "      <td>243.764710</td>\n",
       "      <td>331.356811</td>\n",
       "      <td>-87.592102</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>82041</th>\n",
       "      <td>07X500</td>\n",
       "      <td>8</td>\n",
       "      <td>2014</td>\n",
       "      <td>Female</td>\n",
       "      <td>210.500000</td>\n",
       "      <td>296.057129</td>\n",
       "      <td>-85.557129</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28230</th>\n",
       "      <td>03M054</td>\n",
       "      <td>8</td>\n",
       "      <td>2014</td>\n",
       "      <td>Female</td>\n",
       "      <td>244.666672</td>\n",
       "      <td>329.838715</td>\n",
       "      <td>-85.172043</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>82020</th>\n",
       "      <td>07X500</td>\n",
       "      <td>8</td>\n",
       "      <td>2014</td>\n",
       "      <td>Not SWD</td>\n",
       "      <td>222.750000</td>\n",
       "      <td>301.250000</td>\n",
       "      <td>-78.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>240736</th>\n",
       "      <td>20K105</td>\n",
       "      <td>5</td>\n",
       "      <td>2015</td>\n",
       "      <td>Current ELL</td>\n",
       "      <td>330.679260</td>\n",
       "      <td>252.888885</td>\n",
       "      <td>77.790375</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           dbn grade    ay     category  mean_scale_score_math  \\\n",
       "28254   03M054     8  2014    Never ELL             243.764710   \n",
       "82041   07X500     8  2014       Female             210.500000   \n",
       "28230   03M054     8  2014       Female             244.666672   \n",
       "82020   07X500     8  2014      Not SWD             222.750000   \n",
       "240736  20K105     5  2015  Current ELL             330.679260   \n",
       "\n",
       "        mean_scale_score_ela  test_delta  \n",
       "28254             331.356811  -87.592102  \n",
       "82041             296.057129  -85.557129  \n",
       "28230             329.838715  -85.172043  \n",
       "82020             301.250000  -78.500000  \n",
       "240736            252.888885   77.790375  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wide[\"test_delta\"] = wide.mean_scale_score_math - wide.mean_scale_score_ela\n",
    "wide[\"test_delta_abs\"] = abs(wide.mean_scale_score_math - wide.mean_scale_score_ela)\n",
    "\n",
    "wide = wide.sort_values(by=\"test_delta_abs\", ascending=False)\n",
    "\n",
    "wide[[\"dbn\",\"grade\",\"ay\",\"category\",\"mean_scale_score_math\", \"mean_scale_score_ela\", \"test_delta\"]].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26f91652",
   "metadata": {},
   "source": [
    "Concatenate DataFrames: long data\n",
    "---------------------------------------------------\n",
    "If we have two DataFrames with the same columns (like our math and ela results) we can concatenate the data from one df onto the other to create a longer dataframe. Generally, this is a more flexible format for analysis than the wide data format. In our math/ela example, it allows us to, for example, easily average the math and ela test scores together (filtering for group, year, etc).\n",
    "\n",
    "In the example here we will add a new column, `test`, which will have the value of \"ela\" or \"math\". This column will let us know whether that row reports a math test or ela test result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "f25522ca",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>dbn</th>\n",
       "      <th>grade</th>\n",
       "      <th>category</th>\n",
       "      <th>number_tested</th>\n",
       "      <th>mean_scale_score</th>\n",
       "      <th>level_1_n</th>\n",
       "      <th>level_1_pct</th>\n",
       "      <th>level_2_n</th>\n",
       "      <th>level_2_pct</th>\n",
       "      <th>level_3_n</th>\n",
       "      <th>level_3_pct</th>\n",
       "      <th>level_4_n</th>\n",
       "      <th>level_4_pct</th>\n",
       "      <th>level_3_4_n</th>\n",
       "      <th>level_3_4_pct</th>\n",
       "      <th>test_year</th>\n",
       "      <th>ay</th>\n",
       "      <th>charter</th>\n",
       "      <th>school_name</th>\n",
       "      <th>test</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>01M015</td>\n",
       "      <td>3</td>\n",
       "      <td>All Students</td>\n",
       "      <td>27</td>\n",
       "      <td>277.777771</td>\n",
       "      <td>16.0</td>\n",
       "      <td>0.592593</td>\n",
       "      <td>11.0</td>\n",
       "      <td>0.407407</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2013</td>\n",
       "      <td>2012</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>math</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>01M015</td>\n",
       "      <td>4</td>\n",
       "      <td>All Students</td>\n",
       "      <td>20</td>\n",
       "      <td>277.399994</td>\n",
       "      <td>12.0</td>\n",
       "      <td>0.600000</td>\n",
       "      <td>6.0</td>\n",
       "      <td>0.300000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.050000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.050000</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.100000</td>\n",
       "      <td>2013</td>\n",
       "      <td>2012</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>math</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>01M015</td>\n",
       "      <td>5</td>\n",
       "      <td>All Students</td>\n",
       "      <td>24</td>\n",
       "      <td>274.000000</td>\n",
       "      <td>17.0</td>\n",
       "      <td>0.708333</td>\n",
       "      <td>6.0</td>\n",
       "      <td>0.250000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.041667</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.041667</td>\n",
       "      <td>2013</td>\n",
       "      <td>2012</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>math</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>01M015</td>\n",
       "      <td>All Grades</td>\n",
       "      <td>All Students</td>\n",
       "      <td>71</td>\n",
       "      <td>276.394379</td>\n",
       "      <td>45.0</td>\n",
       "      <td>0.633803</td>\n",
       "      <td>23.0</td>\n",
       "      <td>0.323944</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.028169</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.014085</td>\n",
       "      <td>3.0</td>\n",
       "      <td>0.042254</td>\n",
       "      <td>2013</td>\n",
       "      <td>2012</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>math</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32606</th>\n",
       "      <td>01M015</td>\n",
       "      <td>3</td>\n",
       "      <td>Not SWD</td>\n",
       "      <td>19</td>\n",
       "      <td>275.736847</td>\n",
       "      <td>11.0</td>\n",
       "      <td>0.578947</td>\n",
       "      <td>8.0</td>\n",
       "      <td>0.421053</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2013</td>\n",
       "      <td>2012</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>math</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>430002</th>\n",
       "      <td>84X730</td>\n",
       "      <td>3</td>\n",
       "      <td>All Students</td>\n",
       "      <td>51</td>\n",
       "      <td>602.000000</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.039000</td>\n",
       "      <td>22.0</td>\n",
       "      <td>0.431000</td>\n",
       "      <td>23.0</td>\n",
       "      <td>0.451000</td>\n",
       "      <td>4.0</td>\n",
       "      <td>0.078000</td>\n",
       "      <td>27.0</td>\n",
       "      <td>0.529000</td>\n",
       "      <td>2019</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>BRONX CHARTER SCHOOL FOR THE ARTS</td>\n",
       "      <td>ela</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>430009</th>\n",
       "      <td>84X730</td>\n",
       "      <td>4</td>\n",
       "      <td>All Students</td>\n",
       "      <td>49</td>\n",
       "      <td>591.000000</td>\n",
       "      <td>18.0</td>\n",
       "      <td>0.367000</td>\n",
       "      <td>14.0</td>\n",
       "      <td>0.286000</td>\n",
       "      <td>13.0</td>\n",
       "      <td>0.265000</td>\n",
       "      <td>4.0</td>\n",
       "      <td>0.082000</td>\n",
       "      <td>17.0</td>\n",
       "      <td>0.347000</td>\n",
       "      <td>2019</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>BRONX CHARTER SCHOOL FOR THE ARTS</td>\n",
       "      <td>ela</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>430016</th>\n",
       "      <td>84X730</td>\n",
       "      <td>5</td>\n",
       "      <td>All Students</td>\n",
       "      <td>53</td>\n",
       "      <td>597.000000</td>\n",
       "      <td>18.0</td>\n",
       "      <td>0.340000</td>\n",
       "      <td>20.0</td>\n",
       "      <td>0.377000</td>\n",
       "      <td>8.0</td>\n",
       "      <td>0.151000</td>\n",
       "      <td>7.0</td>\n",
       "      <td>0.132000</td>\n",
       "      <td>15.0</td>\n",
       "      <td>0.283000</td>\n",
       "      <td>2019</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>BRONX CHARTER SCHOOL FOR THE ARTS</td>\n",
       "      <td>ela</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>430017</th>\n",
       "      <td>84X730</td>\n",
       "      <td>6</td>\n",
       "      <td>All Students</td>\n",
       "      <td>113</td>\n",
       "      <td>592.000000</td>\n",
       "      <td>45.0</td>\n",
       "      <td>0.398000</td>\n",
       "      <td>33.0</td>\n",
       "      <td>0.292000</td>\n",
       "      <td>23.0</td>\n",
       "      <td>0.204000</td>\n",
       "      <td>12.0</td>\n",
       "      <td>0.106000</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0.310000</td>\n",
       "      <td>2019</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>BRONX CHARTER SCHOOL FOR THE ARTS</td>\n",
       "      <td>ela</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>430024</th>\n",
       "      <td>84X730</td>\n",
       "      <td>All Grades</td>\n",
       "      <td>All Students</td>\n",
       "      <td>266</td>\n",
       "      <td>594.000000</td>\n",
       "      <td>83.0</td>\n",
       "      <td>0.312000</td>\n",
       "      <td>89.0</td>\n",
       "      <td>0.335000</td>\n",
       "      <td>67.0</td>\n",
       "      <td>0.252000</td>\n",
       "      <td>27.0</td>\n",
       "      <td>0.102000</td>\n",
       "      <td>94.0</td>\n",
       "      <td>0.353000</td>\n",
       "      <td>2019</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>BRONX CHARTER SCHOOL FOR THE ARTS</td>\n",
       "      <td>ela</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>856567 rows × 20 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "           dbn       grade      category  number_tested  mean_scale_score  \\\n",
       "0       01M015           3  All Students             27        277.777771   \n",
       "7       01M015           4  All Students             20        277.399994   \n",
       "14      01M015           5  All Students             24        274.000000   \n",
       "21      01M015  All Grades  All Students             71        276.394379   \n",
       "32606   01M015           3       Not SWD             19        275.736847   \n",
       "...        ...         ...           ...            ...               ...   \n",
       "430002  84X730           3  All Students             51        602.000000   \n",
       "430009  84X730           4  All Students             49        591.000000   \n",
       "430016  84X730           5  All Students             53        597.000000   \n",
       "430017  84X730           6  All Students            113        592.000000   \n",
       "430024  84X730  All Grades  All Students            266        594.000000   \n",
       "\n",
       "        level_1_n  level_1_pct  level_2_n  level_2_pct  level_3_n  \\\n",
       "0            16.0     0.592593       11.0     0.407407        0.0   \n",
       "7            12.0     0.600000        6.0     0.300000        1.0   \n",
       "14           17.0     0.708333        6.0     0.250000        1.0   \n",
       "21           45.0     0.633803       23.0     0.323944        2.0   \n",
       "32606        11.0     0.578947        8.0     0.421053        0.0   \n",
       "...           ...          ...        ...          ...        ...   \n",
       "430002        2.0     0.039000       22.0     0.431000       23.0   \n",
       "430009       18.0     0.367000       14.0     0.286000       13.0   \n",
       "430016       18.0     0.340000       20.0     0.377000        8.0   \n",
       "430017       45.0     0.398000       33.0     0.292000       23.0   \n",
       "430024       83.0     0.312000       89.0     0.335000       67.0   \n",
       "\n",
       "        level_3_pct  level_4_n  level_4_pct  level_3_4_n  level_3_4_pct  \\\n",
       "0          0.000000        0.0     0.000000          0.0       0.000000   \n",
       "7          0.050000        1.0     0.050000          2.0       0.100000   \n",
       "14         0.041667        0.0     0.000000          1.0       0.041667   \n",
       "21         0.028169        1.0     0.014085          3.0       0.042254   \n",
       "32606      0.000000        0.0     0.000000          0.0       0.000000   \n",
       "...             ...        ...          ...          ...            ...   \n",
       "430002     0.451000        4.0     0.078000         27.0       0.529000   \n",
       "430009     0.265000        4.0     0.082000         17.0       0.347000   \n",
       "430016     0.151000        7.0     0.132000         15.0       0.283000   \n",
       "430017     0.204000       12.0     0.106000         35.0       0.310000   \n",
       "430024     0.252000       27.0     0.102000         94.0       0.353000   \n",
       "\n",
       "        test_year    ay  charter                        school_name  test  \n",
       "0            2013  2012        0                                NaN  math  \n",
       "7            2013  2012        0                                NaN  math  \n",
       "14           2013  2012        0                                NaN  math  \n",
       "21           2013  2012        0                                NaN  math  \n",
       "32606        2013  2012        0                                NaN  math  \n",
       "...           ...   ...      ...                                ...   ...  \n",
       "430002       2019  2018        1  BRONX CHARTER SCHOOL FOR THE ARTS   ela  \n",
       "430009       2019  2018        1  BRONX CHARTER SCHOOL FOR THE ARTS   ela  \n",
       "430016       2019  2018        1  BRONX CHARTER SCHOOL FOR THE ARTS   ela  \n",
       "430017       2019  2018        1  BRONX CHARTER SCHOOL FOR THE ARTS   ela  \n",
       "430024       2019  2018        1  BRONX CHARTER SCHOOL FOR THE ARTS   ela  \n",
       "\n",
       "[856567 rows x 20 columns]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "math[\"test\"] = \"math\"\n",
    "ela[\"test\"] = \"ela\"\n",
    "long = pd.concat([math, ela])\n",
    "long"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.10.6 ('school-data')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  },
  "vscode": {
   "interpreter": {
    "hash": "c853444e20c489e5b96d8e1a4533affead1d94f1ba40ff9ef08cffb9c8ee794e"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}