Open In Colab

Intro to Census Data with Python#

To run the code:

Make sure you have this notebook open in Google Colab (if you are starting from the digital textbook, click on the icon to the top right and click Colab). Each block of code is called a cell. To run a cell, hover over it and click the arrow in the top left of the cell, or click inside of the cell and press Shift + Enter.

Note: When you run a block of code for the first time, Google Colab will say Warning: This notebook was not authored by Google. Please click Run Anyway.

Sign up for a Census API key here. It may take some time for you to receive the email to activate the API key, so this tutorial notebook is divided into two sections: 1. Without API Key and 2. With API Key.

Without API Key: pygris#

#@title Import modules
!pip install -q pygris
from pygris import validate_state, tracts
from pygris.data import get_census
import matplotlib.pyplot as plt

Get outlines of Census Tracts within Orange County

tracts = tracts(state = "FL",
                county = ['Orange'],
                year = 2023)

tracts.plot()
Using FIPS code '12' for input 'FL'
Using FIPS code '095' for input 'Orange'
<Axes: >
../_images/0a968bbd1c3de1808f037dbaa6427784e5eb34b5ddb8b274de65627c7bee1833.png

In Census data, there are unique variable names for hundreds of different datasets. Go to this list of Census variables and find the codes that align with your topic. Add these codes, along with what you want the column name to be, using the format below:

census_vars = {
    "CENSUS_CODE": "NAME_OF_YOUR_COLUMN",
    "CENSUS_CODE": "NAME_OF_YOUR_COLUMN",
    etc...
}

If you’re not sure which Census variables to use, here is a starting point

census_vars = {
    "B01003_001E": 'TOT_POP',
    "C17002_001E": 'INCOME_POVERTY_RATIO',
    "B19058_001E": 'TOT_FOOD_ASSIST',
    "B01002_001E": 'MEDIAN_AGE',
    "B19013_001E": 'MEDIAN_HOUSEHOLD_INCOME',
}

Census Variable

Description

B01003_001E

Total Population

C17002_001E

Ratio of Income to Poverty Level in the Past 12 Months

B19058_001E

Total Public Assistance Income or Food Stamps/SNAP in the Past 12 Months for Households

B01002_001E

Median Age

B19013_001E

Median Household Income in the Past 12 Months (in 2023 Inflation-Adjusted Dollars)

# Enter your code here
census_vars = {
    
}

census_data = get_census(dataset = "acs/acs5",
                           variables = list(census_vars.keys()),
                           year = 2023,
                           params = {
                             "for": "tract:*",
                             "in": f"state:{validate_state('FL')}"},
                           guess_dtypes = True,
                           return_geoid = True).rename(columns=census_vars)

census_data.head()

Example:#

census_vars = {
    "B01003_001E": 'TOT_POP',
    "C17002_001E": 'INCOME_POVERTY_RATIO',
    "B19058_001E": 'TOT_FOOD_ASSIST',
    "B01002_001E": 'MEDIAN_AGE',
    "B19013_001E": 'MEDIAN_HOUSEHOLD_INCOME',
}

census_data = get_census(dataset = "acs/acs5",
                           variables = list(census_vars.keys()),
                           year = 2023,
                           params = {
                             "for": "tract:*",
                             "in": f"state:{validate_state('FL')}"},
                           guess_dtypes = True,
                           return_geoid = True).rename(columns=census_vars)

census_data.head()
Using FIPS code '12' for input 'FL'
TOT_POP INCOME_POVERTY_RATIO TOT_FOOD_ASSIST MEDIAN_AGE MEDIAN_HOUSEHOLD_INCOME GEOID
0 5187 5006 2319 21.5 18657.0 12001000201
1 5897 4509 1897 21.4 17609.0 12001000202
2 3703 3703 1855 27.5 47813.0 12001000301
3 2500 2500 1255 47.0 39583.0 12001000302
4 5736 5736 2414 31.8 51266.0 12001000400

To visualize these variables on a map, we can merge census_data with tracts.

data = tracts[['geometry', 'GEOID']].merge(census_data, on = "GEOID")
data
geometry GEOID TOT_POP INCOME_POVERTY_RATIO TOT_FOOD_ASSIST MEDIAN_AGE MEDIAN_HOUSEHOLD_INCOME
0 POLYGON ((-81.44831 28.59645, -81.44831 28.596... 12095012405 5004 5003 2029 31.9 43411.0
1 POLYGON ((-81.55755 28.7129, -81.55751 28.7129... 12095017812 7675 7675 2823 41.6 106974.0
2 POLYGON ((-81.16011 28.5043, -81.16001 28.5043... 12095016750 12628 12600 4188 38.2 113182.0
3 POLYGON ((-81.27566 28.46595, -81.27561 28.465... 12095016756 2851 2851 953 42.5 150532.0
4 POLYGON ((-81.51742 28.57588, -81.51741 28.576... 12095015005 3061 3015 1085 44.9 102202.0
... ... ... ... ... ... ... ...
262 POLYGON ((-81.24514 28.58933, -81.24514 28.589... 12095016504 5370 5180 1886 32.7 58125.0
263 POLYGON ((-81.245 28.57761, -81.24464 28.57761... 12095016505 2574 2574 1059 29.6 59735.0
264 POLYGON ((-81.43318 28.49325, -81.43289 28.493... 12095014504 5583 5556 2564 31.5 59000.0
265 POLYGON ((-81.21225 28.52589, -81.21224 28.526... 12095016731 7530 5111 1874 32.3 90625.0
266 POLYGON ((-81.52343 28.43885, -81.52282 28.442... 12095017109 5450 5450 2018 51.0 159821.0

267 rows × 7 columns

# Plot the data
data.plot(
    column = "MEDIAN_HOUSEHOLD_INCOME",
    cmap = "viridis",
    figsize = (8, 6),
    legend = True
)

plt.title("Median Household Income in Orange County, FL (2023)")
Text(0.5, 1.0, 'Median Household Income in Orange County, FL (2023)')
../_images/e322a5318757a410cef42f73d6e4254e55748ac3a1df37967be35a3b1605bef9.png

With API Key: cenpy#

#@title Import modules
!pip install -q pytidycensus
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
import pytidycensus as tc
import os
# Set your API key (http://api.census.gov/data/key_signup.html)
tc.set_census_api_key("CENSUS API KEY HERE")
Census API key has been set for this session.

In Census data, there are unique variable names for hundreds of different datasets. Go to this list of Census variables and find the codes that align with your topic. Add these codes, along with what you want the column name to be, using the format below:

census_vars = {
    "CENSUS_CODE": "NAME_OF_YOUR_COLUMN",
    "CENSUS_CODE": "NAME_OF_YOUR_COLUMN",
    etc...
}

If you’re not sure which Census variables to use, here is a starting point

census_vars = {
    "B01003_001E": 'TOT_POP',
    "C17002_001E": 'INCOME_POVERTY_RATIO',
    "B19058_001E": 'TOT_FOOD_ASSIST',
    "B01002_001E": 'MEDIAN_AGE',
    "B19013_001E": 'MEDIAN_HOUSEHOLD_INCOME',
}

Census Variable

Description

B01003_001E

Total Population

C17002_001E

Ratio of Income to Poverty Level in the Past 12 Months

B19058_001E

Total Public Assistance Income or Food Stamps/SNAP in the Past 12 Months for Households

B01002_001E

Median Age

B19013_001E

Median Household Income in the Past 12 Months (in 2023 Inflation-Adjusted Dollars)

census_vars = {
    "B01003_001E": 'TOT_POP',
    "C17002_001E": 'INCOME_POVERTY_RATIO',
    "B19058_001E": 'TOT_FOOD_ASSIST',
    "B01002_001E": 'MEDIAN_AGE',
    "B19013_001E": 'MEDIAN_HOUSEHOLD_INCOME',
}

census_data = tc.get_acs(
    geography="tract",
    variables=list(census_vars.keys()),
    state="FL",
    year=2023,
    output="wide",
    geometry=True
).rename(columns=census_vars)
Getting data from the 2019-2023 5-year ACS
census_data
GEOID geometry TOT_POP INCOME_POVERTY_RATIO TOT_FOOD_ASSIST MEDIAN_AGE MEDIAN_HOUSEHOLD_INCOME state county tract NAME B01003_001_moe C17002_001_moe B19058_001_moe B01002_001_moe B19013_001_moe
0 12031013200 POLYGON ((-81.70785 30.20086, -81.70756 30.202... 2388 1235 363 22.0 60292 12 031 013200 Duval County, Florida 336.0 329.0 87.0 0.8 21878.0
1 12031002901 POLYGON ((-81.68865 30.36574, -81.68582 30.368... 3358 3354 1379 37.2 29125 12 031 002901 Duval County, Florida 510.0 511.0 146.0 5.7 12349.0
2 12031012000 POLYGON ((-81.78369 30.30049, -81.78349 30.306... 5801 5794 1895 32.5 56465 12 031 012000 Duval County, Florida 872.0 872.0 211.0 6.8 16803.0
3 12031012900 POLYGON ((-81.75218 30.27017, -81.74757 30.270... 2665 2665 1024 36.6 52830 12 031 012900 Duval County, Florida 458.0 458.0 169.0 5.7 9047.0
4 12031015200 POLYGON ((-81.60274 30.34624, -81.60262 30.350... 3640 3640 1554 37.7 58932 12 031 015200 Duval County, Florida 558.0 558.0 323.0 2.3 2260.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5117 12009069906 POLYGON ((-80.71489 28.387, -80.70295 28.38705... 2064 1931 757 28.9 29163 12 009 069906 Brevard County, Florida 631.0 630.0 169.0 11.7 5135.0
5118 12111382012 POLYGON ((-80.34961 27.29706, -80.34957 27.301... 6055 6055 2245 38.5 64409 12 111 382012 St. Lucie County, Florida 791.0 791.0 265.0 3.7 12236.0
5119 12095014201 POLYGON ((-81.40092 28.45036, -81.40028 28.452... 6345 6215 1831 33.9 65181 12 095 014201 Orange County, Florida 1673.0 1668.0 417.0 4.3 29441.0
5120 12091022500 POLYGON ((-86.63945 30.42761, -86.63841 30.428... 4316 4316 1675 37.3 81313 12 091 022500 Okaloosa County, Florida 461.0 461.0 178.0 1.7 17276.0
5121 12125960201 POLYGON ((-82.35698 29.96599, -82.35654 29.975... 1651 1614 770 40.1 44403 12 125 960201 Union County, Florida 259.0 256.0 137.0 9.3 12856.0

5122 rows × 16 columns

# Filter Census data to Orange County
data = census_data[census_data['GEOID'].str.startswith('120950')]
data
GEOID geometry TOT_POP INCOME_POVERTY_RATIO TOT_FOOD_ASSIST MEDIAN_AGE MEDIAN_HOUSEHOLD_INCOME state county tract NAME B01003_001_moe C17002_001_moe B19058_001_moe B01002_001_moe B19013_001_moe
45 12095014601 POLYGON ((-81.45939 28.53206, -81.45826 28.537... 8356 8356 2599 29.5 45488 12 095 014601 Orange County, Florida 1209.0 1209.0 372.0 2.0 15543.0
46 12095017807 POLYGON ((-81.48867 28.68386, -81.4763 28.6838... 4506 4506 1428 37.5 61115 12 095 017807 Orange County, Florida 723.0 723.0 190.0 5.3 9928.0
47 12095012000 POLYGON ((-81.45219 28.57339, -81.4515 28.5780... 9327 9153 2301 29.8 54163 12 095 012000 Orange County, Florida 1831.0 1649.0 183.0 2.3 7573.0
48 12095014702 POLYGON ((-81.47583 28.5391, -81.46761 28.5382... 5345 5306 2139 34.2 50908 12 095 014702 Orange County, Florida 680.0 680.0 247.0 1.7 5445.0
49 12095014811 POLYGON ((-81.4919 28.48467, -81.49159 28.4886... 4704 4704 1864 46.5 113509 12 095 014811 Orange County, Florida 565.0 565.0 225.0 3.0 24825.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5036 12095012307 POLYGON ((-81.46787 28.59614, -81.46123 28.596... 6867 6821 2313 35.2 47378 12 095 012307 Orange County, Florida 831.0 832.0 206.0 3.4 8875.0
5059 12095016723 POLYGON ((-81.24486 28.56928, -81.23741 28.569... 5787 5750 1670 38.1 82500 12 095 016723 Orange County, Florida 740.0 737.0 126.0 2.6 16686.0
5072 12095016746 POLYGON ((-81.19469 28.54369, -81.18831 28.543... 4486 4486 1473 46.3 99959 12 095 016746 Orange County, Florida 960.0 960.0 262.0 2.6 35486.0
5100 12095015702 POLYGON ((-81.34502 28.62161, -81.34479 28.622... 1520 1520 588 41.3 117237 12 095 015702 Orange County, Florida 195.0 195.0 71.0 3.5 38055.0
5119 12095014201 POLYGON ((-81.40092 28.45036, -81.40028 28.452... 6345 6215 1831 33.9 65181 12 095 014201 Orange County, Florida 1673.0 1668.0 417.0 4.3 29441.0

266 rows × 16 columns

# Convert columns to numeric
for col in list(census_vars.values()):
  data[col] = pd.to_numeric(data[col])

# Plot the data
data.plot(
    column = "MEDIAN_HOUSEHOLD_INCOME",
    cmap = "viridis",
    figsize = (8, 6),
    legend = True
)

plt.title("Median Household Income in Orange County, FL (2023)")
Text(0.5, 1.0, 'Median Household Income in Orange County, FL (2023)')
../_images/14b571c4eeb9532741ba154e3e1a8b72b15fd7dde09981973f4705aefc410374.png