... ...

This document presents a very basic example you can access to Census and other data. and use them to provide context for local evaluation data. Specifically, this tutorial uses the R programming language, and includes examples and the R code you need (click the “Code” buttons below) to get you started.

So, let’s start with some sites – fictional in this case! We might have three groups that receive training in ethics. We will make some bogus data by simply “assigning” names and values to variables called SiteName, Enrollment, PctGoals, and then a location with latitude and longitude. You can copy and paste the code directly into R.

SiteName<-c("Amazing Workers","Better Accountants","Cheating Luddites")
Enrollment<-c(344,411,388)
PctGoals<-c(70,80,90)
lon<-c(-76.081,-76.015,-76.0157);lat<-c(40.99,40.93,41.1731)
df<-as.data.frame(cbind(SiteName,Enrollment,PctGoals,lat,lon))
df$lat<-as.numeric(df$lat);df$lon<-as.numeric(df$lon)
df
##             SiteName Enrollment PctGoals     lat      lon
## 1    Amazing Workers        344       70 40.9900 -76.0810
## 2 Better Accountants        411       80 40.9300 -76.0150
## 3  Cheating Luddites        388       90 41.1731 -76.0157

Above, we can see that one of the groups, the “Cheating Luddites” have the highest percentage of training goals completed (and we have some location data too).

Let’s imagine the evaluation team wants to put their sites into greater community context. So, why not use information about poverty and education? Gosh, that makes so much sense!

Get Poverty Data

The tidycensus package enables you to grab ACS or Decennial Census data and download it directly to R. In this example, I grabbed tract-level population data, and the number of individuals under 18 years of age in poverty, and then calculated a percentage. You can see the distribution of the percentages in the histogram below.

More info on the tidycensus package is here. NOTE that you will need to create an API key (easy to do).

library(tidycensus)
#mycensusapikey<-"replace with yours"  # uncomment this line and the next one when you run this code
#census_api_key(mycensusapikey,install = TRUE)

tractPopulation<-get_acs(
  geography = 'tract',     # select tract level data for Pennsylvania in 2020
  state = 'PA',year=2020,
  variables = "S1701_C01_001E") # Est of number people for whom pov is determined

tract18under<-get_acs(
  geography = 'tract',
  state = 'PA',year=2020,
                 variables = "S1701_C01_002E") # Est of those aged under 18 years of age
tract18under<-tract18under %>% select(estimate) %>% rename(estimate18=estimate)

pa_tracts2<-cbind(tractPopulation,tract18under) # put the two variables into one dataframe
pa_tracts2$pctU18<-(pa_tracts2$estimate18/pa_tracts2$estimate)*100

 

So far, we downloaded data and computed a new variable tract18under Now let’s plot these data!

# make a basic histogram
hist(pa_tracts2$pctU18,main="Distribution of Poverty Among Minors in Pennsylvania Tracts")

 

Get Education Data

Using the educationdata package from the Urban Institute, I downloaded NCES enrollment data for Wilkes-Barre Area School District. To identify the LEAs you want, you’ll need to know the NCES codes that are here.

More info on the educationdata package is here.

library(educationdata)
pa <- get_education_data(
  level = 'schools',   # select school level data
  source = 'ccd', 
  topic = 'enrollment',    # download enrollment related data 
  subtopic = list('sex'),   # download enrollment details by sex
  filters = list(
  year = 2019,
  leaid = c('4226300')),    # choose one LEA (or more if you wish); you'll get all schools in the LEA
  add_labels = TRUE)

 

Let’s take a look at what we downloaded! We can use a simple table to display the enrollment categories:

table(pa$grade)
## 
##           Pre-K    Kindergarten               1               2               3 
##               0              20              20              20              20 
##               4               5               6               7               8 
##              20              20              20              12              12 
##               9              10              11              12              13 
##              16              16              16              16               0 
## Adult education        Ungraded           Total   Not specified 
##               0               0              40               0

The very messy output shows there are 20 first grade records, 16 ninth grade records, etc… There are also 40 records that provide a “total” enrollment, and that’s what we want. Below we will “keep” only those records. (see expss package for nicer HTML tables)

pa<-pa[pa$grade=="Total",] # this is subsetting the file to records where grade is equal to "Total"

 

direc <- get_education_data(
  level = 'schools', 
  source = 'ccd', 
  topic = 'directory', 
  filters = list(
  leaid = c('4226300')), # directory of schools for one LEA
  add_labels = TRUE)

direc<-direc%>%
  group_by(ncessch)%>%   # we need to aggregate by the school code
  summarise(school_name=first(school_name))  # compute the first school name, so we get one record per school

pa2<-left_join(pa,direc,by="ncessch")  # merge schools data to directory information (e.g. school names)
  
# create a simple table of mean enrollment (see expss package for more refined tables)
pa2 %>% 
  group_by(school_name) %>% 
  summarise(enrollMEAN=first(enrollment))
## # A tibble: 10 × 2
##    school_name                       enrollMEAN
##    <chr>                                  <int>
##  1 DANIEL J FLOOD EL SCH                    699
##  2 DODSON EL SCH                            568
##  3 DR DAVID W KISTLER EL S                  987
##  4 ELMER L MEYERS JSHS                      890
##  5 G A R MEMORIAL JSHS                      934
##  6 HEIGHTS EL (MURRY COMP)                  879
##  7 JAMES M COUGHLIN JSHS                    898
##  8 SOLOMON EL SCH                           848
##  9 SOLOMON JHS                              486
## 10 Wilkes-Barre Area SD STEM Academy         60

 

Looks like Kistler Elementary School has the highest enrollment. Amazing.

Create Program Area Map

Let’s use the Census and NCES data to place the evaluation sites into context using a map.

Shapefiles are datasets that define physical boundaries or positions of different features (e.g. schools, crime, rainfall). Many shapefiles are available from the Census Bureau. The following steps (1) download the shapefiles, (2) merge our contextual data to them, and (3) create a map.

library(tigris);library(leaflet)

pa_t<-tigris::tracts(                   # this command says you want a tract shapefile
  state = 'PA',county = "079",  # download Luzerne County and assign it as `pa_t`
  progress_bar=TRUE)
pa_t2<-merge(pa_t,pa_tracts2,by='GEOID')  # merge datasets using a common variable

pa_c<-tigris::counties(state = 'PA',progress_bar=FALSE)  # download PA counties

palTract<-colorNumeric(palette = 'Blues',domain = pa_t2$pctU18)  # make a set of colors that align with the map

m<-leaflet(df) %>% 
    addProviderTiles("CartoDB.Positron") %>%    # background tiles
    addPolygons(data=pa_c,fillColor = 'yellow',fillOpacity = .2, weight=.4) %>%    # counties
  addPolygons(    # Census tracts
        data=pa_t2,     
        weight=1,color='black',stroke = T,  # boundary line properties
        fillColor =~palTract(pa_t2$pctU18), fillOpacity = .7,
        label=paste0("Tract #:",pa_t2$GEOID)
        ) %>%
    addCircleMarkers(weight=.5,radius=5,fillColor='black',fillOpacity = 1, label=df$SiteName) %>%
    setView(-76,41.2,zoom=9)
    
m