library(tidyverse)
library(sf)
nc_counties <- st_read("https://drive.google.com/uc?export=download&id=1g9sGIikgOEubqoj97fUVoCYAKlBDVX5a")
temp <- read_csv("https://drive.google.com/uc?export=download&id=16wJAatPKM0cF7VNy8hwNWOeJiiDNvsTt")12 Creating Analytic Datasets
In this chapter, we will work on manipulating and combining data to create analytic datasets.
12.1 RQ1: What is the distribution of warm summer days in 2025 by North Carolina county?
In this research question, our goal is to create an analytic variable representing the percent of daily average summer temperatures that were above 70 degrees farenheit by NC county
12.1.1 Data:
- North Carolina county boundaries from US Census Bureau
- 2025 summer season daily average temperature values for each weather station in NC air quality data by station from North Carolina State Climate Office
12.1.2 Processing Steps:
- In the
tempobject, each row represents a single day in the 2025 summer season (June - August) at a single weather station in North Carolina. Aggregate thetempobject to the station level (this will summarize values at that station over the season). The aggregated dataset should include two calculated columnstotal_obs(total number of observations per station) andtotal_over_70(total observations per station where the daily temperature value is over 70 degrees). Thegroup_bycommand must includesite,latitude, andlongitude - Spatialize the station-level data using the
st_as_sf()command. The CRS = 4326. - Reproject the
nc_countiesand spatialized temp objects into CRS = 2264 (North Carolina State Plane, ft) (i.e.nc_counties <- nc_counties |> st_transform(crs = 2264)) - Execute a spatial join between your reprojected temperature object and reprojected nc_counties object. Drop geometry and aggregate the temp object to the county level (take the sum of the
total_obsandtotal_over_70columns per county). - Calculate a
pct_over_70variable representing the percent of county observations that are over 70 degrees. - Use a table join to add your county-aggregated data to the
nc_countiesobject.
12.2 RQ2: How does tree cover canopy vary within walking distance from bus stops in Chapel Hill, NC?
In this research question, our goal is to create an analytic variable representing the percent tree canopy cover within .5 miles of each bus stop in Chapel Hill, NC.
12.2.1 Data:
- Bus stop locations from Chapel Hill Open Data
- Tree canopy cover from NLCD. Each pixel value represents the percent of tree canopy cover in that pixel.
library(terra)
library(exactextractr)
bus_stops <- st_read("https://drive.google.com/uc?export=download&id=1jRINUl-5uBAsBcWnKTRmdUO7ZTs1EV4G")
tree_canopy <- rast("https://drive.google.com/uc?export=download&id=1_SqO-ocyLCg3g1Mv1ZbqOd5Pa1sTddps")12.2.2 Processing Steps:
- Transform
bus_stopsobject into EPSG:3857 to match the tree canopy projection. Note that the units in this projection are meters - Buffer the
bus_stops804.672 meters (.5 miles) - Use the
exact_extract()function to add a field to the buffered bus stop object that represents the average canopy cover - Create a simplified dataset that includes only the following fields:
STOP_ID, average tree canopy variable
12.3 RQ3: How accessible are bus stops to Chapel Hill addresses?
In this research question, our goal is to create an analytic variable representing the distance of each address in Chapel Hill to the nearest bus stop.
12.3.1 Data:
- Bus stop locations from Chapel Hill Open Data
- Chapel Hill addresses from Chapel Hill Open Data
ch_addresses <- st_read("https://drive.google.com/uc?export=download&id=1fFXfEbOWjwfsT_JLeLYnaVPkbJCGYA2_")12.3.2 Processing Steps:
- Transform the
ch_addressesobject into EPSG:3857 to match the bus stop object. - Calculate the distance (in meters) from each address to the nearest bus stop.
- Create a simplified dataset that includes only the following fields:
OBJECTID,LBCSDesc, distance variable