5  Labs

5.1 General Formatting Guidance (SAMPLE FILE)

All submitted labs should follow the following formatting guidelines (unless otherwise noted). If your work does not follow these guidelines, it will need to be revised and resubmitted:

  • Submission:
    • You should submit two files for each lab– a .Rmd and a .html.
    • You should not submit any other files.
    • The files should be saved using the following convention: LASTNAME_Lab_X.Rmd
  • Code:
    • All R code must appear inside code chunks. Do not place any code in the body text.
    • All commands MUST have a descriptive comment
    • Code must follow the syntax that we have used in class
    • Code chunks should not display messages or warnings in the knitted .html. If messages or warnings are printed, you need to include message = F, warning = F in the r header.
    • Each code chunk should perform one logical set of tasks/analyses (i.e. one code chunk could include all of your data manipulation, another code chunk could contain all of your data visualizations)
    • Do not print large datasets or unnecessary intermediate outputs
  • Maps
    • Maps must use an appropriate visual variable and classification scheme
    • Formatting should not detract from map meaning
      • If using a basemap, transparency must be added
      • Display variable must be clear (i.e. renaming legend, adding title)
      • Legend must not cover any map components
  • Organization
    • Code chunks should be logically organized under clear, informative section headers
    • Written responses should appear directly below the code chunk that produces the the relevant output
  • AI
    • AI use must be documented in the following manner:
      • What specific task(s) in the assignment an AI tool was used to assist with (e.g., brainstorming ideas, clarifying a concept, debugging code, checking organization or clarity)
      • Why AI was used (e.g., confusion about a course concept, uncertainty about assignment expectations or instructions, lack of clear examples in course materials, trouble getting code to run or understanding errors, being stuck and needing a starting point, needing help organizing ideas)
      • Reflection on learning (Briefly note whether using AI helped or hindered your understanding, reasoning, or ability to complete the task independently next time)
    • One short documentation (less than a paragraph) is sufficient to cover all AI use per assignment (you do not need a separate disclosure for each use)

5.2 Lab 1

5.2.1 Overview

In this lab, you will reflect on the quantitative revolution in geography and the major critiques that emerged in response. The goal is to consider how these critiques can inform a thoughtful, reflexive approach to using quantitative methods throughout the course.

5.2.2 Specifications

This lab is designed to assess the Concept 1 Competencies. You’ll be evaluated on the following specifications:

Specifications
Student response is submitted to Canvas and is approximately 500 words. Minor grammatical/organizational issues don’t prevent the work from being read and understood.
Content Understanding: Student demonstrates a competent understanding of the Quantitative Revolution in geography and the major critiques that emerged in response
Reflection & Application: Student thoughtfully reflects on how these critiques will inform their approach to quantitative methods for the remainder of the course.

5.2.3 Prompt

The quantitative revolution fundamentally reshaped the field of geography by emphasizing formal models, statistical analysis, and claims to objectivity and scientific rigor. In response, humanistic, feminist, Marxist, and critical geographers have raised concerns that quantitative approaches can be reductionist, overly positivist, and falsely neutral, emphasizing abstract space over lived place and obscuring social context, power, and marginalized experiences.

In this response paper, reflect on how these critiques can inform your approach to using quantitative methods as you move forward in a largely quantitative course. Your response should address the following:

  • Briefly explain how the quantitative revolution changed geography, and summarize the core concerns raised by critical approaches we’ve discussed in class
  • Reflect on how these critiques challenge the idea that quantitative methods are neutral, objective, or sufficient on their own
  • Discuss how you can continue to consider these critiques for the rest of the course by explaining how they will influence the questions you ask, how you interpret results, and the claims you make about data/analysis.

Your response should be approximately 500 words

5.3 Lab 2

5.3.1 Overview

In the Describing Data tutorial we worked with spatial datasets describing the rurality of North Carolina census tracts, summer temperatures across the United States, and a demographic variable measured at the census tract level in North Carolina. Spatial data is often aggregated to specific geographic units (for instance, to protect individual privacy or simplify a complex dataset). This aggregation influences the descriptive statistics we calculate.

In this lab, you will examine demographic and climate data at a different level of spatial aggregation and observe how the descriptive statistics change as a result (remember the underlying data values are not changing, only the geographic units used to summarize them). You will also explore an alternative way of defining “rural” and compare how this definition shapes your interpretation of the data.

You will need to complete the Describing Data tutorial to complete this lab!

Remember that you can use the Command Compendium to help you modify R commands

5.3.2 Specifications

This lab is designed to assess the Concept 2 Competencies. You’ll be evaluated on the following specifications:

Specification
Student submits HTML and RMD versions the R Markdown file.
The RMD is clearly organized (appropriate headings, code chunk formatting, and clean output). Minor formatting issues don’t prevent the work from being read and understood.
Describing Rurality: Student produces a descriptive statistics table and a map. Written interpretation demonstrates understanding of the patterns and differences between NC Rural Center and USDA RUCA definitions
Describing Climate: Student produces descriptive statistics, two visualizations, and a map. Written interpretations demonstrate understanding of central tendency, dispersion, spatial patterns, and aggregation effects. Response is focused on extracting meaning, not simply summarizing descriptive statistics results.
Describing Demographics: Student produces descriptive statistics, two visualizations, and a map. Written interpretation demonstrates understanding of central tendency, dispersion, spatial patterns, and differences across aggregation levels. Response is focused on extracting meaning, not simply summarizing descriptive statistics results.

5.3.3 Lab Instructions

  1. Create a new .Rmd named “LASTNAME_lab2.Rmd”. Save it into your GEOG391 folder.

  2. Remove sample text (leaving the header and set-up chunk). Add a chunk for loading libraries and reading in data. Add the following code into that chunk:

    ## load necessary libraries
    library(tidyverse)
    library(gt)
    library(sf)
    library(e1071)
    library(tmap)
    
    
    #read in data
    rurality <- st_read("https://drive.google.com/uc?export=download&id=1aVDPgiZLUnKGb5k2SQo7ScDO2WNvX3Nd")
    county_climate_normals <- st_read("https://drive.google.com/uc?export=download&id=1sf0cmMaeHGra7AQheYAjTiC3wGCzB4FG") |> filter(!is.na(average_summer_temp))
    acs_county_nc <- st_read("https://drive.google.com/uc?export=download&id=1hXVtKL_33c4QRw6_Z9LlYqjgisdYCRqX")
  3. Add “Describing Rurality” as a third-level header. In this section you will compute descriptive statistics and visualizations for North Carolina census tracts using the North Carolina Rural Center’s definition of rurality (rural = fewer than 250 people per square mile). The NC Rural Center variable is named nc_rural_center in the dataset. Add a code chunk below your header. Your code chunk should:

    • Create a descriptive statistics table of rurality by census tract
    • Create a map of rurality by census tract
  4. Below your chunk, you should write 1-2 paragraphs that do the following:

    • Describe what the descriptive statistics tell us about the data
    • Describe what the map tells us about the spatial pattern of the data
    • Compare the NC Rural Center definition of rural with the USDA RUCA codes, describing how the descriptive statistics differ and how the spatial patterns differ across the state
  5. Add “Describing Climate” as a third-level header. In this section you will compute descriptive statistics and visualizations for average summer temperatures per county in the U.S. Add a code chunk below your header. Your code chunk should:

    • Create a descriptive statistics table of average summer temperature by U.S county
    • Create two appropriate non-map data visualizations of average summer temperature by U.S. county
    • Create a map of average of summer temperature by U.S. county
  6. Below your chunk, you should write 1-2 paragraphs that do the following:

    • Describe what the descriptive statistics table and data visualizations tell us about average summer temperature (make sure to discuss central tendency, dispersion, shape)
    • Describe what the map tells us about the spatial pattern of the data
    • Compare the county-level data results to latitude/longitude (point-level) data, noting
      • Differences in descriptive statistics, visualizations, and spatial patterns
      • How these differences may relate to the scale or aggregation of the data
  7. Add “Describing Demographics” as a third-level header. In this section you will compute descriptive statistics and visualizations for your selected variable in Mini Challenge #1. Add a code chunk below your header. Your code chunk should:

    • Create a descriptive statistics table of your variable by NC county
    • Create two appropriate non-map data visualizations of your variable by NC county
    • Create a map of your variable by NC county
  8. Below your chunk, you should write 1-2 paragraphs that do the following:

    • Describe what the descriptive statistics table and data visualizations tell us about your variable (make sure to discuss central tendency, dispersion, shape)
    • Describe what the map tells us about the spatial pattern of the data
    • Compare the county-level data results to the census tract data, noting
      • Differences in descriptive statistics, visualizations, and spatial patterns
      • How these differences may relate to the scale or aggregation of the data
  9. Knit your .Rmd and make sure the formatting and organization are clear in the knitted .html document

5.4 Lab 3

5.4.1 Overview

In the Describing Spatial Data tutorial we learned how to expand our descriptive statistics toolkit to include spatial descriptive statistics.

In this lab, you will practice calculating spatial descriptive statistics on a dataset representing the location of wind turbines in the continental US (variable names are here) and Covid-19 cases and deaths by US county centroid on April 1, 2020 and July 1, 2020.

Remember that you can use the Command Compendium to help you modify R commands

5.4.2 Specifications

This lab is designed to assess the Concept 3 Competencies. You’ll be evaluated on the following specifications:

Specification
Student submits HTML and RMD versions the R Markdown file.
The RMD is clearly organized (appropriate headings, code chunk formatting, and clean output). Minor formatting issues don’t prevent the work from being read and understood.
Describing Covid: Student produces descriptive statistics tables, calculates weighted means, and creates maps with mean centers for cases and deaths, including manual legend entries. Written interpretation demonstrates understanding of the distribution, spatial patterns, and why an unweighted mean center isn’t meaningful. Response is focused on extracting meaning, not simply summarizing descriptive statistics results.
Describing Wind Turbines: Student produces a descriptive statistics table and one non-map visualization, calculates spatial descriptive statistics (mean center, standard deviational ellipse, weighted mean center), and creates a map with manual legend entries. Written interpretation demonstrates understanding of the variable’s distribution and spatial patterns. Response is focused on extracting meaning, not simply summarizing descriptive statistics results.

5.4.3 Lab Instructions

  1. Create a new .Rmd named “LASTNAME_lab3.Rmd”. Save it into your GEOG391 folder.

  2. Remove sample text (leaving the header and set-up chunk). Add a chunk for loading libraries and reading in data. Add the following code into that chunk:

    ## load necessary libraries
    library(tidyverse)
    library(gt)
    library(sf)
    library(e1071)
    library(tmap)
    library(maptiles)
    library(sfdep)
    
    
    #read in data
    
    ##cumulative covid cases and deaths on April 1, 2020
    covid_deaths_04012020 <- st_read("https://drive.google.com/uc?export=download&id=1spBLqVxa25a7FDo7U7aNDJF2t2N0-uXv")
    
    ##cumulative covid cases and deaths on July 1, 2020
    covid_deaths_07012020 <- st_read("https://drive.google.com/uc?export=download&id=1ZNka0ffMXSVAqVDCj1R-oYh37BH9ziAZ")
    
    ##wind turbines across the continental US
    us_wind_turbine_locs <- st_read("https://drive.google.com/uc?export=download&id=1Cd9Nl1cNkGD_F9L68EqefKtFx7X7-IpE")
  3. Add “Describing Covid” as a third-level header. In each of the covid datasets, there is a variable named cases which represents the cumulative cases in each county on the given date and a variable named deaths which represents the cumulative deaths in each county on the given date. Note that the dataset only includes counties that had at least one case or death. In this section you will compute descriptive statistics for cases and deaths on each date, compute spatial descriptive statistics, and create a well designed map. Your code chunk should:

    • Create a descriptive statistics table of cases and deaths for each dataset
    • Compute a weighted mean center for cases and deaths on each date
    • Create a map for the April 2020 data that symbolizes cases by county, as well as the mean center for cases AND deaths. Make sure that you add manual legend entries for the mean center of cases and deaths.
    • Create a map for the July 2020 data that symbolizes cases by county, as well as the mean center for cases AND deaths. Make sure that you add manual legend entries for the mean center of cases and deaths.
  4. Below your chunk, you should write 1-2 paragraphs that do the following:

    • Summarize what the descriptive statistics reveal about the distribution of Covid cases and deaths
    • Explain why creating a meaningful histogram or boxplot for this data would be challenging
    • Describe what the maps (including the spatial descriptive statistics) reveal about the spatial distribution of Covid cases and deaths.
    • Explain why an unweighted mean center isn’t meaningful in this context (consider the unit of observation)
  5. Add a “Describing Wind Turbines” header. In this section, you will compute descriptive and spatial descriptive statistics for the location and characteristics of wind turbines across the continental US. Your code chunk should:

    • Create a descriptive statistics table for one quantitative variable in the turbine dataset.
    • Create one non-map data visualization for your variable of interest
    • Calculate mean center of wind turbines, standard deviational ellipse of wind turbines, and weighted mean center based on your variable of interest.
      • To calculate the weighted mean center, you may need to drop NA values from your variable of interest. To do this, create a new object turbine_dropped <- us_wind_turbine_locs |> drop_na(VARIABLEOFINTEREST) and use that object to calculate the weighted mean center.
    • Create a map that symbolizes your variable of interest, the mean center of wind turbines, the standard deviational ellipse of wind turbines, and the weighted mean center. Make sure that you add manual legend entries for the mean center and weighted mean center (you do not need to add a legend entry for the standard deviational ellipse).
  6. Below your chunk, you should write 1-2 paragraphs that do the following:

    • Summarize what the descriptive statistics and non-map visualization reveal about the distribution of your variable
    • Describe what the maps (including the spatial descriptive statistics) reveal about the spatial distribution of wind turbines and your variable of interest
  7. Knit your .Rmd and make sure the formatting and organization are clear in the knitted .html document

5.5 Lab 4

5.5.1 Overview

In the Probability tutorial we learned about calculating empirical probability using historical data and applying the binomial and normal probability distributions.

In this lab, you will practice applying the binomial probability distribution using daily water height data from the USGS water gauge in Bolin Creek from 2015-2025. You will also make a probability map of the probability of low January temperatures across North Carolina by applying the normal distribution.

Remember that you can use the Command Compendium to help you modify R commands.

5.5.2 Specifications

This lab is designed to assess the Concept 4 Competencies. You’ll be evaluated on the following specifications:

Specification
HTML and RMD versions the R Markdown file have been submitted.
The RMD is clearly organized (appropriate headings, code chunk formatting, and clean output) so that the analysis is easy to read and understand.
Flood Risk at Bolin Creek: Student creates a descriptive statistics table for daily water height, calculates empirical probability of a flood risk day, estimates the most likely number of flood risk days over 25 days, and computes cumulative probability of more than 3 flood risk days. Written interpretation demonstrates understanding of the results.
Low Temperature Probabilities Across NC: Student calculates the probability of January days with maximum temperature below 40°F at each ASOS station and produces a probability map for the state. Written interpretation demonstrates understanding of the probabilities and spatial patterns.
  1. Create a new .Rmd named “LASTNAME_lab4.Rmd”. Save it into your GEOG391 folder.

  2. Remove sample text (leaving the header and set-up chunk). Add a chunk for loading libraries and reading in data. Add the following code into that chunk:

    #load libraries
    library(tidyverse)
    library(tmap)
    library(sf)
    library(tigris)
    
    #bolin creek daily data from 2015-2025
    bolin_creek_data <- read_csv("https://drive.google.com/uc?export=download&id=1eEzriJ5XzR-aS74nJmyiQFVrmV4T3cel")
    
    #NC asos station data from 2000-2020
    asos_jan_data <- read_csv("https://drive.google.com/uc?export=download&id=1_zE1kcC7YY083R6rv20sqeN1l2dCdmhi")

5.5.3 Lab Instructions

  1. Add a third-level header called “Flood Risk at Bolin Creek” and a new code chunk below the header. In this code chunk you should:

    • Create a new object that represents the daily maximum water height by using the command daily_max_bolin <- bolin_creek_data |> group_by(time) |> summarise(max_water = max(value))
    • Create a descriptive statistics table for daily water height at the Bolin Creek USGS station
    • Calculate the empirical probability of a “flood risk day” at Bolin Creek. A flood risk day is defined as a day where the water height is more than 1ft higher than the median of daily height for the period of records (2015-2025).
    • Use this empirical probability to determine the most likely number of days of flood risk over a 25 day period.
    • Calculate the cumulative probability of having more than 3 flood risk days over a period of 25 days.
  2. Below this code chunk, write 1-2 paragraphs that address the following

    • Explain in plain language what the empirical probability of a flood risk day represents
      • What does this probability tell you about how often flood risk occurs at Bolin Creek?
    • Describe the output of the binomial distribution and explain which number of flood risk days is most likely in a 25-day period and why this makes sense given the empirical probability
    • Interpret the cumulative probability of having more than 3 flood risk days in 25 days and explain what this means in terms of real-world flood risk.
  3. Add a third-level header called “Low Temperature Probabilities in NC” and add a new code chunk below the header. In this code chunk you should:

    • Use the normal distribution to calculate the probability (for each ASOS station) of having a January day with a maximum temperature below 40 degrees F.
    • Create a probability map for the full state
  4. Below this code chunk, write 1-2 paragraphs that address the following:

    • Are there areas of the state with particularly high or low probabilities? What might explain these differences?
    • How could these patterns inform planning or preparedness for cold weather events?

5.6 Lab 5

5.6.1 Overview

In the Estimation with Sampling tutorial we learned about how to make point and interval estimations of means and proportions from samples of a population.

In this lab, you will practice calculating point and interval estimations from a simulated simple random sample generated from the Behavioral Risk Factor Surveillance System 2023 data. This sample represents the population of the United States and explores various health-related behaviors and outcomes. Each row represents an individual’s responses to the survey.

The variables in the dataset are:

Variable Description
physhlth Number of days of poor physical health in the last 30 days
menthlth Number of days of poor mental health in the last 30 days
x.state state FIPS code
exerany2 Any exercise in the last 30 days (1= yes, 0= no)

Remember that you can use the Command Compendium to help you modify R commands.

5.6.2 Specifications

This lab is designed to assess the Concept 5 Competencies. You’ll be evaluated on the following specifications:

Specification
HTML and RMD versions the R Markdown file have been submitted.
The RMD is clearly organized (appropriate headings, code chunk formatting, and clean output) so that the analysis is easy to read and understand.
Descriptive Statistics: Student creates a descriptive statistics table and non-map graphic for the three variables. Written interpretation describes distribution and demonstrates understanding of the results.
Countrywide Estimates: Student calculates a point estimate and confidence interval for the three variables of interest for the entire country. Written interpretation describes the estimates and relates the estimates to conclusions about the full population.
Statewide Estimates: Student calculates a point estimate and confidence interval for the three variables of interest for each state and maps the results. Written interpretation describes variability between states in terms of estimates and confidence interval sizes, as well as spatial patterns.

5.6.3 Lab Instructions

  1. Create a new .Rmd named “LASTNAME_lab5.Rmd”. Save it into your GEOG391 folder.

  2. Remove sample text (leaving the header and set-up chunk). Add a chunk for loading libraries and reading in data. Add the following code into that chunk:

    #load libraries
    library(tidyverse)
    library(tmap)
    library(sf)
    library(tigris)
    
    #BRFSS sample from 2023
    brfss_sample <- read_csv("https://drive.google.com/uc?export=download&id=11rKXmplEk0dcvi7lzEQ0OUrXDrnvK4MR") |> mutate(x.state = str_pad(as.character(x.state), width = 2, side = "left", pad = "0"))
  3. Add a third-level header called “Descriptive Statistics” and a new code chunk below the header. In this code chunk you should:

    • Create a descriptive statistics table each of the three variables (not including state) in the dataset.
    • Create one non-map graphic for each of the three variables.
  4. Below this code chunk, write a few sentences the characteristics of the sample distribution for each of the variables.

  5. Add a third-level header called “Countrywide Estimates” and add a new code chunk below the header. In this code chunk you should:

    • Calculate (at a 95% confidence level) the point estimation and confidence interval for each of the three variables.
  6. Below this code chunk, write a few sentences that describe the values and what this tells us about the values of the population.

  7. Add a third-level header called “Statewide Estimates and Mapping” and add a new code chunk below the header. In this code chunk you should:

    • Add the following code which calculates (at a 95% confidence level), the point estimation and confidence interval, as well as the size of the confidence interval, for each of the three variables for each state (there will be two missing states).

      aggregated_data <- brfss_sample |>
        group_by(x.state) |>
        summarise(
      
          #sample size
          n = n(),
      
          #mental health (mean + CI)
          ment_mean  = t.test(menthlth)$estimate,
          ment_lower = t.test(menthlth)$conf.int[1],
          ment_upper = t.test(menthlth)$conf.int[2],
          ment_ci_size = ment_upper - ment_lower,
      
          #physical health (mean + CI)
          phys_mean  = t.test(physhlth)$estimate,
          phys_lower = t.test(physhlth)$conf.int[1],
          phys_upper = t.test(physhlth)$conf.int[2],
          phys_ci_size = phys_upper - phys_lower,
      
          #exercise (proportion + CI)
          exer_total = sum(exerany2, na.rm = TRUE),
          exer_prop  = exer_total / n,
          exer_lower = prop.test(exer_total, n)$conf.int[1],
          exer_upper = prop.test(exer_total, n)$conf.int[2],
          exer_ci_size = exer_upper - exer_lower
        )
    • Then add the following code to join your dataset to a spatial boundary file

      #get just continental US for mapping purposes
      state_boundaries <- states() |> filter(!(GEOID %in% c("15", "02","60", "66", "69", "72", "78")))
      
      #add state data to state boundaries
      spatial_data <- state_boundaries |> left_join(YOUR_AGGREGATED_STATE_DATASET, join_by("GEOID" == "x.state"))
    • Create a dual-paneled map that displays the point estimate per state for one of the three variables and the size of the confidence interval (VARIABLEOFINTEREST_ci_size)

  8. Below this code chunk, write a few sentences describing the spatial pattern that you see in the point estimates and the size of the confidence intervals. Explain why some states might have smaller or larger intervals.

5.7 Lab 6

5.7.1 Overview

In the Hypothesis Testing tutorial we learned about applying statistical tests to evaluate whether there is enough evidence to determine if an estimate (calculated from a sample) is statistically different than another estimate, within a certain level of confidence.

In this lab, you will practice applying statistical tests to a sample of North Carolina public school data on chronic absences. Each school in the sample contains a schoolwide mean for absenteeism before Covid (2018-2019) and after Covid (2021-2024), as well as a rural/non-rural designation depending on the county the school is located in.

Remember that you can use the Command Compendium to help you modify R commands.

5.7.2 Specifications

This lab is designed to assess the Concept 6 Competencies. You’ll be evaluated on the following specifications:

Specification
HTML and RMD versions the R Markdown file have been submitted.
The RMD is clearly organized (appropriate headings, code chunk formatting, and clean output) so that the analysis is easy to read and understand.
Descriptive Statistics: Student creates a map of each variable and discusses the spatial pattern
Scenario 1: Student determines the appropriate test, discusses assumptions, and interpretation demonstrates understanding of the results.
Scenario 2: Student determines the appropriate test, discusses assumptions, and interpretation demonstrates understanding of the results.
Scenario 3: Student determines the appropriate test, discusses assumptions, and interpretation demonstrates understanding of the results.

5.7.3 Lab Instructions

  • Create a new .Rmd named “LASTNAME_lab6.Rmd”. Save it into your GEOG391 folder.

  • Remove sample text (leaving the header and set-up chunk). Add a chunk for loading libraries and reading in data. Add the following code into that chunk:

    #load libraries
    library(tidyverse)
    library(tmap)
    library(sf)
    
    school_sample <- st_read("https://drive.google.com/uc?export=download&id=1Yys2-G69_tc9YC-7jlRgsfXn0faLZhUV") 
  • Under a header called “Describing School Absences” make a map for both variables (pre_covid_pct, afer_covid_pct). Make each map use the same legend by using this code as your tm_dots command: tm_dots("REPLACE_WITH_VARIABLE", fill.scale = tm_scale(breaks = c(0,.1,.15,.2,.3, .4,.8)).Then describe spatial patterns and the characteristics of the datasets.

  • Under a header called “Testing Hypotheses” identify an appropriate statistical test, test assumptions, generate a null and alternative hypothesis (including whether it is one-sided or two-sided), and interpret results for the following scenarios:

    • The known countrywide mean for school absences before Covid is 14.5% (.145). Does the North Carolina mean (drawn from a sample of NC schools) differ significantly from the countrywide mean?
    • Is the absentee rate by school significantly higher in the years after Covid-19?
    • Does the absentee rate differ significantly between urban and rural schools before Covid-19? What about after Covid-19?

5.8 Lab 7

5.8.1 Overview

In the Point Pattern Analysis tutorial, we learned how to interpret the spatial structure of unmarked point data.

In this lab, you will practice applying these skills to a dataset representing wildfire locations in Croatan National Forest in North Carolina from 1957-2024 (from US Forest Service) and a dataset representing locations of fire stations along the I-40 corridor in North Carolina.

Remember that you can use the Command Compendium to help you modify R commands.

5.8.2 Specifications

This lab is designed to assess the Concept 7 Competencies. You’ll be evaluated on the following specifications:

Specification
HTML and RMD versions the R Markdown file have been submitted.
The RMD is clearly organized (appropriate headings, code chunk formatting, and clean output) so that the analysis is easy to read and understand.
Analyzing Global Structure: Student maps the point pattern, computes and visualizes a quadrat count (with Monte Carlo simulations) and tests it against CSR, and creates a kernel density estimate for both datasets.
Analyzing Local Structure: Student computes and visualizes ANN and L-function (both with Monte Carlo simulations). Student appropriately determines whether to test against CSR or an Inhomogeneous Poisson Pattern.
Describing Results: Student integrates global and local results to identify the dominant spatial process (first-order, second-order, or both) and proposes a specific hypothesis informed by background research.

5.8.3 Lab Instructions

  • Create a new .Rmd named “LASTNAME_lab7.Rmd”. Save it into your GEOG391 folder.

  • Remove sample text (leaving the header and set-up chunk). Add a chunk for loading libraries and reading in data. Add the following code into that chunk:

    #load libraries
    library(sf)
    library(tidyverse)
    library(spatstat)
    
    wildfire_points <- st_read("https://drive.google.com/uc?export=download&id=16m9HJAn5uzSyiUdhdhoboPRRi9dDDQaT") |> st_transform(crs =2264)
    
    ###THIS TURNS THE DATA INTO A POINT PATTERN OBJECT WITH A DEFINED BOUNDARY###
    ### USE THIS FOR PPA####
    combined_wildfire <- st_union(wildfire_points)
    bb_wildfire <- st_convex_hull(combined_wildfire) |> as.owin()
    wildfire.ppp <- as.ppp(st_coordinates(wildfire_points), W = bb_wildfire)
    
    fire_stations <- st_read("https://drive.google.com/uc?export=download&id=1bjAeEaoVZUjf9VLJTKl_anABwznioAoa")  |> st_transform(crs =2264)
    
    ###THIS TURNS THE DATA INTO A POINT PATTERN OBJECT WITH A DEFINED BOUNDARY###
    ### USE THIS FOR PPA#####
    combined_firestation <- st_union(fire_stations)
    bb_firestation <- st_convex_hull(combined_firestation) |> as.owin()
    firestation.ppp <- as.ppp(st_coordinates(fire_stations), W = bb_firestation)
  • Under a header called “Analyzing Wildfire Points” add a code chunk, and a written analysis below the code chunk that does the following:

    • Make a map (with basemap) of the wildfire_points object
    • Test the quadrat count against CSR using a quadrat density test (use nx = 8, ny = 8). Add Monte Carlo simulations (n = 99) to the statistical testing
    • Create a KDE for wildfire counts
    • Run an ANN analysis that uses Monte Carlo simulations (n = 99) to create an empirical probability. You will need to determine if your simulations should homogeneous or inhomogeneous based on your global analysis.
    • Evaluate the L-function and envelope (n = 99). You will need to determine if you should use the homogeneous or inhomogeneous function (and simulations) based on your global analysis. Because our border is not a rectangle, we can’t use the “iso” correction, so use the “border” correction instead.
    • In text below the chunk, identify whether there is evidence for the pattern of wildfires reflecting first-order processes, second-order processes, or a combination of both by clearly interpreting your analytical results. Based on brief background research on wildfires and the study area, propose one specific, plausible hypothesis about an underlying process that could be contributing to the spatial pattern.
  • Under a header called “Analyzing Fire Stations” add a code chunk, and a written analysis below the code chunk that does the following:

    • Make a map (with basemap) of the fire_stations object
    • Test the quadrat count against CSR using a quadrat density test (use nx = 4, ny = 3). Add Monte Carlo simulations (n = 99) to the statistical testing
    • Create a KDE for fire stations
    • Run an ANN analysis that uses Monte Carlo simulations (n = 99) to create an empirical probability. You will need to determine if your simulations should homogeneous or inhomogeneous based on your global analysis.
    • Evaluate the L-function and envelope (n = 99). You will need to determine if you should use the homogeneous or inhomogeneous function (and simulations) based on your global analysis. Because our border is not a rectangle, we can’t use the “iso” correction, so use the “border” correction instead.
    • In text below the chunk, identify whether there is evidence for the pattern of fire stations reflecting first-order processes, second-order processes, or a combination of both by clearly interpreting your analytical results. Propose one specific, plausible hypothesis about an underlying process that could be contributing to the spatial pattern.

5.9 Lab 8

5.9.1 Overview

In the Autocorrelation tutorial, we learned how to define spatial neighbors and how to use these neighborhoods to assess whether values at locations are related to values at near locations.

In this lab, you will practice applying these skills to a dataset representing average SAT score at North Carolina public high schools and workplace mobility during the first few months of Covid using the Google mobility dataset.

Remember that you can use the Command Compendium to help you modify R commands.

5.9.2 Specifications

This lab is designed to assess the Concept 8 Competencies. You’ll be evaluated on the following specifications:

Specification
HTML and RMD versions the R Markdown file have been submitted.
The RMD is clearly organized (appropriate headings, code chunk formatting, and clean output) so that the analysis is easy to read and understand.
Defining Neighborhoods: Student defines neighborhoods and creates spatial weight matrices for both NC schools and NC counties. Written justification explains why the chosen neighborhood definition is appropriate for this analysis.
Analyzing Global Autocorrelation: Student computes Moran’s I for school SAT scores and county-level Covid-19 workplace mobility. Written interpretation explains what the Moran’s I values indicate about spatial structure and strength of autocorrelation.
Analyzing Local Autocorrelation: Student computes and maps local indicators of spatial autocorrelation for both datasets. Written interpretation describes the observed clustering patterns and proposes one reasonable hypothesis for why these patterns exist.

5.9.3 Lab Instructions

  1. Create a new .Rmd named “LASTNAME_lab8.Rmd”. Save it into your GEOG391 folder.

  2. Remove sample text (leaving the header and set-up chunk). Add a chunk for loading libraries and reading in data. Add the following code into that chunk:

    #load libraries
    library(sf)
    library(tidyverse)
    library(spatstat)
    library(spdep)
    
    school_sat <- st_read("https://drive.google.com/uc?export=download&id=1i-C2PF3ajbpHTlavh3m1QbWNObpobprZ") 
    
    covid_mobility <- st_read("https://drive.google.com/uc?export=download&id=1YbCNVR5K8jDF5HOKHcoJ25gz7VkRvDHa") 
  3. Under a header called “Defining Neighbors” add a code chunk, and a written analysis below the code chunk that does the following:

    • Defines neighborhoods and creates a neighborhood weight matrix for NC schools and NC counties. You can use any appropriate neighborhood definition
    • Justify your neighborhood definition. Why is it useful for this particular analysis?
  4. Under a header called “Analyzing Global Autocorrelation” add a code chunk, and a written analysis below the code chunk that does the following:

    • Computes Moran’s I for school SAT scores and NC county covid mobility scores using your selected neighborhood definition. For the covid mobility dataset, you should use the workplace_change variable, which represents the percent change in workplace mobility from March-May 2020.
    • Describe your results. What does the Moran’s I value tell you about the structure of the data (and the strength of this structure)?
  5. Under a header called “Analyzing Local Autocorrelation” add a code chunk, and a written analysis below the code chunk that does the following:

    • Calculate and map Local Indicators of Autocorrelation for both datasets
    • Describe your results. What is the spatial pattern of clustering? Propose one reasonable hypothesis for why this pattern might exist.

5.10 Lab 9

5.10.1 Overview

In the spatial modeling tutorials, we learned that spatial autocorrelation can appear in model residuals when nearby locations share similar values for reasons not fully captured by the observed covariates. We also learned that there are multiple ways to address this issue. Traditional spatial regression models, such as the spatial lag or spatial error model, treat spatial dependence as something to account for in order to improve inference. Bayesian spatial models, by contrast, treat spatial dependence as a latent process and estimate spatial effects directly.

In this lab, you will practice building and comparing these approaches using American Community Survey data for North Carolina census tracts. You will first fit a standard OLS model, then test the residuals for spatial autocorrelation, then fit a traditional spatial regression model, and finally fit a Bayesian spatial model.

Remember that you can use the Command Compendium to help you modify R commands.

5.10.2 Specifications

This lab is designed to assess the Concept 9 Competencies. You’ll be evaluated on the following specifications:

Specification
HTML and RMD versions the R Markdown file have been submitted.
The RMD is clearly organized (appropriate headings, code chunk formatting, and clean output) so that the analysis is easy to read and understand
OLS Model: Student fits the specified OLS model and interprets the coefficient estimates in context
Residual Spatial Dependence: Student defines a neighborhood structure, creates a spatial weights matrix, and tests OLS residuals for spatial autocorrelation. Written interpretation explains whether residual spatial structure is present
Traditional Spatial Model: Student fits an appropriate spatial regression model and explains why it was chosen. Written interpretation compares the spatial model to the OLS model
Bayesian Spatial Model: Student fits a Bayesian spatial model using INLA and maps the posterior mean spatial effect. Written interpretation explains the role of the spatial effect and compares the Bayesian model conceptually to the traditional spatial model.

5.10.3 Lab Instructions

  1. Create a new .Rmd named “LASTNAME_lab9.Rmd”. Save it into your GEOG391 folder.

  2. Remove sample text (leaving the header and set-up chunk). Add a chunk for loading libraries and reading in data. Add the following code into that chunk:

    library(sf)
    library(tidyverse)
    library(tmap)
    library(spdep)
    library(spatialreg)
    library(INLA)
    
    acs_tract_nc <- st_read("https://drive.google.com/uc?export=download&id=1cnz4xgdDRZvlXzN3IyvCxOETRWfrpw-0")
  3. Under a header called “OLS” add a code chunk, and a written analysis below the code chunk that does the following:

    • Fits an OLS model with a dependent variable of median_hh_inc and covariates pct_under_poverty_level, pct_renter, pop_density_mile, pct_bachelors_or_more
    • Describe the results of the OLS model. Interpret the relationship between median household income and each of the covariates, indicating which variables appear positively or negatively associated with income.
  4. Under a header called “Residual Spatial Dependence” add a code chunk, and a written analysis below the code chunk that does the following:

    • Test the OLS residuals for spatial dependence
    • Interprets the Moran’s I statistic for the OLS residuals and explain what the result suggests about the presence and strength of spatial autocorrelation.
  5. Under a header called “Traditional Spatial Model” add a code chunk, and a written analysis below the code chunk that does the following:

    • Determine whether an SEM or SLM is better suited for this dataset and run that model
    • Interpret the results of the spatial regression model, including the estimated spatial parameter. Describe how the coefficient estimates compare to the OLS model and explain how this model accounts for spatial dependence in the data.
  6. Under a header called “Bayesian Spatial Model” add a code chunk, and a written analysis below the code chunk that does the following:

    • Runs a BYM2 model
    • Interpret the fixed effects from the Bayesian model and describe how they compare to the previous models. Then explain what the BYM2 spatial effect represents and describe the general spatial pattern observed in the map.

5.11 Lab 10

5.11.1 Overview

In the continuous surfaces tutorial we learned several different approaches to making predictions at unmeasured locations for spatially continuous variables.

In this lab, you will practice these approaches using PM2.5 data for the continental US for 2022.

Remember that you can use the Command Compendium to help you modify R commands.

5.11.2 Specifications

This lab is designed to assess the Concept 10 Competencies. You’ll be evaluated on the following specifications:

Specification
HTML and RMD versions the R Markdown file have been submitted.
The RMD is clearly organized (appropriate headings, code chunk formatting, and clean output) so that the analysis is easy to read and understand
IDW: Student fits the specified OLS model and interprets the coefficient estimates in context
Kriging: Student defines a neighborhood structure, creates a spatial weights matrix, and tests OLS residuals for spatial autocorrelation. Written interpretation explains whether residual spatial structure is present
SPDE: Student fits an appropriate spatial regression model and explains why it was chosen. Written interpretation compares the spatial model to the OLS model

5.11.3 Lab Instructions

  1. Create a new .Rmd named “LASTNAME_lab10.Rmd”. Save it into your GEOG391 folder.

  2. Remove sample text (leaving the header and set-up chunk). Add a chunk for loading libraries and reading in data. Add the following code into that chunk:

    library(sf)
    library(tidyverse)
    library(gstat)
    library(sp)
    library(terra)
    library(tmap)
    library(INLA)
    
    ## THESE DATASETS HAVE A PCS IN METERS
    us_boundary <- st_read("https://drive.google.com/uc?export=download&id=1nbJ_ND9qp0sPFvH3LqFjXny2MVxu-cuc")
    
    pm <- st_read("https://drive.google.com/uc?export=download&id=1JLm928sALNKSjproYQz74Itin_Hv1VwN")
  3. Under a header called “IDW” add a code chunk, and a written analysis below the code chunk that does the following:

    • Select an appropriate IDP using cross-validation, choose a reasonable number of grid cells for the prediction surface, and run an IDW interpolation for PM2.5 across the continental United States
    • Explain how the IDP was selected based on the cross-validation results and justify the grid resolution used for the prediction surface. Describe the spatial pattern of the predicted PM2.5 surface, noting any major regional trends or hotspots that appear in the map
  4. Under a header called “Kriging” add a code chunk, and a written analysis below the code chunk that does the following:

    • Calculate an empirical variogram, select an appropriate theoretical variogram model and parameters (sill, nugget, and range), and fit a kriging model.
    • In the written section, explain how the variogram was interpreted and how the model parameters were chosen. Describe the spatial pattern of the kriging surface and interpret the kriging variance map, including where predictions appear more or less certain. Compare the kriging surface to the IDW surface and discuss any noticeable differences in spatial pattern or smoothness
  5. Under a header called “SPDE” add a code chunk, and a written analysis below the code chunk that does the following:

    • Run an SPDE-based Bayesian spatial model on the dataset

    • In the written section, interpret the predicted spatial surface and describe the uncertainty estimates produced by the model (credible intervals). Compare the Bayesian predictions to the IDW and kriging surfaces and discuss how the patterns and uncertainty representations differ across the three approaches