## load necessary libraries
library(tidyverse)
library(gt)
library(sf)
library(e1071)
library(tmap)
#read in data
rurality <- st_read("https://drive.google.com/uc?export=download&id=1aVDPgiZLUnKGb5k2SQo7ScDO2WNvX3Nd")
county_climate_normals <- st_read("https://drive.google.com/uc?export=download&id=1sf0cmMaeHGra7AQheYAjTiC3wGCzB4FG") |> filter(!is.na(average_summer_temp))
acs_county_nc <- st_read("https://drive.google.com/uc?export=download&id=1hXVtKL_33c4QRw6_Z9LlYqjgisdYCRqX")5 Labs
5.1 General Formatting Guidance (SAMPLE FILE)
All submitted labs should follow the following formatting guidelines (unless otherwise noted). If your work does not follow these guidelines, it will need to be revised and resubmitted:
- Submission:
- You should submit two files for each lab– a .Rmd and a .html.
- You should not submit any other files.
- The files should be saved using the following convention: LASTNAME_Lab_X.Rmd
- Code:
- All R code must appear inside code chunks. Do not place any code in the body text.
- All commands MUST have a descriptive comment
- Code must follow the syntax that we have used in class
- Code chunks should not display messages or warnings in the knitted .html. If messages or warnings are printed, you need to include
message = F, warning = Fin the r header. - Each code chunk should perform one logical set of tasks/analyses (i.e. one code chunk could include all of your data manipulation, another code chunk could contain all of your data visualizations)
- Do not print large datasets or unnecessary intermediate outputs
- Maps
- Maps must use an appropriate visual variable and classification scheme
- Formatting should not detract from map meaning
- If using a basemap, transparency must be added
- Display variable must be clear (i.e. renaming legend, adding title)
- Legend must not cover any map components
- Organization
- Code chunks should be logically organized under clear, informative section headers
- Written responses should appear directly below the code chunk that produces the the relevant output
- AI
- AI use must be documented in the following manner:
- What specific task(s) in the assignment an AI tool was used to assist with (e.g., brainstorming ideas, clarifying a concept, debugging code, checking organization or clarity)
- Why AI was used (e.g., confusion about a course concept, uncertainty about assignment expectations or instructions, lack of clear examples in course materials, trouble getting code to run or understanding errors, being stuck and needing a starting point, needing help organizing ideas)
- Reflection on learning (Briefly note whether using AI helped or hindered your understanding, reasoning, or ability to complete the task independently next time)
- One short documentation (less than a paragraph) is sufficient to cover all AI use per assignment (you do not need a separate disclosure for each use)
- AI use must be documented in the following manner:
5.2 Lab 1
5.2.1 Overview
In this lab, you will reflect on the quantitative revolution in geography and the major critiques that emerged in response. The goal is to consider how these critiques can inform a thoughtful, reflexive approach to using quantitative methods throughout the course.
5.2.2 Specifications
This lab is designed to assess the Concept 1 Competencies. You’ll be evaluated on the following specifications:
| Specifications |
|---|
| Student response is submitted to Canvas and is approximately 500 words. Minor grammatical/organizational issues don’t prevent the work from being read and understood. |
| Content Understanding: Student demonstrates a competent understanding of the Quantitative Revolution in geography and the major critiques that emerged in response |
| Reflection & Application: Student thoughtfully reflects on how these critiques will inform their approach to quantitative methods for the remainder of the course. |
5.2.3 Prompt
The quantitative revolution fundamentally reshaped the field of geography by emphasizing formal models, statistical analysis, and claims to objectivity and scientific rigor. In response, humanistic, feminist, Marxist, and critical geographers have raised concerns that quantitative approaches can be reductionist, overly positivist, and falsely neutral, emphasizing abstract space over lived place and obscuring social context, power, and marginalized experiences.
In this response paper, reflect on how these critiques can inform your approach to using quantitative methods as you move forward in a largely quantitative course. Your response should address the following:
- Briefly explain how the quantitative revolution changed geography, and summarize the core concerns raised by critical approaches we’ve discussed in class
- Reflect on how these critiques challenge the idea that quantitative methods are neutral, objective, or sufficient on their own
- Discuss how you can continue to consider these critiques for the rest of the course by explaining how they will influence the questions you ask, how you interpret results, and the claims you make about data/analysis.
Your response should be approximately 500 words
5.3 Lab 2
5.3.1 Overview
In the Describing Data tutorial we worked with spatial datasets describing the rurality of North Carolina census tracts, summer temperatures across the United States, and a demographic variable measured at the census tract level in North Carolina. Spatial data is often aggregated to specific geographic units (for instance, to protect individual privacy or simplify a complex dataset). This aggregation influences the descriptive statistics we calculate.
In this lab, you will examine demographic and climate data at a different level of spatial aggregation and observe how the descriptive statistics change as a result (remember the underlying data values are not changing, only the geographic units used to summarize them). You will also explore an alternative way of defining “rural” and compare how this definition shapes your interpretation of the data.
You will need to complete the Describing Data tutorial to complete this lab!
Remember that you can use the Command Compendium to help you modify R commands
5.3.2 Specifications
This lab is designed to assess the Concept 2 Competencies. You’ll be evaluated on the following specifications:
| Specification |
|---|
| Student submits HTML and RMD versions the R Markdown file. |
| The RMD is clearly organized (appropriate headings, code chunk formatting, and clean output). Minor formatting issues don’t prevent the work from being read and understood. |
| Describing Rurality: Student produces a descriptive statistics table and a map. Written interpretation demonstrates understanding of the patterns and differences between NC Rural Center and USDA RUCA definitions |
| Describing Climate: Student produces descriptive statistics, two visualizations, and a map. Written interpretations demonstrate understanding of central tendency, dispersion, spatial patterns, and aggregation effects. Response is focused on extracting meaning, not simply summarizing descriptive statistics results. |
| Describing Demographics: Student produces descriptive statistics, two visualizations, and a map. Written interpretation demonstrates understanding of central tendency, dispersion, spatial patterns, and differences across aggregation levels. Response is focused on extracting meaning, not simply summarizing descriptive statistics results. |
5.3.3 Lab Instructions
Create a new .Rmd named “LASTNAME_lab2.Rmd”. Save it into your GEOG391 folder.
Remove sample text (leaving the header and set-up chunk). Add a chunk for loading libraries and reading in data. Add the following code into that chunk:
Add “Describing Rurality” as a third-level header. In this section you will compute descriptive statistics and visualizations for North Carolina census tracts using the North Carolina Rural Center’s definition of rurality (rural = fewer than 250 people per square mile). The NC Rural Center variable is named nc_rural_center in the dataset. Add a code chunk below your header. Your code chunk should:
- Create a descriptive statistics table of rurality by census tract
- Create a map of rurality by census tract
Below your chunk, you should write 1-2 paragraphs that do the following:
- Describe what the descriptive statistics tell us about the data
- Describe what the map tells us about the spatial pattern of the data
- Compare the NC Rural Center definition of rural with the USDA RUCA codes, describing how the descriptive statistics differ and how the spatial patterns differ across the state
Add “Describing Climate” as a third-level header. In this section you will compute descriptive statistics and visualizations for average summer temperatures per county in the U.S. Add a code chunk below your header. Your code chunk should:
- Create a descriptive statistics table of average summer temperature by U.S county
- Create two appropriate non-map data visualizations of average summer temperature by U.S. county
- Create a map of average of summer temperature by U.S. county
Below your chunk, you should write 1-2 paragraphs that do the following:
- Describe what the descriptive statistics table and data visualizations tell us about average summer temperature (make sure to discuss central tendency, dispersion, shape)
- Describe what the map tells us about the spatial pattern of the data
- Compare the county-level data results to latitude/longitude (point-level) data, noting
- Differences in descriptive statistics, visualizations, and spatial patterns
- How these differences may relate to the scale or aggregation of the data
Add “Describing Demographics” as a third-level header. In this section you will compute descriptive statistics and visualizations for your selected variable in Mini Challenge #1. Add a code chunk below your header. Your code chunk should:
- Create a descriptive statistics table of your variable by NC county
- Create two appropriate non-map data visualizations of your variable by NC county
- Create a map of your variable by NC county
Below your chunk, you should write 1-2 paragraphs that do the following:
- Describe what the descriptive statistics table and data visualizations tell us about your variable (make sure to discuss central tendency, dispersion, shape)
- Describe what the map tells us about the spatial pattern of the data
- Compare the county-level data results to the census tract data, noting
- Differences in descriptive statistics, visualizations, and spatial patterns
- How these differences may relate to the scale or aggregation of the data
Knit your .Rmd and make sure the formatting and organization are clear in the knitted .html document
5.4 Lab 3
5.4.1 Overview
In the Describing Spatial Data tutorial we learned how to expand our descriptive statistics toolkit to include spatial descriptive statistics.
In this lab, you will practice calculating spatial descriptive statistics on a dataset representing the location of wind turbines in the continental US (variable names are here) and Covid-19 cases and deaths by US county centroid on April 1, 2020 and July 1, 2020.
Remember that you can use the Command Compendium to help you modify R commands
5.4.2 Specifications
This lab is designed to assess the Concept 3 Competencies. You’ll be evaluated on the following specifications:
| Specification |
|---|
| Student submits HTML and RMD versions the R Markdown file. |
| The RMD is clearly organized (appropriate headings, code chunk formatting, and clean output). Minor formatting issues don’t prevent the work from being read and understood. |
| Describing Covid: Student produces descriptive statistics tables, calculates weighted means, and creates maps with mean centers for cases and deaths, including manual legend entries. Written interpretation demonstrates understanding of the distribution, spatial patterns, and why an unweighted mean center isn’t meaningful. Response is focused on extracting meaning, not simply summarizing descriptive statistics results. |
| Describing Wind Turbines: Student produces a descriptive statistics table and one non-map visualization, calculates spatial descriptive statistics (mean center, standard deviational ellipse, weighted mean center), and creates a map with manual legend entries. Written interpretation demonstrates understanding of the variable’s distribution and spatial patterns. Response is focused on extracting meaning, not simply summarizing descriptive statistics results. |
5.4.3 Lab Instructions
Create a new .Rmd named “LASTNAME_lab3.Rmd”. Save it into your GEOG391 folder.
Remove sample text (leaving the header and set-up chunk). Add a chunk for loading libraries and reading in data. Add the following code into that chunk:
## load necessary libraries library(tidyverse) library(gt) library(sf) library(e1071) library(tmap) library(maptiles) library(sfdep) #read in data ##cumulative covid cases and deaths on April 1, 2020 covid_deaths_04012020 <- st_read("https://drive.google.com/uc?export=download&id=1spBLqVxa25a7FDo7U7aNDJF2t2N0-uXv") ##cumulative covid cases and deaths on July 1, 2020 covid_deaths_07012020 <- st_read("https://drive.google.com/uc?export=download&id=1ZNka0ffMXSVAqVDCj1R-oYh37BH9ziAZ") ##wind turbines across the continental US us_wind_turbine_locs <- st_read("https://drive.google.com/uc?export=download&id=1Cd9Nl1cNkGD_F9L68EqefKtFx7X7-IpE")Add “Describing Covid” as a third-level header. In each of the covid datasets, there is a variable named
caseswhich represents the cumulative cases in each county on the given date and a variable nameddeathswhich represents the cumulative deaths in each county on the given date. Note that the dataset only includes counties that had at least one case or death. In this section you will compute descriptive statistics for cases and deaths on each date, compute spatial descriptive statistics, and create a well designed map. Your code chunk should:- Create a descriptive statistics table of cases and deaths for each dataset
- Compute a weighted mean center for cases and deaths on each date
- Create a map for the April 2020 data that symbolizes
casesby county, as well as the mean center forcasesANDdeaths. Make sure that you add manual legend entries for the mean center of cases and deaths. - Create a map for the July 2020 data that symbolizes
casesby county, as well as the mean center forcasesANDdeaths. Make sure that you add manual legend entries for the mean center of cases and deaths.
Below your chunk, you should write 1-2 paragraphs that do the following:
- Summarize what the descriptive statistics reveal about the distribution of Covid cases and deaths
- Explain why creating a meaningful histogram or boxplot for this data would be challenging
- Describe what the maps (including the spatial descriptive statistics) reveal about the spatial distribution of Covid cases and deaths.
- Explain why an unweighted mean center isn’t meaningful in this context (consider the unit of observation)
Add a “Describing Wind Turbines” header. In this section, you will compute descriptive and spatial descriptive statistics for the location and characteristics of wind turbines across the continental US. Your code chunk should:
- Create a descriptive statistics table for one quantitative variable in the turbine dataset.
- Create one non-map data visualization for your variable of interest
- Calculate mean center of wind turbines, standard deviational ellipse of wind turbines, and weighted mean center based on your variable of interest.
- To calculate the weighted mean center, you may need to drop NA values from your variable of interest. To do this, create a new object
turbine_dropped <- us_wind_turbine_locs |> drop_na(VARIABLEOFINTEREST)and use that object to calculate the weighted mean center.
- To calculate the weighted mean center, you may need to drop NA values from your variable of interest. To do this, create a new object
- Create a map that symbolizes your variable of interest, the mean center of wind turbines, the standard deviational ellipse of wind turbines, and the weighted mean center. Make sure that you add manual legend entries for the mean center and weighted mean center (you do not need to add a legend entry for the standard deviational ellipse).
Below your chunk, you should write 1-2 paragraphs that do the following:
- Summarize what the descriptive statistics and non-map visualization reveal about the distribution of your variable
- Describe what the maps (including the spatial descriptive statistics) reveal about the spatial distribution of wind turbines and your variable of interest
Knit your .Rmd and make sure the formatting and organization are clear in the knitted .html document
5.5 Lab 4
5.5.1 Overview
In the Probability tutorial we learned about calculating empirical probability using historical data and applying the binomial and normal probability distributions.
In this lab, you will practice applying the binomial probability distribution using daily water height data from the USGS water gauge in Bolin Creek from 2015-2025. You will also make a probability map of the probability of low January temperatures across North Carolina by applying the normal distribution.
Remember that you can use the Command Compendium to help you modify R commands.
5.5.2 Specifications
This lab is designed to assess the Concept 4 Competencies. You’ll be evaluated on the following specifications:
| Specification |
|---|
| HTML and RMD versions the R Markdown file have been submitted. |
| The RMD is clearly organized (appropriate headings, code chunk formatting, and clean output) so that the analysis is easy to read and understand. |
| Flood Risk at Bolin Creek: Student creates a descriptive statistics table for daily water height, calculates empirical probability of a flood risk day, estimates the most likely number of flood risk days over 25 days, and computes cumulative probability of more than 3 flood risk days. Written interpretation demonstrates understanding of the results. |
| Low Temperature Probabilities Across NC: Student calculates the probability of January days with maximum temperature below 30°F at each ASOS station and produces a probability map for the state. Written interpretation demonstrates understanding of the probabilities and spatial patterns. |
Create a new .Rmd named “LASTNAME_lab4.Rmd”. Save it into your GEOG391 folder.
Remove sample text (leaving the header and set-up chunk). Add a chunk for loading libraries and reading in data. Add the following code into that chunk:
#load libraries library(tidyverse) library(tmap) library(sf) library(tigris) #bolin creek daily data from 2015-2025 bolin_creek_data <- read_csv("https://drive.google.com/uc?export=download&id=1eEzriJ5XzR-aS74nJmyiQFVrmV4T3cel") #NC asos station data from 2000-2020 asos_jan_data <- read_csv("https://drive.google.com/uc?export=download&id=1_zE1kcC7YY083R6rv20sqeN1l2dCdmhi")
5.5.3 Lab Instructions
Add a third-level header called “Flood Risk at Bolin Creek” and a new code chunk below the header. In this code chunk you should:
- Create a new object that represents the daily maximum water height by using the command
daily_max_bolin <- bolin_creek_data |> group_by(time) |> summarise(max_water = max(value)) - Create a descriptive statistics table for daily water height at the Bolin Creek USGS station
- Calculate the empirical probability of a “flood risk day” at Bolin Creek. A flood risk day is defined as a day where the water height is more than 1ft higher than the median of daily height for the period of records (2015-2025).
- Use this empirical probability to determine the most likely number of days of flood risk over a 25 day period.
- Calculate the cumulative probability of having more than 3 flood risk days over a period of 25 days.
- Create a new object that represents the daily maximum water height by using the command
Below this code chunk, write 1-2 paragraphs that address the following
- Explain in plain language what the empirical probability of a flood risk day represents
- What does this probability tell you about how often flood risk occurs at Bolin Creek?
- Describe the output of the binomial calculation and explains which number of flood risk days is most likely in a 25-day period and why this makes sense given the empirical probability
- Interpret the cumulative probability of having more than 3 flood risk days in 25 days and explain what this means in terms of real-world flood risk.
- Explain in plain language what the empirical probability of a flood risk day represents
Add a third-level header called “Low Temperature Probabilities in NC” and add a new code chunk below the header. In this code chunk you should:
- Use the normal distribution to calculate the probability (for each ASOS station) of having a January day with a maximum temperature below 40 degrees F.
- Create a probability map for the full state
Below this code chunk, write 1-2 paragraphs that address the following:
- Are there areas of the state with particularly high or low probabilities? What might explain these differences?
- How could these patterns inform planning or preparedness for cold weather events?
5.6 Lab 5
5.6.1 Overview
In the Estimation with Sampling tutorial we learned about how to make point and interval estimations of means and proportions from samples of a population.
In this lab, you will practice calculating point and interval estimations from a simulated simple random sample generated from the Behavioral Risk Factor Surveillance System 2023 data. This sample represents the population of the United States and explores various health-related behaviors and outcomes. Each row represents an individual’s responses to the survey.
The variables in the dataset are:
| Variable | Description |
|---|---|
| physhlth | Number of days of poor physical health in the last 30 days |
| menthlth | Number of days of poor mental health in the last 30 days |
| x.state | state FIPS code |
| exerany2 | Any exercise in the last 30 days (1= yes, 0= no) |
Remember that you can use the Command Compendium to help you modify R commands.
5.6.2 Specifications
This lab is designed to assess the Concept 5 Competencies. You’ll be evaluated on the following specifications:
| Specification |
|---|
| HTML and RMD versions the R Markdown file have been submitted. |
| The RMD is clearly organized (appropriate headings, code chunk formatting, and clean output) so that the analysis is easy to read and understand. |
| Descriptive Statistics: Student creates a descriptive statistics table and non-map graphic for the three variables. Written interpretation describes distribution and demonstrates understanding of the results. |
| Countrywide Estimates: Student calculates a point estimate and confidence interval for the three variables of interest for the entire country. Written interpretation describes the estimates and relates the estimates to conclusions about the full population. |
| Statewide Estimates: Student calculates a point estimate and confidence interval for the three variables of interest for each state and maps the results. Written interpretation describes variability between states in terms of estimates and confidence interval sizes, as well as spatial patterns. |
5.6.3 Lab Instructions
Create a new .Rmd named “LASTNAME_lab5.Rmd”. Save it into your GEOG391 folder.
Remove sample text (leaving the header and set-up chunk). Add a chunk for loading libraries and reading in data. Add the following code into that chunk:
#load libraries library(tidyverse) library(tmap) library(sf) library(tigris) #BRFSS sample from 2023 brfss_sample <- read_csv("https://drive.google.com/uc?export=download&id=11rKXmplEk0dcvi7lzEQ0OUrXDrnvK4MR") |> mutate(x.state = str_pad(as.character(x.state), width = 2, side = "left", pad = "0"))Add a third-level header called “Descriptive Statistics” and a new code chunk below the header. In this code chunk you should:
- Create a descriptive statistics table each of the three variables (not including state) in the dataset.
- Create one non-map graphic for each of the three variables.
Below this code chunk, write a few sentences the characteristics of the sample distribution for each of the variables.
Add a third-level header called “Countrywide Estimates” and add a new code chunk below the header. In this code chunk you should:
- Calculate (at a 95% confidence level) the point estimation and confidence interval for each of the three variables.
Below this code chunk, write a few sentences that describe the values and what this tells us about the values of the population.
Add a third-level header called “Statewide Estimates and Mapping” and add a new code chunk below the header. In this code chunk you should:
Calculate (at a 95% confidence level) the point estimation and confidence interval for each of the three variables for each state (there will be two missing states). Also calculate the size of the confidence interval for each state
Add the following code to join your dataset to a spatial boundary file
#get just continental US for mapping purposes state_boundaries <- states() |> filter(!(GEOID %in% c("15", "02","60", "66", "69", "72", "78"))) #add state data to state boundaries spatial_data <- state_boundaries |> left_join(YOUR_AGGREGATED_STATE_DATASET, join_by("GEOID" == "x.state"))Create a map of one of the variables. Also create a map of the size of the confidence interval per state
Below this code chunk, write a few sentence that compare the state estimates and the size of their confidence intervals. Explain why some states might have smaller or larger intervals. Also describe any spatial patterns that you see in the means or the size of the confidence intervals.
5.7 Lab 6
5.7.1 Overview
In the Hypothesis Testing tutorial we learned about applying statistical tests to evaluate whether there is enough evidence to determine if an estimate (calculated from a sample) is statistically different than another estimate, within a certain level of confidence.
In this lab, you will practice applying statistical tests to a sample of North Carolina public school data on chronic absences. Each school in the sample contains a schoolwide mean for absenteeism before Covid (2018-2019) and after Covid (2021-2024), as well as a rural/non-rural designation depending on the county the school is located in.
Remember that you can use the Command Compendium to help you modify R commands.
5.7.2 Specifications
This lab is designed to assess the Concept 6 Competencies. You’ll be evaluated on the following specifications:
| Specification |
|---|
| HTML and RMD versions the R Markdown file have been submitted. |
| The RMD is clearly organized (appropriate headings, code chunk formatting, and clean output) so that the analysis is easy to read and understand. |
| Descriptive Statistics: Student creates a descriptive statistics table, map, and non-map graphic for the two Covid-19 variables. Written interpretation describes distribution and demonstrates understanding of the results. |
| Scenario 1: Student determines the appropriate test, discusses assumptions, and interpretation demonstrates understanding of the results. |
| Scenario 2: Student determines the appropriate test, discusses assumptions, and interpretation demonstrates understanding of the results. |
| Scenario 3: Student determines the appropriate test, discusses assumptions, and interpretation demonstrates understanding of the results. |
5.7.3 Lab Instructions
Create a new .Rmd named “LASTNAME_lab6.Rmd”. Save it into your GEOG391 folder.
Remove sample text (leaving the header and set-up chunk). Add a chunk for loading libraries and reading in data. Add the following code into that chunk:
#load libraries library(tidyverse) library(tmap) library(sf) school_sample <- st_read("https://drive.google.com/uc?export=download&id=1Yys2-G69_tc9YC-7jlRgsfXn0faLZhUV")Under a header called “Describing School Absences” make a map, a descriptive statistics table, and one non-map graphic for both variables (pre_covid_pct, after_covid_pct). Describe spatial patterns and the characteristics of the datasets.
Under a header called “Testing Hypotheses” identify an appropriate statistical test, test assumptions, generate a null and alternative hypothesis, and interpret results for the following scenarios:
The known countrywide mean for school absences is 14.5%. Does the North Carolina mean (drawn from a sample of NC schools) differ significantly from the countrywide mean?
Is the absentee rate significantly higher in the years after Covid-19?
Does the absentee rate differ significantly between urban and rural schools before Covid-19? What about after Covid-19?
5.8 Lab 7
5.8.1 Overview
In the Point Pattern Analysis tutorial, we learned how to interpret the spatial distribution of events (represented as points) in a defined study area/region based on comparing a spatial distribution to CSR (complete spatial randomness).
In this lab, you will practice applying these skills to a dataset representing wildfire locations in Croatan National Forest in North Carolina from 1957-2024 (from US Forest Service).
Remember that you can use the Command Compendium to help you modify R commands.
5.8.2 Specifications
This lab is designed to assess the Concept 7 Competencies. You’ll be evaluated on the following specifications:
| Specification |
|---|
| HTML and RMD versions the R Markdown file have been submitted. |
| The RMD is clearly organized (appropriate headings, code chunk formatting, and clean output) so that the analysis is easy to read and understand. |
| Descriptive Statistics: Student creates a descriptive statistics table, map, and non-map graphic for the two Covid-19 variables. Written interpretation describes distribution and demonstrates understanding of the results. |
| Analyzing Global Structure: Student computes and visualizes a quadrat count and tests it against CSR, and creates a kernel density estimate using an appropriate bandwidth. Written interpretation demonstrates understanding of first-order processes. |
| Analyzing Local Structure: Student computes and visualizes the G-function and F-function. Written interpretation demonstrates understanding of second-order processes. |
| Describing Wildfires: Student integrates global and local results to identify the dominant spatial process (first-order, second-order, or both) and proposes a specific hypothesis informed by background research. |
5.8.3 Lab Instructions
Create a new .Rmd named “LASTNAME_lab7.Rmd”. Save it into your GEOG391 folder.
Remove sample text (leaving the header and set-up chunk). Add a chunk for loading libraries and reading in data. Add the following code into that chunk:
#load libraries library(sf) library(tidyverse) library(spatstat) forest_boundary <- st_read("https://drive.google.com/uc?export=download&id=1Ku6Hv_RX4WSABaHh9Pj_x_HEBXOW0HfS") wildfire_points <- st_read("https://drive.google.com/uc?export=download&id=16m9HJAn5uzSyiUdhdhoboPRRi9dDDQaT")Under a header called “Analyzing Global Structure” add a code chunk, and a written analysis below the code chunk that does the following:
- Compute a quadrat count using 9 quadrats and display the plot
- Test the quadrat count against CSR using a quadrat test
- Create a KDE for wildfire counts (this will require experimenting with bandwidths)
- Interpret your results. What do they tell you about the global structure of the dataset?
Under a header called “Analyzing Local Structure” add a code chunk, and a written analysis below the code chunk that does the following:
- Compute and plot the G-Function
- Compute and plot the F-function
- Interpret your results. What do they tell you about the global structure of the dataset?
Under a header called “Describing Wildfires” consider your full set of results (global and local structure). Identify whether there is evidence for the pattern of wildfires reflecting first-order processes, second-order processes, or a combination of both. Based on brief background research on wildfires and the study area, propose one specific, plausible hypothesis about an underlying process that could be contributing to the spatial pattern.
5.9 Lab 8
5.9.1 Overview
In the Autocorrelation tutorial, we learned how to define spatial neighbors and how to use these neighborhoods to assess whether values at locations are related to values at near locations.
In this lab, you will practice applying these skills to a dataset representing average SAT score at North Carolina public high schools and workplace mobility during the first few months of Covid using the Google mobility dataset.
Remember that you can use the Command Compendium to help you modify R commands.
5.9.2 Specifications
This lab is designed to assess the Concept 8 Competencies. You’ll be evaluated on the following specifications:
| Specification |
|---|
| HTML and RMD versions the R Markdown file have been submitted. |
| The RMD is clearly organized (appropriate headings, code chunk formatting, and clean output) so that the analysis is easy to read and understand. |
| Defining Neighborhoods: Student defines neighborhoods and creates spatial weight matrices for both NC schools and NC counties. Written justification explains why the chosen neighborhood definition is appropriate for this analysis. |
| Analyzing Global Autocorrelation: Student computes Moran’s I for school SAT scores and county-level Covid-19 workplace mobility. Written interpretation explains what the Moran’s I values indicate about spatial structure and strength of autocorrelation. |
| Analyzing Local Autocorrelation: Student computes and maps local indicators of spatial autocorrelation for both datasets. Written interpretation describes the observed clustering patterns and proposes one reasonable hypothesis for why these patterns exist. |
5.9.3 Lab Instructions
Create a new .Rmd named “LASTNAME_lab8.Rmd”. Save it into your GEOG391 folder.
Remove sample text (leaving the header and set-up chunk). Add a chunk for loading libraries and reading in data. Add the following code into that chunk:
#load libraries library(sf) library(tidyverse) library(spatstat) school_sat <- st_read("https://drive.google.com/uc?export=download&id=1i-C2PF3ajbpHTlavh3m1QbWNObpobprZ") covid_mobility <- st_read("https://drive.google.com/uc?export=download&id=1YbCNVR5K8jDF5HOKHcoJ25gz7VkRvDHa")Under a header called “Defining Neighbors” add a code chunk, and a written analysis below the code chunk that does the following:
- Defines neighborhoods and creates a neighborhood weight matrix for NC schools and NC counties. You can use any appropriate neighborhood definition
- Justify your neighborhood definition. Why is it useful for this particular analysis?
Under a header called “Analyzing Global Autocorrelation” add a code chunk, and a written analysis below the code chunk that does the following:
- Computes Moran’s I for school SAT scores and NC county covid mobility scores using your selected neighborhood definition. For the covid mobility dataset, you should use the
workplace_changevariable, which represents the percent change in workplace mobility from March-May 2020. - Describe your results. What does the Moran’s I value tell you about the structure of the data (and the strength of this structure)?
- Computes Moran’s I for school SAT scores and NC county covid mobility scores using your selected neighborhood definition. For the covid mobility dataset, you should use the
Under a header called “Analyzing Local Autocorrelation” add a code chunk, and a written analysis below the code chunk that does the following:
- Calculate and map Local Indicators of Autocorrelation for both datasets
- Describe your results. What is the spatial pattern of clustering? Propose one reasonable hypothesis for why this pattern might exist.