4 Labs

4.1 General Formatting Guidance (SAMPLE FILE)

All submitted labs should follow the following formatting guidelines (unless otherwise noted). If your work does not follow these guidelines, it will need to be revised and resubmitted:

Submission:
- You should submit two files for each lab– a .Rmd and a .html.
- You should not submit any other files.
- The files should be saved using the following convention: LASTNAME_Lab_X.Rmd
Code:
- All R code must appear inside code chunks. Do not place any code in the body text.
- All commands MUST have a descriptive comment
- Code must follow the syntax that we have used in class
- Code chunks should not display messages or warnings in the knitted .html. If messages or warnings are printed, you need to include message = F, warning = F in the r header.
- Each code chunk should perform one logical set of tasks/analyses (i.e. one code chunk could include all of your data manipulation, another code chunk could contain all of your data visualizations)
- Do not print large datasets or unnecessary intermediate outputs
Maps
- Maps must use an appropriate visual variable and classification scheme
- Formatting should not detract from map meaning
  - If using a basemap, transparency must be added
  - Display variable must be clear (i.e. renaming legend, adding title)
  - Legend must not cover any map components
Organization
- Code chunks should be logically organized under clear, informative section headers
- Written responses should appear directly below the code chunk that produces the the relevant output
AI
- AI tools may be utilized as a support resource; however, they should not serve as the primary approach to completing work. Students are expected to engage first with course materials, critical thinking, and independent reasoning. Any use of AI must meaningfully contribute to the student’s learning process, and students must be able to justify the role AI played in their work.
- AI use must be documented using the following structure:
  1. A description of the problem(s) you encountered
  2. How (or if) the AI use supported solving this problem
  3. How (or if) this use of AI will help you approach similar problems independently in the future.
- AI documentation should be included as text at the bottom of the document

4.2 Lab 1

4.2.1 Overview

In this lab, you will examine several spatial datasets to understand their structure. You will identify the variables in each dataset, including their types and geographic components, and determine the unit of analysis. You will also identify where each dataset comes from and how the data was collected. Finally, you will assess basic aspects of data quality and representativeness in order to decide whether each dataset is appropriate for a particular use or research question.

4.2.2 Specifications

Specification
Student downloaded three appropriate data files from each of the sources in Part 1.
Student answered all Part 2 questions for each dataset in complete sentences and demonstrated competent understanding (no substantial errors).
Student submitted one Word document with responses and all three data files to Canvas

4.2.3 Lab Instructions

Part 1: Downloading Spatial Data

For this lab, you will download three spatial datasets from different data providers. All datasets should be downloaded in a tabular format (either .csv or .xlsx) so they can be opened in Excel, Numbers, or Google Sheets.

iNaturalist (you will need to create an account to download data):
- Use the “Explore” page to filter observations by a specific location, or a specific species and a specific location.
- Once filters are applied, download data from the “Filters” tab
US Census Bureau:
- Use the search bar and a query (for instance, “poverty in all counties in north carolina in 2020”) to search for a variable of interest for all North Carolina counties
- Select a table and use the Download option
NCOneMap:
- Browse or search for a dataset of interest (e.g., public facilities, infrastructure, environmental features).
- Open the dataset’s information page and download as .csv.

Part 2: Exploring Data

In this part of the lab, you will examine each dataset you downloaded to understand its structure, variables, and source. You are not expected to perform any analysis. Complete the questions below for each dataset. Your responses should be written in complete sentences.

Who collected or provided the data?
How was the data collected?
Would this data be considered “old-school”, “new-school”, or volunteered data?
What does one row in the dataset represent?
What does one column in the dataset represent?
What is the unit of analysis?
Identify at least one numeric variable
Identify at least one categorical variable
Identify any fields that describe location or geography
Are there missing values or incomplete fields?
Identify one limitation or potential source of bias in the data
Describe one research question this dataset could help answer
Describe one research question that this dataset would not be useful for because of the unit of analysis or the representation decisions

4.3 Lab 2

4.3.1 Overview

In this lab, you will work with a population projection file for North Carolina provided by the North Carolina Office of State Budget and Management. You will read in a file, practice some basic data manipulation using base R and tidyverse, and make a graph.

4.3.2 Specifications

Specification
Lab is submitted on Canvas and follows the general formatting guidelines. Minor formatting issues don’t prevent the work from being read and understood.
Reading Data: All required datasets are read into R. Responses correctly identify basic properties of the data. Minor inaccuracies are acceptable if overall understanding is clear.
Manipulating Data: Code produces the required variables and transformed datasets. Minor coding errors or inefficiencies are acceptable if the results are usable.
Describing Data: Required tables and non-map graphics are present. Written responses correctly describe patterns or distributions shown in the outputs. Response is focused on extracting meaning, not simply summarizing descriptive statistics results.

4.3.3 Lab Instructions

Create a new .Rmd document (and save it to your GEOG215 folder).

Then complete the following tasks. Your .Rmd file should be organized so that each task has a text header, a code chunk, comments for each command, and any written components directly under the chunk. Remember to follow the Formatting Guide.

Reading Data

Load the tidyverse, gt , and e1071 libraries and read in the data using the following command: nc_pop <- read_csv("https://drive.google.com/uc?export=download&id=1ogC0lRjEMaXLmRrZkwLVI9MbmJdk4NIy")
Answer the following questions:
- What does each row in the dataset represent?
- What does each column in the dataset represent?
- What is the data type of each column? (either use the class() command or explore the environment tab)

Manipulating Data

Use base R to create an object called high_pop_counties that is just counties that have a a population over 100,000 in 2020
Use tidyverse to create an object called low_pop_counties that is just counties that have a population below 20,000 in 2020
Use base R to add a variable called change to the nc_pop object that is the population difference between 2010 and 2050
Use tidyverse to add a variable called growth to the nc_pop object that assigns a value of “Growing” to counties that are projected to gain population between 2010 and 2050 and a value of “Shrinking” to populations that are projected to lose population between 2010 and 2050
Write a command in base R that calculates the mean of the change variable
Write a command in tidyverse that calculates the max of the change variable
Write a tidyverse command that creates a new object called simp_pop that selects just the change variable and renames it to pop_change

Describing Data

Create a descriptive statistics table of the change variable
Create two graphics using ggplot that help describe the data.
Answer the following questions:
- Describe what the descriptive statistics table tells you about the distribution of the variable
- Describe what the graphics tell you about the variable, focusing on what you couldn’t already learn from the descriptive statistics table

4.4 Lab 3

4.4.1 Overview

In this lab, you will practice working with spatial data in R by examining spatial and non-spatial attributes, mapping variables, and calculating descriptive and spatial descriptive statistics.

You will work with several spatial datasets:

American Community Survey data on average commute for North Carolina counties and census tracts (vector data)
Modeled Wet Bulb Globe Temperature on July 17, 2025 across Orange County, NC provided by Andrew Robinson (Geography PhD student) (raster data). WBGT is a measure of heat stress.
Locations of wind turbines in the continental US from The U.S. Wind Turbine Database

4.4.2 Specifications

Specification
Lab is submitted on Canvas and follows general formatting guidelines. Minor formatting issues may be present but do not interfere with readability or interpretation.
Reading Data: All required datasets are read into R. Responses correctly identify basic properties of the data. Minor inaccuracies are acceptable if overall understanding is clear.
Manipulating Data: Code produces the required variables and transformed datasets. Minor coding errors or inefficiencies are acceptable if the results are usable.
Describing Data: Required maps, tables, and non-map graphics are present. Written responses correctly describe patterns or distributions shown in the outputs. Response is focused on extracting meaning, not simply summarizing descriptive statistics results.

4.4.3 Lab Instructions

Create a new .Rmd document (and save it to your GEOG215 folder).

Reading Data

Load the tidyverse , tmap, spdep, gt, terra, sfdep, and sf libraries

Read in the data using the following commands:

turbines <- st_read("https://drive.google.com/uc?export=download&id=1LLl871Mv3BY7hI56kWPiaMvvk5SxFcQX")

county_wfh <- st_read("https://drive.google.com/uc?export=download&id=1kV-HKXvlrhfSKOHli4QuRfNoW6I76ihN")

tract_wfh <- st_read("https://drive.google.com/uc?export=download&id=1H-MgCZmca_0YLHg3P6zxqhVrldOTddyD")

wbgt_raster <- rast("https://drive.google.com/uc?export=download&id=1l4kZvyCf9ySy7CQlKlzl5y9z6aYHHlpa")

Answer the following questions:
- What is the CRS of each dataset? Is it a geographic coordinate system or a projected coordinate system?
- What is the geometry type of turbines, county_wfh, and tract_wfh?
- What is the resolution of the WBGT data?

Manipulating Data

Use a tidyverse command to calculate a new field perc_wfh in the county_wfh and tract_wfh objects. (wfeE is the number of people working from home and totalE is the total population)

Describing Work From Home

Create a map of perc_wfh for both the county_wfh and tract_wfh objects.
Create a descriptive statistics table of perc_wfh for both the county_wfh and tract_wfh objects
- To make a descriptive statistics table when you are using spatial data you must add the command st_drop_geometry() before creating the table (i.e. DATA |> st_drop_geometry() |> select(VARIABLE)
Create one non-map graphic of perc_wfh for both the county_wfh and tract_wfh objects
Answer the following questions (Remember that the underlying data for the ACS is collected at the individual/household level, the only difference is the level of spatial aggregation):
- How does the distribution of perc_wfh differ between the county-level and tract level data? Include discussion of central tendency, spread, shape, and frequency.
- How does changing the scale of aggregation (counties vs. tracts) affect the spatial pattern you observe?
- What does this example demonstrate about the scale effect of MAUP and why it matters for interpreting spatial data?

Describing Heat Stress in Orange County NC

Create a map of wbgt_raster
Summarize the cell values of wbgt_raster

Describing Wind Turbines

Calculate mean center of wind turbines, standard deviational ellipse of wind turbines, and weighted mean center based on a variable of interest selected from the dataset.
- To calculate the weighted mean center, you may need to drop NA values from your variable of interest. To do this, create a new object turbine_dropped <- turbines |> drop_na(VARIABLEOFINTEREST) and use that object to calculate the weighted mean center.
Create a map that symbolizes the mean center of wind turbines, the standard deviational ellipse of wind turbines, and the weighted mean center (make sure that you add manual legend entries for the mean center and weighted mean center)
Answer the following questions:
- What does your map reveal about the spatial distribution of wind turbines?

4.5 Lab 4

4.5.1 Overview

In this lab, you will practice formulating spatial research questions and finding, downloading, and loading spatial data that is relevant to those questions.

4.5.2 Specifications

Specification
Lab is submitted on Canvas and follows the general formatting guidelines. Minor formatting issues don’t prevent the work from being read and understood. YOU MUST ALSO SUBMIT YOUR DATA FILES
RQ1: Student successfully reads in data using relative file paths and creates a basic map of the datasets. Written interpretation demonstrates competent understanding of the datasets.
RQ2: Student successfully reads in data using relative file paths and creates a basic map of the datasets. Written interpretation demonstrates competent understanding of the datasets.

4.5.3 Lab Instructions

Create a new .Rmd document (and save it to your GEOG215 folder).

RQ1

Formulate a research question that would involve spatial feature engineering (using space to create new variables in a dataset that can be used for spatial or non-spatial modeling)
Find and download two spatial datasets that would help you answer that research question. You should be looking for spatial files (.shp, .geojson, .tif) or a tabular file (.csv) that has coordinates in it.
Read your spatial files into R using relative file paths.
Make a map of each of your datasets. If you found a .csv, you will need to use the st_as_sf() command to make the data spatial.
Answer the following questions for each of your datasets:
- What is the unit of analysis?
- What attributes does the data have that would contribute to answering your research question?
- What types of transformations would be required to be able to combine your datasets into a “tidy” object?

RQ2

Formulate a research question that would involve spatial feature engineering (using space to create new variables in a dataset that can be used for spatial or non-spatial modeling)
Find and download two spatial datasets that would help you answer that research question. You should be looking for spatial files (.shp, .geojson, .tif) or a tabular file (.csv) that has coordinates in it.
Read your spatial files into R using relative file paths.
Make a map of each of your datasets. If you found a .csv, you will need to use the st_as_sf() command to make the data spatial.
Answer the following questions for each of your datasets:
- What is the unit of analysis?
- What attributes does the data have that would contribute to answering your research question?
- What types of transformations would be required to be able to combine your datasets into a “tidy” object?

4.6 Lab 5

4.6.1 Overview

In this lab, you will practice creating analytic datasets by using spatial feature engineering.

4.6.2 Specifications

Specification
Lab is submitted on Canvas and follows the general formatting guidelines. Minor formatting issues don’t prevent the work from being read and understood.
The required transformations are performed to create the analytic variable for each research question. The overall workflow reflects the intended steps, even if minor errors are present
Each analytical dataset includes the required visualization (scatterplot and map). Written responses accurately describe what the analytic variable represents and what the distribution shows.

4.6.3 Lab Instructions

Create a new .Rmd document (and save it to your GEOG215 folder).

Reading Data

Load the tidyverse , tmap, terra, exactextractr, and sf libraries

Read in the data using the following commands:

#all ems stations across the triangle region
triangle_ems <- st_read("https://drive.google.com/uc?export=download&id=1JYZxoM3GB43AnSY2mKW1p_ixeBBh1CBJ")

#all census blocks across the triangle region
triangle_blocks <- st_read("https://drive.google.com/uc?export=download&id=1Q6f9wPXrN0NMZJZkAR9OFkneCfxIBiOl")

#all census blocks across chapel hill
ch_blocks <- st_read("https://drive.google.com/uc?export=download&id=14Jhu9ZQDRL14lQGiTUKcqqrsQcooH6BU")

#raster of average summer temp in ch (in celcius)
ch_summer_heat <- rast("https://drive.google.com/uc?export=download&id=1qvJepSdhFTiKZIAyn76VA9xvlIM8f5s8")

#raster of canopy cover in ch (% per pixel)
ch_canopy_cover <- rast("https://drive.google.com/uc?export=download&id=1a5MibyiyAJgxIqTxzSBOoMNqklErlIVR")

Analytic Dataset #1

For this analytic dataset you should create a new object that adds two columns to the ch_blocks object. One column, av_heat should be the average summer heat across each block. The second column, av_canopy should be the average tree canopy across each block.

After creating the analytic dataset, create a scatterplot dataset |> ggplot(aes(x = VARIABLE1, y= VARIABLE2)) + geom_point() that shows the relationship between summer heat and canopy coverage across Chapel Hill.

Below the code chunk, write a brief interpretation of the relationship shown in the scatterplot.

Analytic Dataset #2

For this analytic dataset you should create a new object that adds a column to the triangle_blocks object. The new column (count_ems) should include a count of each EMS station that is within 5 miles of the block.

After creating the analytic dataset, create a basic map that displays the count_ems variable across the triangle.

Below the code chunk, write a brief interpretation of the spatial pattern of the map.

4.7 Lab 6

4.7.1 Overview

In this lab, you will create well-designed maps for the analytic datasets created in the Analytic Dataset Practicum. Remember to use this valuable resource:

4.7.2 Specifications

Specification
Lab is submitted on Canvas and follows the general formatting guidelines. Minor formatting issues don’t prevent the work from being read and understood.
Maps follow the core design instructions. Minor design or formatting issues are acceptable
All maps use ONLY tmap v4 code

4.7.3 Lab Instructions

Map 1: Warm Summer Days for North Carolina Counties

This map should adhere to the following design guidelines:

County values for percent warm summer days should be represented by dots (hint: you can use the tm_dots() command even with polygons)
The dots should visualized using an appropriate color palette and classification scheme
The map should include a basemap and a legend within the map frame
The map title should be centered above the map

Map 2: Bus Stop Canopy Coverage

This map should adhere to the following design guidelines:

Bus stop points (NOT buffers) should be visualized. You should use the st_drop_geometry() command and a left_join() to join your buffered values back to the original bus stop object
Bus stops should visualized using an appropriate color palette and classification scheme
The map should include a satellite basemap and an appropriately placed legend
The map title should be centered above the map

Map 3: Distance to Bus Stops in Chapel Hill

This map should adhere to the following design guidelines:

Only addresses above a user-selected distance should be visualized (i.e. more than .25 miles from a bus stop)
Chapel Hill boundaries should be included (ch_boundaries <-st_read("https://drive.google.com/uc?export=download&id=1ievfdMpmrZBO1qBI_uILYpb1XbVXmC0b”))
Bus stop locations should be included
The map should include an unlabeled basemap and an appropriately placed legend
The map title should be within the map frame