10  Describing Spatial Data

In this chapter, we will build on the descriptive tools we learned in the Describing Data chapter. Spatial descriptive statistics summarize datasets by their spatial characteristics, not their attribute values. Combining traditional descriptive statistics with spatial descriptive statistics can expand our understanding of the dataset.

We will use the following spatial datasets:

To follow along with this tutorial, download the .Rmd template here. The template already includes a code chunk for loading libraries and reading in the data, along with labeled empty chunks for each section of the tutorial. As you work through the code examples below, add each set of commands to the chunk with the matching section heading in your template.

10.1 Describing Tornadoes

10.1.1 Traditional Descriptive Statistics

Open the tornado object by double-clicking it in the environment. Important variables in the dataset are:

Field Name Description
yr Year of occurrence
st State
mag Rating on Enhanced Fujita Scale
inj Number of injuries
fat Number of fatalities
len Length in miles
wid Width in yards

Q1. What does each observation (each row) represent? How do you know?

Q2. Create a descriptive statistics table for a categorical variable in the dataset. Create one data visualization to compliment the statistics table.

Q3. Create a descriptive statistics table for a quantitative variable in the dataset. Create one data visualization to compliment the statistics table.

10.1.2 Spatial Descriptive Statistics

10.1.2.1 Mean Center

The mean center of a spatial dataset represents the average location of a set of points and is calculated by taking the average of all the x-coordinates and all the y-coordinates in the dataset.

#calculate mean center of all tornados in dataset
torn_mean_center <- center_mean(torn)

#map tornadoes and mean center
tm_shape(torn) + tm_dots() + tm_shape(torn_mean_center) + tm_dots(fill = "blue") + tm_add_legend(
    type = "symbols",
    labels = "Mean Center",
    fill = "blue"
  )

Q4. What does the location of our mean center tell us about the spatial distribution of tornadoes in the United States?

10.1.2.2 Weighted Mean Center

The weighted mean center calculates the average location, but allows you to select a variable to weight by. The calculation then gives more influence to features with larger values of that variable. For instance, we could give higher weight to fatal tornadoes

## calculate weighted mean center using "fat" variable
torn_w_mean_center <- center_mean(torn, weight = torn$fat)


##map mean center and weighted mean center
tm_shape(torn) + tm_dots() + tm_shape(torn_mean_center) + tm_dots(fill = "blue", ) + tm_shape(torn_w_mean_center) + tm_dots(fill = "red") + tm_add_legend(
    type = "symbols",
    labels = c("Mean Center", "Weighted Mean Center"),
    fill = c("blue", "red")
  )

Q5. What does the difference between the mean center and the weighted mean center (weighted by fatalities) tell us about the spatial distribution of fatal tornadoes?

Q6. Using the code below, calculate and map the difference in the weighted mean center for tornadoes before 2000 and after 2000. What does this tell us about how the spatial pattern of tornadoes has changed over time?

#tornadoes before 2000
torn_b_2000 <- torn |> filter(yr < 2000)

#tornados after 2000
torn_a_2000 <- torn |> filter(yr >= 2000)

10.1.2.3 Standard Deviational Ellipse

The standard deviational ellipse is a method for summarizing the spatial central tendency, dispersion, and directional trends. The ellipse visually represents the spread of the data and the direction of the spread. It is centered on the mean center and its axes represent the standard deviation of the x and y coordinates.

#calculate standard ellipse values
std_ellip_torn <- std_dev_ellipse(torn)

#create an ellipse of those values
std_ellip_torn <- st_ellipse(geometry =std_ellip_torn,
                                   sx = std_ellip_torn$sx,
                                   sy = std_ellip_torn$sy,
                                   rotation = -std_ellip_torn$theta)
#map the ellipse with transparency
tm_shape(torn) + tm_dots() + tm_shape(std_ellip_torn) + tm_polygons(fill_alpha = .5)

Q6. What does the standard deviational ellipse tell us about the direction and spread of the tornado dataset?

10.2 Mini-Challenge

This challenge will ask you to calculate a weighted mean center for different for a dataset on North Carolina hospitals. We will focus on a few variables:

Field Name Description
hgenlic Total hospital general beds
rehabhlic Total hospital rehab beds
psylic Total hospital psych beds
nfgenlic Total nursing facility general beds
  1. Open the nc_hosp dataset. What does each row represent?
  2. Calculate the mean center of all hospitals and the weighted mean center based on the four variables of interest
  3. Create a map that displays the variability in the mean centers. Make sure to include a legend