18 Bayesian Modeling

In this chapter, we will explore Bayesian modeling as an alternative to traditional spatial models. In frequentist spatial models like OLS with spatial lag (SLM) or spatial error models (SEM), spatial dependence is typically represented by a single parameter (e.g., ρ in SLM or λ in SEM). This parameter is applied uniformly across the study area, meaning the spatial effect is global and assumes a homogeneous influence of neighboring observations. The focus is on adjusting coefficient estimates to account for spatial autocorrelation, not explicitly modeling spatial heterogeneity.

In frequentist spatial models, spatial dependence is represented either through relationships among observed outcomes (spatial lag models) or through spatially correlated error terms (spatial error models), whereas in Bayesian spatial models it is represented as a latent spatial process that enters the mean structure and can vary across locations.

In Bayesian spatial models, spatial dependence can be treated as a latent (unobserved) process that is estimated at the level of each spatial unit, which allows for local heterogeneity rather than assuming a uniform effect across the entire study area. Bayesian models produce posterior distributions for both the regression coefficients and the latent spatial effects.

We will use data from the 5-year estimates from the American Community Survey for North Carolina Census tracts.

To follow along with this tutorial, make a new .Rmd document. As you move through the tutorial add chunks, headers, and relevant text to your document.

library(tidyverse)
library(tmap)
library(sf)
library(spatialreg)
library(spdep)
library(INLA)
#install INLA using this command:  install.packages("INLA",repos = "https://inla.r-inla-download.org/R/stable", dep = TRUE)

#acs data for north carolina census tractws
acs_tract_nc <- st_read("https://drive.google.com/uc?export=download&id=1cnz4xgdDRZvlXzN3IyvCxOETRWfrpw-0") |>  mutate(log_home_val = log(med_home_value))

18.1 Traditional Spatial Modeling

Suppose we are interested in modeling median home value (log normalized) as a response to several covariates:

Accessibility (percent of people who walk to work)
Median household income
Median rent (for renter occupied units)
Median year built of homes

We can start with running an OLS (assume all assumptions are met except for independence).

#ols
home_value_model <- lm(log_home_val ~ pct_walk_to_work + median_hh_inc + med_rent + med_year_structure_built, data = acs_tract_nc)

summary(home_value_model)


Call:
lm(formula = log_home_val ~ pct_walk_to_work + median_hh_inc + 
    med_rent + med_year_structure_built, data = acs_tract_nc)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.42892 -0.19976 -0.00305  0.20347  1.83045 

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)              7.251e+00  1.095e+00   6.625 4.25e-11 ***
pct_walk_to_work         1.111e-03  1.227e-04   9.056  < 2e-16 ***
median_hh_inc            9.308e-06  2.827e-07  32.921  < 2e-16 ***
med_rent                 2.802e-04  2.086e-05  13.429  < 2e-16 ***
med_year_structure_built 2.086e-03  5.556e-04   3.754 0.000178 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3391 on 2493 degrees of freedom
Multiple R-squared:  0.6048,    Adjusted R-squared:  0.6042 
F-statistic: 953.8 on 4 and 2493 DF,  p-value: < 2.2e-16

We could then check our residual Moran’s I

#add residuals as a variable
acs_tract_nc <-acs_tract_nc |> mutate(olsresid = resid(home_value_model)) 

#map residuals
tm_shape(acs_tract_nc) + tm_polygons(fill = "olsresid",fill.scale = tm_scale_intervals(values = "bu_pu", style = "quantile", n = 5), col_alpha = 0)

#calculate neighborhoods
nb <- poly2nb(acs_tract_nc, queen = TRUE)

#set neighborhood weight matrix
#zero policy allows 0 neighbors (subgraph)
nbw <- nb2listw(nb, style = "W", , zero.policy = TRUE)

#calculate Morans I
gmoran <- moran.test(acs_tract_nc$olsresid, nbw)
gmoran


    Moran I test under randomisation

data:  acs_tract_nc$olsresid  
weights: nbw  
n reduced by no-neighbour observations  

Moran I statistic standard deviate = 36.169, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
     0.4394378449     -0.0004011231      0.0001478792

Because home value is a classic lag effect, we might choose to just run a lag model

#lag model
fit.lag<-lagsarlm(log_home_val ~ pct_walk_to_work + median_hh_inc + med_rent + med_year_structure_built, 
                  data = acs_tract_nc,
                  listw = nbw,
                  zero.policy = TRUE) 
#see results
summary(fit.lag)


Call:lagsarlm(formula = log_home_val ~ pct_walk_to_work + median_hh_inc + 
    med_rent + med_year_structure_built, data = acs_tract_nc, 
    listw = nbw, zero.policy = TRUE)

Residuals:
      Min        1Q    Median        3Q       Max 
-1.287564 -0.181041 -0.005093  0.173196  2.814662 

Type: lag 
Regions with no neighbours included:
 1321 1814 1986 2315 
Coefficients: (asymptotic standard errors) 
                           Estimate Std. Error z value  Pr(>|z|)
(Intercept)              5.7312e+00 1.0234e+00  5.5999 2.144e-08
pct_walk_to_work         1.0116e-03 1.1459e-04  8.8278 < 2.2e-16
median_hh_inc            8.1798e-06 2.7014e-07 30.2795 < 2.2e-16
med_rent                 2.0435e-04 1.9733e-05 10.3559 < 2.2e-16
med_year_structure_built 1.7200e-03 5.2061e-04  3.3038 0.0009538

Rho: 0.19497, LR test value: 318.55, p-value: < 2.22e-16
Asymptotic standard error: 0.011145
    z-value: 17.494, p-value: < 2.22e-16
Wald statistic: 306.05, p-value: < 2.22e-16

Log likelihood: -681.1066 for lag model
ML residual variance (sigma squared): 0.1003, (sigma: 0.31671)
Number of observations: 2498 
Number of parameters estimated: 7 
AIC: 1376.2, (AIC for lm: 1692.8)
LM test for residual autocorrelation
test value: 720.89, p-value: < 2.22e-16

In the spatial lag model, spatial dependence is represented by a single global parameter which means the strength of the spatial relationship between neighboring observations is assumed to be the same across the entire study area.

Q1. Using what you learned in the Regression tutorial, interpret the output of the OLS and SLM.

18.2 BYM Model

Bayesian modeling differs from the frequentist spatial models used earlier because it treats spatial dependence as part of the data-generating process rather than as a single adjustment parameter. In a Bayesian spatial model, the unexplained variation can instead be decomposed into tract-specific latent effects.

Another advantage of the Bayesian framework is that all unknown quantities in the model are treated as random variables with probability distributions. This includes the regression coefficients and the spatial random effects. Bayesian inference combines the information from the data with prior assumptions about these parameters to produce posterior distributions, which represent updated beliefs about the parameters after observing the data.

In this section, we introduce the Besag–York–Mollié (BYM) model, a Bayesian hierarchical model for areal spatial data.

Let $Y_i$ be the logarithm of median home value in census tract $i$, where $i = 1, \ldots, n$. Because the outcome is continuous, we model it using a Gaussian probability distribution:

\[ Y_i \sim N(\mu_i, \sigma^2), \qquad i = 1, \ldots, n \]

This distribution describes the likelihood, meaning how the observed data are generated given the parameters of the model. The mean of the distribution is modeled as

\[ \mu_i = \beta_0 + \beta_1 \, pct\_walk\_to\_work_i + \beta_2 \, median\_hh\_inc_i + \beta_3 \, med\_rent_i + \beta_4 \, med\_year\_structure\_built_i + b_i \]

$\beta_0$ is the intercept
$\beta_1$ through $\beta_4$ are the coefficients for the observed covariates
$b_i$ is the spatial random effect for tract $i$

The spatial random effect captures the portion of housing values that remains spatially patterned after accounting for the observed covariates. In the original BYM model, the spatial random effect is split into two components:

\[ b_i = u_i + v_i \]

These two terms represent two different sources of unexplained variation. The term $u_i$ is the spatially structured effect. This captures the idea that nearby census tracts tend to be similar to one another. If one tract has a higher-than-expected home value after accounting for the covariates, neighboring tracts may also tend to have higher values. This structured effect is modeled using a Conditional Autoregressive (CAR) prior. The CAR prior defines the value of $u_i$ relative to its neighboring tracts:

\[ u_i \mid u_{-i} \sim N\left( \frac{1}{n_i}\sum_{j \in \delta_i} u_j,\; \frac{\sigma_u^2}{n_i} \right) \]

$\delta_i$ is the set of neighboring tracts for tract $i$
$n_i$ is the number of neighbors
the mean of $u_i$ is the average spatial effect of neighboring tracts

Conceptually, this prior smooths values across space. Each tract’s spatial effect is encouraged to be similar to the average of its neighbors. The term $v_i$ represents unstructured variation, meaning tract-level randomness that is not spatially patterned. This component captures variation that cannot be explained by either the covariates or the neighborhood structure. This effect is modeled using an independent and identically distributed (IID) prior:

\[ v_i \sim N(0, \sigma_v^2) \]

The IID prior assumes that each tract’s random noise is independent from the others and centered around zero. Together, these two components allow the model to separate spatial patterns from random local noise.

For this tutorial, we use the BYM2 version of the model. BYM2 is a reparameterization of the original BYM model that combines the structured and unstructured components into a single spatial effect:

\[ b_i = \frac{1}{\sqrt{\tau_b}} \left( \sqrt{1-\phi} \, v_i^* + \sqrt{\phi} \, u_i^* \right) \]

In this formulation:

$u_i^*$ is the scaled spatial (CAR) component
$v_i^*$ is the scaled IID component
$\tau_b$ controls the overall magnitude of the spatial variation
$\phi$ controls how much variation is spatially structured versus unstructured

When $\phi$ is close to 1, most unexplained variation follows spatial neighborhood patterns. When $\phi$ is close to 0, most unexplained variation behaves like independent noise across tracts. One advantage of BYM2 is that it uses Penalized Complexity (PC) priors, which are weakly informative priors designed to avoid overfitting. These priors encourage the model to prefer simpler explanations unless the data provide strong evidence of spatial structure.

To fit the model, we first create a neighborhood structure for the census tracts, similar to what we’ve done before, but creating it as a graph, not a neighborhood weight list.

nb_acs <- poly2nb(acs_tract_nc)

nb2INLA("map.adj", nb_acs)
g <- inla.read.graph(filename = "map.adj")

Next, we create an index variable so that each census tract has its own spatial random effect:

acs_tract_nc$re_u <- 1:nrow(acs_tract_nc)

We then specify the BYM2 model:

formula <- log_home_val ~ pct_walk_to_work + median_hh_inc + med_rent + med_year_structure_built +
  f(re_u, model = "bym2", graph = g)

This formula says that logged median home value is modeled as a function of the observed covariates plus a tract-level Bayesian spatial effect.

We fit the model using inla():

res <- inla(
  formula,
  family = "gaussian",
  data = acs_tract_nc,
  control.predictor = list(compute = TRUE),
  control.compute = list(return.marginals.predictor = TRUE)
)

We can look at the results for our fixed effects here:

res$summary.fixed

                                 mean           sd   0.025quant     0.5quant
(Intercept)              5.097923e+00 9.600268e-01 3.215117e+00 5.097970e+00
pct_walk_to_work         4.762694e-04 9.812044e-05 2.838482e-04 4.762699e-04
median_hh_inc            7.403241e-06 2.424027e-07 6.927831e-06 7.403259e-06
med_rent                 4.887840e-05 1.876974e-05 1.207287e-05 4.887733e-05
med_year_structure_built 3.391024e-03 4.859743e-04 2.438071e-03 3.390999e-03
                           0.975quant         mode          kld
(Intercept)              6.980461e+00 5.097971e+00 1.132826e-11
pct_walk_to_work         6.686880e-04 4.762698e-04 1.143596e-11
median_hh_inc            7.878552e-06 7.403259e-06 1.730068e-11
med_rent                 8.569003e-05 4.887732e-05 1.185277e-11
med_year_structure_built 4.344122e-03 3.390999e-03 1.147945e-11

Because this is a Bayesian model, the results are interpreted using posterior means and 95% credible intervals rather than p-values.

The intercept has a posterior mean of 5.10, with a 95% credible interval from 3.21 to 6.98. Since the response variable is the logarithm of median home value, this represents the expected log home value when all covariates equal zero, after accounting for the spatial random effect.

The coefficient for percent walking to work has a posterior mean of 0.000476, with a 95% credible interval from 0.000283 to 0.000669. Because the entire credible interval is positive, the model suggests that higher walk-to-work rates are associated with higher median home values, holding other variables constant.

The coefficient for median household income has a posterior mean of 7.40 × 10⁻⁶, with a 95% credible interval from 6.93 × 10⁻⁶ to 7.88 × 10⁻⁶. This interval is entirely positive, indicating a strong positive relationship between household income and home value.

The coefficient for median rent has a posterior mean of 4.88 × 10⁻⁵, with a 95% credible interval from 1.20 × 10⁻⁵ to 8.57 × 10⁻⁵. This suggests that tracts with higher rents also tend to have higher home values, which is consistent with overall housing market conditions.

The coefficient for median year structure built has a posterior mean of 0.00339, with a 95% credible interval from 0.00244 to 0.00435. This indicates that tracts with newer housing stock tend to have higher home values.

The fitted object contains posterior summaries for the model quantities. We first extract the posterior mean and 95% credible intervals for the fitted values:

acs_tract_nc$PM <- res$summary.fitted.values[, "mean"]
acs_tract_nc$LL <- res$summary.fitted.values[, "0.025quant"]
acs_tract_nc$UL <- res$summary.fitted.values[, "0.975quant"]

For the BYM2 random effect, the first $n$ rows of res$summary.random$re_u correspond to the tract-level spatial effect $b_i$ that enters the model. We extract it as follows:

n <- nrow(acs_tract_nc)

acs_tract_nc$bym2_effect <- res$summary.random$re_u$mean[1:n]

Finally, we can make the predictions. Remember, this is logged home value.

tm_shape(acs_tract_nc) +
  tm_polygons("PM", fill.scale = tm_scale_intervals(values = "bu_pu", style = "quantile"), col_alpha = 0)

We can then map the posterior mean for the spatial random effect

tm_shape(acs_tract_nc) +
  tm_polygons("bym2_effect", col_alpha = 0)

res$summary.hyperpar

                                                mean           sd  0.025quant
Precision for the Gaussian observations 2.033344e+04 1.799219e+04 1300.838633
Precision for re_u                      8.416736e+00 3.676848e-01    7.711577
Phi for re_u                            7.179693e-01 2.478955e-02    0.667605
                                            0.5quant   0.975quant         mode
Precision for the Gaussian observations 1.492438e+04 6.676419e+04 3535.4755483
Precision for re_u                      8.410266e+00 9.158965e+00    8.4006882
Phi for re_u                            7.185354e-01 7.650927e-01    0.7194705

This effect represents the portion of housing values that remains spatially patterned after accounting for the observed covariates in the model.

Because the response variable is the log of median home value, the spatial effect shifts the predicted value on the log scale.

Positive values of $b_i$ (green areas) indicate tracts where home values are higher than expected given the covariates.
Negative values of $b_i$ (purple areas) indicate tracts where home values are lower than expected given the covariates.

The model also estimates hyperparameters that describe the strength of the spatial structure. The parameter $\phi$ controls how much of the remaining variation is spatially structured versus random noise. In this model, the posterior mean of $\phi$ is approximately 0.72, meaning that about 72% of the remaining unexplained variation follows a spatial pattern, while the rest behaves like independent tract-level noise.

Overall, this map highlights localized spatial patterns in housing prices that remain after accounting for the socioeconomic and housing variables included in the model.