With inverse distance weighting and simple kriging, we assumed that the underlying spatial trend was essentially constant, or at least wasn’t determined by external information.
This is generally a good thing to do when you have no strong external measures giving rise to the measure being sampled.
However, in many situations the underlying spatial trend may be known to have a direction (e.g. it increases as \(x\) or \(y\) increases) or may be known to correlate with some other measure, and we have that other measure available to us at both the sampled locations and unobserved locations where we wish to interpolate.
In this case, we can model the spatial trend in the outcome measure of interest using this covariate information at the sampled locations, and then predict the spatial trend to unobserved locations. The left-overs or residuals from this trend can then be kriged with simple kriging so that the resulting predictions incorporate the spatial correlation.
As an example dataset, we’ll be using a famous dataset of heavy metal
concentrations in topsoil in a flood plain of the river Meuse near the
village of Stein in the Netherlands. It is available within the
sp
package and you can find more details about it using
?meuse
. We’ll be concentrating on the metal
zinc
, and look at it on a log scale:
The goal will be to interpolate this measure so we can estimate the concentration of zinc across the entire flood plain. To do this we’ll make use of a grid for interpolation:
Universal Kriging is used where we have some covariate information that can be used to predict the spatial trend. The process is then:
We notice that zinc concentrations are highest closest to the river, and we have that as a covariate both at our sampled and unsampled locations. Thus, we could use distance to the river to model the spatial trend. Plotting log zinc concentrations versus distance gives:
and we can linearise this relationship by using the square root of distance instead, allowing us to fit a standard linear regression:
This gives an equation relating log zinc concentration to distance from the river,
\[ \log(zinc) = 7 - 2.55 \sqrt{dist} \]
that we can use to predict the log zinc concentration across the flood plain:
We can then subtract the predicted log zinc concentration, given the distance to the river from the observed log zinc concentrations and produce a map of the residuals (observed minus predicted):
We then perform simple kriging on this, by fitting a variogram:
and then kriging to interpolate the ‘residual’ component across the region:
Combining this with the prediction from the linear regression we get our final predictions:
We can then combine the uncertainties from the kriging process and from the linear model predictions to give an uncertainty surface for our predictions:
We notice that the variance in our predictions has now reduced significantly compared to the prediction done using simple kriging.
In practice using R, we add a spatial trend in the kriging process we
change the formula in the variogram
and krige
functions:
dist.var <- variogram(log(zinc) ~ sqrt(dist), locations=meuse.sf)
dist.mod <- fit.variogram(dist.var, model=vgm(0.2, "Sph", 800, 0.05))
krig.dist <- krige(log(zinc) ~ sqrt(dist), meuse.sf, meuse.grid.sf, model=dist.mod)