Introduction

With inverse distance weighting we interpolated to new, unobserved locations by using a weighted average of the values at sampled locations, with the weights inversely proportional to the distance to the sampled locations, raised to some power.

This means the amount of influence that points at a distance \(d\) from our unobserved location have is the same, regardless of the distribution of points across the region. Essentially it means that there is a common, global amount of smoothing being applied: the weights for a distance \(d\) are the same regardless of where we are interpolating.

An alternate approach is to allow the amount of smoothing to adapt to changes in the intensity of sampled points as well as changes in the values measured at those points. Essentially we utilise the covariance between the interpoint distances and the interpoint values to tune the weights to what is happening locally around the unobserved point: weights for a distance \(d\) will differ depending on where we are interpolating.

As an example dataset, we’ll be using a famous dataset of heavy metal concentrations in topsoil in a flood plain of the river Meuse near the village of Stein in the Netherlands. It is available within the sp package and you can find more details about it using ?meuse. We’ll be concentrating on the metal zinc, and look at it on a log scale:

The goal will be to interpolate this measure so we can estimate the concentration of zinc across the entire flood plain. To do this we’ll make use of a grid for interpolation:

Simple Kriging

An alternate, and perhaps more statistical way of performing interpolation is to use a statistical model to define how pairs of measurements are expected to be related, given how far they are apart. We do this by creating a variogram which takes half the squared difference in the measure \(z\) between pairs of observations \(i\), \(j\): \[ \gamma_{ij} = \frac{1}{2}(z_i - z_j)^2 \] and plots this against the distance \(d_{ij}\) between observations \(i\) and \(j\), for all pairs of observations up to some given distance (e.g. up to say 1/3 of the area covered by the points).

This is a bit overwhelming to interpret, so we typically divide the distance into bands and then estimate the average \(\gamma\) within each band:

This can be done using the variogram function:

const.var <- variogram(log(zinc) ~ 1, data=meuse.sf)

We then fit a variogram model \(\gamma(d)\) to this. There’s a whole heap of variogram models available:

We usually pick one of these models, then control it’s fit to the data by modifying three main parameters: the nugget, the partial sill and the range. The nugget is the y-intercept (i.e. \(\gamma(0)\)), while the range and partial sill are the point (x and y respectively) where the variogram starts flattening out.

This fitting is done in the fit.variogram function. Here we use the spherical variogram model, and we give it a guess at the partial sill (0.6), the range (1000) and the nugget (0.1). It then optimises these parameters from there and gives:

const.mod <- fit.variogram(const.var, model=vgm(0.6, "Sph", 1000, 0.1))
const.mod
##   model      psill    range
## 1   Nug 0.05066089   0.0000
## 2   Sph 0.59060629 897.0097

We were pretty close with our guesses. We can plot the result on top of our data variogram:

Now that we have a model for how pairs of points will be correlated, we can now do the interpolation. Essentially we use the variogram model to provide localised weighting parameters.

Recall that with inverse distance weighting the interpolated value at an unsampled site was determined by the weighted average of neighbouring points where the weighting parameter (the power \(p\) in the formula) was the same across the entire study extent. Kriging uses the variogram model to compute the weights of neighboring points based on the distribution of those values, so it allows the local pattern of points in the neighbourhood of the unsampled point to define the weights. The weights \(w_1, w_2, \ldots, w_n\) are determined by solving the system of \(n+1\) linear equations \[ \begin{aligned} \sum_j w_j \gamma(\lVert\mathbf{x}_i - \mathbf{x}_j\rVert) + \mu &= \gamma(\lVert\mathbf{x}_i - \mathbf{x}\rVert)\quad i = 1,\ldots,n,\\ \sum_j w_j &= 1. \end{aligned} \] where \(\gamma\) is the variogram function, \(w_j\) are the weights, \(\mathbf{x}\) is the unobserved location for which we want to interpolate, \(\mathbf{x}_i\) are our sampled locations, and \(\mu\) is an additional constant (a Lagrange multiplier) that contributes to the variance (uncertainty) of the kriging estimate.

Once we have the weights \(w_j\), we then compute the interpolated measure \(\hat{z}\) using the same equation as was used for IDW, though we note that \(\sum_j w_j = 1\) so it simplifies to: \[ \hat{z} = \sum_{j=1}^n w_jz_j \] We do this in R with the krige function:

krig.const <- krige(log(zinc) ~ 1, meuse.sf, meuse.grid.sf, model=const.mod)

which gives us an sf object with the same columns (var1.pred and var1.var) as the idw function did. The difference is that var1.var is now useful as a measure of uncertainty. You should see low uncertainty where we have observations, and higher uncertainty where we don’t.