Spatial interpolation is the process of taking measurement information from a set of points and extending or interpolating that measurement across a spatial region.
The key idea is that measurements over a spatial region tend to be correlated - points that are close to each other typically give similar measurements because they are close.
So when we model the measurement spatially, we need to account for that correlation.
There are a wide range of methods for spatial interpolation, and we’ll cover inverse distance weighting, simple kriging and universal kriging.
As an example dataset, we’ll be using a famous dataset of heavy metal
concentrations in topsoil in a flood plain of the river Meuse near the
village of Stein in the Netherlands. It is available within the
sp
package and you can find more details about it using
?meuse
. We’ll be concentrating on the metal
zinc
, and look at it on a log scale:
The goal will be to interpolate this measure so we can estimate the concentration of zinc across the entire flood plain. To do this we’ll make use of a grid for interpolation:
With inverse distance weighting, we estimate the measurement at an unsampled location using values from nearby locations, weighted in proportion to the proximity of the sampled points to the unsampled location.
Given measurements \(z_i\) at location \(\mathbf{x}_i\) for \(i=1, \ldots n\), we estimate the value \(\hat{z}\) at a new point \(\mathbf{x}\) using:
\[ \hat{z} = \frac{\sum_{i=1}^n w_iz_i}{\sum_{i=1}^n w_i} \] where the weights \(w_i\) are given by \[ w_i = \frac{1}{\lVert \mathbf{x} - \mathbf{x}_i\rVert^p} \]
where \(p\) is the inverse distance power, which we have freedom to vary. The weights are thus the inverse distance between the new point \(\mathbf{x}\) and our sampled points \(\mathbf{x}_i\): sampled points that are close will have a smaller distance and thus a larger weight.
The inverse distance power \(p\) controls how strongly neighbouring points have priority over points far away. If \(p\) is small (e.g. \(p < 1\)) then points far away can have influence on the estimated measure, whereas if \(p\) is larger (e.g. \(p > 2\)) then points further away have very little influence on the estimated measure.
A typical value is \(p=2\) but this
can be experimented with. We can do the estimation with the
idw
function from gstat
which can take either
an sf
object (from sf
) or a
Spatial
object (from sp
). We then supply the
variable we want and the new data locations we want to interpolate to,
along with the power to use:
meuse.idw <- idw(log(zinc) ~ 1, locations=meuse.sf, newdata=meuse.grid.sf, idp = 2)
The idw
then gives us a new object containing the
predictions in the var1.pred
column, which we can then
map:
Using a smaller value of idp
will result in more weight
applied to observations further away, which will thus downweight the
closer observations:
meuse.idw.1 <- idw(log(zinc) ~ 1, locations=meuse.sf, newdata=meuse.grid.sf, idp = 1)