In this exercise we’re going to experiment with different geometry layers, and look at how to tidy up axis labels.
Start by downloading labA02.R and load it into RStudio.
https://www.massey.ac.nz/~jcmarsha/161122/labs/labA02.R
This script is the same as the last - you should get the same plot if you run the code.
Copy and paste the code that does the plot, and change the
geom_point
to geom_bin2d
. This divides the
latitude and longitude up into small chunks (bins) and then counts the
number of observations in each bin and uses that as the colour.
Repeat the above, but instead of using geom_bin2d
,
try geom_density2d
. This will smooth the data, and give a
contour plot instead like contour lines for elevation on a map. Along
each line, the density of points is the same.
Instead of swapping one geometry for another, we can add them together. e.g:
ggplot(data=quakes) +
geom_point(mapping=aes(x=long, y=lat)) +
geom_density2d(mapping=aes(x=long, y=lat))
Try altering the colour of the contour lines and points. You can
use linetype
to change the contour lines,
e.g. geom_density2d(mapping=aes(x=long, y=lat), linetype='dashed')
.
For the density layer, you might want to also experiment with the
adjust
parameter which governs the amount of smoothing.
Setting it to something below 1 (e.g. 0.5) will result in less
smoothing, so the contours will look a bit noisier, but be closer to the
data. Setting it to something above 1 (e.g. 2) will result in more
smoothing, so the contours will be less noisy, but also not so close to
the data. A good value trades off these two measures (noisiness versus
closeness to the data). This is an example of the Bias-Variance
tradeoff, a key concept in statistics.
Let’s clean up the axis labels. Using the complete words would be
better than the shortcuts. We can do this with
labs(x="Longitude")
. We just add this on in the same way we
added the additional geom_density2d()
function.
We can add a title as well using
labs(title="my title goes here")
. Have a think about what
would make a good title for this plot.
Consider the plot below:
ggplot(data=quakes) +
geom_point(mapping=aes(x=mag, y=stations))
We notice two things on this plot. Firstly, there’s an obvious positive correlation, but secondly we notice that the magnitude variable has clearly been rounded to the nearest tenth as all the points form vertical strips.
Try to improve this plot by:
Investigate geom_jitter
to jitter the points
horizontally a bit. The width
and height
parameters might be useful for altering the amount of jitter.
Try adding some smoothing via geom_smooth
or
geom_density2d
. Which smoother do you think is more
appropriate in this example and why?
You can modify the amount of smoothing in
geom_smooth
via span=0.75
after first
switching to a LOESS (local linear smoother) using code like this:
geom_smooth(mapping=aes(x=mag, y=stations), method='loess', span=0.75)
Smaller values will smooth less, values closer to 1 will smooth more.
This is a LOESS (local linear smoother) and span
here
refers to the proportion of the data used to estimate the trend at each
point (i.e. the trend at the left of the plot uses the left-most 75% of
the data, the trend at the right uses the right-most 75% of the data).
Smaller values of span adapt to the data faster, but will give a noisier
result, while larger values lead to less noise, but don’t adapt quickly
to the data. Another example of the variance-bias trade-off.
Tidy up the axis labels, and choose a good title.