In this exercise we’ll look at plots of a single numeric variable. In that case we’re interested in the distribution of the variable, which can usually be described in terms of:
Start by downloading labA03.R and load it into RStudio.
https://www.massey.ac.nz/~jcmarsha/161122/labs/labA03.R
Start by running the first block of code in labA03.R
to produce a histogram.
You should notice a warning that the number of bins has been set
to the default value of 30. You can tune this by changing it by adding a
bins
parameter to geom_histogram()
like
this:
ggplot(data=quakes) +
geom_histogram(aes(x=mag), bins=10)
Try a range of values, including 20 and 25. You should notice that you get dramatically different shapes from the histogram. Why do you think this is?
Alternatives to the histogram are geom_density
or
geom_boxplot
. Try those out and see what you prefer for
these data. You can control the smoothness of the density plot using
adjust
: values smaller than 1 will result in less
smoothing, and larger than one more smoothing.
How would you describe this distribution? Think about center, spread and shape.
Try playing with the col
and fill
aesthetics for the density or boxplot to see what they do.
Produce similar plots for the depth
and
stations
variables as well. How would you describe those
distributions? Add some notes to your R script about this.