In this exercise we’ll look at plots of a single numeric variable. In that case we’re interested in the distribution of the variable, which can usually be described in terms of:

Start by downloading labA03.R and load it into RStudio.

https://www.massey.ac.nz/~jcmarsha/161122/labs/labA03.R

  1. Start by running the first block of code in labA03.R to produce a histogram.

  2. You should notice a warning that the number of bins has been set to the default value of 30. You can tune this by changing it by adding a bins parameter to geom_histogram() like this:

    ggplot(data=quakes) +
      geom_histogram(aes(x=mag), bins=10)

    Try a range of values, including 20 and 25. You should notice that you get dramatically different shapes from the histogram. Why do you think this is?

  3. Alternatives to the histogram are geom_density or geom_boxplot. Try those out and see what you prefer for these data. You can control the smoothness of the density plot using adjust: values smaller than 1 will result in less smoothing, and larger than one more smoothing.

  4. How would you describe this distribution? Think about center, spread and shape.

  5. Try playing with the col and fill aesthetics for the density or boxplot to see what they do.

  6. Produce similar plots for the depth and stations variables as well. How would you describe those distributions? Add some notes to your R script about this.