Pearson correlation coefficient is a measure of linear relationship
Case a: Positive linear relationship
Case b: Negative linear relationship
Correlation Matrix
To show all pairwise correlation coefficients
Useful to explore the inter-relationship between variables
Call:corr.test(x = pinetree[, -1])
Correlation matrix
Top Third Second First
Top 1.00 0.92 0.96 0.97
Third 0.92 1.00 0.95 0.91
Second 0.96 0.95 1.00 0.97
First 0.97 0.91 0.97 1.00
Sample Size
[1] 60
Probability values (Entries above the diagonal are adjusted for multiple tests.)
Top Third Second First
Top 0 0 0 0
Third 0 0 0 0
Second 0 0 0 0
First 0 0 0 0
To see confidence intervals of the correlations, print with the short=FALSE option
Correlation Plots
3-D Plots
A bubble plot, shows the third (fourth) variable as point size (colour).
3-D plots are far more useful if you can rotate them
Package plot3D
3-D plots are far more useful if you can rotate them
Package plotly
Contour plots
3D plots are difficult to interpret than 2D plots in general
Contour plots are another way of looking three variables in two dimensions
Conditioning plots
Conditioning Plots (Coplots) show two variables at different ranges of third variable
More R graphs
Build plots in a single layout. R packages patchwork or gridExtra can be used.
Time series data
A Time Series is an ordered sequence of observations of a variable(s) (often) made at equally spaced time points.
Time series Components of variation
Trend - representing long term positive (upward) or negative (downward) movement
Seasonal - a periodic behaviour happening within a block (say Christmas time) of a given time period (say in a calendar year) but this periodic behaviour will repeat fairly regularly over time (say year after year)
Error (Residual)
Time Series Example
Autocorrelation function (ACF)
The \(k ^ \text{th}\) order ACF or the autocorrelation between \(x_t\) and \(x_{t-k}\) is
The significance of autocorrelations may be judged from the 95% confidence interval band
Autocorrelations decay to zero ($20 notes positively depend on the values of $20 notes held in the immediate past rather than too distant past)
PACF (Partial Autocorrelation Function)
A type of correlation after removing the effect of earlier lags
Time series trend types
Requires a (parametric) model to fit the trend (covered later)
Non-parametric fits can also be made
Seasonality
Simple scatter plot of the response variable against time may reveal seasonality directly
Sub-series plots
Seasonality is easily seen graphically when grouping variables are used
ACF plot showing seasonality
White noise errors
Example using random normal data
Time series decomposition
Additive model \(X_t\) = Trend + Seasonal + Error
(where \(X_t\) is an observation at time \(t\))
Multiplicative model \(X_t\) = Trend \(\times\) Seasonal + Error
(trend and seasonal components are not independent)
Detrending means removing the trend from the series, making it easier to see the seasonality.
Deseasoning means removing the seasonality from the series, making it easier to see the trend.
Learning EDA
The best way to learn EDA is to try many approaches and find which are informative and which are not.
Chatfield (1995) on tackling statistical problems:
Do not attempt to analyse the data until you understand what is being measured and why. Find out whether there is prior information such as are there any likely effects.
Find out how the data were collected.
Look at the structure of the data.
The data then need to be carefully examined in an exploratory way before attempting a more sophisticated analysis.
Use common sense, and be honest!
Summary
Size
For small datasets, we cannot be assertive.
Some displays are affected by sample size (eg. stem plot); some may not (eg. smoothed density)
Shape
We are concerned with overall shape of distribution.
Are there gaps and/or many peaks (modes)?
Is the distribution symmetrical? Is the distribution normal?
Outliers
More important than points in the middle
boxplots & scatter plots show them
Graphs should be simple and informative; certainly not misleading!