161250 Data Analysis
In a census, every element of the population is contacted, counted, and other information collected or evaluated.
In a survey, a sample from the population is selected (using sampling schemes) and information collected.
Experiments are active than surveys due to application of a treatment. Experiments aim is to study the effects of induced conditions.
Categorical (or Qualitative)
Nominal data a label is assigned to nominal data
Mathematical manipulation of nominal data makes no sense
Ordinal data e.g. rating scales
Quantitative Data
Ratio & Interval scales
Measuring Devices or Instruments
Measurement Error
Indirect measures
a non-sampling error
Selection stage: an element may be selected but not found
Collection stage: it may not be possible to take a measurement
Documentation stage
Call-backs reduce non-response
TARGET POPULATION the population under study
FRAME operationalises data collection from a target population. e.g. listing of elements in population.
ACTUAL POPULATION is the resulting set of elements on which usable data have been collected.
Sample is a subset of the population
Random - refers to process not outcome
To apply the SRS method, the population needs to be homogeneous
SRS is easy to handle; suits even for a poor sampling frame
SRS can be costlier; may lead to a politically incorrect sample!
SRS estimates are more variable
Suitable for heterogeneous populations
Population is divided into homogeneous groups called strata and samples are selected from each stratum
Sampling Approaches
Advantages of STRS
A convenient method of sampling
population is composed of clusters (groups)
Select certain clusters (randomly) and collect measurements from a random selection of the elements within the chosen clusters
Larger variance than SRS!
Select every \(k^{th}\) element!
Random start within the first block of elements.
Convenient and also the sample will be representative of population
Variance of estimates - generally greater than those of SRS
Inefficient/inappropriate, if cycle or trend is present
each element has its own probability of selection
requires an associated variable to be known (e.g. previous census) and each element in the population has a value (size) of it
e.g. “Dollar-Unit Sampling”
Sample Design | Design Effect (\(d\)) | Effective Sample Size (\(\frac{n}{d}\)) |
---|---|---|
SRS | 1.00 | \(n\) |
STRS | 0.80 to 0.90 | \(\frac{n}{0.9}\) to \(\frac{n}{0.8}\) |
Cluster | 1.02 to 1.26 | \(\frac{n}{1.26}\) to \(\frac{n}{1.02}\) |
SyRS | 1.05 | \(\frac{n}{1.05}\) |
Quota | 2 | \(\frac{n}{2}\) |
Issues to address
Bias occurs due to
A sample may have the same biases as a census along with sampling errors