Chapter 1: Data Collection

Census, Survey & Experiments

In a census, every element of the population is contacted, counted, and other information collected or evaluated.
In a survey, a sample from the population is selected (using sampling schemes) and information collected.
Experiments are active than surveys due to application of a treatment. Experiments aim is to study the effects of induced conditions.

Types of data

Categorical (or Qualitative)
- Nominal data a label is assigned to nominal data
  
  Mathematical manipulation of nominal data makes no sense
- Ordinal data e.g. rating scales
Quantitative Data
- Discrete data
- Continuous data
Ratio & Interval scales
- interval scale - division & subtraction may not be meaningful
- ratio scale - all arithmetic manipulation can be done

Measurement issues

Measuring Devices or Instruments
- a physical device - measuring rule to gauge the heights of plants
- a counting device - a Geiger- counter for measuring radioactive material
- a questionnaire - requires a more subjective response.
Measurement Error
- arises if the instrument tends to be faulty
Indirect measures
- e.g. measure of fitness - using BMI?
- temperature - gauged by expansion of mercury!

Non-response

a non-sampling error
Selection stage: an element may be selected but not found
- e.g. sheep in a flock may be tagged with individual identification number but one may not be found at the time of the survey.
Collection stage: it may not be possible to take a measurement
- some respondents may forget, or refuse, to answer the questionnaire
Documentation stage
- Incorrect record of measurement
Call-backs reduce non-response

Census related concepts

TARGET POPULATION the population under study

FRAME operationalises data collection from a target population. e.g. listing of elements in population.

ACTUAL POPULATION is the resulting set of elements on which usable data have been collected.

Sampling related concepts

Sample is a subset of the population
- Sampling conserves resources - money, human-power, time etc
- A well collected sample is more useful than a shoddily taken Census
- Testing may be destructive

Principle of randomisation

to avoid bias
select a sample having similar properties of the population
to estimate how closely a sample reflects the population
epsem (equal probability of selection) sampling methods follow the randomisation principle

Bias vs variance

sampling variation (ie. sample to sample variation) is different from bias

Simple Random Sampling (SRS)

Random - refers to process not outcome
- Each (sampling) unit has same chance of being selected
- Units can be selected with & without replacement

To apply the SRS method, the population needs to be homogeneous

SRS is easy to handle; suits even for a poor sampling frame
SRS can be costlier; may lead to a politically incorrect sample!
SRS estimates are more variable

Stratified Random Sampling (STRS)

Suitable for heterogeneous populations
Population is divided into homogeneous groups called strata and samples are selected from each stratum
Sampling Approaches
- Sample the larger strata more heavily (suits when all the strata are equally variable)
- Sample the more varied strata are sampled
Advantages of STRS
- leads to efficient estimation That is, the variance (of an estimate) is usually less than that of SRS
- sample is spread throughout population

Cluster sampling

A convenient method of sampling
population is composed of clusters (groups)
Select certain clusters (randomly) and collect measurements from a random selection of the elements within the chosen clusters
Larger variance than SRS!

Systematic Random sampling (SyRS)

Select every \(k^{th}\) element!
Random start within the first block of elements.
- Convenient and also the sample will be representative of population
- Variance of estimates - generally greater than those of SRS
- Inefficient/inappropriate, if cycle or trend is present

PPS (probability proportional to size) sampling

each element has its own probability of selection
requires an associated variable to be known (e.g. previous census) and each element in the population has a value (size) of it
e.g. “Dollar-Unit Sampling”
- audit sampling depending on the size.
- That is, the chance of selecting an account is proportional to its value.

Other Sampling methods

Multistage
- e.g. 1st stage - cluster; 2nd stage - SRS
Volunteers
- e.g. Blood donors…
- randomisation?
Snowball / opportunity / Purposive
- e.g. Study of HIV/AIDS patients…
- randomisation?
Capture-Recapture Methods
- e.g. estimating wild life populations

Example

Effective Sample size (thumb rule)

Sample Design	Design Effect (\(d\))	Effective Sample Size (\(\frac{n}{d}\))
SRS	1.00	\(n\)
STRS	0.80 to 0.90	\(\frac{n}{0.9}\) to \(\frac{n}{0.8}\)
Cluster	1.02 to 1.26	\(\frac{n}{1.26}\) to \(\frac{n}{1.02}\)
SyRS	1.05	\(\frac{n}{1.05}\)
Quota	2	\(\frac{n}{2}\)

Summary

Issues to address
- WHAT are collected?
- WHO does the data collection?
- HOW are the data collected?
Bias occurs due to
- SELECTION
- COLLECTION
- NON-RESPONSE (the single largest cause of bias!)
A sample may have the same biases as a census along with sampling errors

Chapter 1:Data Collection