PG Curriculum

Sampling in Research Methodology — Teaching Framework

Core Concepts in Sampling

Before teaching any sampling method, students must master the foundational vocabulary. These are the building blocks upon which all sampling theory rests.

💡

The Central Problem of Sampling

We can almost never study an entire population. Sampling is the science of selecting a subset (sample) from the population in a way that allows us to make valid inferences about the whole. The quality of your research is only as good as the quality of your sample.

🔑 Master Mnemonic — Elements of Sampling

Pop Study Sampling Sample Statistic

Population (Target) → Study Population (Accessible) → Sampling Frame (List) → Sample (Selected units) → Statistic (Measurement on sample, estimates Parameter of population)

Target Population

The entire group about which you want to draw conclusions.

Example: All adults with hypertension in India

Often too large to study directly → need a sample

Study Population

The accessible subset of the target population from which you will actually sample.

Example: Adults with hypertension in Cuttack district

Must be clearly defined with inclusion/exclusion criteria

Sampling Frame

A complete list of all sampling units in the study population. The foundation of probability sampling.

Example: Voter list, hospital register, school roll

If the frame is flawed, the sample is flawed

Sampling Unit

The element selected at each stage of sampling. May be individual, household, village, clinic, etc.

Example: Individual (simple), Household (cluster), School (two-stage)

Must match your research question

Parameter vs Statistic

Parameter: True value in the population (unknown; what we want)
Statistic: Value calculated from the sample (known; our estimate)

Example: True prevalence of TB (parameter) vs. prevalence in our sample (statistic)

Sampling Error vs Bias

Sampling Error: Random variation; unavoidable; decreases with larger sample size

Sampling Bias: Systematic error; does NOT decrease with larger sample; caused by flawed method

A large biased sample is WORSE than a small unbiased one

Key Terminology Reference

Term	Definition	Symbol	Example in Health Research	Common Confusion
Population (N)	All individuals about whom inference is to be made	N	All TB patients in India	Confused with study population (accessible group)
Sample (n)	Subset of population actually studied	n	200 TB patients in Odisha	Students often confuse sample size with sample method
Sampling Fraction	n/N — proportion of population included	f = n/N	200/20,000 = 1% of TB patients studied	Larger fraction ≠ better sample if method is biased
Representativeness	Degree to which sample reflects population characteristics	—	Sample has same age/sex distribution as source	Large sample ≠ representative sample
Precision	Narrowness of confidence interval; repeatability	1/SE	Prevalence: 12% ± 2% vs 12% ± 8%	Precision ≠ accuracy (can be precise but biased)
Confidence Interval	Range likely to contain the true parameter	CI	95% CI: 10%–14% for TB prevalence	NOT "95% chance the true value is in this range"
Design Effect (DEFF)	Ratio of variance with complex design vs SRS	DEFF	DEFF=2 means cluster sampling needs 2× sample size	Students forget to account for DEFF in cluster studies
Intraclass Correlation	Similarity of units within a cluster	ICC/ρ	Children in same school more similar in vaccination status	Higher ICC → bigger DEFF → need more clusters

Sampling Taxonomy — The Complete Tree

The entire universe of sampling methods, organised hierarchically. Click any method to see its full profile.

🗂️ Sampling Methods in Research

⬇

🎲 Probability Sampling
Every unit has a known, non-zero chance of selection

Simple Random Sampling (SRS)

Equal probability for each unit; lottery or random number table

Systematic Random Sampling

Every kth unit from a list; k = N/n (sampling interval)

Stratified Random Sampling

Divide into strata, then SRS within each stratum

Cluster Sampling

Randomly select groups (clusters); study all units in selected clusters

Multi-stage Sampling

Two or more levels of random selection; used in national surveys

PPS Sampling

Probability Proportional to Size; clusters selected by size weight

🔍 Non-Probability Sampling
Selection based on judgement, availability, or convenience

Convenience Sampling

Whoever is available and willing; quickest but most biased

Purposive (Judgement) Sampling

Deliberate selection based on researcher's judgement of typicality

Quota Sampling

Pre-set quotas for subgroups; no random selection within quotas

Snowball Sampling

Existing participants recruit future participants; hidden populations

Volunteer / Self-selection

Participants come forward on their own; extreme volunteer bias

Consecutive Sampling

All eligible patients in sequence over time; common in clinical studies

🔀 Mixed / Special Methods
Specialised designs for specific research contexts

Lot Quality Assurance Sampling (LQAS)

WHO method for monitoring health program coverage; accept/reject lots

Respondent-Driven Sampling (RDS)

Advanced snowball with mathematical weights; drug users, sex workers

Adaptive Cluster Sampling

Add neighbours when rare event found; wildlife, rare disease research

Theoretical Sampling

Grounded theory; sample until theoretical saturation is reached

EPI 30×7 Cluster Sampling

WHO's immunisation coverage method; 30 clusters × 7 children

🎓 Teaching Tip: After showing this tree, ask students: "Which method would you use to study TB prevalence in a district of 500 villages?" — Let them debate before teaching. The correct answer involves multi-stage cluster sampling. The debate reveals their conceptual gaps.

Probability Sampling — Methods in Depth

Every unit has a known, non-zero probability of being selected. This is the gold standard for quantitative health research as it allows generalisability.

✅

Why Probability Sampling is Preferred

Allows statistical inference — you can calculate confidence intervals, test hypotheses, and generalise to the population. The mathematical theory of sampling is built on probability. Without probability sampling, you cannot legitimately extrapolate beyond your sample.

Method	Mechanism	Sampling Frame Needed?	Best Used When	Advantages	Disadvantages	Real-World Example
Simple Random SRS	Lottery / Random number table / Computer random; each unit has equal probability = n/N	Yes (complete)	Small, homogeneous, well-listed populations	Unbiased; easy to understand; forms basis of statistical theory	Requires complete sampling frame; not practical for large dispersed populations; may miss minorities	Selecting 200 patients from hospital register of 2000 using random numbers
Systematic Every kth unit	k = N/n; random start between 1 and k; select every kth unit thereafter	Yes (ordered list)	Large populations with sequential lists (registers, records)	Easy to execute; spread across list; no need to number all units if list exists	Periodicity bias if list has periodic pattern (e.g., every 7th patient is a Monday case)	Antenatal clinic: k=10; randomly start at 4, then select 4, 14, 24, 34...
Stratified SRS within strata	Divide population into homogeneous subgroups (strata); SRS within each stratum; proportional or disproportionate allocation	Yes (per stratum)	Heterogeneous populations; need subgroup estimates; want to ensure minority representation	Ensures representation; reduces variance vs SRS; permits subgroup analysis	Complex; must know stratum sizes; disproportionate allocation requires weighting	NFS surveys: stratify by urban/rural, then state, then household type
Cluster Whole clusters selected	Divide population into clusters; randomly select clusters; study ALL units in selected clusters	No (only cluster list)	Geographically dispersed; no individual sampling frame; field surveys	Practical; cost-efficient; no complete frame needed; feasible in field	Less precise than SRS (clustering effect); DEFF >1; need more subjects	District nutritional survey: randomly select 30 villages; study all children in selected villages
Multi-stage Nested random	Two or more stages of random selection; each stage uses a sampling frame for that level	Yes (at each stage)	Large national studies; hierarchical populations (districts→blocks→villages→households)	Practical for national surveys; economical; flexible design	Complex; errors compound across stages; needs lists at each level	NFHS: State → District → PSU (village/ward) → Household → Individual
PPS Sampling Size-weighted	Probability of selecting a cluster proportional to its size; ensures equal probability of individual selection	Yes (with size data)	Clusters of unequal size; want equal individual probability	Equal probability for all individuals; no weighting needed in analysis	Need size information for all clusters; complex to implement	EPI cluster sampling: villages selected proportional to their population size

⚙️ Key Formulas — Probability Sampling

Sampling Interval (k) = N ÷ n

N = Population size · n = Required sample size · k = Sampling interval (Systematic sampling)
Proportional Allocation: nᵢ = n × (Nᵢ / N) where Nᵢ = stratum size
Design Effect (DEFF) = 1 + (m−1) × ICC · m = cluster size, ICC = intraclass correlation
Effective sample size = n ÷ DEFF · Required n (cluster) = SRS sample × DEFF

Non-Probability Sampling — Methods in Depth

Selection is not based on random chance. Not all units have a known probability of selection. Used in qualitative research, exploratory studies, and when probability sampling is impossible.

⚠️

When is Non-Probability Sampling Justified?

Non-probability sampling is not inherently wrong — it is appropriate for qualitative research, pilot studies, exploratory work, hidden populations, and situations where probability sampling is logistically impossible. The error is using it when probability sampling was feasible and then claiming generalisability.

Method	Mechanism	Bias Risk	Best Used For	Strengths	Limitations	Health Research Example
Convenience Accidental	Select whoever is readily available; patients in OPD, students in class, mall visitors	HIGH	Pilot studies; feasibility testing; quick surveys	Cheapest; fastest; easy to execute; good for hypothesis generation	Highly biased; not representative; cannot generalise; selection entirely determined by convenience	Exit interviews with OPD patients to pilot a questionnaire on patient satisfaction
Purposive Judgement	Researcher deliberately selects "information-rich" cases based on specific characteristics	MODERATE	Qualitative research; case studies; key informant interviews	Focused on relevant cases; efficient for specific objectives; expert knowledge used	Researcher bias in selection; not generalisable; dependent on researcher's judgement	Selecting ASHA workers with ≥5 years experience for in-depth interviews on community health
Quota Non-random strata	Set fixed quotas for subgroups (age, sex, etc.); fill quotas by convenience within each subgroup	MODERATE	Market research; large surveys where frame unavailable; needs subgroup balance	Ensures proportional representation of subgroups; faster than stratified random; no frame needed	Selection within quota is non-random; interviewer bias; cannot calculate sampling error	Community survey: quota of 50 males and 50 females in each age group, recruited at convenience
Snowball Chain referral	Initial seeds recruited; each participant refers others; chain grows like a snowball	MODERATE	Hidden or hard-to-reach populations; stigmatised groups	Only practical method for some populations; builds trust networks; reaches hidden groups	Selection bias towards well-connected individuals; network clustering; non-representative	Studying risk behaviour in IV drug users; FSW health surveys; undocumented migrants
Volunteer Self-selection	Individuals volunteer in response to advertisement or invitation	VERY HIGH	Clinical trials (with random allocation after recruitment); experimental studies	Motivated participants; good compliance; ethical (consent built in)	Healthy worker effect; volunteers are atypical; extreme self-selection bias	Vaccine trial: volunteers respond to ad; then randomised to vaccine vs placebo
Consecutive Sequential	Every eligible patient presenting over a defined time period is recruited; no random selection	LOW–MOD	Hospital-based clinical studies; OPD-based research	Simple; minimises selection bias within available patients; complete capture of eligible cases	Limited to patients attending that facility; selection bias due to healthcare-seeking behaviour	All newly diagnosed diabetic patients in medicine OPD over 6 months included in study

Quota vs Stratified — The Classic Confusion

Quota: Same subgroups; NO random selection within groups; interviewer picks convenient subjects to fill quota; cannot calculate sampling error

Stratified: Same subgroups; YES random selection within strata; unbiased within stratum; can calculate sampling error

They look similar in design but differ fundamentally in how units within groups are selected.

Snowball vs RDS — Important Distinction

Snowball: Simple chain referral; no mathematical weights; biased towards well-connected; qualitative research

RDS: Advanced snowball with mathematical corrections for network effects; gives population estimates; used in HIV research with FSW, MSM

RDS can produce valid prevalence estimates where Snowball cannot.

Comparison — Side by Side

The master comparison table — use this for teaching contrasts, exam preparation, and decision-making in research design.

Criterion	SRS	Systematic	Stratified	Cluster	Multi-stage	Convenience	Purposive	Snowball
Type	Probability	Probability	Probability	Probability	Probability	Non-Prob	Non-Prob	Non-Prob
Sampling Frame Required	Yes (complete)	Yes (ordered)	Yes (per strata)	No (cluster list only)	Partial (at each level)	No	No	No
Representativeness	High	High (if no periodicity)	Very High	Moderate	Moderate–High	Low	Low	Low
Statistical Inference	Yes	Yes	Yes	Yes (with DEFF)	Yes (with weights)	No	No	No (RDS: limited)
Cost / Complexity	Low–Moderate	Low	Moderate	Low–Moderate	High	Very Low	Low	Low
Variance / Precision	Benchmark	Equal or better than SRS	Better than SRS	Worse than SRS (DEFF >1)	Variable	Cannot estimate	Cannot estimate	Cannot estimate
Bias Risk	Very Low	Low (periodicity risk)	Very Low	Moderate (homogeneity)	Low–Moderate	High	Moderate	Moderate–High
Best Research Type	Prevalence studies; RCTs	Hospital-based; sequential lists	Comparative studies; surveys	Field surveys; national studies	NFHS; DHS; large national surveys	Pilot; qualitative	Qualitative; key informants	Hidden populations
Indian Health Example	PHC patient study using OPD register	Every 5th antenatal visit to clinic	Urban/rural stratified TB survey	Village-based NCD survey	NFHS-5; DLHS; AHS	Questionnaire pilot in OPD	ASHA worker interviews	IDU risk behaviour survey

Interactive Decision Guide — Which Sampling Method?

🧭 Choose Your Sampling Method — Answer These Questions

Q1. Do you need to generalise findings to a larger population? (Quantitative study?)

Yes → Need probability sampling →

No → Qualitative/exploratory → Non-probability is acceptable →

→ Use Probability Sampling

Now consider: Do you have a complete sampling frame (list of all units)?
Yes + Small population: Use SRS or Systematic Random
Yes + Heterogeneous population: Use Stratified Random
No complete frame + Field survey: Use Cluster or Multi-stage
National scale survey: Multi-stage with PPS (like NFHS)

→ Non-Probability Sampling — Choose Type

Pilot/quick survey: Convenience sampling
In-depth qualitative: Purposive sampling
Need subgroup balance (no frame): Quota sampling
Hidden population (IDU, FSW): Snowball or RDS
Hospital-based clinical study: Consecutive sampling

Q2. Is your population homogeneous or heterogeneous in the outcome of interest?

Homogeneous → SRS or Systematic gives good results →

Heterogeneous → Stratified sampling reduces variance significantly →

→ SRS or Systematic Sampling

When population is homogeneous, SRS gives efficient, unbiased results. Systematic is simpler operationally if a sequential list exists. Watch for periodicity in systematic sampling.

→ Stratified Random Sampling

Divide into strata that are internally homogeneous but differ from each other. Use proportional allocation for overall estimates. Use disproportionate allocation if you need equal precision for each stratum (e.g., rare ethnic minority).

Q3. Is your target population geographically dispersed and costly to reach individually?

Yes → Cluster or Multi-stage is the practical choice →

No → Concentrated, accessible → Use SRS or Systematic →

→ Cluster or Multi-stage Sampling

Cluster: Randomly select groups (villages/schools/wards), study all units within. Increases efficiency but reduces precision (DEFF > 1). Multi-stage: Add another layer of random selection at each level — best for national surveys like NFHS.

→ SRS or Systematic Sampling

When population is accessible and a sampling frame exists, simple approaches work best. Systematic is easiest operationally. SRS gives the theoretical gold standard.

Sample Size — The Science of How Many

One of the most commonly asked questions in research: "How many subjects do I need?" Sample size is determined by statistical requirements, not budget or convenience.

📐

The 4 Determinants of Sample Size

Sample size is determined by: (1) Expected prevalence/effect size — the more extreme (near 50% for prevalence), the more subjects needed. (2) Precision desired — narrower CI needs more subjects. (3) Confidence level — 99% CI needs more than 95% CI. (4) Power — analytical studies need power to detect differences.

📐 Core Sample Size Formulas

Cross-sectional (Prevalence): n = Z² × p × q / d²

Z = Z-value for confidence level (1.96 for 95% CI; 2.576 for 99%) · p = expected prevalence · q = 1−p · d = allowable error (absolute precision)

Analytical (Comparison): n = Z² × 2pq / d² (each group) or use Kelsey formula for OR/RR

Cluster adjustment: n_cluster = n_SRS × DEFF · DEFF = 1 + (m−1) × ICC

Finite population correction: n_final = n / [1 + (n−1)/N] (when sampling fraction >5%)

Study Type	Formula	Key Inputs	Example Calculation	Software Tool
Cross-sectional Prevalence estimation	n = Z²pq/d²	p = expected prevalence; d = allowable error; confidence level	p=0.20, d=0.05, 95% CI: n = (1.96)²×0.20×0.80/(0.05)² = 246	OpenEpi; EpiInfo; G*Power
Case-Control Odds Ratio	Kelsey / Schlesselman formula; based on OR and p₁	Expected OR; exposure prevalence in controls; α; power (1−β)	OR=2.0, p₂=0.30, α=0.05, power=80%: n≈133 per group	OpenEpi; EpiInfo; PASS
Cohort / RCT Risk difference or RR	n = Z²(p₁q₁+p₂q₂)/(p₁−p₂)²	Incidence in exposed/unexposed; α; power; dropout rate	p₁=0.15, p₂=0.30, α=0.05, power=80%: n≈130/group (add 10–20% attrition)	OpenEpi; G*Power; Stata
Cluster Sampling Design effect adjustment	n_cluster = n_SRS × DEFF	n_SRS; DEFF (assume 1.5–2.0 if ICC unknown); cluster size m	n_SRS=246, DEFF=1.5: n_cluster = 246×1.5 = 369; if 30 clusters: 369/30 = 13/cluster	EPI cluster; WHO LQAS tables
Qualitative Research Saturation-based	No formula; theoretical saturation	Research question complexity; homogeneity of group; data richness	Typical: 15–30 in-depth interviews; 3–5 focus group discussions of 6–10 participants	Not applicable; literature guidance

Factors That Increase Sample Size — Teaching Matrix

Factor	Change	Effect on Sample Size	Reason	Teaching Analogy
Prevalence (p)	p → 50%	↑ Increases	Maximum variance at p=0.5 (pq is maximised)	If you don't know heads vs tails, you need more tosses
Confidence Level	95% → 99%	↑ Increases (Z: 1.96 → 2.576)	More certainty requires wider margin coverage	More certain = more evidence needed
Allowable Error (d)	5% → 2%	↑ Greatly Increases (×6.25)	d is squared in denominator; halving d quadruples n	Smaller target needs more shots to hit it reliably
Desired Power	80% → 90%	↑ Increases	Higher power reduces Type II error; needs more data	Better detection = larger radar screen
Dropout/Non-response	Add 10–20%	↑ Adds buffer	Some subjects will drop out; need reserves	Order extra food in case guests bring friends
Cluster Effect (DEFF)	DEFF = 2	↑ Doubles n	Clustering reduces effective information per subject	Asking one village = not same as asking 30 individuals from 30 villages
Population Size (N)	N ↑ greatly	↓ Little effect above N=10,000	Large populations: FPC correction negligible	A teaspoon from the ocean gives same info as from a pool
Effect Size	Small effect	↑ Greatly Increases	Harder to detect smaller differences	Need more tests to find a faint signal in noise

⚠️ Common Exam Trap: Increasing sample size does NOT reduce sampling bias — it only reduces sampling error. A biased large sample is NOT better than a small unbiased sample. Students frequently confuse precision (sample size dependent) with accuracy (method dependent).

Faculty Teaching Guide

Proven strategies to make sampling genuinely understood — not just memorised. Based on active learning and conceptual contrast teaching.

🎓

The Core Teaching Challenge

Students memorise sampling method names and definitions but cannot choose the right method when faced with a real research scenario. The solution: teach decision-making, not definitions. Every sampling method should be taught by asking "WHEN would you use this — and when would it fail?"

5-Step Teaching Sequence for Sampling

Anchor to the Problem

Start with: "We want to know the prevalence of hypertension in Odisha. How would you study 4.5 crore people?" Let students struggle — their guesses reveal gaps you need to fill.

Teach the Framework First

Introduce the Population → Frame → Sample → Statistic pathway before any method. Students who skip this confuse "who I want to study" with "who I actually studied."

Contrast Probability vs Non-Probability

Use one vivid example each: NFHS methodology (probability) vs OPD convenience sample (non-probability). Ask: "Which one can you use to claim Odisha's hypertension prevalence? Why?"

Teach Each Method with a FAIL case

For every method, show when it breaks. Systematic: the ANC register with every 7th patient being a high-risk referral. Cluster: studying one ward and claiming it represents Delhi.

Apply the Decision Matrix

Give 3–4 research scenarios; students work in pairs to choose and defend their method. No single right answer for some — the debate IS the learning.

Exam-Proof with Common Traps

End each class by showing the 3 most common exam mistakes: Quota≠Stratified; Large n ≠ no bias; Cluster needs DEFF correction. Repetition of these traps is essential.

Exam-Focused: High-Yield Contrast Pairs

Pair	Similarity (Why Students Confuse Them)	Key Difference	Exam Tip
Stratified vs Quota	Both divide population into subgroups before sampling	Stratified: RANDOM selection within strata → probability method. Quota: CONVENIENCE selection within quotas → non-probability	If random selection within groups → Stratified. If researcher fills quotas by convenience → Quota
Cluster vs Stratified	Both use groups/subpopulations as part of design	Cluster: Randomly SELECT clusters, study ALL units within. Stratified: Create strata, randomly select INDIVIDUALS within each	Cluster = select whole groups then study everything inside. Stratified = study a random sample from each group
Sampling Error vs Sampling Bias	Both affect accuracy of estimates from samples	Error: Random; decreases with n; unavoidable. Bias: Systematic; does NOT decrease with n; caused by poor method	A biased large sample is worse than a small unbiased one. Bias can only be fixed by changing the method, not by adding subjects
Multi-stage vs Cluster	Both involve selecting groups (clusters) at some point	Cluster: One-stage — select clusters, study all. Multi-stage: Two or more stages of selection (e.g., districts → villages → households → individuals)	NFHS uses multi-stage (PSUs → households → individuals). A simple village survey using whole villages is cluster
Systematic vs SRS	Both are probability methods; both give unbiased samples from lists	SRS: Truly random each time. Systematic: Periodic — vulnerable to periodicity bias if list has cyclical pattern	If OPD register has every Monday = highest severity, systematic sampling at interval=7 will always select Monday cases → biased
Snowball vs Purposive	Both are non-probability; both used in qualitative research	Snowball: Participants recruit others (chain referral); grows from initial seeds. Purposive: Researcher actively selects information-rich cases based on criteria	Snowball = participants drive recruitment. Purposive = researcher drives recruitment based on judgement

Real Indian Health Research Scenarios — Apply Your Method

Research Question	Recommended Method	Justification	Sampling Frame	Practical Challenge
Prevalence of anaemia among adolescent girls in Odisha	Multi-stage + Stratified	Large state; heterogeneous urban/rural; hierarchical population structure	School/PHC registers at each stage; voter rolls for households	Non-school-going girls missed; consent from parents
Immunisation coverage in a district — WHO survey	EPI 30×7 Cluster (PPS)	WHO's validated method; no individual frame available; clusters selected by population size	Village list with population for PPS; no individual-level frame needed	Random walk method for household selection within cluster
Risk behaviour among truck drivers on national highway	Snowball / RDS	Hidden population; no sampling frame exists; trust-based recruitment needed	None available	Chain-referral bias; need multiple seeds at different highway stops
Comparing treatment outcomes: DOTS vs self-administered therapy in TB patients	Systematic from RNTCP register	Sequential patient list available; need unbiased allocation to study arms	District RNTCP patient register	Ensure register is complete; check for periodicity in registration patterns
Qualitative study on barriers to institutional delivery among tribal women	Purposive Sampling	Qualitative; need information-rich cases; tribal women with experience of home delivery	No frame; ASHA worker referrals for identification	Language barriers; trust building; purposive selection criteria must be explicit
Monitoring vaccine coverage in 30 PHC areas after campaign	LQAS (Lot Quality Assurance)	Need accept/reject decision for each PHC area; small sample per area; operational monitoring	PHC area population list; community health workers' records	Threshold and sample size based on LQAS tables; decision rule must be pre-specified
Blood pressure survey in a medical college OPD (pilot study)	Consecutive Sampling	Pilot only; all eligible patients in OPD over 2 weeks; simple and complete within available setting	OPD attendance register as guide; not a frame	Healthcare-seeking bias; generalisation limited to OPD-attending population only

📌 Final Teaching Principle: Every sampling decision is a compromise between scientific rigour, feasibility, cost, and ethics. There is rarely one "correct" answer. Teach students to justify their choice — explaining what they gain, what they lose, and what limitations they must acknowledge in their discussion section. That skill is what separates a good researcher from one who merely follows a textbook formula.

Sampling &Its Types

Sampling &
Its Types