Bias in Epidemiologic Studies
🔍 What Is Bias?
● Bias = Systematic error in estimating the association between exposure and outcome
● Results in an incorrect risk ratio, odds ratio, or rate ratio
● Introduced at either the design or analysis stage
🔁 Bias vs. Chance
Feature Random Error (Chance) Bias (Systematic Error)
Cause Natural variation Design flaw or measurement issue
Effect Random distortion Predictable distortion of association
🧠 Types of Systematic Error
1. Selection Bias
2. Information Bias
3. Confounding (covered separately)
📌 1. Selection Bias: Error due to procedures used to select participants or
factors that influence participation or follow-up
🔁 Common Sources:
● Loss to follow-up: Differential dropout by exposure or outcome status
● Volunteer/non-response bias: Participants differ from the target population
● Hospital patient bias (Berkson’s bias): Hospital-based controls may have other
conditions related to the exposure
● Healthy Worker Effect: Workers are generally healthier than the general population
🧪 Case-Control Example
● Exposure: Coffee
● Cases: Pancreatic cancer patients
● Controls: Hospital patients with GI issues (who avoid coffee)
● ➤ Problem: Controls have abnormally low exposure → false positive association
👥 Cohort Example
● Exposure: Smoking
● Outcome: Dementia
● Problem: Older participants = survivors; bias may suggest smoking is protective
because vulnerable smokers died earlier
📝 2. Information Bias:
exposure, outcome, or covariates
Distortion due to inaccurate measurement of
🧩 Sources of Information Bias:
Type Description
Misclassification Errors in classifying exposure or disease
Interviewer bias Different probing/measurement between groups
Recall/reporting bias Unequal accuracy of self-reported info
📊 Misclassification
🔹 Non-differential Misclassification
● Misclassification occurs equally across groups
● Usually biases toward the null
● Example: Overreporting physical activity in both groups
🔹 Differential Misclassification
● Misclassification differs between cases/controls or exposed/unexposed
● Can bias toward or away from the null
● Example: Exposed individuals more likely to report symptoms
🎙️ Interviewer Bias
Occurs when:
● Interviewers probe more in one group
● Reviewers interpret data differently by group
✅ Prevention Strategies
1. Blind data collectors to exposure/outcome status
2. Use standardized protocols
3. Train interviewers thoroughly
4. Use multiple data sources when possible
🧠 Recall/Reporting Bias
● Occurs when recall differs between groups
● Cases may over-report exposures due to concern
● Exposed may report symptoms more accurately due to fear
✅ Prevention Strategies
● Use well-designed questionnaires
● Allow anonymous or self-administered responses
Confirm with records or biomarkers, if possible
✅ Best Practices to Reduce Bias
● Plan to minimize errors at the design stage
● Pilot test instruments
● Use blinding, objective measurements, and clear definitions
● Ask:
○ Could bias have occurred?
○ Would it cause distortion toward or away from the null?
○ Is the bias likely meaningful?
🧾 Summary Table
Type of Bias Definition Example
Selection Bias Error from how participants are Dropout differs by
selected/followed exposure/outcome
Information Bias Error from inaccurate measurement Misclassification, recall error
Interviewer Bias Different methods for different groups Probing exposed group more
Recall Bias Differential recall between groups Cases over-report past
exposures
Confounding & Random Error
🧠 Confounding
🔑 Definition
A confounder is a third variable that distorts the true association between an exposure and an
outcome.
Not due to study design. Can occur in any type of epidemiologic study.
⚠️ Confounding Effects
● Can overestimate or underestimate true associations
● Can make a harmful exposure appear protective (and vice versa)
● Example:
○ True RR = 0.5 → Confounded RR = 0.8 (bias toward null)
○ True RR = 2.5 → Confounded RR = 3.1 (bias away from null)
✅ A Variable Is a Confounder If:
1. It is an independent predictor of the outcome
2. It is associated with the exposure
3. It is not on the causal pathway between exposure and disease
🧪 Example: Alcohol & Lung Cancer
● Alcohol initially appears associated with lung cancer
● BUT: Alcohol drinkers are more likely to be smokers
● When separated by smoking status → no alcohol effect
● ➤ Smoking is the confounder
🔄 Confounder vs. Intermediate
● Confounder: A third variable that distorts the relationship
● Intermediate: Lies on the causal pathway
○ Example: Running → ↑ Lung Capacity → Faster Speed
○ Lung capacity is not a confounder, but an intermediate
📚 How to Identify Potential Confounders
● Know your subject area
● Do a literature review
● Use lists of historical confounders (e.g., age, sex, race/ethnicity)
🛠️ Controlling Confounding
📐 Design Phase
1. Randomization
○ Equal chance of exposure assignment
○ Balances known and unknown confounders
○ Used in Randomized controlled trials (RCTs)
2. Restriction
○ Limit study to one level of the confounder (e.g., only men, only smokers)
○ Reduces generalizability
3. Matching
○ Select comparison groups with similar levels of the confounder
○ Used in cohort and case-control studies
🧮 Analysis Phase
1. Standardization: Adjust rates to a standard population
2. Stratified Analysis: Analyze within strata of the confounder
3. Multivariate Analysis: Adjust for multiple confounders simultaneously
📊 Assessing Confounding
● Use this formula:
● If <10%, likely little confounding
● If >10%, confounding may be present
⚠️ Residual Confounding
● Confounding left over after adjustment
● Possible reasons:
○ Confounder not measured
○ Measurement error
○ Broad categories (e.g., wide age groups)
● ➤ Acknowledge in the discussion section of your paper
🎲 Random Error:
association
Errors due to chance that result in inaccurate estimates of
🧬 Sources
● Measurement error
● Sampling variability
● Occurs in all studies
🔄 Effects of Random Error
● May suggest an association that doesn’t exist
● May mask a real association
● Described using probability-based statistics
🧪 Sampling Variability
● We use samples to estimate the truth about a population
● Variability can occur because:
○ Samples differ from the population
○ Small samples = more error
● Example: Anecdotal evidence (e.g., older men gamble more) may be misleading
📉 Evaluating Random Error
🧮 Two Statistical Tools:
1. Confidence Intervals (CIs)
○ Range of values within which the true value likely falls
2. Hypothesis Testing / p-values
○ p < 0.05 often considered “statistically significant”
💊 Example: Clinical Significance ≠ Statistical Significance
● Finasteride study showed a 0.3 point improvement in symptoms
● Statistically significant
● But not clinically meaningful (need ≥ 3 point change to feel a difference)
✅ Summary: Error Types in Epidemiology
Type Description Source
Confounding Third variable distorts exposure-outcome Systematic
Selection Bias Error in participant selection Systematic
Information Bias Error in measurement of exposure/outcome Systematic
Random Error Chance-related distortion Random
Causal Inference in Epidemiology
🔎 What Is a Cause?
📖 Definitions
● Merriam-Webster: Something that brings about a result
● Kenneth Rothman: An event or characteristic without which disease would not
occur
● Mervyn Susser: Something that makes a difference
⚙️ Characteristics of Causes
● Can be host or environmental factors
● Can be positive (e.g., smoking) or negative (e.g., lack of exercise)
● Causative exposures: air pollution, toxins
● Preventive exposures: exercise, healthy diet, vitamins
🔁 Three Essential Attributes of a Cause
1. Association: Exposure and outcome must co-occur
2. Time Order: Cause must occur before the effect
3. Direction: One-way relationship (A → B, not B → A)
○ E.g., smoking causes low birth weight, but low birth weight doesn’t cause
smoking
⚠️ Risk Factors ≠ Causes
Characteristic High-Risk Group Low-Risk Group
Place of Birth North America/Europe Asia, Africa
Socioeconomic Status High Low
Marital Status Never Married Ever Married
Not all risk factors are direct causes.
🧬 Historical Models of Causation
Era Theory
Ancient Divine punishment or body humors
Pre-modern Miasma theory (bad air)
Modern Germ theory (microorganisms)
Contemporary Web of causation & SCM
🕸️ Web of Causation
● Many interconnected factors contribute to disease
● Used especially for chronic diseases
● Example (MI): Includes stress, diet, smoking, inactivity, genetics
🥧 Sufficient Cause Model (SCM)
🔑 Key Concepts
● Sufficient cause = a full “pie” of component causes that inevitably leads to disease
● Component causes = individual “pie pieces”
● Necessary cause = appears in every sufficient cause (e.g., HIV virus for AIDS)
✅ Application
● Prevent disease by removing just one pie piece
● SCM explains how different combinations can cause the same disease
🔍 Association vs. Causation
Just because two things are associated doesn’t mean one causes the other.
Use Hill’s guidelines as a framework to evaluate whether an observed association is likely to
be causal.
📋 Hill’s Guidelines for Assessing Causality
Guideline Description
Strength Stronger associations are more likely causal
Consistency Repeated in different studies/populations
Specificity One cause → one effect (rare in chronic diseases)
Temporality Cause must precede effect (required)
Biological Gradient Dose-response relationship
Plausibility Fits with known biology
Coherence Aligns with existing knowledge
Experiment Evidence from intervention studies
Analogy Similar exposures cause similar outcomes
⚠️ These are guidelines, not strict rules. They support, but do not prove,
causation.
🧠 Quote from Sir Austin Bradford Hill (1965)
“All scientific work is incomplete... That does not confer upon us a freedom to
ignore the knowledge we already have, or to postpone the action that it appears to
demand.”
📌 Summary
● Causality is complex and often cannot be proven definitively
● Use SCM and Hill’s Guidelines to evaluate evidence
● Public health decisions often rely on best available evidence—not perfect certainty
Reading & Critiquing Epidemiologic Studies
🧭 Overview: Critical Review Framework
A structured critique of epidemiologic studies includes three main phases:
1. A. Collection of Data
2. B. Analysis of Data
3. C. Interpretation of Data
🧪 A. Collection of Data
A.1: Study Context
● What prompted the study?
● Was it informed by previous literature or research gaps?
A.2: Study Objectives
● What hypotheses are being tested?
A.3: Primary Exposure
● What is the main exposure being evaluated?
● Was it accurately measured?
○ Distinguish conceptual (theoretical) vs. operational (measured) definitions
○ Be aware of misclassification bias
A.4: Primary Outcome
● What is the primary health outcome of interest?
● Was it accurately measured?
○ Again, consider conceptual vs. operational definitions
○ Consider possible misclassification
A.5: Study Design
● Identify the type of study:
○ Experimental
○ Cohort (prospective or retrospective)
○ Case-control
○ Cross-sectional
○ Ecological
A.6: Study Base & Participants
● What population was studied over what period?
● How were participants selected?
● What was the sample size and the exposed:unexposed ratio?
○ Affects statistical power, comparability, and generalizability
A.7: Selection Bias
● Did refusal, non-response, or loss to follow-up differ by exposure and disease?
● How likely is bias from selection issues?
A.8: Information Bias
● Was there potential for:
○ Recall bias
○ Interviewer bias
○ Misclassification?
A.9: Confounding (Pre-analysis)
● How was confounding minimized before analysis?
○ Methods:
■ Randomization
■ Restriction
■ Matching
■ Careful data collection on known confounders
■ Use of comparable groups from the same source population
📊 B. Analysis of Data
B.1: Control of Confounding (During Analysis)
● Were appropriate methods used?
○ Stratified analysis
○ Multivariate analysis
○ Standardization
B.2: Measures of Association
● What was reported?
○ Risk/rate ratio
○ Risk/rate difference
○ Odds ratio
B.3: Measures of Statistical Stability
● Were p-values or confidence intervals reported?
○ P-values → hypothesis testing
○ Confidence intervals → precision & estimation
🧠 C. Interpretation of Data
C.1: Major Results
● What were the key numbers and findings?
C.2: Bias & Confounding
● Were results affected by:
○ Selection bias
○ Information bias
○ Confounding?
● Consider magnitude and direction of potential bias
C.3: Misclassification
● Did non-differential misclassification affect results?
○ How large was the potential impact?
C.4: Study Limitations
● Did the authors acknowledge major limitations?
● Did they discuss how those limitations might influence the findings?
C.5: Conclusions
● Were they justified by the data?
● Were the implications consistent with the study results?
C.6: Generalizability
● Can results be applied to a broader population?
○ To whom do these findings apply—everyone or only a specific group?
📌 Summary: What Makes a Good Epidemiologic Study?
✔ Clearly defined and relevant research question
✔ Appropriate study design
✔ Valid and reliable measurement of exposure and outcome
✔ Proper control of bias and confounding
✔ Correct and transparent analysis
✔ Reasonable interpretation and conclusion based on evidence