0% found this document useful (0 votes)

77 views17 pages

Causal Inference for Researchers

This document discusses causal inference and methods for determining causal relationships from observational data. It introduces concepts like treatment effects, potential outcomes, ignorability assumptions, and methods for controlling confounding variables like backdoor paths and sensitivity analysis.

Uploaded by

liuyang.1025

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views17 pages

Causal Inference for Researchers

Uploaded by

liuyang.1025

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Causal Inference

Introduction

1. Confusion Over Causality

● Spurious Correlation
Causally unrelated variables might happen to be highly correlated with each other over
some period of time.
● Anecdotes & Science Reporting
People have beliefs about causal effects in their own lives
Headlines often do not use the forms of the word cause, but do get interpreted causally
● Reverse Causality
Even if there is a causal relationship, sometimes the direction is unclear

2. Causal Inference
● Formal Definitions of causal effects
● Assumptions necessary to identity causal effects from data (Untestable)
● Rules about what variables need to be controlled for
● Sensitivity analyses to determine the impact of violations of assumptions on conclusions
“Observation studies are an interesting and challenging field which demands a good deal of
humility, since we can claim only to be groping towards the truth.” (Cochran, 1972)

3.1 Treatment and Outcomes

Suppose we are interested in the causal effect of some treatment A on some outcome Y.
● Treatment: binary or category
● Potential Outcomes:
Outcome we would see under each possible treatment option
Ya is the outcome that would be observed if treatment was set to A = a
● Counterfactuals:
Outcomes that would have been observed hd the treatment been different
E.g. If treatment was A = 1, then counterfactual outcome is Y0

Suppose Treatment A is binary (0, 1):

Before the treatment decision is made, any outcome is a potential outcome: Y0, Y1
After the study, there is an observed outcome Y = YA and counterfactual outcome Y1-A
3.2 Hypothetical Intervention
We will primarily focus on treatments that could be thought of as interventions → Well Defined in
“Potential Outcomes Framework”
● Intervention: can imagine being randomized / manipulated in a hypothetical trial
● Not Immutable Variables: immutable variables cannot be manipulated
● Has one version, i.e. no hidden versions of treatment
● Potentially actionable afterwards

3.3 Causal Effect

(Hypothetical World) (Real World)

● Treatment A had a causal effect on outcome Y if Y1 differs from Y0
● Average Causal Effect: E(Y1 - Y0) Average value of Y if every was treated with A = 1
minus average value of Y if every was treated with A = 0
● E(Y1-Y0) ≠ E(Y|A=1) - E(Y|A=0) due that E(Y|A=1) is subpopulation of people with A = 1
● Other Causal Effect
○ E(Y1/Y0): causal relative risk
○ E(Y1-Y0|A=1): causal effect of treatment on the treated
○ E(Y1-Y0|V=v): average causal effect in the subpopulation with covariate V=v
● Challenge - Fundamental Problem of Causal Inference: we only observe one treatment
and one outcome for each unit (so we consider population causal effect) → How do we
use observed data to link observed outcomes to potential outcomes?

3.4 Causal Assumptions

Identifiability of causal effects requires making some untestable assumptions about observed
data: Y, A and a set of pre-treatment covariates X.
1. Stable Unit Treatment Value Assumption (SUTVA): Yia|Ai=a
a. No interference: units do not interfere with each other
b. One version of treatment
→ Write potential outcome for the i-th person in terms of only that person’s treatments
2. Consistency Assumption: Y = Ya if A=a, for all a
→ The potential outcome under treatment A=a, Ya is equal to the observed outcome if
actual treatment received is A=a
3. Ignorability Assumption (No Unmeasured Confounders): Y0, Y1 丄 A | X
→ Within levels of X, treatment is randomly assigned
4. Positivity Assumption
For every set of values of X, treatment assignment was not deterministic, i.e.
P(A=a|X=x)>0 for all a and x
E(Y|A=a, X=x) (observed data) = E(Ya|A=a,X=x) (consistency) = E(Ya|X=x) (ignorability)

3.5 Causal Design

Cross-section User Design: Snapshot the time * user, not consider history and comparison is no
treatment, might be better if we have
● Incident User Design (new user design): restrict the treated population to those newly
initiating treatment → cleaner problem, regardless of history treatment experience
● Active Comparator: control also has treatment (similar type as T) → less confounder

3.6 Confounding
Confounders are often defined as variables that affects both treatment and outcome
→ For Ignorability: within levels of confounders, treatment and outcome are independent

Eg1. Assign the color of onboarding card (treatment) in Zephyr based on a coin flip, collect
onboarding member’s weekly macrosesion (outcome). → the coin flip is not a confounder since
it not affect the outcome
Eg2. If people with a family history of cancer (treatment) are more likely to develop cancer (the
outcome) → The family history
Eg3. If older people are at higher risk of cardiovascular disease (the outcome) and are more
likely to receive statins (the treatment) → The age is a confounder

Confounding Control:
1. Identifying a set of variables X that make the ignorablity assumption hold
2. Using statistical methods to control for these variables and estimate the causal effect
3.7 Causal Graphs
Causal Graph is a Direct Acyclic Graph (DAG) that helps for identifying coufouding variables to
achieve ignorability, by telling us
● which variables are independent from each other
● which variables are conditionally independent from each other
→ ways that we can factor and simplify the joint distribution

DAG: no undirected path + no cycle

Terminology:
● nodes / vertices, edge, path (way gets from one vertex to another traveling along edges)
● Parents and children (for adjacent), ancestors and descendant

DAG & Probabilities

1) 2)

1) 2)
Decomposition of joint distribution (factorization / DAG is compatible): start with root and
conditioning on parents along descendant line)

Paths & Associations & Blocking

1. Chains: A → X → Y
● A and Y are associated since informations from A makes it to Y
● Conditioning on B (a node in the middle of a chain) blocks the path from A to C.
Eg. A: temperature, X: whether or not sidewalks are icy, Y: whether or not someone falls
2. Forks: A ← X→ Y
● A and Y are associated since informations from X flows to both of them)
● Conditioning on X (a node in the middle of a fork) block the path from A to Y.
3. Inverted forks: A → X ← Y
● A and Y are independent since info from A and Y collide at X (collider)
● Conditioning on X induces an association between A and Y
Eg. A: state of on/off switch (coin flip), Y: same as A (other coin), X: whether the lightbulb
is lit up (lit up only if both A and B are in the on state)

Rules for d-separation

A path is d-separated by a set of nodes X if it contains a chain and the middle part is X / a fork
and the middle part is X / an inverted fork and the middle part is not in X.
Two nodes, A and Y are d-separated by a set of nodes X if it blocks every path from A to Y, i.e.
Y 丄 A | X. → Recall ignorability Assumption: Y 丄 A | X

3.8 How to control confounding variables (X)?

Pre: Frontdoor v.s. Backdoor path
● Frontdoor path from A to Y is one that begins with an arrow emanating of A

(not worry about it since they captures effects of treatment on outcome; not control for Z;
only in Causal mediation analysis involves understanding frontdoor paths)
● Backdoor path from A to Y are paths from A to Y that travel through arrow going into A

Backdoor paths confound the relationships between A and Y! → Need to be blocked!

Backdoor Path Criterion

A set of variables X is sufficient to control for confounding if 1. it block all backdoor paths from
treatment to outcome 2. it does not include any descendants of treatment
● Need to know causal DAG (With expertise & Assumptions)
● Many control choices → likely to be sufficient; Sensitivity analysis

Disjunctive Cause Criterion

Control for all (observed) causes of the exposure, the outcome, or both
(Property: If there is a set of observed variables that satisfy the backdoor path criterion then the
variables selected based on disjunctive cause criterion will be sufficient for control confounding)
● Not always select the smallest set of variables
● Is conceptually simpler (no need to know causal DAG)
● Will work if 1)such a set exists 2) correctly identify all observed causes of A and Y

3.9 Sensitivity Analysis

Overt Bias: there are imbalance on observed covariates
Hidden Bias: there are unobaserved variables that are confounders

Sensitivity Analysis: If there is hidden bias, determine how severe it would have to be to change
conclusions of statistical significant or not / direction of effect

Hypothesis: No hidden bias г=1

→ Increase г until evidence of treatment effect goes away (no longer statistically significant)
→ If г=1.1, very sensitive to hidden bias; if г=5, not very sensitive
(R package: sensitity2x2xk, sensitivityfull)
Observational Study

1. Randomized Trial Revisit

In a randomized trial, treatment assignment A would be determined randomly → erasing the
arrow from X to A → there are no backdoor path from treatment A to outcome Y.

v.s.
→ The distribution of pre-treatment variables X that affect Y will be the same in both treatment
groups (covariate balance) → if outcome distribution ends up differing, it will not be because of
X → X is dealt with at the design phase

Why not always randomize?

● Randomized experiments are expensive
● Sometimes randomizing is unethical / impractical
● Take time since need to wait for outcome data

2. Observational Study
Type I: Planned, prospective, observational studies with active data collection:
● Like trials: data collected on a common set of variables at planned times, outcomes
carefully measured, study protocols
● Unlike trails: regulations much weaker since not intervening, broader population eligible
for the study
Type II: Databases, retrospective, passive data collection
● Large sample sizes, inexpensive, potential for rapid analysis
● Data quality typically lower, no uniform standard of collection
3. Approaches
Analysis Process: define metric&population → select confounder&instrument → select model →
tune&calculate → validation

How to choose?
● One-time / Multiple treatment? Fixed Effect for multiple treatment & short-term
● Small sample size? Time Series, Matching
● Quick calculate? Regression, Stratification
● Not enough covariates? Doubly Robust, Time Series, Fixed Effeect, Propensity Score

Stratification
1. Methodology

- E(Y|A=a, X=x) = E(Ya|X=x): marginal causal effect in stratum

- P(X=x): probability / size of each stratum
- Overall effect direction may not equal to margin direction as Simpson’s Paradox
2. Challenges
- Ignorability Assumption might be violated as not enough X
- May lead to many empty cells as X dimensions / values increases

Matching
1. Definition
Matching is to match individuals in the treated group to individuals in the control group on the
covariates, attempting to make an observational study more like a randomized trial

Characteristics:
- Controlling for confounders is achieved at the design phase without looking into outcome
- Reveal lack of overlap in covariate distribution (positivity assumption need to be held)
- Deal with outcome like random trail once data are matched
- Why based on the treatment group? Usually treatment is a smaller group and we make
inference about the treated population. (Can have different population)
2. Methodology
Step 1: For every individual in the treatment group, find the matched individuals in the control
group based on covariates.
Step 1.1 Calculate the distance score based on distance measurements
1) Exactly match: distance infinity if not equal
2) Mahalanobis distance
*S is the covariance matrices of covariates, i.e. S = Cov(X)
*The square root of the sum of squared distances between each covariate scaled by the
covariance matrix (scale: covarites with higer covariance has lower weight of distance)
*Robust MD: use rank to replace X & S is still for original X → not affect by outliers
3) Propensity Score

Step 1.2 Select Matches based on distance scores

1) Greedy (Nearest neighbor) Matching
a) Randomly order list of treated subjects and control subjects
b) Start with first treated subject. Match to the control with smallest distance and
remove the matched control from the list
c) Repeat b) until all treated are matched
d) For k:1, go through the list again and find 2nd matches. Repeat until k matches
2) Optimal Matching
a) All M treatment * N control pairings
b) Select M best pairs that minimize the total distance
3) Sparse Optimal Matching
a) Match with blocks
b) Mismatches can be tolerated if fine balance can still be achieved

Caliper: A bad match can be defined using a caliper - maximum acceptable distance
- If no matches within caliper, the positivity assumption would be violated
- If excluding bad matches, positivity assumption holds but population is harder to define
- In PSM, threshold = 0.2*std(logit(PS))

Greedy v.s. Optimal

Greedy Matching Optimal Matching

Is global distance minimized? No Yes

Invariant to Initiate status? No Yes

Computation Fast Demanding / Can be Infeasible

Step 1.3 Assessing Balance

The purpose of matching is to achieve Stochastic Balance or less ideally, Fine Balance (each
covariates distribution are balanced)
Method 1: Test for difference in means between treatments and controls for each covariate
based on two sample t-test.
→ Drawback: p-value are dependent on sample size; however, we probably do not care much if
mean difference are small

Method 2: Create “Table 1” to compare pre-matching and post-matching balance based on

Standardized Mean Difference on each covariates.

Table 1

Step 2: After successfully matching and achieving adequate balance, proceed outcome analysis
with randomized test on matched/dependent sample groups
1) Paired-Samples T Test
H0: μd = 0 v.s. H1: μd ≠ 0 (μd: difference between mean of paired treatment and control)

X
2) Exact / Permutation Test
a) Compute test statistics T from observed data
b) Assume H0: no treatment effect
c) Randomly permute treatment assignment within pairs and recompute T
d) Repeat many times and see how unusual observed T is (by distribution p-value)
T=6 Permuted 1k times
3) McNemar’s Chi-squared Test (for binary outcome)

4) Conditional Logistic Regression (Matched binary outcome data)

5) Stratified Cox Model (Time-to-event / survival outcome data)
6) Generalized Estimating Equations

Weighting (IPTW)
1. Intuition
1) Use all of the data, but down-weight some and up-weight others
(matching v.s. weighting)

Matching: 1 : n → 1 : 1 (select 1 of n)
Weighting: 1 : n → (n+1) : (n+1) (weight 1, and weight n)

2. Methodology
2.1 Estimate Propensity Score

2.2 Create pseudo-population by inverse PS weights to achieve unconfounded groups

→ treatment assignment no longer depends on X → everyone is equally likely to be treated

under ignorability (π is correct) and positivity (π is not 0 or 1)

- Might need to trim the tail or weight truncation.

2.3 Assessing Balance

Covariate balance can be checked on the weighted sample using standardized differences
with Table 1 (or Plot) → Stratify on treatment & find Weighted mean and variances
→ If imbalance: refined propensity score, interactions? non-linear?

E.g.

2.3 Estimate the causal effect

1) Linear Regression model (linear marginal structural model)

E.g. For A=0/1

2) Generalized Marginal Structural Model

Propensity Score
1. Balancing Score

If we match on the balancing score π, we should achieve balance on X

Why? Ignorability: X same → P(A=1) same; if two groups same π, X distributions are balanced
so conditioning on balancing score = allocation probability

2. Propensity Score Definition

PS is the prob. of receiving treatment given covariates X (a balancing score)
Propensity score is a scalar - each subject will have exactly one value of the propensity score
- In a random trail, P(A=1|X) = P(A) = 0.5;
- In observational study, we need to estimate P(A=1|X) via logistic regression

Overlap - Positivity Assumption need to be held

If there is a lack of overlap, trimming the tails is an option.

E.g. remove controls with PS < min(treatment) & treatments with PS > max(control)
In practice, logit (log-odds) of PS is often used as it is unbounded, stretches the distribution

3. Trimming the data

Problems: If PS is close to 0 / 1, it might violated the positivity assumptions; Close to 0 → Cause
the weight in IPWT too larget → distort the result

Trimming the tails → Remove subjects who have extreme values of the PS (close to 0/1)
- Rule of Thumb: cut-off at 2% tail (above 98th and below 2nd)
Trimming the tails changes the population!

Marginal Structural Model (MSM)

1. Definition
A Model for the mean of population potential outcomes with treatments
, g() is a link function
Marginal: model that is not conditional on the confounders (population average)
Structural: model for potential outcomes, not observed outcomes

2. Linear MSM (continuous outcome) v.s. Logistic MSM (binary outcome)

3. MSM With Effect Modification

Suppose V is a subset of confounders that modifies the effect of A.
A linear MSM with effect modification

,
More generally, ,
h() is a function specifying parameters form of a and V (typically additive, linear).

4. Compare with Generalized Linear Model

MSM v.s. GLM not equivalent due to confounding (Y^a setting & potential not E(Y|A)
conditioning & observed)
(MSM) v.s. (GLM)

However, pseudo-population from IPTW is free from confounding ! ! !

→ Estimate MSM by solving observed data of IPTW population

→ Generalized linear model to solve beta

→ MSM to solve parameters

Doubly Robust (DR)

1. Definition
- Propensity Score Model + Outcome Regression Model

- IPTW: E(Y1) =

- Regression: E(Y1) = where

- Unbiased if either one is correctly specified

2. Justification
- If propensity score is correct, i.e. E(Ai) = πi(Xi)

- If outcome regression model is correct, i.e. Yi = m1(Xi)

Instrument Variable (IV)

1. Variables affecting A and Y

2. What is IV?
- Definition:
IV is a variable that affects treatment but not directly affects outcome.
It is an alternative causal inference method that does not rely on ignorability assumption.
- Example:
A: smoking during pregnancy; Y: birth weight; X: mother’s age, weight, etc.
Z: randomize to either receive encouragement to stop smoking (Z=1) or receive usual care (0)
- Types:
1) randomly assigned as part of study 2) believed to randomized in nature

3. IV method
3.1 Assumption
1) Associated with the treatment (as an encouragement) [check with data]
2) Exclusion Restriction: affect outcome only through A, i.e. not Z → Y or Z → U/X → Y

3.2 Measurement
With IV, we can measure the complier average causal effect with monotonicity assumption
*Compliance classes
*Monotonicity Assumption: there are no defiers → P(A) increase with more encouragement

So
● If perfect compliance, CACE = (intentioned treatment effect) ITT
● ITT is an underestimate of CACE since always taker / never taker exists

2) Two-stage Least Square Methods (2SLS)

Stage 1: →
Stage 2:
Then beta1 is the estimate of causal effect.
Notes:
- In an ordinary least squares (OLS) estimate, Yi = b0 + b1*Ai + ei, assume the error term
e and covariate A are independent → confounding will make correlated ← Z is
randomized of A and Y, so the two e is independent; A_hat is the projection of A onto
space spanned by Z
- In a binary scenario, Yi = b0 + Ai_est*b1 + ei = b0 + (a0_est + Zi*a1_est)*b1 + ei
So a1_est*b1 = E(Yi|Zi=1) - E(Yi|Zi=0) = (E(Ai|Zi=1) - E(Ai|Zi=0))*b1
So b1 is a consistent estimator of the CACE!
I.e. Z increase by 1 → A_est increase by a1_est → Y increase by a1_est*b1
- Consider covariates: regress A on Z and X, then regress Y on A_est and X
- Sensitivity Analysis:
- Exclusion Restriction: If Z does directly affect Y by an amount p, change?
- Monotonicity: If there was pi proportion of defiers, change?
- Strength of IVs is the proportion of compilers, i.e. E(A|Z=1) - E(A|Z=0)
For weak instrument, population inference (compiler) small & large variance
→ Can use methods for strengthening IV, e.g. near/far matching

Fixed Effect Model

1. Panel Data Model

λi - individual intercept, γt - time intercept, βk - explanatory slope, uit - error term
● Fixed Effect Model: effect λ/γ fixed with x
○ Individual fixed: only λ; time fixed: only γ; mixed
○ To calculate: 1) Dummy for each individual/time or 2) mean deviation
● Random Effect Model: effect λ/γ independent of x and follow some distribution
● FEM or REM? Hausman Test, REM assumption difficult to hold and infer population

2. Causal Practice
● Assumption:
○ Unobserved confounders are invariant during the time frame of analysis
○ Units can switch between treatment and control → be their own control
○ Markov Assumption: past treatments not affect current outcome → select data
● Method:
○ FEM with individual fixed
○ Can add confounding X with treatment for regression
○ How to choose T? Small - in-sig, large - time variant → rule of thumb: 4 week

Time Series Model

1. Method
● Step 1: Predict Y(t) with covariate X groups Y(c) in previous time series data
Step 2: Effect = actual - predicted time

● Bayesian structural TS models:

2. Notes
● Assumption
○ control groups are not affected by treatment
○ C & T relationship unchanged in pre/post period ← C predict other C
● Have cumulative effect by date
● Useful when there are limited covariates or few treatment units
● Define control? Start from whole / cohort, use covariates/PS to synthesize if needed
● Validation
○ Check A/A to make sure there's no effect. (fit and predict on pre-period data)
○ Check Model fit score (MAPE): if low, separate treatment / add season effect

AAAI-2023 教程用于因果推断的机器学习
No ratings yet
AAAI-2023 教程用于因果推断的机器学习
145 pages
Kenneth Rothman - Timothy L. Lash - Modern Epidemiology-LWW (2020) - 96-142
No ratings yet
Kenneth Rothman - Timothy L. Lash - Modern Epidemiology-LWW (2020) - 96-142
47 pages
21.1 Causality
No ratings yet
21.1 Causality
56 pages
Causal Inference in Statistics: An Overview
100% (1)
Causal Inference in Statistics: An Overview
51 pages
Causal Report
No ratings yet
Causal Report
52 pages
Lecture 21
No ratings yet
Lecture 21
8 pages
Perraillon MC, Causal Inference
No ratings yet
Perraillon MC, Causal Inference
22 pages
Understanding Causal Inference Techniques
No ratings yet
Understanding Causal Inference Techniques
19 pages
Causal Inference in Statistics: An Overview
100% (2)
Causal Inference in Statistics: An Overview
51 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
8 pages
01 Foundations
No ratings yet
01 Foundations
102 pages
1-Introduction To Applied Econometrics
No ratings yet
1-Introduction To Applied Econometrics
33 pages
Lecture - Discussion - Week 3
No ratings yet
Lecture - Discussion - Week 3
19 pages
Greenland & Robins 2009
No ratings yet
Greenland & Robins 2009
9 pages
Imperial Causality
No ratings yet
Imperial Causality
124 pages
Casual Tutorial Slides
No ratings yet
Casual Tutorial Slides
254 pages
Introduction To Causal Inference-Aug25 2020-Neal
No ratings yet
Introduction To Causal Inference-Aug25 2020-Neal
61 pages
A2 Causality
No ratings yet
A2 Causality
28 pages
2020 - Introduction - To - Causal - Inference - From ML Perspective
100% (1)
2020 - Introduction - To - Causal - Inference - From ML Perspective
133 pages
Testing Identifiability of Causal Effects - David Galles Judea Pearl
No ratings yet
Testing Identifiability of Causal Effects - David Galles Judea Pearl
11 pages
Lec 11
No ratings yet
Lec 11
46 pages
M Api
No ratings yet
M Api
17 pages
A Survey of Causal Inference Framework
No ratings yet
A Survey of Causal Inference Framework
19 pages
IS4242 W4 Causal Inference & Experiment
No ratings yet
IS4242 W4 Causal Inference & Experiment
87 pages
An Empirical Study of One of The Simplest Causal Prediction Algorithms
No ratings yet
An Empirical Study of One of The Simplest Causal Prediction Algorithms
10 pages
Causal AI: Concepts and Applications
No ratings yet
Causal AI: Concepts and Applications
71 pages
A Brief Introduction To Causal Inference in Machine Learning
No ratings yet
A Brief Introduction To Causal Inference in Machine Learning
88 pages
Causality
No ratings yet
Causality
22 pages
Causal Inference
No ratings yet
Causal Inference
2 pages
Causal Inference and Machine Learning
No ratings yet
Causal Inference and Machine Learning
296 pages
Lec 14
No ratings yet
Lec 14
40 pages
04 - Graphical Causal Models - Causal Inference For The Brave and True
No ratings yet
04 - Graphical Causal Models - Causal Inference For The Brave and True
13 pages
Causal Inference in The Social Sciences
No ratings yet
Causal Inference in The Social Sciences
30 pages
Annurev Statistics 033121 114601
No ratings yet
Annurev Statistics 033121 114601
30 pages
Causal Inference Intro
No ratings yet
Causal Inference Intro
16 pages
04 - Graphical Causal Models - Causal Inference For The Brave and True
No ratings yet
04 - Graphical Causal Models - Causal Inference For The Brave and True
13 pages
Spe2024 CIlect KF
No ratings yet
Spe2024 CIlect KF
35 pages
Causal Inference Book Part I-Ifqdve
No ratings yet
Causal Inference Book Part I-Ifqdve
158 pages
Bayesian Causal Tutorial Ohiostate June2019
No ratings yet
Bayesian Causal Tutorial Ohiostate June2019
56 pages
Causalidade Diagrama 2013
No ratings yet
Causalidade Diagrama 2013
71 pages
O Que É Casual Diagrama
No ratings yet
O Que É Casual Diagrama
70 pages
Causal Inference Techniques Overview
No ratings yet
Causal Inference Techniques Overview
28 pages
Causal Inference, Michael E. Sobel
No ratings yet
Causal Inference, Michael E. Sobel
3 pages
Econometrics Review #1
No ratings yet
Econometrics Review #1
35 pages
Causal Inference As A Machine Learning Exercise
No ratings yet
Causal Inference As A Machine Learning Exercise
90 pages
R Bloggers3
No ratings yet
R Bloggers3
13 pages
Peter Spirtes 2010
No ratings yet
Peter Spirtes 2010
20 pages
Intro Stat
No ratings yet
Intro Stat
17 pages
Causal Reasoning From Meta-Reinforcement Learning
No ratings yet
Causal Reasoning From Meta-Reinforcement Learning
13 pages
Disentangling Causal Effects From Sets of
No ratings yet
Disentangling Causal Effects From Sets of
11 pages
An Introduction To Causal Inference
No ratings yet
An Introduction To Causal Inference
67 pages
Nested Case Control Study
No ratings yet
Nested Case Control Study
6 pages
Causal Notes
No ratings yet
Causal Notes
110 pages
Lee 2013
No ratings yet
Lee 2013
10 pages
Personalized Food Recommendation System by Using Machine Learning Models
No ratings yet
Personalized Food Recommendation System by Using Machine Learning Models
5 pages
Determinants of Loan Repayment The Case
No ratings yet
Determinants of Loan Repayment The Case
16 pages
Physiotherapy: J o U R N A L o F
No ratings yet
Physiotherapy: J o U R N A L o F
10 pages
Ocs351 Question Bank Artificial Intelligence and Machine Learning Fundamentals
No ratings yet
Ocs351 Question Bank Artificial Intelligence and Machine Learning Fundamentals
18 pages
Customer Churn Prediction On E-Commerce Using Machine Learning
No ratings yet
Customer Churn Prediction On E-Commerce Using Machine Learning
8 pages
Machine Learning Practical Sem 5
No ratings yet
Machine Learning Practical Sem 5
3 pages
Control Confounding with Matching & Regression
No ratings yet
Control Confounding with Matching & Regression
48 pages
CampusX DSMP 2.0 Syllabus
No ratings yet
CampusX DSMP 2.0 Syllabus
66 pages
Nigerian Students' Mobile Learning
No ratings yet
Nigerian Students' Mobile Learning
7 pages
Mixed Models Theory and Applications With R 2nd Edition Complete Ebook Edition
100% (16)
Mixed Models Theory and Applications With R 2nd Edition Complete Ebook Edition
17 pages
Broadly, There Are 3 Types of Machine Learning Algorithms.
No ratings yet
Broadly, There Are 3 Types of Machine Learning Algorithms.
33 pages
Unit V Fds Notes
No ratings yet
Unit V Fds Notes
35 pages
Impact of Poor Oral Health On Children's School Attendance and Performance
No ratings yet
Impact of Poor Oral Health On Children's School Attendance and Performance
8 pages
GEE for Longitudinal Binary Data Analysis
No ratings yet
GEE for Longitudinal Binary Data Analysis
13 pages
WIA1006 Report (OrionX)
No ratings yet
WIA1006 Report (OrionX)
42 pages
IML-IITKGP - Assignment 5 Solution
No ratings yet
IML-IITKGP - Assignment 5 Solution
7 pages
Employee Attrition Analysis
100% (1)
Employee Attrition Analysis
16 pages
Fernandez-Gonzalez Et Al Connection NAO Weather Types 2012
No ratings yet
Fernandez-Gonzalez Et Al Connection NAO Weather Types 2012
16 pages
Do Analyst Matter For Green Investment - Evidence From The EU Taxonomy
No ratings yet
Do Analyst Matter For Green Investment - Evidence From The EU Taxonomy
17 pages
IPL - PREDICTION Final
No ratings yet
IPL - PREDICTION Final
6 pages
Logistic & Naïve Bayes Analysis
No ratings yet
Logistic & Naïve Bayes Analysis
17 pages
Association of Metformin Use During.95003
No ratings yet
Association of Metformin Use During.95003
10 pages
Corrected Predictive Analytics Reviewer
No ratings yet
Corrected Predictive Analytics Reviewer
10 pages
Solutions PDF
No ratings yet
Solutions PDF
122 pages
ISYE 6414 Midterm 2 Solutions
No ratings yet
ISYE 6414 Midterm 2 Solutions
3 pages
Logit R101
No ratings yet
Logit R101
27 pages
Architecture For Fraud Detection in Financial Institutions: Abstract
No ratings yet
Architecture For Fraud Detection in Financial Institutions: Abstract
16 pages
CSE1015 - Machine Learning Essentials: J Component Report
No ratings yet
CSE1015 - Machine Learning Essentials: J Component Report
18 pages
Data Science Resume: Dhruv Kharwar
No ratings yet
Data Science Resume: Dhruv Kharwar
1 page

Causal Inference for Researchers

Uploaded by

Causal Inference for Researchers

Uploaded by

Causal Inference

1. Confusion Over Causality

3.1 Treatment and Outcomes

Suppose Treatment A is binary (0, 1):

3.3 Causal Effect

(Hypothetical World) (Real World)

3.4 Causal Assumptions

3.5 Causal Design

DAG: no undirected path + no cycle

DAG & Probabilities

Paths & Associations & Blocking

Rules for d-separation

3.8 How to control confounding variables (X)?

Backdoor paths confound the relationships between A and Y! → Need to be blocked!

Backdoor Path Criterion

Disjunctive Cause Criterion

3.9 Sensitivity Analysis

Hypothesis: No hidden bias г=1

1. Randomized Trial Revisit

Why not always randomize?

- E(Y|A=a, X=x) = E(Y​a​|X=x): marginal causal effect in stratum

Step 1.2 Select Matches based on distance scores

Greedy v.s. Optimal

Is global distance minimized? No Yes

Invariant to Initiate status? No Yes

Computation Fast Demanding / Can be Infeasible

Step 1.3 Assessing Balance

Method 2: Create “Table 1” to compare pre-matching and post-matching balance based on

4) Conditional Logistic Regression (Matched binary outcome data)

2.2 Create pseudo-population by inverse PS weights to achieve unconfounded groups

under ignorability (π is correct) and positivity (π is not 0 or 1)

2.3 Assessing Balance

2.3 Estimate the causal effect

E.g. For A=0/1

If we match on the balancing score π, we should achieve balance on X

2. Propensity Score Definition

Overlap - Positivity Assumption need to be held

If there is a lack of overlap, ​trimming​ the tails is an option.

3. Trimming the data

Marginal Structural Model (MSM)

2. Linear MSM (continuous outcome) v.s. Logistic MSM (binary outcome)

3. MSM With Effect Modification

4. Compare with Generalized Linear Model

However, pseudo-population from IPTW is free from confounding ! ! !

→ Generalized linear model to solve beta

→ MSM to solve parameters

Doubly Robust (DR)

- Regression: E(Y​1​) = where

- If outcome regression model is correct, i.e. Y​i​ = m​1​(X​i​)

Instrument Variable (IV)

2) Two-stage Least Square Methods (2SLS)

Fixed Effect Model

Time Series Model

● Bayesian structural TS models:

You might also like

- E(Y|A=a, X=x) = E(Ya|X=x): marginal causal effect in stratum

If there is a lack of overlap, trimming the tails is an option.

- Regression: E(Y1) = where

- If outcome regression model is correct, i.e. Yi = m1(Xi)