0% found this document useful (0 votes)
39 views40 pages

Matching Methods

Presen6ation by mr wasim ahmad on the topic matching methods
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views40 pages

Matching Methods

Presen6ation by mr wasim ahmad on the topic matching methods
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Introduction to Matching Methods

M Rahul
Institute of Economic Growth, Delhi

25 September 2025
Matching
Fundamental problem of causal inference

▶ Causal (treatment) effect for unit i = Yi (1) − Yi (0).


▶ However, we do not observe both potential outcomes for an individual unit.
Motivation: The Counterfactual Problem

▶ We often want the effect of a treatment/program on an outcome.


▶ We cannot observe both potential outcomes for the same unit (”fundamental
problem of causal inference”).
Neyman–Rubin Causal Model

▶ Potential outcomes: Yi (1) (treated), Yi (0) (control).


▶ Observed outcome: Yiobs = Ti Yi (1) + (1 − Ti )Yi (0).
▶ Average treatment effect: τ = E [Yi (1) − Yi (0)].
Is smoking pipes/cigars more dangerous than
cigarettes?

Table: Death rates (Cochran, 1968)

Smoking group Canada UK US


Non-smokers 20.2 11.3 13.5
Cigarettes 20.5 14.1 13.5
Cigars/pipes 35.5 20.7 17.4
Table: Mean age in years
Smoking Group Canada UK US
Non-smokers 54.9 49.1 57.0
Cigarettes 50.5 49.8 53.2
Cigars/pipes 65.9 55.7 59.7

▶ We were not comparing “apples to apples”!


▶ We need to compare cigarette smokers with others in similar age groups.
▶ In real setting there may be many background variables: sub-classification may
not be easy to do. → “curse of dimensionality”
Causal Graphs and Backdoor Paths

T Y

▶ Covariate X confounds effect of T on Y .


▶ Randomised experiment: the treated and control groups are guaranteed to be
similar in terms of all covariates, both observed and unobserved.
▶ Matching: tries to replicate this for observed covariates, in observational data.

“any method that aims to equate (or “balance”) the distribution of covariates
in the treated and control groups.” (Stuart, 2010)

▶ Should close the backdoor path.


Matching reduces model dependence
Matching reduces model dependence
Matching Basics

Four steps in matching (Stuart, 2010):


▶ Define a distance measure to be used, once the variables to be used in the
matching procedure are determined based on our causal assumptions.
▶ Identify matches based on the distance measure defined.
▶ See if the matched sample obtained is satisfactory. If not, repeat the above steps
till a satisfactorily matched sample is achieved.
▶ Estimate the treatment effect using the matched sample obtained from the above
steps.
Key Assumptions for Identification

1. Unconfoundedness: (Y1 , Y0 ) ⊥ T |X .
2. Overlap: 0 < Pr (T = 1|X ) < 1.
3. No interference between units.
Exact Matching

Exact Matching:
▶ Two observations are matched only if the value of each covariate is the same in
both the observations. A simple matching estimator:
1 P
δbATT = (Yi − Yj(i) )
NT Di =1
▶ Simple but rarely feasible with many covariates.
Idea of Propensity Score Matching

▶ Propensity score: conditional probability of treatment given covariates.


▶ ei (Xi ) = Pr (Ti = 1|Xi ).
▶ Rosenbaum & Rubin (1983): the propensity score is a balancing score.
▶ Instead of matching on many X , match on one score ei .
Estimating Propensity Scores

▶ Estimate using logit or probit regression.


▶ Choose pre-treatment covariates based on causal reasoning.
▶ Check overlap of scores between treated and control.
Matching on Propensity Scores

▶ Common algorithms: nearest neighbour, caliper.


▶ After matching, check covariate balance again.
▶ Discard poor matches if needed.
Pitfalls of PSM

▶ King & Nielsen (2019): PSM can increase imbalance.


▶ Must iteratively refine the model until balance improves.
▶ Not a substitute for good covariate selection.
Genetic Matching

▶ Uses an evolutionary algorithm to maximise covariate balance.


▶ Reduces researcher discretion and model dependence.
▶ Implemented in R’s Matching package (Sekhon 2011).
Empirical Example: NSW Training Program

▶ National Supported Work Program: randomised job training.


▶ LaLonde (1986) compared experimental and observational estimates (by using
controls from survey data) → econometric estimates did not always replicate
experimental estimates.
▶ Dehejia & Wahba (1999) used PSM on observational data. → found PSM
estimates to be close to experimental estimates.
Covariate Balance in Experimental data

Table: Means of variables by Treatment in the experimental data

treat black hispanic married nodegree age re74


0 0.8 0.1 0.2 0.8 25.1 2107.0
1 0.8 0.1 0.2 0.7 25.8 2095.6

Source: Author’s calculations based on NSW data


Experimental benchmark ATT: 1794.3 dollars
Covariate Balance in observational data before match-
ing

Table: Means of variables by Treatment before matching in CPS based data

treat black hispanic married nodegree age re74


0 0.1 0.1 0.7 0.3 33.2 14016.8
1 0.8 0.1 0.2 0.7 25.8 2095.6

Source: Author’s calculations based on CPS data

▶ Large imbalance in key covariates (black, married, nodegree).


▶ ATT estimate: -8497.52 dollars
Covariate Balance After PSM

Table: Means of variables by Treatment after propensity score matching.

treat black hispanic married nodegree age re74


0 0.8 0.1 0.2 0.7 24 2273
1 0.8 0.0 0.2 0.7 26 2096

Source: Author’s calculations based on CPS data

▶ Balance improved but still imperfect (e.g. age distribution).


▶ ATT estimate: 1440 dollars.
Covariate Balance After Genetic Matching

Table: Means of variables by Treatment after genetic matching.

treat black hispanic married nodegree age re74


0 0.8 0.1 0.2 0.7 26 2054
1 0.8 0.1 0.2 0.7 26 2096

Source: Author’s calculations based on CPS data

▶ Better covariate balance than PSM.


▶ ATT estimate: 1970 dollars.
Key Takeaways

▶ Clarify assumptions before matching.


▶ Check and report covariate balance.
▶ Genetic Matching can outperform PSM.
▶ Matching reduces model dependence but does not cure unobserved confounding.
Synthetic Controls
How to estimate the effects of aggregate interven-
tions?

▶ How to estimate the effects of aggregate interventions – affecting small number of


large units?
▶ Traditional regression analysis – not well suited for infrequent policy interventions
on a small number of units.
▶ Comparative case studies – to infer the effect of an intervention by comparing the
evolution of the outcomes between the treated and a similar group not affected by
treatment.
▶ Possible when the evolution of the outcome is driven by some common factors in
both the treated and comparison group.
▶ But, how to identify units for comparison?
Synthetic controls

The synthetic control method (Abadie and Gardeazabal, 2003) formalises the selection
of comparison units using a data driven procedure. Based on the idea that
▶ A combination of a unaffected units usually provides a more appropriate
comparison than a single unit alone.
Setup

Suppose,
▶ J + 1 units: j = 1, 2, ..., J + 1.
▶ Unit 1 is treated.
▶ Donor pool: j = 2, 3, ..., J + 1.
Suppose,
▶ Data spans T periods.
▶ First T0 periods are pre-intervention
▶ Outcome of interest: Yjt .
▶ Set of k predictors: X1j , X2j , ..., Xkj .
▶ For each unit j, we define – Potential response with intervention: Yjt1 and
Potential response without intervention: YjtN .
Again the fundamental problem

The effect of intervention for the affected unit (unit 1):

τ1t = Y1t1 − Y1tN


where, t > T0 .
▶ However, we observe only the potential outcome under treatment in the
post-intervention period.

Challenge: Knowing the evolution of the treated unit in the absence of the inter-
vention!
Synthetic controls

Synthetic control: A weighted average of the units in the donor pool.

A J × 1 vector of weights, W = (w2 , ..., wJ+1 ).


Synthetic control estimators of Y1tN and τ1t are, respectively,:

Ŷ1tN = ΣJ+1
j=2 wj Yjt

and

τ̂1t = Y1t − Ŷ1tN


But, how are the weights chosen?

Choose the weights such that the synthetic control best resembles the pre-intervention
values for the treated unit of predictors of the outcome variable. That is,
W ∗ = (w2 , . . . , wJ+1 )′ that minimises:
1
 
2 2
||X1 − X0 W || = Σkh=1 vh Xh1 − w2 Xh2 − ... − wJ+1 XhJ+1

subject to wj ≥ 0 and w2 + · · · + wJ+1 = 1,


where, X0 is a k×J matrix, X0 = [X 2 , . . . , X J+1 ],
vh is a weight that reflects the relative importance that we assign to the hth variable.
▶ The choice of W ∗ depends on the choice of V = (v1 , . . . , vk ).
▶ Then, how to choose V ?
Usually, V is chosen such that it minimizes the mean squared prediction error (MSPE)
in the pre-intervention period:
T0  J+1 2
wj∗ (V )Yjt
X X
Y1t −
t=1 j=2
The Economic Costs of Conflict: A Case Study of the
Basque Country (Abadie and Gardeazabal, 2003)
▶ One of the richest regions in Spain in 1970s (3rd highest per capita GDP).
▶ By the end of the conflict in 1990 – 6th position in per capita GDP.
▶ Analysis is not straightforward due to contamination by the economic downturn in
2nd half of the 1970’s and the 1st half of the 1980’s.
▶ This period also coincided with the peak of terrorist activity.
▶ Simple comparison with other regions is also difficult because of the pre-existing
differences in the characteristics driving economy.

Use a combination of other Spanish regions to construct a “synthetic” control


region which resembles relevant economic characteristics of the Basque Country
before the outset of Basque political terrorism in the late 1960’s.
Real and synthetic Basque

12
10
real per−capita GDP (1986 USD, thousand)

8
6
4
2

Basque country

Synthetic Basque country


0

1960 1970 1980 1990

year
Real per capita GDP gaps

Gaps: Treated − Synthetic

1.5
1.0
real per−capita GDP (1986 USD, thousand)

0.5
0.0
−0.5
−1.0
−1.5

1960 1970 1980 1990

year
Thank you!

You might also like