Introduction to Matching Methods
M Rahul
Institute of Economic Growth, Delhi
25 September 2025
Matching
Fundamental problem of causal inference
▶ Causal (treatment) effect for unit i = Yi (1) − Yi (0).
▶ However, we do not observe both potential outcomes for an individual unit.
Motivation: The Counterfactual Problem
▶ We often want the effect of a treatment/program on an outcome.
▶ We cannot observe both potential outcomes for the same unit (”fundamental
problem of causal inference”).
Neyman–Rubin Causal Model
▶ Potential outcomes: Yi (1) (treated), Yi (0) (control).
▶ Observed outcome: Yiobs = Ti Yi (1) + (1 − Ti )Yi (0).
▶ Average treatment effect: τ = E [Yi (1) − Yi (0)].
Is smoking pipes/cigars more dangerous than
cigarettes?
Table: Death rates (Cochran, 1968)
Smoking group Canada UK US
Non-smokers 20.2 11.3 13.5
Cigarettes 20.5 14.1 13.5
Cigars/pipes 35.5 20.7 17.4
Table: Mean age in years
Smoking Group Canada UK US
Non-smokers 54.9 49.1 57.0
Cigarettes 50.5 49.8 53.2
Cigars/pipes 65.9 55.7 59.7
▶ We were not comparing “apples to apples”!
▶ We need to compare cigarette smokers with others in similar age groups.
▶ In real setting there may be many background variables: sub-classification may
not be easy to do. → “curse of dimensionality”
Causal Graphs and Backdoor Paths
T Y
▶ Covariate X confounds effect of T on Y .
▶ Randomised experiment: the treated and control groups are guaranteed to be
similar in terms of all covariates, both observed and unobserved.
▶ Matching: tries to replicate this for observed covariates, in observational data.
“any method that aims to equate (or “balance”) the distribution of covariates
in the treated and control groups.” (Stuart, 2010)
▶ Should close the backdoor path.
Matching reduces model dependence
Matching reduces model dependence
Matching Basics
Four steps in matching (Stuart, 2010):
▶ Define a distance measure to be used, once the variables to be used in the
matching procedure are determined based on our causal assumptions.
▶ Identify matches based on the distance measure defined.
▶ See if the matched sample obtained is satisfactory. If not, repeat the above steps
till a satisfactorily matched sample is achieved.
▶ Estimate the treatment effect using the matched sample obtained from the above
steps.
Key Assumptions for Identification
1. Unconfoundedness: (Y1 , Y0 ) ⊥ T |X .
2. Overlap: 0 < Pr (T = 1|X ) < 1.
3. No interference between units.
Exact Matching
Exact Matching:
▶ Two observations are matched only if the value of each covariate is the same in
both the observations. A simple matching estimator:
1 P
δbATT = (Yi − Yj(i) )
NT Di =1
▶ Simple but rarely feasible with many covariates.
Idea of Propensity Score Matching
▶ Propensity score: conditional probability of treatment given covariates.
▶ ei (Xi ) = Pr (Ti = 1|Xi ).
▶ Rosenbaum & Rubin (1983): the propensity score is a balancing score.
▶ Instead of matching on many X , match on one score ei .
Estimating Propensity Scores
▶ Estimate using logit or probit regression.
▶ Choose pre-treatment covariates based on causal reasoning.
▶ Check overlap of scores between treated and control.
Matching on Propensity Scores
▶ Common algorithms: nearest neighbour, caliper.
▶ After matching, check covariate balance again.
▶ Discard poor matches if needed.
Pitfalls of PSM
▶ King & Nielsen (2019): PSM can increase imbalance.
▶ Must iteratively refine the model until balance improves.
▶ Not a substitute for good covariate selection.
Genetic Matching
▶ Uses an evolutionary algorithm to maximise covariate balance.
▶ Reduces researcher discretion and model dependence.
▶ Implemented in R’s Matching package (Sekhon 2011).
Empirical Example: NSW Training Program
▶ National Supported Work Program: randomised job training.
▶ LaLonde (1986) compared experimental and observational estimates (by using
controls from survey data) → econometric estimates did not always replicate
experimental estimates.
▶ Dehejia & Wahba (1999) used PSM on observational data. → found PSM
estimates to be close to experimental estimates.
Covariate Balance in Experimental data
Table: Means of variables by Treatment in the experimental data
treat black hispanic married nodegree age re74
0 0.8 0.1 0.2 0.8 25.1 2107.0
1 0.8 0.1 0.2 0.7 25.8 2095.6
Source: Author’s calculations based on NSW data
Experimental benchmark ATT: 1794.3 dollars
Covariate Balance in observational data before match-
ing
Table: Means of variables by Treatment before matching in CPS based data
treat black hispanic married nodegree age re74
0 0.1 0.1 0.7 0.3 33.2 14016.8
1 0.8 0.1 0.2 0.7 25.8 2095.6
Source: Author’s calculations based on CPS data
▶ Large imbalance in key covariates (black, married, nodegree).
▶ ATT estimate: -8497.52 dollars
Covariate Balance After PSM
Table: Means of variables by Treatment after propensity score matching.
treat black hispanic married nodegree age re74
0 0.8 0.1 0.2 0.7 24 2273
1 0.8 0.0 0.2 0.7 26 2096
Source: Author’s calculations based on CPS data
▶ Balance improved but still imperfect (e.g. age distribution).
▶ ATT estimate: 1440 dollars.
Covariate Balance After Genetic Matching
Table: Means of variables by Treatment after genetic matching.
treat black hispanic married nodegree age re74
0 0.8 0.1 0.2 0.7 26 2054
1 0.8 0.1 0.2 0.7 26 2096
Source: Author’s calculations based on CPS data
▶ Better covariate balance than PSM.
▶ ATT estimate: 1970 dollars.
Key Takeaways
▶ Clarify assumptions before matching.
▶ Check and report covariate balance.
▶ Genetic Matching can outperform PSM.
▶ Matching reduces model dependence but does not cure unobserved confounding.
Synthetic Controls
How to estimate the effects of aggregate interven-
tions?
▶ How to estimate the effects of aggregate interventions – affecting small number of
large units?
▶ Traditional regression analysis – not well suited for infrequent policy interventions
on a small number of units.
▶ Comparative case studies – to infer the effect of an intervention by comparing the
evolution of the outcomes between the treated and a similar group not affected by
treatment.
▶ Possible when the evolution of the outcome is driven by some common factors in
both the treated and comparison group.
▶ But, how to identify units for comparison?
Synthetic controls
The synthetic control method (Abadie and Gardeazabal, 2003) formalises the selection
of comparison units using a data driven procedure. Based on the idea that
▶ A combination of a unaffected units usually provides a more appropriate
comparison than a single unit alone.
Setup
Suppose,
▶ J + 1 units: j = 1, 2, ..., J + 1.
▶ Unit 1 is treated.
▶ Donor pool: j = 2, 3, ..., J + 1.
Suppose,
▶ Data spans T periods.
▶ First T0 periods are pre-intervention
▶ Outcome of interest: Yjt .
▶ Set of k predictors: X1j , X2j , ..., Xkj .
▶ For each unit j, we define – Potential response with intervention: Yjt1 and
Potential response without intervention: YjtN .
Again the fundamental problem
The effect of intervention for the affected unit (unit 1):
τ1t = Y1t1 − Y1tN
where, t > T0 .
▶ However, we observe only the potential outcome under treatment in the
post-intervention period.
Challenge: Knowing the evolution of the treated unit in the absence of the inter-
vention!
Synthetic controls
Synthetic control: A weighted average of the units in the donor pool.
A J × 1 vector of weights, W = (w2 , ..., wJ+1 ).
Synthetic control estimators of Y1tN and τ1t are, respectively,:
Ŷ1tN = ΣJ+1
j=2 wj Yjt
and
τ̂1t = Y1t − Ŷ1tN
But, how are the weights chosen?
Choose the weights such that the synthetic control best resembles the pre-intervention
values for the treated unit of predictors of the outcome variable. That is,
W ∗ = (w2 , . . . , wJ+1 )′ that minimises:
1
2 2
||X1 − X0 W || = Σkh=1 vh Xh1 − w2 Xh2 − ... − wJ+1 XhJ+1
subject to wj ≥ 0 and w2 + · · · + wJ+1 = 1,
where, X0 is a k×J matrix, X0 = [X 2 , . . . , X J+1 ],
vh is a weight that reflects the relative importance that we assign to the hth variable.
▶ The choice of W ∗ depends on the choice of V = (v1 , . . . , vk ).
▶ Then, how to choose V ?
Usually, V is chosen such that it minimizes the mean squared prediction error (MSPE)
in the pre-intervention period:
T0 J+1 2
wj∗ (V )Yjt
X X
Y1t −
t=1 j=2
The Economic Costs of Conflict: A Case Study of the
Basque Country (Abadie and Gardeazabal, 2003)
▶ One of the richest regions in Spain in 1970s (3rd highest per capita GDP).
▶ By the end of the conflict in 1990 – 6th position in per capita GDP.
▶ Analysis is not straightforward due to contamination by the economic downturn in
2nd half of the 1970’s and the 1st half of the 1980’s.
▶ This period also coincided with the peak of terrorist activity.
▶ Simple comparison with other regions is also difficult because of the pre-existing
differences in the characteristics driving economy.
Use a combination of other Spanish regions to construct a “synthetic” control
region which resembles relevant economic characteristics of the Basque Country
before the outset of Basque political terrorism in the late 1960’s.
Real and synthetic Basque
12
10
real per−capita GDP (1986 USD, thousand)
8
6
4
2
Basque country
Synthetic Basque country
0
1960 1970 1980 1990
year
Real per capita GDP gaps
Gaps: Treated − Synthetic
1.5
1.0
real per−capita GDP (1986 USD, thousand)
0.5
0.0
−0.5
−1.0
−1.5
1960 1970 1980 1990
year
Thank you!