0% found this document useful (0 votes)

37 views38 pages

Matching Estimator

This document discusses matching estimators, which compare outcomes of program participants to matched nonparticipants. The key advantages of matching estimators are that they do not require specifying an outcome equation and are not susceptible to misspecification bias. The assumptions are unconfoundedness (selection on observables) and common support. Under these assumptions, the average treatment effect on the treated can be estimated by substituting outcomes of matched nonparticipants for the missing potential outcomes of participants. Propensity score matching reduces the matching problem to a univariate problem by matching on the propensity score rather than all covariates. The document then discusses various methods for implementing propensity score matching like nearest neighbor matching, stratification matching, and kernel matching.

Uploaded by

CristianDavidFrancoHernandez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views38 pages

Matching Estimator

Uploaded by

CristianDavidFrancoHernandez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Section 4

Matching Estimator
Matching Estimators

• Key Idea:
The matching method compares the outcomes of
program participants with those of matched
nonparticipants, where matches are chosen on the basis
of similarity in observed characteristics.

• Main advantage of matching estimators:

they typically do not require specifying a functional form
of the outcome equation and are therefore not
susceptible to misspecification bias along that
dimension.
Assumptions of Matching Approach

• Assume you have access to data on treated and

untreated individuals (D=1 and D=0)
• Assume you also have access to a set of Z variables
whose distribution is not affected by D:
F(Z|D,Y1,Y0)=F(Z|Y1,Y0)
(will be explained in a few slides why necessary)
Assumptions of Matching Approach
1. Selection on Observables (Unconfoundedness Assump.)

 There exists a set of observed characteristics Z such that

outcomes are independent of program participation conditional on Z
(i.e. treatment assignment is “strictly ignorable” given Z
(Rosenbaum/Rubin (1983)).

2. Common Support Assumption

 Assumption 2 is required, so that matches for D=0 and D=1

observations can be found
Implication of Assumptions
• If Assumptions 1 and 2 are satisfied, then the problem of determining
mean program impact can be solved by substituting the Y0 distribution
observed for “matched-on-Z non-participants” for the missing Y0
distribution of participants.

• To justify assumption 1, individuals cannot select into the program

based on anticipated treatment impact

• Assumption 1 implies:

• Under these assumptions, one can estimate the ATE, TTE and UTE
Weaker assumptions for TTE

• If interest centers on TTE, assumptions 1 and 2 can be

slightly relaxed:
1. The following weaker conditional mean independence
assumption on Y0 suffices:

2. Only the following support condition is necessary (>0 is not

required, because this is only needed to guarantee a
participant analogue for each non-participant)

 The weaker assumptions for the TTE allow selection

into the program to depend on Y1, but not on Y0.
Estimation of the TTE using the Matching
Approach
• Under these assumptions, the mean impact of the program on
program participants can be written as:

(using the Law of Iterated Expectations and the assumptions stated

before)
• Here we can illustrate why the assumption is needed, that the
distribution of the matching variables, Z, is not affected by whether
the treatment is received.
Assumption about the distribution of
matching variables Z
• Assumption: The distribution of the matching variables, Z, is not
affected by whether the treatment is received (see slide 3).
• In the derivation of treatment effects, e.g. of the TTE (see slide
before), we make use of this assumption as follows:

• This expression uses the conditional density to

represent the density that would also have been observed in the no
treatment (D=0) state, which rules out the possibility that receipt of
treatment changes the density of Z.
• Examples: age, gender and race would generally be valid matching
variables, but marital status may not be if it were directly affected by
the receipt of the program.
Matching Estimator

• A prototypical matching estimator for the TTE takes the form (n1 is
the number of observations in the treatment group):

• where is an estimator for the matched “no

treatment” outcome

• Recall that Assumption 1 implies:

How does matching compare to a
randomized experiment?

• Distribution of observables of the matched controls will be the same

in the treatment group
• However, distribution of unobservables not necessarily balanced
across groups

• Experiment has full support, but with matching there can be a failure of
the common support condition (assump 2)
 if there are regions where the support of Z does not overlap for the D=0
and D=1 groups, then matching is only justified when performed over the
region of common support, i.e. the estimated treatment effect must be
defined conditionally on the region of overlap
Implementing Matching Estimators

• Problems:
– How to construct a match when Z is of high dimension
– What to do if P(D=1|Z)=1 for some Z (violation of common
support assumption (A2)
– How to choose set of Z variables
Propensity Score Matching

• Matching estimators difficult to implement when set of conditioning

variables Z is large (small cell problems) or Z continuous (“curse of
dimensionality”)
 Rosenbaum and Rubin theorem (1983):
Show that

implies

 Reduces the matching problem to a univariate problem, provided

P(D=1|Z) (the “propensity score”) can be parametrically estimated
Proof of Rosenbaum/Rubin Theorem

• Show that
E(D|Y,Z)=E(D|Z) implies E{D|Y,P(Z)}= E{D|P(Z)}

• Let P(Z)=P(D=1|Z) and note that P(D=1|Z)=E(D|Z)

• E{D|Y,P(Z)}= E{ E(D|Y,Z) |Y, P(Z)} [Law of Iterated Expectations]

= E{ E(D|Z) |Y, P(Z)} [assumption 1 of matching est.]
= E{ P(Z) |Y, P(Z)}
= P(Z)
= E{ D | P(Z)}
Implementation of the
Propensity Score Matching Estimator

Step 1: Estimate a model of program participation, i.e. estimate the

propensity score P(Z) for each person

Step 2: Select matches based on the estimated propensity score

(n1 is the number of observations in the treatment group)
Propensity Score Matching Methods
• For notational simplicity, let P=P(Z)
• A prototypical propensity score matching estimator for the TTE
takes the form:

with

where denotes the set of program participants, the set of non-

participants, the region of common support (defined on next slide),
and is the number of persons in the set

 The match for each participant is constructed as a

weighted average over the outcomes of non-participants, where the
weights depend on the distance between
Implementing Matching Estimators

• Problems:
– How to construct a match when Z is of high dimension
– What to do if P(D=1|Z)=1 for some Z (violation of common
support assumption (A2)
– How to choose set of Z variables
Common Support Condition

• The common support region can be estimated by

where are standard nonparametric

density estimators.

• To ensure that the densities are strictly greater than zero, it is

required that the densities are strictly positive (i.e. exceed zero by a
certain amount), determined using a “trimming level” q.

• The common support condition ensures that matches for D=1 and
D=0 can be found.
Cross-sectional matching methods:
Alternative ways of constructing matched outcomes

• Define a neighborhood for each i in the participant sample.

• Neighbors for i are non-participants for whom
• The persons matched to i are those people in set where

Alternative matching estimators that differ

– in how neighborhood is defined and
– in how the weights are constructed

1. Nearest Neighbor Matching

2. Stratification or Interval Matching
3. Kernel and Local Linear Matching
Cross-sectional Method 1:
Nearest Neighbor Matching
• Traditional, pairwise matching, also called nearest-neighbor
matching, sets

• That is the non-participant with the value of Pj that is closest to Pi is

selected as the match and Ai is a singleton set.
• The estimator can be implemented either matching with or without
replacement
– With replacement: same comparison group observation can be used
repeatedly as a match
– Drawback of matching without replacement: final estimate will usually
depend on the initial ordering of the treated observations for which the
matches were selected
Cross-sectional Method 1:
Nearest Neighbor Matching
• Variation of nearest-neighbor matching: Caliper matching
(Cochrane and Rubin (1973))

• Attempts to avoid “bad” matches (those for which Pj is far from Pi) by
imposing a tolerance on the maximum distance allowed, i.e. a match
for person i is selected only if

where is a prespecified tolerance.

• Treated persons for whom no matches can be found within the

caliper are excluded from the analysis (one way of imposing the
common support condition)

• Drawback of caliper matching: it is difficult to know a priori what

choice for the tolerance level is reasonable.
Cross-sectional Method 2:
Stratification or Interval Matching
• Method:
1. In this variant of matching, the common support of P is partitioned into
a set of intervals.
2. Average treatment impacts are calculated through simple averaging
within each interval.
3. Overall average impact estimate:
• a weighted average of the interval impact estimates, using the
fraction of the D=1 population in each interval for the weights.
• Requires decision on how wide the intervals should be:
– Dehejia and Wahba (1999) use intervals that are selected such that the
mean values of the estimated Pi’s and Pj’s are not statistically different
from each other within intervals.
Cross-sectional Method 3:
Kernel and Local Linear Matching
• Kernel Method:
– Uses a weighted average of all observations within the common
support region: the farther away the comparison unit is from the
treated unit the lower the weight.

• Local linear matching:

– Similar to the kernel estimator but includes a linear term in the
weighting function, which helps to avoid bias.
Kernel and Local Linear Matching

A kernel estimator for

is given by

with weights

K is a kernel function and h is a bandwidth (or smoothing parameter)

 discussion about choice of kernel function and bandwidth, see
later
Intro to Nonparametric Estimation

• Reference: Angus Deaton “The Analysis of Household

Surveys”  TO READ
– ch. 3.2 Nonparametric methods for estimating density functions
(p. 169-175), Nonparametric regression analysis (p. 191-199)
• Kernel density estimation
• Kernel regression
• Choice of kernel and choice of bandwidth (trade-off
between bias and variance)
• Local linear estimation: when and why better?
Estimating Univariate Densities:
Histograms versus Kernel Estimators
• Application: when visual impression of the position and spread of
the data is needed (important for example for evaluating the
distribution of welfare and effects of policies on whole distribution)
• Histograms have the following disadvantages:
– Degree of arbitrariness that comes from the choice of the number of
“bins” and of their width
– Problem when trying to represent continuously differentiable densities of
variables that are inherently continuous  histogram can obscure the
genuine shape of the empirical distribution and unsuited to provide info
about the derivates of density functions
• Alternatives: fit a parametric density to the data or nonparametric
techniques (allow a more direct inspection of the data)
Nonparametric density estimation

• Idea: get away from “bins” of the histogram by estimating the density at
every point along the x-axes.
• Problem: with a finite sample, there will only be empirical mass at a finite
number of points.
• Solution: use mass at nearby points as well as the point itself.
• Illustration: think of sliding a band (or window) along the x-axis, calculate
the fraction of the sample per unit interval within it and plot the result as
an estimate of the density at the mid-point of the band
• Naïve estimator:

 but there will be steps in f(x) each time a data point enters or exits the
band
Nonparametric density estimation

• Naïve estimator:

 but there will be steps in f(x) each time a data point enters or exits the
band

• Modification: instead of giving all the points inside the band equal
weight, give more weight to those near to x and less to those far away,
so that points have a weight of zero both just outside and just inside the
band  replace the indicator function by a “kernel” function K(.)
Choice of Kernel and Bandwidth
• Choice of kernel K(.):
1. Because it is a weighting function, it should be positive and integrate
to unity over the band.
2. It should be symmetric around zero, so that points below x get the
same weight as those an equal distance above.
3. It should be decreasing in the absolute value of its argument.
• Alternative kernel functions:
– Epanechnikov kernel,
– Gaussian kernel (normal density, giving some weight to all
observations)
– “biweight” kernel

 Choice of kernel will influence shape of the estimated density

(especially when there are few points), but choice is not a critical one
Choice of Kernel and Bandwidth

• Choice of bandwidth:
– Results often very sensitive to choice of bandwidth.
– Estimating densities by kernel methods is an exercise in
“smoothing” the raw observations into an estimated density and
the bandwidth controls how much smoothing is done.
– Bandwidth controls trade-off between bias and variance:
• A large bandwidth will provide a smooth and not very variable
estimate, but risks bias by bringing in observations from other parts
of the density.
• A small bandwidth helps to pick up genuine features of the underlying
density, but risks producing an unnecessarily variable plot.
 Oversmoothed estimates are biased and undersmoothed estimates
are too variable.
Choice of Kernel and Bandwidth
• Choice of bandwidth (ctnd):
– Consistency of the nonparametric estimator requires that the
bandwidth shrinks to zero as the sample size gets large, but not at
“too fast a rate” (can be made formal).
– In practice: Consider a number of different bandwidths, plot the
associated density estimated and examine the sensitivity of the
estimates with respect to bandwidth choice.
– Formal theory of the trade-off:
• In standard parametric inference optimal estimation is based on
minimizing the mean-squared error between the estimated and true
parameters.
• In the nonparametric case, we estimate a function not a parameter and
the there will be a mean-sq error at each point on the estimated density
 attempt to minimize the mean integrated squared error
 This way an optimal bandwidth can be estimated (after kernel is
chosen).
Nonparametric Regression Analysis
• Conditional expectation of y conditional on x:

• Links between a conditional expectation and the underlying

distributions:

• Intuitively: calculate the average of all y-values corresponding to

each x or vector of x  not feasible with finite samples and
continuous x, same problem as in density estimation so adopt same
solution: average over points “near” x

• Kernel regression estimator:

Kernel and Local Linear Matching

A kernel estimator for

is given by

with weights

K is a kernel function and h is a bandwidth (or smoothing parameter)

 discussion about choice of kernel function and bandwidth, see
later
Nonparametric Regression Analysis

• Important: it is not possible to calculate a conditional expectation for

values of x where the density is zero  in practice, problems
whenever the estimated density is small or zero (will make the
regression function imprecise)
• Main strength of nonparametric over parametric regression: assumes
no functional form for the relationship, allowing the data to choose,
not only parameter estimates, but the shape of the curve itself
• Weaknesses:
– price of the flexibility is the much greater data requirements to
implementing nonparametric methods and the difficulties of handling
high-dimensional problems (alternatives: polynomial regressions and
semiparametric estimation)
– Nonparametric methods lack the menu of options that is available for
parametric methods when dealing with simultaneity, measurement error,
selectivity and so forth
Locally Linear Regression

• Read Angus Deaton “The Analysis of Household

Surveys” (p. 197-199)
 important: will be used again later on
Difference-in-Difference Matching Estimators
• Assumption of cross-sectional matching estimators:
– After conditioning on a set of observable characteristics, outcomes are
conditionally mean independent of program participation.

• BUT: there may be systematic differences between participant and

nonparticipant outcomes that could lead to a violation of the
identification conditions required for matching
– e.g. due to program selectivity on unmeasured characteristics

• Solution in the case of temporally invariant differences in outcomes

between participants and nonparticipants:
 difference-in-difference matching strategy
(see Heckman, Ichimura and Todd (1997))
Cross-sectional versus Diff-in-Diff Matching
Estimators

A) Cross-sectional Matching Estimator

This estimator assumes:

Under these conditions, can be estimated by

where n1 are the number of treated individuals for which CS2 is satisfied.
Cross-sectional versus Diff-in-Diff Matching
Estimators
B) Difference-in-Difference Matching Estimator
This estimator requires repeated cross-section or panel data. Let t and t’ be
the two time periods, one before the program start date and one after.
Conditions needed to justify the application of the estimator are:

Under these conditions, can be estimated by

Assessing the Variability of Matching
Estimators

• Distribution theory for cross-sectional and DID kernel and local

linear matching estimators: see Heckman, Ichimura and Todd
(1998)

• But implementing the asymptotic standard error formulae can be

cumbersome, so standard errors for matching estimators are often
generated using bootstrap resampling methods

• This is valid for kernel or local linear matching estimators, but not for
nearest neighbor matching estimators (see Abadie and Imbens
(2004), also for alternatives in that case)

EH426 AT3 2024 Matching
No ratings yet
EH426 AT3 2024 Matching
31 pages
Causal Estimation and Matching Methods
No ratings yet
Causal Estimation and Matching Methods
103 pages
Simple and Bias-Corrected Matching Estimators For Average Treatment Effects
No ratings yet
Simple and Bias-Corrected Matching Estimators For Average Treatment Effects
57 pages
Stata Guide for Propensity Score Matching
100% (1)
Stata Guide for Propensity Score Matching
15 pages
Matching Method (PSM) - Mbarara. Toko
No ratings yet
Matching Method (PSM) - Mbarara. Toko
28 pages
Matching and Selection On Observables Handout
100% (1)
Matching and Selection On Observables Handout
30 pages
Matching and The Propensity Score Handout
No ratings yet
Matching and The Propensity Score Handout
23 pages
Econometric Matching Estimators
No ratings yet
Econometric Matching Estimators
52 pages
Chapter 3
No ratings yet
Chapter 3
12 pages
Slides QSMII Chapter 3
No ratings yet
Slides QSMII Chapter 3
12 pages
Propensity Score Matching Explained
No ratings yet
Propensity Score Matching Explained
14 pages
Coarsened Exact Matching in Stata
No ratings yet
Coarsened Exact Matching in Stata
23 pages
NIH Public Access: Author Manuscript
No ratings yet
NIH Public Access: Author Manuscript
29 pages
Bias-Corrected Matching Estimators For
No ratings yet
Bias-Corrected Matching Estimators For
11 pages
SSRN 286297
No ratings yet
SSRN 286297
49 pages
Propensity Score Matching for Recurrent Events
No ratings yet
Propensity Score Matching for Recurrent Events
104 pages
Gaussian and Bootstrap Approximation For Matching-Based Average Treatment Effect Estimators
No ratings yet
Gaussian and Bootstrap Approximation For Matching-Based Average Treatment Effect Estimators
50 pages
FLAME: A Fast Large-Scale Almost Matching Exactly Approach To Causal Inference
No ratings yet
FLAME: A Fast Large-Scale Almost Matching Exactly Approach To Causal Inference
23 pages
Lecture 8 Matching
No ratings yet
Lecture 8 Matching
45 pages
Understanding Propensity Score Matching
No ratings yet
Understanding Propensity Score Matching
3 pages
Final
No ratings yet
Final
9 pages
Evaluation Method - 2023 - Class
No ratings yet
Evaluation Method - 2023 - Class
21 pages
Lanners 23 A
No ratings yet
Lanners 23 A
11 pages
Tutorial: Matching and Difference in Difference Estimation: Psmatch2 From HTTP://FMWWW - Bc.Edu/Repec/Bocode/P
No ratings yet
Tutorial: Matching and Difference in Difference Estimation: Psmatch2 From HTTP://FMWWW - Bc.Edu/Repec/Bocode/P
12 pages
B1 Regression Adjustment
No ratings yet
B1 Regression Adjustment
29 pages
WP 535
No ratings yet
WP 535
33 pages
Matching To Remove Bias in Observational Studies
No ratings yet
Matching To Remove Bias in Observational Studies
26 pages
M604 Final Solutions
No ratings yet
M604 Final Solutions
20 pages
Quasi-Experimental Design Guide
No ratings yet
Quasi-Experimental Design Guide
12 pages
Handout 6 Causality
No ratings yet
Handout 6 Causality
16 pages
Unit-II DM Techniques
No ratings yet
Unit-II DM Techniques
20 pages
Introduction To Propensity Score Analysis
No ratings yet
Introduction To Propensity Score Analysis
41 pages
Measurement Error For Factor Class of Estimator
No ratings yet
Measurement Error For Factor Class of Estimator
14 pages
Chapter 11 Quasiexperimental Designs
No ratings yet
Chapter 11 Quasiexperimental Designs
22 pages
Introduction To Cohort Analysis
No ratings yet
Introduction To Cohort Analysis
13 pages
Finite-Sample Optimal Estimation and Inference On Average Treatment Effects Under Unconfoundedness
No ratings yet
Finite-Sample Optimal Estimation and Inference On Average Treatment Effects Under Unconfoundedness
56 pages
ExLATE Handout JA May2011short
No ratings yet
ExLATE Handout JA May2011short
30 pages
Journal 1
No ratings yet
Journal 1
10 pages
Jia Grad - Msu 0128D 15316
No ratings yet
Jia Grad - Msu 0128D 15316
168 pages
Modeldepmatch Handout - Pdf#page 20
No ratings yet
Modeldepmatch Handout - Pdf#page 20
110 pages
Matching Methods
No ratings yet
Matching Methods
9 pages
FroelichaSperlich Book
No ratings yet
FroelichaSperlich Book
365 pages
Wacholder III
No ratings yet
Wacholder III
9 pages
Event Studies Slides
No ratings yet
Event Studies Slides
39 pages
OLS and Matching
No ratings yet
OLS and Matching
20 pages
Empirical Methods in Microeconomics
No ratings yet
Empirical Methods in Microeconomics
3 pages
R Propensity Score Matching Guide
No ratings yet
R Propensity Score Matching Guide
24 pages
Prop Scores
No ratings yet
Prop Scores
77 pages
Treatment Effect Analysis Techniques
No ratings yet
Treatment Effect Analysis Techniques
6 pages
Benefits of Propensity Score Matching
No ratings yet
Benefits of Propensity Score Matching
84 pages
Statistics Cheat Sheet
100% (3)
Statistics Cheat Sheet
23 pages
Smith and Todd 2005 PDF
No ratings yet
Smith and Todd 2005 PDF
49 pages
Propensity Score Matching in Econometrics
No ratings yet
Propensity Score Matching in Econometrics
49 pages
Apc 1
No ratings yet
Apc 1
47 pages
Quiz 2 Cheatsheet v3
No ratings yet
Quiz 2 Cheatsheet v3
2 pages
Notas - Dynamic Optimation and Optimal Control
No ratings yet
Notas - Dynamic Optimation and Optimal Control
26 pages
Bayes Rule for Continuous Variables
No ratings yet
Bayes Rule for Continuous Variables
9 pages
Lecture 20
No ratings yet
Lecture 20
20 pages
Matlab
100% (2)
Matlab
83 pages
TOEFL iBT Fast-Track Course
No ratings yet
TOEFL iBT Fast-Track Course
5 pages
Konversi Bilangan dan Komplemen Sistem Numerik
No ratings yet
Konversi Bilangan dan Komplemen Sistem Numerik
6 pages
Chapter 14 The Behavior of Gases
100% (2)
Chapter 14 The Behavior of Gases
59 pages
Civil Engineering Super Elevation Guide
100% (1)
Civil Engineering Super Elevation Guide
6 pages
Practical File For Class-X (AI)
No ratings yet
Practical File For Class-X (AI)
16 pages
Steam Sterilization Autoclave
No ratings yet
Steam Sterilization Autoclave
28 pages
Low Power Inverter For Domestic Applications
100% (1)
Low Power Inverter For Domestic Applications
39 pages
Gear Condition Monitoring Technics: December 2017
No ratings yet
Gear Condition Monitoring Technics: December 2017
26 pages
Math p2 GRD 11 Eng MG 2024 - June
100% (2)
Math p2 GRD 11 Eng MG 2024 - June
10 pages
AA Buck
No ratings yet
AA Buck
88 pages
BCA - 440-18, BCA-440-20, BCA-CS-440-20 Java
No ratings yet
BCA - 440-18, BCA-440-20, BCA-CS-440-20 Java
2 pages
Interactive Volume Rendering With Ray Tracing
No ratings yet
Interactive Volume Rendering With Ray Tracing
22 pages
Roboteq Controllers User Manual v18 PDF
No ratings yet
Roboteq Controllers User Manual v18 PDF
384 pages
Screw Cement Grout Pump Cat
No ratings yet
Screw Cement Grout Pump Cat
2 pages
Comprehensive Guide to Bearing Housings
No ratings yet
Comprehensive Guide to Bearing Housings
8 pages
Creating BIRT Reports For Maximo
No ratings yet
Creating BIRT Reports For Maximo
14 pages
Fault Code Definition: No. FMI Number Definition
No ratings yet
Fault Code Definition: No. FMI Number Definition
2 pages
Field Guide to Constellations
No ratings yet
Field Guide to Constellations
130 pages
Unit 1 & 3 Discrete Structure Vedveethi - Co.in
No ratings yet
Unit 1 & 3 Discrete Structure Vedveethi - Co.in
73 pages
AE 6604 Aircraft Materials and Processes-Question Bank Unit I Part - B
100% (4)
AE 6604 Aircraft Materials and Processes-Question Bank Unit I Part - B
3 pages
Mrs Lincolns Dressmaker A Novel Jennifer Chiaverini PDF Download
No ratings yet
Mrs Lincolns Dressmaker A Novel Jennifer Chiaverini PDF Download
33 pages
Vande Bharat Train Set 18 Course Guide
No ratings yet
Vande Bharat Train Set 18 Course Guide
49 pages
AA SL - Unit 2c - Rational Functions
No ratings yet
AA SL - Unit 2c - Rational Functions
34 pages
7J Electricity Test 2004
100% (5)
7J Electricity Test 2004
3 pages
Assignment On Array
No ratings yet
Assignment On Array
5 pages
Thinking Through Diagrams PDF
100% (1)
Thinking Through Diagrams PDF
15 pages
Practical Chemistry Principles for JEE/NEET
No ratings yet
Practical Chemistry Principles for JEE/NEET
22 pages
JEE Main (January Attempt) 2020: A Detailed Analysis by Resonance
No ratings yet
JEE Main (January Attempt) 2020: A Detailed Analysis by Resonance
7 pages
Plasma Air PA600 Series: Installation, Operation & Maintenance Manual
No ratings yet
Plasma Air PA600 Series: Installation, Operation & Maintenance Manual
4 pages
Math 4-Matatag Diagnostic Sy 2024-2025
100% (1)
Math 4-Matatag Diagnostic Sy 2024-2025
5 pages
Sgw-3015 Technical Manual
100% (3)
Sgw-3015 Technical Manual
7 pages

Matching Estimator

Uploaded by

Matching Estimator

Uploaded by

Section 4

• Main advantage of matching estimators:

• Assume you have access to data on treated and

 There exists a set of observed characteristics Z such that

2. Common Support Assumption

 Assumption 2 is required, so that matches for D=0 and D=1

• To justify assumption 1, individuals cannot select into the program

• If interest centers on TTE, assumptions 1 and 2 can be

2. Only the following support condition is necessary (>0 is not

 The weaker assumptions for the TTE allow selection

(using the Law of Iterated Expectations and the assumptions stated

• This expression uses the conditional density to

• where is an estimator for the matched “no

• Recall that Assumption 1 implies:

• Distribution of observables of the matched controls will be the same

• Matching estimators difficult to implement when set of conditioning

 Reduces the matching problem to a univariate problem, provided

• Let P(Z)=P(D=1|Z) and note that P(D=1|Z)=E(D|Z)

• E{D|Y,P(Z)}= E{ E(D|Y,Z) |Y, P(Z)} [Law of Iterated Expectations]

Step 1: Estimate a model of program participation, i.e. estimate the

Step 2: Select matches based on the estimated propensity score

where denotes the set of program participants, the set of non-

 The match for each participant is constructed as a

• The common support region can be estimated by

where are standard nonparametric

• To ensure that the densities are strictly greater than zero, it is

• Define a neighborhood for each i in the participant sample.

Alternative matching estimators that differ

1. Nearest Neighbor Matching

• That is the non-participant with the value of Pj that is closest to Pi is

where is a prespecified tolerance.

• Treated persons for whom no matches can be found within the

• Drawback of caliper matching: it is difficult to know a priori what

• Local linear matching:

A kernel estimator for

K is a kernel function and h is a bandwidth (or smoothing parameter)

• Reference: Angus Deaton “The Analysis of Household

 Choice of kernel will influence shape of the estimated density

• Links between a conditional expectation and the underlying

• Intuitively: calculate the average of all y-values corresponding to

• Kernel regression estimator:

A kernel estimator for

K is a kernel function and h is a bandwidth (or smoothing parameter)

• Important: it is not possible to calculate a conditional expectation for

• Read Angus Deaton “The Analysis of Household

• BUT: there may be systematic differences between participant and

• Solution in the case of temporally invariant differences in outcomes

A) Cross-sectional Matching Estimator

Under these conditions, can be estimated by

Under these conditions, can be estimated by

• Distribution theory for cross-sectional and DID kernel and local

• But implementing the asymptotic standard error formulae can be

You might also like