Econometrics 2
1. Repeated Cross Section: Difference in differences
Laurent Davezies & Elia Lapenta
ENSAE, 2024/2025
1 / 28
Outline
Introduction
The difference in differences method
Generalizations
2 / 28
Longitudinal data
▶ Repeated cross sections: data measuring the same variables at different
periods, on different units (units=individuals, households, firms, areas...).
▶ Examples:
▶ Traditional national surveys: Consumer Expenditure Survey, Health
Survey, Housing survey, Time-use survey...
▶ Administrative data without individual identifier.
▶ Panels (next lectures): data mesuring the same variables at different
periods on the same units:
▶ The US PSID and NLSY, administrative data including an identifier...
▶ Rotating panels such as the Insee Labor Force Survey.
▶ We consider here methods applying to both types of data.
3 / 28
Motivation
▶ Measuring the evolution over time of an outcome of interest:
▶ Wages, income;
▶ Consumption (share of certain expenditures, durable goods);
▶ Fertility...
▶ Separating the explained and unexplained part of this evolution.
▶ Measuring the evolution of the effect of treatments/particular covariates:
▶ gender wage gap;
▶ returns to schooling...
4 / 28
Evolution of returns to schooling and gender wage gap
▶ Interact the covariate of interest with time.
▶ The example below is obtained on US data between 1978 and 1985:
5 / 28
Difference in Differences
▶ Evaluate treatment effects: alternative to the instrumental variable
approach.
▶ One of the main empirical methods used in policy evaluation:
Difference in Differences
▶ Large number of scientific papers and/or policy analysis using DiD or
some extensions/generalizations of DiD.
▶ Basic idea: compare evolution of a group entering in a treatment
with evolution of a group not entering in the treatment.
▶ Design of the data: repeated cross section
▶ With panel data (see next lectures) : same identification assumption
but more precise estimates
6 / 28
Outline
Introduction
The difference in differences method
Generalizations
7 / 28
Motivation
▶ Two repeated cross sections at t = 0 or t = 1.
▶ We consider a binary treatment Dt ∈ {0, 1} at period t.
▶ Yt (0) (resp. Yt (1)) is the potential outcome associated to no treatment
(resp. treatment) at period t. We only observe Yt := Yt (Dt ).
▶ Two groups (stable on the two periods): the “control group” (G = 0) and
the “treatment group” (G = 1).
▶ The control group remains untreated in both periods, whereas the
treatment group receives the treatment at the second period. Then
Dt = G × t.
▶ We seek to identify the average treatment effect on the treated, viz.
δ T = E (Y1 (1) − Y1 (0)|D1 = 1)
= E (Y1 (1) − Y1 (0)|G = 1).
8 / 28
Basic set-up
▶ Often, the assumption of no selection is unrealistic (ie
Cov(D1 , Y1 (0)) ̸= 0), even if we include regressors (ie
Cov(D1 , Y1 (0)|X ) ̸= 0).
▶ It is not always possible to find a valid instrument.
▶ Other idea: exploit spatial and temporal variations in the treatment.
▶ Examples:
▶ Effect of minimum wage on employment: use variations between US
states and temporal variations in the minimum wage.
▶ Effect of taxes on consumption.
▶ Effect of the presence of Seveso plants or green parks on housing
prices.
9 / 28
Example
▶ Effect of minimum wage on employment (Card and Krueger, 1994)?
▶ In April 1992, New Jersey increases its minimum wage, from $4.25 to
$5.05. Pennsylvania keeps its minimum wage at $4.25.
▶ Card and Krueger focus on fast-food restaurants.
▶ They gather data on around 400 such restaurants in the two states, before
and after the reform.
10 / 28
First strategy: control-treated comparison
▶ A first idea would be to simply compare the control and treatment group,
after the introduction of the treatment:
βCS = E (Y1 |G = 1) − E (Y1 |G = 0).
▶ But to obtain βCS = δ T , one would require:
E (Y1 (0)|G = 1) = E (Y1 (0)|G = 0).
▶ This condition is often unrealistic. We can check it informally by looking
at the 1st period:
E (Y0 (0)|G = 1) = E (Y0 (0)|G = 0)
⇐⇒ E (Y0 |G = 1) = E (Y0 |G = 0). (1)
▶ In C & K: E
b (Y0 |G = 1) ≃ 20.4, E
b (Y0 |G = 0) ≃ 23.3. We reject (1) at the
5% level.
11 / 28
Second strategy: before-after comparison
▶ A second idea would be to measure the evolution of Y in the treatment
group:
βBA = E (Y1 |G = 1) − E (Y0 |G = 1).
▶ But for βBA = δ T to hold, one would need
E (Y1 (0)|G = 1) = E (Y0 (0)|G = 1),
▶ This condition is often unrealistic. We can test it informally by checking
whether in the control group,
E (Y1 (0)|G = 0) = E (Y0 (0)|G = 0)
⇐⇒ E (Y1 |G = 0) = E (Y0 |G = 0). (2)
▶ In C & K: E
b (Y1 |G = 0) ≃ 21.2 and E
b (Y0 |G = 0) ≃ 23.3. We reject (2) at
the 10% level.
12 / 28
Third strategy: difference in differences
▶ We now combine the two previous ideas by considering the difference in
differences:
βDID = [E (Y1 |G = 1) − E (Y0 |G = 1)] − [E (Y1 |G = 0) − E (Y0 |G = 0)] .
Theorem 1
Let us suppose that the following common trends condition holds:
E (Y1 (0)|G = 1) − E (Y0 (0)|G = 1) = E (Y1 (0)|G = 0) − E (Y0 (0)|G = 0)
Then βDID = δ T .
Proof: since Dt = G × t, we have
βDID =E (Y1 (1) − Y1 (0)|G = 1)
+ [E (Y1 (0)|G = 1) − E (Y0 (0)|G = 1)]
− [E (Y1 (0)|G = 0) − E (Y0 (0)|G = 0)] .
=δ T □
13 / 28
Graphical interpretation
(taken from “Mostly Harmless Econometrics” by J. Angrist and S. Pischke)
14 / 28
Example: Card and Krueger (1994)
▶ C & K get the results below (which is partially pasted from their Table 3).
⇒ Positive and significant effect! Sign reversed from the classical
microeconomic prediction.
▶ Explanation of C & K: can be the case if restaurants are local monopsony
on the labor market.
15 / 28
The common trends condition
▶ Key condition, which is not testable.
▶ However, if we observe the outcome at several periods (0, -1, -2 etc.)
before the treatment, we can test very close conditions
E [Yt (0)|G = 1] − E [Yt−1 (0)|G = 1]
=E [Yt (0)|G = 0] − E [Yt−1 (0)|G = 0], t ≤ 0.
▶ These conditions are testable because Yt = Yt (0) when t ≤ 0.
▶ We simply test that Y follows a parallel trend in the two groups before
the introduction of the policy.
▶ Further, assume that t 7→ E (Yt (1) − Yt (0)|G = 1) is constant.
▶ Then we can also “test” the common trends condition by testing that
t 7→ E (Yt |G = 1) − E (Yt |G = 0) is constant for t > 1.
16 / 28
Example of Card & Krueger
▶ The graph below is taken from Card and Krueger (2000), who obtain after
their 1994 paper some administrative data on a longer period.
▶ Conclusion?
17 / 28
Example of Pischke (2007)
▶ What is the effect of the length of a school year on students’ achievement?
▶ To answer this, Pischke uses the fact that in 1967, West Germany except
Bavaria moved the start of the school year from Spring to Fall.
⇒ the school year was shortened in 1967 and 1968, from 37 to 24 weeks.
(graph taken from “Mostly Harmless Econometrics” by J. Angrist and S. Pischke)
18 / 28
Outline
Introduction
The difference in differences method
Generalizations
19 / 28
Difference-in-differences and regressions
▶ We can compute the DID by a regression, see below.
▶ 1st benefit: we can easily compute the standard errors of βbDID .
▶ 2nd benefit of the regression view: including control variables.
▶ Pooling the two periods in a dataset: define time T as a random variable
on the pooled sample
D := DT = G × T ,
Y := YT = YT (DT ).
20 / 28
Difference-in-differences and regressions
▶ 1st benefit: we can easily compute the standard errors of βbDID .
Proposition 1
βDID (resp. βbDID ) can be obtained as the coefficient of D = G × T in the
theoretical regression (resp. regression on the data) of Y on G, T and D.
Proof: let β denote the coeff. of the theoretical reg. of Y on X . Recall that:
h 2 i
β = arg min E E (Y |X ) − X ′ b .
b
Here X = (1, G, T , G × T )′ . We have 4 coeffs. (β = (β1 , β2 , β3 , β4 )′ ) for 4
values (E (Y |G = g, T = t) with (g, t) ∈ {0, 1}2 ). Thus E (Y |X ) = X ′ β and:
βDID =E (Y |G = 1, T = 1) − E (Y |G = 1, T = 0)
− [E (Y |G = 0, T = 1) − E (Y |G = 0, T = 0)]
=(β1 + β2 + β3 + β4 ) − (β1 + β2 ) − [(β1 + β3 ) − β1 ] = β4 .
We reason similarly for βbDID , just replacing E (.) by E
b (.) □
21 / 28
Adding control variables
▶ 2nd benefit of the regression view: including control variables.
▶ The common trends condition corresponds to:
YT (d) = β01 + Gβ02 + T β03 + dδ T + ε, E (ε|G, T ) = 0.
▶ We can extend this model by assuming:
YT (d) = β01 + Gβ02 + T β03 + W ′ γ + dδ T + ε, E (ε|G, T ) = 0, E (W ε) = 0,
where W corresponds to a vector of control variables.
▶ Remark: (β1 , β2 , β3 , β4 ) in the previous slide are coefficients of a linear
projection but (β01 , β02 , β03 , δ T ) are causal parameters.
▶ Then, we can identify and estimate δ T by a reg. of Y on G, T , W and D.
▶ Motivation: common trends are assumed to hold for Y (0) − W ′ γ rather
than Y (0), which may be more plausible.
▶ Example: effect of a Seveso plant on housing prices.
▶ The different geographical areas may have different housing prices absent
the plant because of, e.g., different evolution in average income.
22 / 28
Multiple groups and periods, non-binary treatment
▶ We can also consider the case with several groups (g), multiple periods
(t) and a non-binary treatment.
▶ Example 1: D = minimum wage, G = US state, T = year.
▶ Example 2: D = cigarette tax rate, G = US state, T = year.
▶ In such cases, we still assume:
g t
X X
Y (d) = α + 1{G = g}βg + 1{T = t}γt + dδ T + ε,
g=1 t=1
to which we can also add covariates X .
▶ Equivalently, for a unit i in group g and date t:
Yi,g,t = α + βg + γt + Dg,t δ T + εi,g,t . (3)
23 / 28
Dynamic specifications
▶ A policy may take some time to be effective. Sometimes, it is also
anticipated by agents.
▶ We can estimate such dynamic effects by specifications generalizing (3).
▶ For simplicity, assume that Dg,t = 1{t ≥ Fg }: treatment is binary and
monotonic over time. Fg is the time period in which group g starts to be
treated, Fg = +∞ if g remains untreated.
▶ Then, fix τ1 , τ2 > 0 and define
−τ1
Dg,t = 1{t ≤ Fg − τ1 },
k
Dg,t = 1{t = Fg + k}, if k ∈ {−τ1 + 1, ..., τ2 − 1}
τ2
Dg,t = 1{t ≥ Fg + τ2 }.
▶ We consider the following specification:
τ2
X k
Yi,g,t = α + βg + γt + Dg,t δk + εi,g,t .
k=−τ1
24 / 28
Dynamic specifications
▶ Since Dg,t = 0 when t < Fg , we can test the common trends condition by
testing δk = 0 for k < 0.
▶ A violation of δk = 0 for k < 0 can also be due to the anticipation of the
treatment/policy.
▶ If k 7→ δk is monotonic on {0, ..., τ2 }, the treatment takes some time to
reach its full (positive or negative) effect.
▶ To identify the δk , Fg should not be constant, namely, the treatment
should not be introduced simultaneously in all groups.
▶ Even so, we need additional restrictions on the (δk )k . Often: δ−1 = 0.
25 / 28
Example: unilateral divorce laws and divorce rates
▶ In the 70’s, several US state introduce unilateral divorces laws. Were
these laws responsible for the rise in the divorce rate?
▶ Wolfers exploits variations in the timing of the introduction of these laws
to answer this question.
▶ Conclusion from the plot of t 7→ b
δt (obtained using Wolfers’ data)?
26 / 28
Computation of standard errors
▶ Possible aggregated shocks at the (g, t) level: for instance, economic
shocks specific to state g and year t.
▶ In such. a case, we must compute standard errors with clusters at the
g × t level.
▶ But Bertrand et al. (2004) also underline the importance of temporal
dependence.
▶ Using real data, they show that standard errors ignoring such dependence
leads to a severe under-estimation of the true standard errors.
▶ Idea: they create fictitious laws and estimate their effects. They find that
≃ 45% of the effects are significant, instead of ≃ 5%!
▶ Possible solution: clustering at the group level. This implies to have
“many” groups (≥ 50).
27 / 28
Summary
▶ Idea behind difference in differences:
1. A treatment group becomes treated whereas a control group remains
untreated.
2. We then compare the evolutions of their average outcomes.
▶ Works under the common trends condition.
▶ “Test” of this condition using the trends prior to the policy introduction.
▶ Link with regressions.
▶ Generalizations: inclusion of control variables, multiple groups and periods
of time, dynamic specifications...
▶ Computation of standard errors.
28 / 28