0% found this document useful (0 votes)

33 views31 pages

A1 Regression

The document outlines a course on Applied Econometrics focusing on regression and causality, emphasizing causal inference and program evaluation. It details the course structure, including topics such as regression, selection on observables, panel data methods, and instrumental variables. Key concepts and vocabulary related to regression analysis are also introduced, along with the importance of understanding causal relationships in empirical research.

Uploaded by

Sunakshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views31 pages

A1 Regression

Uploaded by

Sunakshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Part A: Regression and causality

A1: Key facts about regression

Kirill Borusyak
ARE 213 Applied Econometrics
UC Berkeley, Fall 2024

1
Acknowledgments

These lecture slides draw on the materials by Michael Anderson, Peter Hull, Paul
Goldsmith-Pinkham, and Michal Kolesar

All errors are mine — please let me know if you spot them!

2
What is this course about (1)

Goal: help you do rigorous empirical (micro)economic research

Focus on causal inference / program evaluation / treatment effects

What is shared by [the causal] literature is [...] an explicit emphasis on credibly

estimating causal effects, a recognition of the heterogeneity in these effects, clarity in the
identifying assumptions, and a concern about endogeneity of choices and the role study
design plays. (Imbens, 2010, “Better LATE Than Nothing”)

3
What is this course about (2)
Focus on most common research designs / identification strategies
The econometrics literature has developed a small number of canonical settings where
researchers view the specific causal models and associated statistical methods as well
established and understood. [They are] referred to as identification strategies. [These]
include unconfoundedness, IV, DiD, RDD, and synthetic control methods and are
familiar to most empirical researchers in economics. The [associated] methods associated
are commonly used in empirical work and are constantly being refined, and new
identification strategies are occasionally added to the canon. Empirical strategies not
currently in this canon, rightly or wrongly, are viewed with much more suspicion until
they reach the critical momentum to be added. (Imbens, 2020)

We will study target estimands, assumptions, tests, estimators, statistical inference

Introduce multi-purpose econometric tools: e.g. randomization inference
4
Course outline (1)
A. Introduction: regression and causality (~4 lectures)

▶ Key facts about regression; potential outcomes and RCTs

B. Selection on observables (~4 lectures)

▶ Covariate adjustment via regression, via propensity scores, doubly-robust methods,

double machine learning

C. Panel data methods (~7 lectures)

▶ Diff-in-diffs and event studies; synthetic controls and factor models

5
Course outline (2)
D. Instrumental variables (IVs) (~7 lectures)
▶ Linear IV; IV with treatment effect heterogeneity
▶ formula instruments, recentering, shift-share IV, spillovers
▶ Examiner designs (“judge IVs”)
E. Regression discontinuity (RD) designs (~3 lectures)
▶ Sharp and fuzzy RD designs and various extensions
F. Miscellaneous topics (~3 lectures)
▶ Nonlinear models: Poisson regression, quantile regression
▶ Statistical inference: clustering, bootstrap
▶ Topics of your interest (email me in advance!)

6
Course outline (3)

7
Currently not covered

Descriptive statistics, data visualization

Structural estimation

Time series data

Experimental design

8
Textbooks
MHE Angrist, Joshua and Jorn-Steffen Pischke (2009). Mostly Harmless Econometrics.
Princeton University Press.

CT Cameron, A. Colin and Pravin Trivedi (2005). Microconometrics: Methods and

Applications. Cambridge University Press.

IW Imbens, Guido and Jeffrey Wooldridge (2009). New developments in econometrics:

Lecture notes.
https://www.cemmap.ac.uk/resource/new-developments-in-econometrics/

JW Wooldridge, Jeffrey (2002). Econometric Analysis of Cross Section and Panel

Data. MIT Press. (Or second edition from 2010)

9
Some econometric vocabulary
P −1 P
′ −1 ′ N ′ N
OLS estimator: β̂ = (X X) XY≡ 1
N i=1 Xi Xi N
1
i=1 Xi Yi

▶ Random variable, function of the observed sample

OLS estimand: βOLS = E [XX′ ]−1 E [XY] ≡ E [Xi X′i ]−1 E [Xi Yi ] (assuming a
random sample)
▶ A non-stochastic population parameter
p
▶ β̂ → βOLS with a random sample under weak regularity conditions
▶ This does not involve assuming a model, exogeneity conditions etc.

β̂ and βOLS correspond to a linear specification Yi = β ′ Xi + error

▶ Just notational convention for reg Y X, not necessarily a model

10
Some econometric vocabulary (2)

An economic or statistical model is needed to interpret βOLS and other estimands

▶ A model involves parameters (with economic meaning) and assumptions

(restricting the DGP)

▶ Assumptions hopefully make some parameters identified, i.e. possible to uniquely

determine from everything the data contain — here, the distribution of (X, Y)

11
Some econometric vocabulary (3)
Example 1: demand and supply

Qi = −βd Pi + εd , Qi = βs Pi + εs , Cov [εd , εs ] = 0

▶ Regressing Qi on Pi and a constant yields (prove this!)

Var [εs ] Var [εd ]

βOLS = · (−βd ) + · βs
Var [εd ] + Var [εs ] Var [εd ] + Var [εs ]

Example 2: heterogeneous effects

Yi = βi Xi + εi , Xi ⊥
⊥ (βi , εi )

▶ Regressing Yi on Xi and a constant yields (prove this!)

βOLS = E [βi ]

12
Outline

1 Course intro

2 What is regression and why do we use it?

3 Linear regression and its mechanics

Regression and its uses
Regression of Y on X ≡ conditional expectation function (CEF):

h(·) : x 7→ h(x) ≡ E [Yi | Xi = x]

Conditional expectation E [Yi | Xi ] = h(Xi ) is a random variable because Xi is

Uses of regression:

Descriptive: how Y on average covaries with X — by definition

Prediction: if we know Xi , our best guess for Yi is h(Xi ) — prove next

Causal inference: what happens to Yi if we manipulate Xi — sometimes

13
Regression as optimal prediction (1)
What is the best guess is defined by a loss function
Proposition: CEF is the best predictor with quadratic loss:

h(·) = arg min E (Yi − g(Xi ))2
g(·)

Lemma: the CEF residual Yi − E [Yi | Xi ] is mean-zero and uncorrelated with any
g(Xi ).
▶ Proof by the law of iterated expectations (LIE)
▶ E [Yi − E [Yi | Xi ]] = E [E [Yi − E [Yi | Xi ] | Xi ]] = 0
▶ E [(Yi − h(Xi )) g(Xi )] = E [E [(Yi − h(Xi )) g(Xi ) | Xi ]] =
E [E [Yi − h(Xi ) | Xi ] · g(Xi )] = 0

14
Regression as optimal prediction (2)
Proposition: CEF is the best predictor with quadratic loss:

h(·) = arg min E (Yi − g(Xi ))2
g(·)

Lemma: the CEF residual Yi − E [Yi | Xi ] is mean-zero and uncorrelated with any
g(Xi ).
Proposition proof:
h i h i
E (Yi − g(Xi ))2 = E {(Yi − h(Xi )) + (h(Xi ) − g(Xi ))}2
h i h i
= E (Yi − h(Xi ))2 + 2E [(Yi − h(Xi )) (h(Xi ) − g(Xi ))] + E (h(Xi ) − g(Xi ))2
h i h i h i
2 2 2
= E (Yi − h(Xi )) + E (h(Xi ) − g(Xi )) ≥ E (Yi − h(Xi ))

15
Regression as optimal prediction: Exercise
What is the best predictor with loss |Yi − g(Xi )|, i.e. arg ming(·) E [|Yi − g(Xi )|]?
Or with the “check” loss function (slope q ∈ (0, 1) on the right, q − 1 on the left)?

Hint: solve it first assuming Xi takes only one value

Note: this exercise is linked to quantile regression
16
Outline

1 Course intro

2 What is regression and why do we use it?

3 Linear regression and its mechanics

Five reasons for linear regression
What does CEF have to do with least squares estimand βOLS = E [XX′ ]−1 E [XY]? And
why do we use it instead of E [Y | X]?
1. Curse of dimensionality: E [Y | X] is hard to estimate when X is high-dimensional
[but machine learning methods make it easier]
2. OLS and CEF solve similar problems: X′ βOLS is the best linear predictor of Y, i.e.
h i
βOLS = arg min E (Y − X′ b)
2
b

3. OLS is also the best linear approximation to the CEF:

h i
′ 2
βOLS = arg min E (E [Y | X] − X b)
b

17
Five reasons for linear regression (cont.)
1. Curse of dimensionality: E [Y | X] is hard to estimate when X is high-dimensional
2. OLS and CEF solve similar problems: X′ βOLS is the best linear predictor of Y, i.e.
h i
βOLS = arg min E (Y − X′ b)
2
b

3. OLS is also the best linear approximation to the CEF:

h i
βOLS = arg min E (E [Y | X] − X′ b)
2
b

▶ Proof by FOC: E [X (E [Y | X] − X′ b)] = 0 =⇒

b = E [XX′ ]−1 E [XE [Y | X]] = E [XX′ ]−1 E [XY] = βOLS

18
Five reasons for linear regression (cont.)

1. Curse of dimensionality: E [Y | X] is hard to estimate when X is high-dimensional

2. OLS and CEF solve similar problems: X′ βOLS is the best linear predictor of Y

3. OLS is also the best linear approximation to the CEF

4. With scalar X, βOLS is a convexly-weighted average of dE [Y | X = x] /dx (or its

discrete analog)

19
Proof of #4: Discrete X (with values x0 < · · · < xK)
PK
Rewrite E [Y | X = x] ≡ h(x) = h(x0 ) + k=1 (h(xk ) − h(xk−1 )) 1 [x ≥ xk ]
PK
Thus Cov [Y, X] = Cov [E [Y | X] , X] = k=1 (h(xk ) − h(xk−1 )) Cov [1 [X ≥ xk ] , X]
and

Cov [Y, X] X h(xk ) − h(xk−1 )

K
(xk − xk−1 ) Cov [1 [X ≥ xk ] , X]
βOLS = = ωk , ωk =
Var [X] x k − xk−1 Var [X]
k=1

Here ωk ≥ 0 because 1 [X ≥ xk ] is monotone. Specifically (prove it!):

Cov [1 [X ≥ xk ] , X] = (E [X | X ≥ xk ] − E [X | X < xk ]) P (X ≥ xk ) P (X < xk )

PK PK
And k=1 ωk = 1 because X = x0 + k=1 (xk − xk−1 ) 1 [X ≥ xk ]
20
Proof of #4: Continuous X

Similarly for continuous X:

Z ∞
Cov [1 [X ≥ x] , X]
βOLS = ω(x)h′ (x)dx, ω(x) =
−∞ Var [X]
R∞
with ω(x) ≥ 0 and −∞
ω(x) = 1

Exercise: if X is Gaussian, βOLS = E [h′ (X)] (prove it!)

φ(a)
▶ Hint: use E [Z | Z ≥ a] = 1−Φ(a) for Z ∼ N (0, 1)

21
Five reasons for linear regression (cont.)
1. Curse of dimensionality: E [Y | X] is hard to estimate when X is high-dimensional
2. OLS and CEF solve similar problems: X′ βOLS is the best linear predictor of Y
3. OLS is also the best linear approximation to the CEF
4. With scalar X, βOLS is a convexly-weighted average of ∂E [Y | X = x] /∂x
5. If E [Y | X] happens to be linear, E [Y | X] = X′ βOLS
▶ Linearity is guaranteed when (X, Y) are jointly normally distributed
▶ or when X is “saturated”: dummies for all values of a discrete variable. E.g. for
binary D and X = (1, D),

E [Y | X] = E [Y | D] = E [Y | D = 0] ·1 + (E [Y | D = 1] − E [Y | D = 0]) ·D
| {z } | {z }
intercept slope

22
(Linear) regression mechanics: Key results
1. When an intercept is included, residuals are mean-zero and uncorrelated with
regressors

2. Regressing Y = Xk on X1 , . . . , XK produces coeﬀicients (0, . . . , 0, 1, 0, . . . 0)

3. β̂ is a linear estimator

4. Frisch-Waugh-Lovell (FWL) theorem

5. Omitted variable bias (OVB) formula

6. Asymptotic distribution and robust standard errors for OLS estimator

23
Linear regression results (cont.)
When an intercept is included, population residuals are mean-zero and uncorrelated
′
with regressors: E [X (Y − βOLS X)] = 0

▶ A simple result, not an assumption (prove it!)

P
▶ The sample analog also holds: N1 i Xi Yi − β̂ ′ Xi = 0

▶ Since residuals are mean-zero, average fitted value equals average outcome,
1 P ′ 1 P
N i β̂ X i = N Y
i i

Regressing Y = Xk on X1 , . . . , XK produces coeﬀicients (0, . . . , 0, 1, 0, . . . 0)

▶ Prove it!

24
OLS is a linear estimator
Given the regressors X, each β̂k is linear in the outcomes, i.e. ∃ {ωki }Ni=1 such that
X
β̂k = ωki Yi
i

for some weights ωki ≡ ωki (X) (prove it!)

P ωki are mean-zero (for Xk 6= intercept), orthogonal to non-Xk regressors,

Weights
▶

and i ωki Xki = 1 (prove it!)

Implication: Regression coeﬀicients can be decomposed

▶ If Yi = Y1i + · · · + YPi , regressing each Ypi on Xi and adding up the coeﬀicient
estimates is numerically the same as regressing Yi on Xi

25
Partialling out: Frisch-Waugh-Lovell (FWL) theorem
Cov[X̃k ,Y] Cov[X̃k ,Ỹ]
Theorem: The k’th element of βOLS can be obtained as βk = Var[X̃k ]
or βk = Var[X̃k ]

where X̃k is the residual from regressing Xk on all other regressors (and same for Ỹ)
Proof:
′ ′ Cov[X̃k ,Y]
Define ε = Y − βOLS X. Plug in Y = βOLS X + ε to Var[X̃k ]

Note that X̃k is uncorrelated with ε; with other regressors; and with Y − Ỹ

Implication: Explicit characterization of the weights ωki :

P
X̃ki Yi X X̃ki
β̂k = Pi 2 = ωki Yi for ωki = PN 2 .
i X̃ki i j=1 X̃kj

26
Omitted variable “bias”
OVB formula is a mechanical relationship between βOLS from a “long” specification

Y = β0 + β1 X1 + β2 X2 + ε

and δOLS from a “short” specification

Y = δ0 + δ1 X1 + error

Claim: δ1 = β1 + β2 ρ, where ρ = Cov [X1 , X2 ] /Var [X1 ] is the regression slope of X2

(“omitted”) on X1 (“included”)
Cov[X1 ,Y] Cov[X1 ,β0 +β1 X1 +β2 X2 +ε]
Proof: δ1 = Var[X1 ]
= Var[X1 ]
= β1 + β2 Cov[X 1 ,X2 ]
Var[X1 ]
.
When included X1 is uncorrelated with omitted X2 , OVB = 0
Generalizes to multiple omitted variables (with OVB = β2′ ρ)
Applies with extra controls X3 included in long, short, and auxiliary regression
27
Asymptotic distribution of the OLS estimator
!−1 ! !−1 !
1X 1X 1X 1X
β̂ = Xi X′i Xi Yi = βOLS + Xi X′i Xi εi
N i N i N i N i
′
where by definition ε = Y − βOLS X. Thus,
!−1 !
√ 1X 1 X
N β̂ − βOLS = Xi X′i √ Xi εi
N i N i
P p
By LLN, 1
N
Xi X′i → E [XX′ ] (assumed non-singular)
i
P D
In a random sample, by CLT (using E [Xε] = 0), √1N i Xi εi → N (0, Var [Xε])
By the continuous mapping theorem,
√
D −1 −1
N β̂ − βOLS → N (0, V) , V = E [XX′ ] Var [Xε] E [XX′ ]
28
Robust standard errors
We estimate V by its sample analog (“sandwich formula”), up to a
degree-of-freedom correction:
!−1 ! !−1
1X 1 X 1 X
V̂ = Xi X′i · Xi X′i ε̂2i · Xi X′i
N i N − dim(X) i N i

Heteroskedasticity-robust (Eicker-Huber-White) standard error is

p
SE β̂k = Vkk /N

Never use homoskedastic standard errors!

For later: standard errors outside iid samples, e.g. clustered SE in panels

Applied Econometrics: OLS & Regression
No ratings yet
Applied Econometrics: OLS & Regression
227 pages
Chapter 1 - Regression Recap
No ratings yet
Chapter 1 - Regression Recap
24 pages
Linear Regression Overview and Causality
No ratings yet
Linear Regression Overview and Causality
110 pages
Econometrics Notes
No ratings yet
Econometrics Notes
95 pages
Econometrics Cheat Sheet
No ratings yet
Econometrics Cheat Sheet
4 pages
Ecom 165 Notes
No ratings yet
Ecom 165 Notes
98 pages
Regression Analysis in Finance
No ratings yet
Regression Analysis in Finance
34 pages
Overview of Classical Regression Model
100% (1)
Overview of Classical Regression Model
84 pages
ECON7350 Introduction and Review of The Classical Linear Regression Model
No ratings yet
ECON7350 Introduction and Review of The Classical Linear Regression Model
15 pages
Week 2, OLS
No ratings yet
Week 2, OLS
83 pages
Lesson 3 - Regression
No ratings yet
Lesson 3 - Regression
14 pages
Regression With One Regressor
No ratings yet
Regression With One Regressor
25 pages
Fet402 Lec02 2023 Econometrics
No ratings yet
Fet402 Lec02 2023 Econometrics
60 pages
Week 5 Notes
No ratings yet
Week 5 Notes
175 pages
Basic Regression Analysis in Econometrics
No ratings yet
Basic Regression Analysis in Econometrics
5 pages
Chapter5 Multivariate Regression
No ratings yet
Chapter5 Multivariate Regression
287 pages
Econometrics - Review Sheet ' (Main Concepts)
No ratings yet
Econometrics - Review Sheet ' (Main Concepts)
5 pages
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2013 1
No ratings yet
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2013 1
80 pages
Simple Linear Regression Model I
No ratings yet
Simple Linear Regression Model I
83 pages
Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
Lecture 3 Classical Linear Regression Model
No ratings yet
Lecture 3 Classical Linear Regression Model
55 pages
PDF Econometric 140pg
No ratings yet
PDF Econometric 140pg
40 pages
Econometrcs Presentatio PDF
No ratings yet
Econometrcs Presentatio PDF
80 pages
Linear Regression 101
No ratings yet
Linear Regression 101
20 pages
Econometrics Simple Linear Regression
No ratings yet
Econometrics Simple Linear Regression
22 pages
Econometrics Chapter 3
No ratings yet
Econometrics Chapter 3
80 pages
Econometrics Lecture Notes
No ratings yet
Econometrics Lecture Notes
119 pages
Econometrics Part1 Notes
No ratings yet
Econometrics Part1 Notes
7 pages
Lecture 1a
No ratings yet
Lecture 1a
17 pages
Chapter 2 Econometrics
No ratings yet
Chapter 2 Econometrics
9 pages
Econ Methods for Manchester Students
No ratings yet
Econ Methods for Manchester Students
164 pages
Lecture 02 JEB109 2023
No ratings yet
Lecture 02 JEB109 2023
38 pages
Lecture 4
No ratings yet
Lecture 4
11 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
Chap3 - Multiple Regression
No ratings yet
Chap3 - Multiple Regression
56 pages
Chapter 2
No ratings yet
Chapter 2
50 pages
IAPRI Technical Training-Intro To Applied Econometrics 2018 06 25+-+Nicole+Mason
No ratings yet
IAPRI Technical Training-Intro To Applied Econometrics 2018 06 25+-+Nicole+Mason
29 pages
Se LN 5
No ratings yet
Se LN 5
9 pages
Bahan Univariate Linear Regression
No ratings yet
Bahan Univariate Linear Regression
64 pages
WEEK2 Simple Regression
No ratings yet
WEEK2 Simple Regression
133 pages
Economics 308: Econometrics Professor Moody: Describing The Relationship Between Two Variables
No ratings yet
Economics 308: Econometrics Professor Moody: Describing The Relationship Between Two Variables
8 pages
Introductory Econometrics For Finance Chris Brooks Solutions To Review - Chapter 3
100% (2)
Introductory Econometrics For Finance Chris Brooks Solutions To Review - Chapter 3
7 pages
Hayashi 1 13
No ratings yet
Hayashi 1 13
13 pages
Introduction To Econometric - Tutor
No ratings yet
Introduction To Econometric - Tutor
134 pages
Econometrics Test Prep
100% (2)
Econometrics Test Prep
7 pages
Lec3 2019 PDF
No ratings yet
Lec3 2019 PDF
43 pages
Bivariate Regression: Class Size Impact
No ratings yet
Bivariate Regression: Class Size Impact
30 pages
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2002 1
No ratings yet
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2002 1
105 pages
Lecture 2.1 Regression
No ratings yet
Lecture 2.1 Regression
20 pages
Quantitative Methods for Finance Overview
No ratings yet
Quantitative Methods for Finance Overview
21 pages
Best Linear Predictor
No ratings yet
Best Linear Predictor
15 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
17 pages
f23 Econ103 Week2 Ta Note
No ratings yet
f23 Econ103 Week2 Ta Note
5 pages
Raw Introduction to Linear Regression (서울대 회귀분석 강의노트)
No ratings yet
Raw Introduction to Linear Regression (서울대 회귀분석 강의노트)
226 pages
Chapter 3 Econometrics
No ratings yet
Chapter 3 Econometrics
67 pages
Introduction To Econometrics (ET2013) : Teresa Randazzo
No ratings yet
Introduction To Econometrics (ET2013) : Teresa Randazzo
30 pages
Econometric Theory: Module - Iii
No ratings yet
Econometric Theory: Module - Iii
10 pages
11.factors Affecting Smartphone Purchase Among
No ratings yet
11.factors Affecting Smartphone Purchase Among
10 pages
2012 Edition: Integrating Theory in Research
No ratings yet
2012 Edition: Integrating Theory in Research
26 pages
Assessment of The Quality of Groundwater Wells in
No ratings yet
Assessment of The Quality of Groundwater Wells in
21 pages
Multilevel Modeling Using R
No ratings yet
Multilevel Modeling Using R
253 pages
Impact of Remittances on Youth Unemployment
100% (1)
Impact of Remittances on Youth Unemployment
26 pages
Kim 2008
No ratings yet
Kim 2008
11 pages
Design Your Own Practical Assessment Task Sheet Year 8 Final For Inquiry Task With Revised Assessment Rubrics
No ratings yet
Design Your Own Practical Assessment Task Sheet Year 8 Final For Inquiry Task With Revised Assessment Rubrics
6 pages
Polytechnic University of The Philippines: The Problem and Its Setting
No ratings yet
Polytechnic University of The Philippines: The Problem and Its Setting
18 pages
Selection Bias (Heckman-SPSS)
No ratings yet
Selection Bias (Heckman-SPSS)
9 pages
Problem Set 5 With Solutions
No ratings yet
Problem Set 5 With Solutions
10 pages
Correlation & Linear Regression Guide
No ratings yet
Correlation & Linear Regression Guide
57 pages
Multiple Correlation & Regression: Correlation Is A Measure of How Well A Given
100% (1)
Multiple Correlation & Regression: Correlation Is A Measure of How Well A Given
28 pages
Logistic Regression Quiz: Pandas Version: 1.0.5 Seaborn Version: 0.10.1 Matplotlib Version: 3.2.1 Sklearn Version: 0.23.1
50% (2)
Logistic Regression Quiz: Pandas Version: 1.0.5 Seaborn Version: 0.10.1 Matplotlib Version: 3.2.1 Sklearn Version: 0.23.1
1 page
1.3 M1 - L3 Activity 3
No ratings yet
1.3 M1 - L3 Activity 3
6 pages
Nigeria's Agricultural Impact
100% (1)
Nigeria's Agricultural Impact
18 pages
PR2 WK 3 and 4
No ratings yet
PR2 WK 3 and 4
4 pages
01 Variables
No ratings yet
01 Variables
16 pages
Mobile Money's Impact on Consumption Volatility
No ratings yet
Mobile Money's Impact on Consumption Volatility
71 pages
Conclusion (Answers)
No ratings yet
Conclusion (Answers)
2 pages
Organisational Culture and Job Satisfaction
No ratings yet
Organisational Culture and Job Satisfaction
20 pages
Metaanalysis of The Relationship Between Violent Video Game Play and Physical Aggression Over Time - Prescott, Sargent, - Hull
No ratings yet
Metaanalysis of The Relationship Between Violent Video Game Play and Physical Aggression Over Time - Prescott, Sargent, - Hull
7 pages
A Statistical Development of Fixed Odds Betting Rules in Soccer
No ratings yet
A Statistical Development of Fixed Odds Betting Rules in Soccer
20 pages
Bsta 2235 Practical Assignment 1
No ratings yet
Bsta 2235 Practical Assignment 1
3 pages
Quadratic Regression Model Insights
No ratings yet
Quadratic Regression Model Insights
36 pages
LG5 Manual
No ratings yet
LG5 Manual
33 pages
Practical Research 11 STEM
No ratings yet
Practical Research 11 STEM
3 pages
Correlation Regression 1
No ratings yet
Correlation Regression 1
9 pages
Assignment 4
No ratings yet
Assignment 4
4 pages
IoT Analytics: ML Tools & Challenges
No ratings yet
IoT Analytics: ML Tools & Challenges
6 pages
Customer Portfolio Management Overview
No ratings yet
Customer Portfolio Management Overview
15 pages

A1 Regression

Uploaded by

A1 Regression

Uploaded by

Part A: Regression and causality

A1: Key facts about regression

Goal: help you do rigorous empirical (micro)economic research

Focus on causal inference / program evaluation / treatment effects

What is shared by [the causal] literature is [...] an explicit emphasis on credibly

We will study target estimands, assumptions, tests, estimators, statistical inference

▶ Key facts about regression; potential outcomes and RCTs

B. Selection on observables (~4 lectures)

▶ Covariate adjustment via regression, via propensity scores, doubly-robust methods,

C. Panel data methods (~7 lectures)

▶ Diff-in-diffs and event studies; synthetic controls and factor models

Descriptive statistics, data visualization

Time series data

CT Cameron, A. Colin and Pravin Trivedi (2005). Microconometrics: Methods and

IW Imbens, Guido and Jeffrey Wooldridge (2009). New developments in econometrics:

JW Wooldridge, Jeffrey (2002). Econometric Analysis of Cross Section and Panel

▶ Random variable, function of the observed sample

β̂ and βOLS correspond to a linear specification Yi = β ′ Xi + error

An economic or statistical model is needed to interpret βOLS and other estimands

▶ A model involves parameters (with economic meaning) and assumptions

▶ Assumptions hopefully make some parameters identified, i.e. possible to uniquely

Qi = −βd Pi + εd , Qi = βs Pi + εs , Cov [εd , εs ] = 0

▶ Regressing Qi on Pi and a constant yields (prove this!)

Var [εs ] Var [εd ]

Example 2: heterogeneous effects

▶ Regressing Yi on Xi and a constant yields (prove this!)

2 What is regression and why do we use it?

3 Linear regression and its mechanics

h(·) : x 7→ h(x) ≡ E [Yi | Xi = x]

Conditional expectation E [Yi | Xi ] = h(Xi ) is a random variable because Xi is

Descriptive: how Y on average covaries with X — by definition

Prediction: if we know Xi , our best guess for Yi is h(Xi ) — prove next

Causal inference: what happens to Yi if we manipulate Xi — sometimes

Hint: solve it first assuming Xi takes only one value

2 What is regression and why do we use it?

3 Linear regression and its mechanics

3. OLS is also the best linear approximation to the CEF:

3. OLS is also the best linear approximation to the CEF:

▶ Proof by FOC: E [X (E [Y | X] − X′ b)] = 0 =⇒

1. Curse of dimensionality: E [Y | X] is hard to estimate when X is high-dimensional

3. OLS is also the best linear approximation to the CEF

4. With scalar X, βOLS is a convexly-weighted average of dE [Y | X = x] /dx (or its

Cov [Y, X] X h(xk ) − h(xk−1 )

Here ωk ≥ 0 because 1 [X ≥ xk ] is monotone. Specifically (prove it!):

Cov [1 [X ≥ xk ] , X] = (E [X | X ≥ xk ] − E [X | X < xk ]) P (X ≥ xk ) P (X < xk )

Similarly for continuous X:

Exercise: if X is Gaussian, βOLS = E [h′ (X)] (prove it!)

2. Regressing Y = Xk on X1 , . . . , XK produces coeﬀicients (0, . . . , 0, 1, 0, . . . 0)

4. Frisch-Waugh-Lovell (FWL) theorem

5. Omitted variable bias (OVB) formula

6. Asymptotic distribution and robust standard errors for OLS estimator

▶ A simple result, not an assumption (prove it!)

Regressing Y = Xk on X1 , . . . , XK produces coeﬀicients (0, . . . , 0, 1, 0, . . . 0)

for some weights ωki ≡ ωki (X) (prove it!)

P ωki are mean-zero (for Xk 6= intercept), orthogonal to non-Xk regressors,

and i ωki Xki = 1 (prove it!)

Implication: Regression coeﬀicients can be decomposed

Implication: Explicit characterization of the weights ωki :

and δOLS from a “short” specification

Claim: δ1 = β1 + β2 ρ, where ρ = Cov [X1 , X2 ] /Var [X1 ] is the regression slope of X2

Heteroskedasticity-robust (Eicker-Huber-White) standard error is

Never use homoskedastic standard errors!

You might also like