Econometrics I
Chapter 3
Linear Regression with One Regressor
Prof. Miguel Ángel Borrella Mas
School of Economics and Business Administration
Universidad de Navarra
Academic year 2022-23
Outline
1 Introduction
2 The (simple) Linear Regression Model
3 The Ordinary Least Squares estimator
4 Measures of fit
5 The Least Squares Assumptions
6 Sampling distribution of the OLS Estimators
Outline
1 Introduction
2 The (simple) Linear Regression Model
3 The Ordinary Least Squares estimator
4 Measures of fit
5 The Least Squares Assumptions
6 Sampling distribution of the OLS Estimators
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Learning objectives
• Ask a question [simple - with one independent variable]. We
want to study the causal effect of “A” on “B”
• Set up a simple linear model to answer this question
• Answer the question using data and a statistical package
(Stata)
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Example
Question from previous chapter: Does class-size affect student
performance?
• What are our priors? −→ Smaller class size are better for
learning outcomes (?)
• We are interested in
Change in TextScore △T extScore
β1 = =
Change in ClassSize △ClassSize
• In words: β1 measures the change in Test Score due to a unit
change in Class Size
• Mathematically:
• β1 = slope of a straight line relating test scores and class size
T est Score = β0 + β1 ∗ Class Size
• β0 = intercept of the straight line
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Example (2)
• But: The average test score in district i does not only depend
on the average class size
• It also depends on other factors such as:
• Quality of the teachers
• Student background
• Quality of text books
• ...
• The equation describing the linear relation between Test score
and Class size is better written as:
T est Scorei = β0 + β1 Class Sizei + ui
where ui lumps together all other district characteristics that
affect average test scores
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Statistical inference for linear regression
Statistical (or econometric) inference about the slope entails:
1 Estimation:
• How should we draw a line through the data to estimate the
population slope? −→ Ordinary Least Squares (OLS!)
• What are advantages and disadvantages of OLS?
2 Hypothesis testing:
• How to test whether the slope is zero?
3 Confidence intervals:
• How to construct a confidence interval for the slope?
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Outline
1 Introduction
2 The (simple) Linear Regression Model
3 The Ordinary Least Squares estimator
4 Measures of fit
5 The Least Squares Assumptions
6 Sampling distribution of the OLS Estimators
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Population regression line
The population regression line is the expected value of Y given X
E(Y | X)
• The slope (or marginal effect) is the difference in the expected
values of Y, for two values of X that differ by one unit
• The estimated regression can be used either for:
• causal inference (learning about the causal effect on Y of a
change in X)
• prediction (predicting the value of Y given X, for an
observation not in the data set)
• Causal inference and prediction place different requirements
on the data – but both use the same regression toolkit
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
General form
General form of the population regression line
Yi = β0 + β1 Xi + ui , i = 1, . . . , n
where:
• Subscript i = observational level [n-paired obs. (Xi , Yi )]
• Yi = Dependent variable
• Xi = Independent variable or regressor
• β0 = Population intercept (unknown!)
• β1 = Population slope (unknown!)
• ui = Regression error term −→ Omitted factors that influence
Y , other than the variable X. Also includes error in the
measurement of Y
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Interpretation
△Y △u
= β1 as long as =0
△X △X
• By how much does Y change if X is increased by 1 unit?
• It is only correct if all other things remain equal when X is
increased by 1 unit
• Conditional mean independence: E(u | X) = 0
• (Explanatory variable must not contain information about the
mean of the unobserved factors)
• Can we test this?
• Condition unlikely to hold
• Simple linear regression model is rarely applicable in practice
• But its discussion is useful for pedagogical reasons
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Interpretation
△Y △u
= β1 as long as =0
△X △X
• By how much does Y change if X is increased by 1 unit?
• It is only correct if all other things remain equal when X is
increased by 1 unit
• Conditional mean independence: E(u | X) = 0
• (Explanatory variable must not contain information about the
mean of the unobserved factors)
• Can we test this?
• Condition unlikely to hold
• Simple linear regression model is rarely applicable in practice
• But its discussion is useful for pedagogical reasons
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Example
A simple linear wage equation:
W agei = β0 + β1 Educi + ui
• β1 = Measures the change in hourly wage of an additional
year of education
• ui = Includes factors such as:
• Labor force experience
• Tenure with current employer
• Work ethic
• Ability
• What about the conditional mean independence?
−→ Again unlikely to hold:
• Individuals with more education will also be more intelligent
(more able) on average!
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Example
A simple linear wage equation:
W agei = β0 + β1 Educi + ui
• β1 = Measures the change in hourly wage of an additional
year of education
• ui = Includes factors such as:
• Labor force experience
• Tenure with current employer
• Work ethic
• Ability
• What about the conditional mean independence?
−→ Again unlikely to hold:
• Individuals with more education will also be more intelligent
(more able) on average!
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Outline
1 Introduction
2 The (simple) Linear Regression Model
3 The Ordinary Least Squares estimator
4 Measures of fit
5 The Least Squares Assumptions
6 Sampling distribution of the OLS Estimators
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Intuition
In general we do not know β0 and β1 −→ We have to estimate them
using a random sample of data
Question: How to find the line that fits the data best?
OLS estimators: To choose the regression coefficients s.t. the
estimated regression line is as close as possible to the observed data
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Intuition
In general we do not know β0 and β1 −→ We have to estimate them
using a random sample of data
Question: How to find the line that fits the data best?
OLS estimators: To choose the regression coefficients s.t. the
estimated regression line is as close as possible to the observed data
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Graphically
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Regression model
Method to estimate β0 and β1 :
LEAST SQUARE PRINCIPLE
Mathematical procedure that uses the data to position a line with
the objective of minimizing the sum of the squares of the vertical
distances between the actual Y values and the predicted values of Y
n
X n
X
min S(β0 , β1 ) = û2i = (Yi − Ŷi )2
β0 ,β1
i=1 i=1
where ûi are called the residuals:
• Difference between the observed y-value and the predicted
y-value for a given x-value on the line
• ûi = Yi − βˆ0 − βˆ1 Xi
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Regression equation
Regression equation
An equation that expresses the linear relationship between two
variables
Ŷi = βˆ0 + βˆ1 Xi
where:
• Ŷi = Estimated value of Y for a selected X value
• βˆ0 = Y-intercept. Estimated value of Y when X = 0
• βˆ1 = Slope. Average change in the dependent variable Y for
each change of one unit (increase or decrease) in the
independent variable X
• Xi = Value of the independent variable that is selected
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
OLS estimators of β1 and β0
Pn
sy sxy i=1 xi yi − nX̄ Ȳ
OLS estimator of β1 → βˆ1 = rxy = 2 = P n 2 2
sx sx i=1 xi − nX̄
where:
• rxy = Correlation coefficient
• sy = Standard deviation of Y
• sx = Standard deviation of X
• sxy = Covariance between X and Y
βˆ0 = Ȳ − βˆ1 X̄
where:
• Ȳ = Sample mean of Y
• X̄ = Sample mean of X
• βˆ1 = Estimated slope of the regression line
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
OLS estimators of β1 and β0
Pn
sy sxy i=1 xi yi − nX̄ Ȳ
OLS estimator of β1 → βˆ1 = rxy = 2 = P n 2 2
sx sx i=1 xi − nX̄
where:
• rxy = Correlation coefficient
• sy = Standard deviation of Y
• sx = Standard deviation of X
• sxy = Covariance between X and Y
βˆ0 = Ȳ − βˆ1 X̄
where:
• Ȳ = Sample mean of Y
• X̄ = Sample mean of X
• βˆ1 = Estimated slope of the regression line
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Why to use OLS estimators?
• OLS is, as in the case of the sample average, the estimator
searching for the line that better “fits” the scatterplot:
• Notice: if the “line” is just an intercept (Y does not depend
on X), then the OLS estimator is just the sample average of
Y1 , . . . , Yn −→ (Ȳ )
• Like Ȳ , the OLS estimator has some desirable properties:
• Under certain assumptions, it is unbiased −→ E(β̂1 ) = β1
• Its sampling distribution has lower variance than other
candidate estimators of β1
p
• Under certain assumptions, it is consistent −→ β̂1 −→ β1
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Example
Application to the California Test Score – Class Size data
• The sample mean of district average test scores Ȳ = 654.16
• It can also be obtained by OLS:
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Example
19.053
• Estimated slope = β̂1 = −2.28 = −0.2264 ∗ 1.8918
• Estimated intercept = β̂0 = 698.9 = 654.1565 + 2.28 ∗ 19.64
• Estimated regression line: T est ˆScore = 698.9 − 2.28 ∗ ST R
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Example
• Interpretation of the estimated slope and intercept:
• The slope: Districts with one more student per teacher have,
on average, test scores that are 2.28 points lower
△Text Score
• That is, we estimate = △ST R = −2.28
• The intercept: Taken literally, means that, according to this
estimated line, districts with zero students per teacher would
have a (predicted) test score of 698.9
• BUT: This interpretation of the intercept makes no sense
here!
1 It extrapolates the line outside the range of the data (the
intercept is not itself economically meaningful)
2 What does it mean for class size to be zero?
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Example
• Interpretation of the estimated slope and intercept:
• The slope: Districts with one more student per teacher have,
on average, test scores that are 2.28 points lower
△Text Score
• That is, we estimate = △ST R = −2.28
• The intercept: Taken literally, means that, according to this
estimated line, districts with zero students per teacher would
have a (predicted) test score of 698.9
• BUT: This interpretation of the intercept makes no sense
here!
1 It extrapolates the line outside the range of the data (the
intercept is not itself economically meaningful)
2 What does it mean for class size to be zero?
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Example (2)
In Stata:
One of the districts in the dataset is Antelope, CA, for which:
ST R = 19.33 and Test Score = 657.8
• Predicted value: ŶAntelope = 698.9 − 2.28 ∗ 19.33 = 654.8
• Residual: ûAntelope = 657.8 − 654.8 = 3.0
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Outline
1 Introduction
2 The (simple) Linear Regression Model
3 The Ordinary Least Squares estimator
4 Measures of fit
5 The Least Squares Assumptions
6 Sampling distribution of the OLS Estimators
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Introduction
How well does the estimated regression line “fit” or explain the data?
1 Does the regressor X account for much or for little variation
in Y ? −→ The R2 measures the fraction of the variance of Y
that is explained by X
• It is unitless
• Ranges between 0 (no fit) and 1 (perfect fit)
2 Are the observations in the scatter plot clustered closely
around the regression line?
−→ The standard error of the regression (SER) measures
how far Yi typically is from its predicted value
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
The R2
The R2 is the fraction of the sample variance of Yi “explained” by
the regression
Yi = Ŷi + ûi = OLS prediction + OLS residual
• Sample Var(Y) = Sample Var(Ŷ ) + Sample Var(û)
• Total sum of squares = “Explained” SS + SS “Residuals”
Pn ¯ 2
ESS i=1 Ŷi −Ŷ SSR
• R2 = T SS = Pn 2 =1− T SS
i=1 (Yi −Ȳ )
• If R2 = 0, Xi explains none of the variation in Yi
• If R2 = 1, Xi explains all of the variation in Yi (Yi = Ŷi )
• In practice, 0 < R2 < 1
• With one regressor: R2 = the square of the correlation
coefficient between X and Y
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
SE of the regression
The SER is an estimator of the standard deviation of the regression
error ui
v v
u n u n
u 1 X 2 u 1 X
¯ û2i
SER = sû = t ûi − û = t
n−2 n−2
i=1 i=1
¯= 1 Pn
• The second equality holds because û n i=1 ûi =0
• The divisor n − 2 is used because 2 degrees of freedom were
lost in estimating the two regression coefficients β0 and β1
• It measures the spread of the observations around the
regression line in the units of the dependent variable
• In other words: It measures the average “size” of the OLS
residual (the average “mistake” made by the OLS regression)
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
SE of the regression
The SER is an estimator of the standard deviation of the regression
error ui
v v
u n u n
u 1 X 2 u 1 X
¯ û2i
SER = sû = t ûi − û = t
n−2 n−2
i=1 i=1
¯= 1 Pn
• The second equality holds because û n i=1 ûi =0
• The divisor n − 2 is used because 2 degrees of freedom were
lost in estimating the two regression coefficients β0 and β1
• It measures the spread of the observations around the
regression line in the units of the dependent variable
• In other words: It measures the average “size” of the OLS
residual (the average “mistake” made by the OLS regression)
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Example
The slope is statistically significant & large in a policy sense, but:
• STR explains only a small fraction of the variation in test
scores −→ R2 = 5.12%
• Large spread −→ RSER = 18.6%
q P
1 n
• In Stata: SER = RMSE = n i=1 (û2i )
• Distinction negligible if n is large enough
• Does this make sense? Does this mean the STR is
unimportant in a policy sense?
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Example
The slope is statistically significant & large in a policy sense, but:
• STR explains only a small fraction of the variation in test
scores −→ R2 = 5.12%
• Large spread −→ RSER = 18.6%
q P
1 n
• In Stata: SER = RMSE = n i=1 (û2i )
• Distinction negligible if n is large enough
• Does this make sense? Does this mean the STR is
unimportant in a policy sense?
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Outline
1 Introduction
2 The (simple) Linear Regression Model
3 The Ordinary Least Squares estimator
4 Measures of fit
5 The Least Squares Assumptions
6 Sampling distribution of the OLS Estimators
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Introduction
So far: OLS is as a way to draw a straight line through the data on
Y and X. But:
1 Under what conditions does the slope of this line have a
causal interpretation? That is, when will the OLS estimator
be unbiased for the causal effect on Y of X?
2 What is the variance of the OLS estimator over repeated
samples?
To answer these questions −→ To make some assumptions about
how Y and X are related to each other, and about how they are
collected (the sampling scheme)
• These assumptions are known as the Least Squares
Assumptions for Causal Inference
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Introduction
So far: OLS is as a way to draw a straight line through the data on
Y and X. But:
1 Under what conditions does the slope of this line have a
causal interpretation? That is, when will the OLS estimator
be unbiased for the causal effect on Y of X?
2 What is the variance of the OLS estimator over repeated
samples?
To answer these questions −→ To make some assumptions about
how Y and X are related to each other, and about how they are
collected (the sampling scheme)
• These assumptions are known as the Least Squares
Assumptions for Causal Inference
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Reminder!
The causal effect on Y of a unit change in X is the expected
difference in Y as measured in a randomized controlled experiment
• With a binary treatment:
• The causal effect is the expected difference in means between
the treatment and control groups (remember chapter 2b!)
• It requires random assignment or as-if random assignment
• Random assignment ensures that the treatment (X) is
uncorrelated with all other determinants of Y, so that there are
no confounding variables
• The least squares assumptions for causal inference generalize
the binary treatment case to regression
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
General assumptions
General assumptions for the linear regression model:
1 Assumption SLR.1 (Linear in parameters) −→ In the
population, the relationship between Y and X is linear
Y = β0 + β1 X + U
2 Assumption SLR.2 (Sample variation in the regressor) −→
The values of the regressor are not all the same (otherwise
would be impossible to study how different values of X lead to
different values of Y)
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Specific assumptions
Specific assumptions for the linear regression model:
3 Assumption SLR.3 (Zero conditional mean) −→ The value
of the regressor must contain no information about the mean
of the unobserved factors
E (ui | Xi ) = 0
4 Assumption SLR.4 (Simple random sampling) −→
(Xi , Yi ), i = 1, . . . , n are i.i.d. and then each data point
follows the population equation
5 Assumption SLR.5 (Outliers unlikely) −→ X and Y have
finite fourth moments
6 Assumption SLR.6 (Homoskedasticity) −→ The value of the
regressor must contain no information about the variability of
the unobserved factors
V ar (ui | Xi ) = σ 2
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
SLR.3
For any given value of X, the mean of u is zero: E (ui | Xi = xi ) = 0
• This implies that β̂1 is unbiased for the causal effect of β1
E (Yi | Xi ) = E (β0 + β1 Xi + ui | Xi ) =
= β0 + β1 E (Xi | Xi ) + (ui | Xi ) = β0 + β1 Xi
• Our example: T est Scorei = β0 + β1 Class Sizei + ui −→
What can be those other factors included in ui ?
• Parental involvement
• Outside learning opportunities (extra math classes...)
• Home environment conducive to reading
• Family income is a useful proxy for many such factors
• This means E(family income | ST R) = constant and implies
that family income and STR are uncorrelated. . .
• Is E (ui | Xi = xi ) = 0 plausible for these other factors?
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
SLR.3 (2)
The benchmark for understanding this assumption is to consider an
ideal RCT:
• X is randomly assigned to people
• Students randomly assigned to different size classes
• Randomization is done by computer —using no information
about the individual
• Because X is assigned randomly, all other individual
characteristics (things included in u) are distributed
independently of X −→ u and X are independent
• Then, in an ideal RCT: E (ui | Xi = xi ) = 0 (SLR.3 holds)
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
SLR.3 (3)
With observational data, we need to think hard about whether
E (ui | Xi = xi ) = 0 holds. Example:
• Suppose that:
• Districts which wealthy inhabitants have small classes and
good teachers
• These districts have a lot of money which they can use to hire
more and better teachers
• Districts with poor inhabitants have large classes and bad
teachers
• These districts have little money and can hire only few and not
very good teachers
• In this case, class size is related to teacher quality
• Teacher quality likely affects test scores −→ Within ui
• This implies a violation of SLR.3
E(ui | Class sizei = small) ̸= E(ui | Class sizei = large) ̸= 0
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
SLR.4
(Xi , Yi ), i = 1, . . . , n are i.i.d. arises automatically if the entity
(individual, district) is sampled by simple random sampling:
• The entities are selected from the same population, so
(Xi , Yi ) are identically distributed for all i = 1, . . . , n
• The entities are selected at random, so the values of (X, Y )
for different entities are independently distributed
Examples of a violation of simple random sampling:
• Panel data and time series data (Data recorded over time)
• Observations on children from the same mother (not
independent)
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
SLR.5
Large outliers are rare −→ E(X 4 ) < ∞ and E(Y 4 ) < ∞
• Outliers are observations that have values far outside the
usual range of the data
• Another way to state assumption is that X and Y have finite
kurtosis
• Large outliers can make OLS regression results misleading
• Look at your data! If you have a large outlier, is it a typo?
• Does it belong in your data set? Why is it an outlier?
• Assumption is necessary to justify the large sample
approximation to the sampling distribution of the OLS
estimators
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
SLR.5 (2)
Large outlier can strongly influence the results:
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
SLR.6
Homoskedasticity graphically:
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Outline
1 Introduction
2 The (simple) Linear Regression Model
3 The Ordinary Least Squares estimator
4 Measures of fit
5 The Least Squares Assumptions
6 Sampling distribution of the OLS Estimators
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Introduction
The OLS estimator is computed from a sample of data
−→ A different sample yields a different value of β̂1 (the source of
the “sampling uncertainty” of β̂1 ¡). Then we want:
• Quantify the sampling uncertainty associated with β̂1
• Use β̂1 to test hypothesis such as H0 = β1 = 0
• Construct a confidence interval for β1
• Goal: To study the sampling distribution of β̂1
1 Probability framework for linear regression
2 Distribution of OLS estimator
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Introduction
The OLS estimator is computed from a sample of data
−→ A different sample yields a different value of β̂1 (the source of
the “sampling uncertainty” of β̂1 ¡). Then we want:
• Quantify the sampling uncertainty associated with β̂1
• Use β̂1 to test hypothesis such as H0 = β1 = 0
• Construct a confidence interval for β1
• Goal: To study the sampling distribution of β̂1
1 Probability framework for linear regression
2 Distribution of OLS estimator
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Probability Framework for Linear Regression
The probability framework is summarized by the OLS assumptions:
1 Population → The group of interest (ex: all possible school
districts)
2 Random variables → X, Y (ex: Test Score, STR) (SLR.2)
3 Joint distribution of X and Y → We assume:
• Population regression function is linear (SLR.1)
• E (ui | Xi = xi ) = 0 (SLR.3)
• X, Y have nonzero finite fourth moments (SLR.5)
4 Simple random sampling → Data collection by this method
implies (Xi , Yi ), i = 1, . . . , n are i.i.d. (SLR.4)
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Reminder
Recall the summary of the sampling distribution of Ȳ :
• For (Y1 , . . . , Yn ) i.i.d. with 0 < σY2 < ∞,
σY2
Ȳ is Best ≤ V ar(µ̂Y ) ∀ µ̂Y
V ar(Ȳ ) =
n
n
!
1X
Linear µ̂Y = Yi
n
i=1
Unbiased E(Ȳ ) = µY
Estimator of µY
• Moreover:
Ȳ − E(Ȳ )
By CLT: p ≃ N (0, 1)
V ar(Ȳ )
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Sampling distribution of β̂1
Like Ȳ , β̂1 (remember: it is a function of sample averages!) has a
sampling distribution:
1 What is E(β̂1 )?
−→ If E(β̂1 ) = β1 , then OLS is unbiased (good thing!)
2 What is V ar(β̂1 )? (measure of sampling uncertainty)
−→ We need to derive a formula in order to compute the SE
of β1
3 What is the distribution of β̂1 in small samples?
−→ It is very complicated in general
4 What is the distribution of β̂1 in large samples?
−→ By the CLT, β̂1 is (approx) normally distributed
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Preliminary algebra
Some (needed!) preliminary algebra:
Yi = β0 + β1 Xi + ui
Ȳ = β0 + β1 X̄ + ū
Hence: Yi − Ȳ = β1 (Xi − X̄) + (ui − ū)
Thus:
Pn
Xi − X̄ Yi − Ȳ
βˆ1 = i=1
Pn 2
Xi − X̄
Pn Pi=1
n
Xi − X̄ β1 (Xi − X̄) + (ui − ū)
βˆ1 = i=1
Pn
i=1
2
Xi − X̄
i=1
Pn Pn
X i − X̄ Xi − X̄ X i − X̄ (ui − ū)
βˆ1 = β1 i=1Pn 2 + i=1 Pn 2
i=1 Xi − X̄ i=1 Xi − X̄
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Preliminary algebra (2)
Pn
Xi − X̄ (ui − ū)
βˆ1 = β1 + i=1
Pn 2
i=1 Xi − X̄
It can be shown that:
n
X n
X
Xi − X̄ (ui − ū) = Xi − X̄ ui
i=1 i=1
Finally:
Pn
ˆ i=1 Xi − X̄ ui
β1 − β1 = Pn 2
i=1 Xi − X̄
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
What is E(β̂1 )?
Pn !
Xi − X̄ ui
E βˆ1 − β1 = E i=1
Pn 2
i=1 Xi − X̄
Pn !
X i − X̄ E (ui | Xi )
Using LIE: E βˆ1 − β1 = E i=1
Pn 2
i=1 Xi − X̄
E βˆ1 − β1 = 0, because LSR.3: E (ui | Xi = xi )
• Thus LSR.3 implies that E(β̂1 ) = β1 —just like Ȳ !
• That is, β̂1 is an unbiased estimator of β1
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
What is V ar(β̂1 )?
Rewrite:
Pn 1 Pn
Xi − X̄ ui i=1 vi
βˆ1 − β1 = i=1
Pn 2 =
n
n−1
2
i=1 Xi − X̄ n SX
where vi = Xi − X̄ ui
2 ≈ σ 2 and n−1
If n is large enough: SX X n ≈ 1. Then:
1 P n
vi
βˆ1 − β1 ≈ n i=1
2
σX
!
1 Pn
vi
V ar βˆ1 − β1 = V ar n i=1 2
σX
1 V ar X − X̄ u
i i
V ar βˆ1 = 2
n 2
σX
• V ar βˆ1 is inversely proportional to n —just like Ȳ !
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
What is the distribution of β̂1 ?
The exact sampling distribution is complicated – it depends on the
population distribution of (Y, X). But, When n is large, we get
some simple (and good) approximations
p
• Since V ar(β̂1 ) < ∞ and β̂1 −→ β1
• We can use the CLT to obtain the (approx) distribution
• Remember previous slide:
1 Pn 1 Pn
i=1 vi i=1 vi
βˆ1 − β1 = n
n−1
2 ≈ n
2
n SX σX
where vi = Xi − X̄ ui
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
What is the distribution of β̂1 ? (2)
• When n is large: vi = (Xi − X̄)ui ≈ (Xi − µX )ui
• vi is i.i.d. (why?)
• V ar(vi ) < ∞ (why?)
σ2
Pn
• By the CLT: n1 i=1 vi ≃ N 0, nv
Then, β̂1 is approximately distributed:
!
σv2
β̂1 ≃ N β1 , 2
2
n σX
where vi ≈ (Xi − µX ) ui
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Extra (1): Proof of consistency
p
Consistency means β̂1 −→ β1 or p − lim β̂1 = β1
Pn !
Xi − X̄ Y i − Ȳ
p − lim βˆ1 = p − lim i=1
Pn 2
i=1 Xi − X̄
1 Pn
p !
Xi − X̄ ui −→ 0
p − lim βˆ1 = β1 + p − lim n i=1
2 p
1 Pn
n i=1 Xi − X̄ −→ V ar(X)
• Then, p − lim β̂1 = β1 if E (ui | Xi = xi ) = 0
• Unbiasedness & consistency both rely on SLR.3
• But consistency implies that the sampling distribution
becomes more and more tightly distributed around β1 if the
sample size n becomes larger and large
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Extra (2): Variance of X vs Variance of β̂1
1 V ar [(X − µ ) u ]
i X i
V ar βˆ1 = 2
n σ 2
X
where 2
σX = V ar(Xi )
• The variance of X appears (squared) in the denominator −→
Increasing the spread of X decreases the variance of β̂1
• Intuition: More variation in X implies more info in the data
that can be used to fit the regression line. Graphically:
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Extra (3): (What about SLR.6?)
Under SLR.6:
σu2 σu2
V ar βˆ1 = 2 = Pn 2
n ∗ σX Xi − X̄
i=1
• Same notion: Larger sampling variability of βˆ1 if variability
of unobserved factors is higher; lower if variation in the
regressor is larger
SSR
• As usual, σu2 is unknown −→ Use σ̂ 2 = s2u = SER = n−2
• Homoskedastic SE of βˆ1 is then:
su
SE(βˆ1 ) = qP 2
n
i=1 Xi − X̄
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Extra (3): (What about SLR.6?)
Difference between homoskedastic and heteroskedastic (robust) SE
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Summary
Parallel conclusions hold for the OLS estimator β̂1 (and also for β̂0 ):
• Under SLR.1-SLR.5:
β̂1 is Best V ar(β̂1 ) ≤ V ar(β̃1 ) ∀ β̃1 −→ Efficient!
X
Linear ( Y1 , . . . ; Yn weighted by X1 , . . . , Xn )
Unbiased E(β̂1 ) = β1
Estimator of β1
• Moreover:
β̂1 − E(β̂1 )
By CLT: q ≃ N (0, 1)
V ar(β̂1 )
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Summary (2)
If SLR.1-SLR.5 hold, then in large samples β̂1 and β̂0 have a jointly
normal sampling distribution:
1 The large-sample normal distribution of β̂ is N
1 β1 , σβ̂2 ,
1
where the variance of this distribution is:
1 V ar [(X − µ ) u ]
i X i
V ar βˆ1 = 2
n σX 2
2 The large-sample normal distribution of β̂ is N β , σ 2 ,
0 0 β̂
0
where the variance of this distribution is:
1 V ar (H u )
ˆ i i µX
V ar β0 = where Hi = 1 − Xi
n E(H 2 ) 2 E(Xi2 )
i
Ready to turn to hypothesis tests & confidence intervals!
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Econometrics I
End chapter 3
Prof. Miguel Ángel Borrella Mas
School of Economics and Business Administration
Universidad de Navarra