Simple Regression Model Explained
Simple Regression Model Explained
Model:
• DGP: Yi = β 0 + β1 X i + ui where the conditional distribution of the
random error ui given X i is iid ( 0, σ 2 ) or iid N ( 0, σ 2 ) .
• Y is the dependent or left hand [-side] variable.
• Y is linearly related to X, the regressor, independent variable,
explanatory variable or right hand [-side] variable.
• The sample or data is a collection of N independent observations on
X and Y, which is a reasonable assumption with cross section data.
• The data are non-experimental so X and Y are random variables,
variables whose values are subject to chance.
• Many textbooks treat the X’s as “fixed”, but this is not very realistic
in economics since experimental data are very rare.
• However, treating the X’s as fixed is a lot easier since there is no
need to distinguish between conditional and unconditional
expectations!
• Assumptions and results in the fixed X case can be interpreted as
statements conditional on the X’s, allowing us to dispense with “|X”
terms etc.
• When deriving the properties of the OLS estimators we will
consider the classical linear regression model – fixed X’s and normal
random error terms – and then sketch out the extension to the case
where the X’s are random and the random errors are not necessarily
normally distributed.
The Data Generation Process
• DGP: Yi = β 0 + β1 X i + ui ui | X i ~ iid (0,σ 2 ) i = 1… N
• In some cases, we assume that ui | X i ~ iid N (0,σ 2 ) . In large
samples, the OLS estimators will be normally distributed in any case
by the CLT (central limit theorem).
• The β 's are unknown parameters to be estimated and the ui 's are
unobserved random error terms with certain properties set out
below.
• The part of the r.h.s. involving X ( β 0 + β1 X i ) is the regression or
regression function and the β 's are often called the regression
coefficients.
• The linearity assumption is not very restrictive. Y and X could be
transformations of the variables in question.
• Interpretation - E (Yi | X i ) = β 0 + β1 X i and dE (Yi | X i ) dX i = β1 etc.
Yi = β 0 + β1 X i + ui
= Conditional Mean + Random Error
4 E(Y|X) = β0 + β1 X
-2 E(Yi |Xi )
DGP: Yi = β0 + β1 Xi + ui
Xi , ui ~iid N(0.1)
ui
β0 = 1,β1 = 2
-4
Yi X
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5
0.4
0.3
0.2
0.1
10
5
2
0 1
Y 0
-5 -1 X
-2
= Y i + uɵ i
Y
6
-2
Residual = Actual Yi − Fitted Yi
ûi = Yi − Y^i
X
-1
-2
-3
-4
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
• The least squares principle and the OLS estimators of β 0 and β1.
Choose β and β to minimise the residual sum of squares.
0 1
• OLS first order conditions:
∑ uɵ
i
i = 0 and ∑ X uɵ
i
i i =0
1 1
N
∑ uɵ i = 0 and
i N
∑ X uɵ
i
i i =0
E ( ui ) = 0 and E ( X i ui ) = 0
• The OLS estimators are:
∑x y i i
β 1 = i =Y −β
and β X
∑x 2 0 1
i
i
σ
2
⇒ β 1 | X ~ AN β ,
∑ x2
i
i
This is the distribution conditional on X.
• The Gauss-Markov theorem – the OLS estimators of β 0 and β1 are
BLUE i.e. best or minimum variance within the class of linear,
unbiased estimators.
• If the random errors (the ui 's ) are assumed to be normally
distributed, then the OLS estimators are best within the class of
unbiased estimators.
• This stronger result is just an application of the Cramer-Rao
theorem, which says that the inverse of the information matrix is a
lower bound for the variance of an unbiased estimator.
2 1 ɵu i2 . The ML
• An unbiased estimator of σ is σ =
2
N −2 i
∑
estimator, which does not adjust for degrees of freedom, is
consistent.
2
• Inference using β 1 and σ – confidence intervals and hypothesis
tests – is standard.
• R2 and all that. The decomposition TSS = ESS + RSS and the
ESS RSS
R =
2
= 1− measure of goodness of fit, which ranges
TSS TSS
between 0 and 1. This is just the fraction of the variation in Y
explained by X.
The OLS Estimators
uɵ i = Yi − Y i is the residual.
RSS = Σi (Yi − b0 − b1 Xi )2
DGP: Yi = 1 + 2 Xi + ui
10000 20000 30000 40000 50000
10
10
0 5
b 0
1
-10 -5 b0
-10
∑ i i 0 ∑ i 1∑ i = 0
X
i
Y −
β X −
β X 2
i i
The first equation implies that:
β 0 = Y − β 1 X
Substitute this into the second equation:
∑ ii
i
X Y − Y − β (
X
1 ∑ i 1∑ i = 0
X −
β )
X 2
i i
Now rewrite ∑X
i
i as NX to obtain:
∑ ii
X Y
i
− Y − 1 (
X NX − β
β 1∑ X i =0
2
) i
This equation may be rearranged as:
2
∑ X iYi − NXY − β 1 ∑ X i − NX = 0
2
i i
Thus:
∑ X Y − NXY ∑ x y
i i i i
β 1 = i
≡ i
∑ X − NX
i
i
2
∑x
2
i
2
i
∑i ∑ i
i
x 2
= ( X
i
− X ) 2
= ∑ i
X 2
− NX
i
2
∑ x y = ∑( X
i
i i
i
i − X )(Yi − Y ) = ∑ X iYi − NXY
i
∑x y i i
β 1 = i =Y −β
and β X
∑x 2 0 1
i
i
where yi = Yi − Y and xi = X i − X are deviations from the sample
means. The formula for the OLS estimator for β 0 says that the OLS
regression line goes through the sample means of X and Y.
The formulae for the β ’s are also sample analogs of the population
moments.
Analogy between Population and Sample Moments*
The formulae for the β ’s are also sample analogs of the population
moments. For simplicity assume that the X’s are iid so EX i = EX . Since
E ( ui | X i ) = 0 implies that Eui = 0 , it follows that EY = β 0 + β1 EX . This
means that β = EY − β EX with sample analog β =Y − β X.
0 1 0 1
⇒ E (Yi − EY )( X i − EX ) = β1 E ( X i − EX )
2
∵ Eui = EX i ui = 0
Cov ( X , Y )
This says that Cov ( X , Y ) = β1Var ( X , Y ) so β1 = . The
Var ( X )
1
∑ xi yi ∑ xi yi
sample analog of this is β = N −1 i = i
.
∑
1 2
1 xi
N −1 i
∑ x 2
i i
in the Classical Regression Model
The Distribution of β 1
1 ∑ii
= xy
We showed that β
i
∑ i . Now substitute β1xi + ui − u for
x
i
2
yi 1:
∑ xi yi (
∑ xi β1 xi + ui − u ) β1 ∑ x 2 + ∑ xi ui + u ∑ xi
i
β 1 = i
= i
= i i i
∑i
x 2
i
∑i
x
i
2
∑i
i
x 2
∑xu i 1
=β
⇒β + i
∑x
1 1 2
i
i
1
yi = Yi − Y = β1 ( X i − X ) + ui − u ⇒ yi = β1 xi + ui − u
using the result that u is a constant and ∑ x = ∑( X
i
i
i
i − X ) = 0.
β 1 = β1 + ∑ wiui
i
equals the true value β plus a
which says that the OLS estimator β 1 1
weighted sum of the random errors. It is easy to check that ∑ wi = 0 and
i
∑ i =1
w
i
2
∑ i . Using these results, it is easy to find the distribution of
x
i
2
β 1 since the “weights” are fixed in the classical regression model (fixed
X’s and normally distributed u’s).
is unbiased since the mean E β
First, β =β:
1 1 1
E β 1 = β1 + E ∑ wi ui = β1 + ∑ wi E ( ui ) = β1 + ∑ wi 0
i i i
= β1
is σ2
The variance of β :
1
∑x
i
2
1
( )
V β 1 = V β + ∑ wi ui = V ∑ wi ui
i i
σ2
= ∑ wi2V ( ui ) = ∑ wi2σ 2 = σ 2 ∑ wi2 =
i i i ∑i
x
i
2
∑ i → ∞ as N → ∞
x 2
∑i
x 2
2
∑x
i
2
i =N
N
i
= N σ X → ∞ as N → ∞
σ 2
β 1 ~ N β1 ,
∑i
x 2
i
σ =
2 ∑ i
ɵ
ui
=
RSS
N − 2 d .o. f .
where d.o.f. stands for the degrees of freedom of the regression.
Statistical Inference
This theorem says that the OLS estimator is BLUE = best (i.e. minimum
variance) within the class of linear, unbiased estimators. We shall prove
this for the OLS slope estimator in the OLS model in deviations form:
yi = β1 xi + ui − u i = 1… N
2
In fact, nothing essential is lost by considering the simpler model yi = β1 xi + ui .
may be written
Without loss of generality, any other linear estimator β 1
as the OLS estimator plus a sum of terms which are linear in the yi 's :
β 1 = β 1 + ∑ di yi
i
is unbiased, E β
where the di 's are arbitrary. If β = β so let’s check
1 1 1
this:
= Eβ + d y = Eβ
Eβ 1 1
∑i
i i
1 ∑
+ d Ey
i
i i
= β1 + ∑ di ( β1 xi ) ∵ Eyi =β1 xi
i
= β1 + β1 ∑ di xi = β1 iff ∑d xi i =0
i i
Thus, the di 's are not completely arbitrary. For unbiasedness, ∑d x
i
i i
must equal zero. Thus, the estimator equals the true parameter plus a
mean zero term involving the random errors:
β 1 = β 1 + ∑ di yi = β1 + ∑ wi ui + ∑ di ( β1 xi + ui − u )
i i i
= β1 + ∑ wi ui + ∑ di ( ui − u ) ∵ ∑ d i xi = 0
i i i
= ∑V ( wiui ) + ∑V ( ( di − d ) ui )
i i
(∑ di xi = 0 ⇒∑ di wi = 0 and ∑x i = 0)
i i i
_____________________________________
3
The general result is ∑ ( a − a )( b − b ) = ∑ ( a − a ) b = ∑ a ( b − b ) . This holds for both random and non-
i
i i
i
i i
i
i i
random variables.
= V ( w u ) + V ( ( d − d ) u ) and Vu = σ 2 , we deduce
Since V β 1 ∑ ii ∑ i
i i
i i
that:
V β 1 = σ ∑ wi + σ ∑ ( di − d )
2
2 2 2
i i
= V β 1 + σ ∑ ( di − d )
2
2
≥V β 1
where Y and uɵ are the sample means of fitted values and the residuals
respectively. It turns out that these are equal to Y and 0.
i i
( )
∑Y i uɵ i = ∑ β 0 + β 1 X i uɵ i = β 0 ∑ uɵ i + β 1 ∑ X i uɵ i = 0
i i
2 2
so ∑Yi = ∑Y i + ∑ u i .
2 ɵ
i i i
• Since the residuals sum to zero, the mean residual is zero, uɵ = 0 , and
the actual and fitted values of Y have the same mean, Y = Y .
• Hence:
2 2 2
2
∑ Yi − NY = ∑ Y i − NY + ∑ u i
2 ɵ
i i i
i.e. TSS = ESS + RSS
1 ∑ii
= xy
• We showed that β
i
∑ i which implied that:
i
x 2
∑xu i i
β 1 = β1 + i
= β1 + ∑ wi ui
∑x i
2
i i
1
= E E β
is unbiased since E β
•β 1 1 (
| X = E [β ] = β :
1 1 )
( )
E β 1 | X = β1 + E ∑ wi ui | X
i
= β1 + ∑ wi E ( ui | X ) = β1 + ∑ wi 0 = β1
i i
The conditional variance is:
| X = V w u | X = w2V ( u | X ) = w2σ 2
V β (1 ) ∑ i i
i
∑ i
i
i ∑
i
i
σ2
(
1 )∑ i
| X = σ 2 w2 =
⇒V β
i ∑1
x 2
i
which obviously depends on the X’s.
1 σ2 2
N i
∑ xiui ~ N 0, 2 ∑ xi
approx
N i
These two results may be combined (using Slutsky’s theorem) to show
that:
−1 −2
1 1 1 σ 2
2
N ∑ xi N ∑ xiui approx
~ ∑ xi N 0, 2 ∑ xi
2 2
i i N i N i
−1
1 2 1 2
⇒ ∑ xi ∑ xiui ~ AN 0,σ ∑ xi
2
N i N i i