SLR
Smoking consumption and lung cancer:
Hua Liang (GWU) 2118-M 219 /
SLR I
The simple linear regression model is:
yi = β0 + β1 xi + ei i = 1...n (3)
Where
the errors ei are mutually independent
E (ei ) = 0, var(ei ) = σ 2
Hua Liang (GWU) 2118-M 220 /
SLR II
Hua Liang (GWU) 2118-M 221 /
SLR III
1 Linearity
2 Independence
3 Normality
4 Equal variance
LINE=⇒ LIE
Q1: Which assumptions are necessary?
Q2: How to verify these assumptions?
Q3: What are the remedial measures?
–HL
The observed data are (yi , xi ), i = 1, 2, . . . , n The unknown parameters, to
be estimated, are β0 , β1 , σ 2 .
Hua Liang (GWU) 2118-M 222 /
SLR IV
The mean parameters β0 , β1 are the parameters of main interest.
Based on the data, we estimate β0 , β1 by β̂0 , β̂1 . Our main tool is the
Method of Least Squares.
Definition: The least squares estimators for β̂0 , β̂1 are the values that
minimize the residual sum of squares
n
X
[yi − (β̂0 + β̂1 xi )]2
i=1
1 Pn 1 Pn
Put x̄ = n i=1 xi , ȳ = n i=1 yi . We will show that
Pn
(x − x̄ )(yi − ȳ)
Pn i
β̂1 = i=1 2
i=1 (xi − x̄ )
β̂0 = ȳ − β̂1 x̄
Hua Liang (GWU) 2118-M 223 /
SLR V
The regression line is y = β̂0 + β̂1 x .
The fitted values are ŷi = β̂0 + β̂1 xi , i = 1, 2, . . . , n
The residuals are êi = yi − ŷi , i = 1, 2, . . . , n
The residual
Pn sum of squares
P (RSS) is
RSS = i=1 (yi − ŷi )2 = ni=1 êi2
> plot ( smoke $CIG , smoke $LUNG)
> f i t l u n g <- lm ( smoke $LUNG ~ smoke $ CIG )
> summary ( f i t l u n g , cor = F )
Hua Liang (GWU) 2118-M 224 /
SLR VI
Call :
lm ( formula = smoke$LUNG ∼ smoke$CIG )
Residuals :
Min 1 Q Median 3Q Max
-6.943 -1.656 0.382 1.614 7.561
Coefficients :
Estimate Std . Error t value Pr ( >| t |)
( Intercept ) 6.4717 2.1407 3.02 0.0043
smoke$CIG 0.5291 0.0839 6.31 1.4 e -07
Residual standard error : 3.07 on 42 degrees of freedom
Multiple R2: 0.486 , Adjusted R2: 0.474
F - statistic : 39.8 on 1 and 42 DF , p - value : 1.44 e -07
> abline ( f i t l u n g , l t y = 2 )
> points ( smoke $CIG , f i t l u n g $ fitted , col = 3 , pch = 1 4 )
> names ( f i t l u n g )
[1] " coefficients " " residuals " " effects " " rank "
[5] " fitted . values " " assign " " qr " " df . residual "
[9] " xlevels " " call " " terms " " model "
> round ( f i t l u n g $ fitted [ 1 : 8 ] , 3 )
Hua Liang (GWU) 2118-M 225 /
SLR VII
1 2 3 4 5 6 7 8
16.1 20.1 16.1 21.6 22.9 24.2 27.9 21.4
> round ( f i t l u n g $ resid [ 1 : 8 ] , 3 )
1 2 3 4 5 6 7 8
0.949 -0.332 -0.142 0.467 -0.096 0.301 -0.608 2.141
Hua Liang (GWU) 2118-M 226 /
SLR VIII
Hua Liang (GWU) 2118-M 227 /
SLR IX
Theorem 11 (LS Estimation for SLR)
The least squares estimators for the SLR model (3) are
Pn
(x − x̄ )(yi − ȳ) Sxy
β̂1 = i=1Pn i 2
=
i=1 (xi − x̄ ) Sxx
β̂0 = ȳ − β̂1 x̄
Proof:
We want to findP(nβ̂0 , β̂1 ), the values that minimize the function
RSS(b0 , b1 ) = i=1 [yi − (b0 + b1 xi )] . 2
Hua Liang (GWU) 2118-M 228 /
SLR X
(β̂0 , β̂1 ) are the solution of the system of equations
∂RSS (b0 , b1 )
=0
∂b0
∂RSS (b0 , b1 )
=0
∂b1
n
X
∂RSS (b0 , b1 )
= −2[yi − (b0 + b1 xi )]
∂b0
i=1
X X
= −2[ yi − (nb0 + b1 xi )]
= −2n[ȳ − (b0 + b1 x̄ )]
Hua Liang (GWU) 2118-M 229 /
SLR XI
n
X
∂RSS (b0 , b1 )
= −2xi [yi − (b0 + b1 xi )]
∂b1
i=1
X X X
= −2[ xi yi − (b0 xi + b1 xi2 )]
So the solution (β̂0 , β̂1 ) has to satisfy:
ȳ = β̂ + β̂1 x̄
P 0 P P
xi yi = β̂0 xi + β̂1 xi2
Hua Liang (GWU) 2118-M 230 /
SLR XII
We replace β̂0 = ȳ − β̂1 x̄ in the second equation, to get:
X X X
xi yi = β̂0 xi + β̂1 xi2
X X
= (ȳ − β̂1 x̄ ) xi + β̂1 xi2
X X
= ȳ xi + β̂1 ( xi2 − n x̄ 2 )
X X
= ȳ xi + β̂1 (xi − x̄ )2
It follows that
P
xi (yi − ȳ)
β̂1 = P
(xi − x̄ )2
P
Subtract from the numerator x̄ (yi − ȳ) = 0 to get
P
(xi − x̄ )(yi − ȳ) Sxy
β̂1 = P =
(xi − x̄ )2 Sxx
Hua Liang (GWU) 2118-M 231 /
SLR XIII
Example 12 (Compute β̂0 , β̂1 for the Fuel example in R)
[1] 25.3
[1] 25.3
[1] 216
Matrix notation
Put Pn !
Pn
i=1 (xi −x̄ )(yi −ȳ)
β̂1 2
β̂ = = i=1 (xi −x̄ )
β̂2 ȳ − β̂1 x̄
We can show that
β̂ = (X > X )−1 X > y
Hua Liang (GWU) 2118-M 232 /
1 x1 y1
1 x2 y2
X = .. .. , Y = ..
. . .
1 xn yn
1 1 · · · 1
X> =
x1 x2 · · · xn
Pn Pn
> n xi > y i
X X = Pn Pni=1 2 , X Y = Pni=1
x
i=1 i i=1 xi i=1 xi yi
−1
a b 1 d −b
=
c d ad − bc −c a
⇓
Hua Liang (GWU) 2118-M 233 /
Pn Pn
1 xi2 − i=1 xi
(X > X )−1 = P P P
i=1
n ni=1 xi2 − ( ni=1 xi )2 − ni=1 xi n
Pn 2
Pn
1 x − x
i=1 i
= P i=1 i
n
Sxx − i=1 xi n
!
1 x̄ 2
n + Sxx − Sx̄xx
=
− Sx̄xx − S1xx
Pn 2
Pn Pn Pn
> −1 > 1 x yi − xi yi xi
(X X ) (X Y ) = Pi
i=1 P
i=1 i=1 P i=1
Sxx − ni=1 xi ni=1 yi + n ni=1 xi yi
Hua Liang (GWU) 2118-M 234 /
n
X n
X n
X n
X
Sxx = (xi − x )2 = xi2 − nx 2 = (xi − x )xi 6= xi2
i=1 i=1 i=1 i=1
n
X n
X
Sxy = (xi − x )(yi − y) = xi2 − nx y
i=1 i=1
Xn n
X n
X
= (xi − x )yi = (yi − y)xi 6= xi yi
i=1 i=1 i=1
–HL
Pn Pn Pn
− i=1 xi i=1 yi +n i=1 xi yi
βb1 =
Sxx
Sxy
=
Sxx
Hua Liang (GWU) 2118-M 235 /
n
X n
X n
X n
X
xi2 yi − xi yi xi
i=1 i=1 i=1 i=1
Xn Xn n
X n
n
X n n n
1 1 X 2X X X
= xi2 yi − (
xi ) yi + ( 2
xi ) yi − xi yi xi
n n
i=1 i=1 i=1 i=1 i=1 i=1 i=1 i=1
n
( n n
) n
( n n n
)
X X 1 X X 1 X X X
= yi xi2 − ( xi )2 + xi xi yi − xi yi
n n
i=1 i=1 i=1 i=1 i=1 i=1 i=1
n
( n
) n
X X X
2 2
= yi xi − nx − xi Sxy
i=1 i=1 i=1
= βb0 Sxx
Hua Liang (GWU) 2118-M 236 /
SLR I
The Simple Linear Regression model is:
yi = β0 + β1 xi + ei , i = 1...n
Where the errors e1 , . . . , en are independent (actually, we may assume
that e1 , . . . , en are just uncorrelated), E (ei ) = 0, and var(ei ) = σ 2 .
We showed that the Least Squares estimators are
Pn
(x − x̄ )(yi − ȳ) Sxy
β̂1 = Pn i
i=1
2
=
i=1 (xi − x̄ ) Sxx
β̂0 = ȳ − β̂1 x̄ (4)
How good are β̂0 , β̂1 as estimators of the unknown β0 , β1 ?
Hua Liang (GWU) 2118-M 237 /
SLR II
Theorem 13 (Properties of the Least Squares Estimator)
In the simple linear regression model, the LS estimators (4) have the
following properties:
1 β̂0 , β̂1 are unbiased; i.e., E (β̂0 ) = β0 E (β̂1 ) = β1
2
σ2 σ2
var(β̂1 ) = Pn 2
=
i=1 (xi − x̄ ) Sxx
1 x̄ 2
var(β̂0 ) = + σ2
n Sxx
3 cov(β̂0 , β̂1 ) = −x̄ /Sxx σ 2
4 If the errors ei are normal, then β̂1 , β̂0 have a normal distribution, i.e.
σ2 2 1 x̄ 2
β̂1 ∼ N (β1 , ), β̂0 ∼ N (β0 , σ + )
Sxx n Sxx
Hua Liang (GWU) 2118-M 238 /
SLR-Proof I
1 Show that E (β̂1 ) is unbiased.
E (ȳ) = β0 + β1 x̄
P
(xi − x̄ )(yi − ȳ)
E (β̂1 ) = E
Sxx
1 X
= E ( (xi − x̄ )(yi − ȳ))
Sxx
1 X
= (xi − x̄ )(E (yi ) − E (ȳ))
Sxx
1 X
= (xi − x̄ )(β0 + β1 xi − (β0 + β1 x̄ ))
Sxx
1 X
= (xi − x̄ )(xi − x̄ )β1
Sxx
= β1 .
Hua Liang (GWU) 2118-M 239 /
SLR-Proof II
E (β̂0 ) = β0 .
σ2
2 Show now that var(β̂1 ) = Sxx
P
(xi − x̄ )(yi − ȳ)
β̂1 =
Sxx
P P
(xi − x̄ )yi − (xi − x̄ )ȳ
=
Sxx
P
(xi − x̄ )yi
=
Sxx
Hua Liang (GWU) 2118-M 240 /
SLR-Proof III
P
(xi − x̄ )yi
var(β̂1 ) = var
Sxx
1 X
= 2
var (xi − x̄ )yi
Sxx
1 X
= 2
(xi − x̄ )2 var(yi )
Sxx
1 X
= 2
(xi − x̄ )2 σ 2
Sxx
1
= 2
Sxx σ 2
Sxx
σ2
=
Sxx
1 x̄ 2
Show that var(β̂0 ) = n + Sxx σ 2
σ2
var(ȳ) = n
Hua Liang (GWU) 2118-M 241 /
SLR-Proof IV
2
cov(ȳ, yi ) = σn
cov(ȳ, β̂1 ) = 0
1 x̄ 2
var(β̂0 ) = n + Sxx σ2
3 Assume now that ei ∼ N (0, σ 2 ). Show that β̂0 , β̂1 have normal
distributions.
β̂1 , β̂0 are linear functions of y1 , . . . , yn .
Since y1 , . . . , yn are normally distributed, then β̂0 , β̂1 are also
normally distributed, as linear functions of the yi ’s.
Analysis of Lung Cancer data, continued
( Intercept ) smoke$CIG
( Intercept ) 4.582 -0.17536
smoke$CIG -0.175 0.00704
[1] 0.0851
[1] 53
Hua Liang (GWU) 2118-M 242 /
SLR-Proof V
[1] -2.12
In vector notation, the theorem states the following:
β̂0 β0
E (β̂) = E (β) i .e. E =
β̂1 β1
2
!
1
+ Sx̄xx − Sx̄xx
var(β̂) = σ 2 n
− Sx̄xx 1
Sxx
Hua Liang (GWU) 2118-M 243 /
SLR-Proof VI
Definition:
An estimator of a parameter (β0 or β1 ) is a function that can be
computed from the observed values y1 , . . . , yn , i.e. it does not depend
on any unknown quantities.
A linear estimator of β0 or β1 is any linear function Pn of the observed
values y1 , y2 , . . . , yn . This has the general form i=1 ai yi for some
numbers a1 , a2 , . . . , an .
An unbiased estimator of β0 (or β1 ) has the expectation equal to β0
(or β1 ).
Hua Liang (GWU) 2118-M 244 /
SLR-Proof VII
Theorem 14 (Theorem (Gauss–Markov))
For the Simple Linear Regression model, the LS estimators are the Best
Linear Unbiased Estimators (BLUE).
Example 15
Consider the SLR model yi = β0 + β1 xi + ei , i = 1, 2, 3, 4, with
x1 = 1, x2 = −1, x3 = 2, x4 = 2.
We have x̄ = 1, Sxx = 6. The LSE is
Sxy 1
β̂1 = = (−2y2 + y3 + y4 )
Sxx 6
1
β̂0 = ȳ − x̄ β̂1 = (3y1 + 7y2 + y3 + y4 )
12
y1 −y2 y1 +y2
Put now β̃1 = 2 , β̃0 = 2 .
Hua Liang (GWU) 2118-M 245 /
SLR-Proof VIII
Check that E (β̃1 ) = β1 , E (β̃0 ) = β0 .
So (β̃0 , β̃1 ) is another linear unbiased estimator of (β0 , β1 ).
σ2
However, β̂1 is a better estimator of β1 , since var(β̃1 ) = var( y1 −y
2
2
) = 2 ,
2 2
and var(β̂1 ) = Sσxx = σ6 < var(β̃1 ).
Moreover, by Gauss–Markov theorem, var(Aβ̃0 + B β̃1 ) ≥ var(Aβ̂0 + B β̂1 )
for any numbers A, B .
For example:
var(β̂0 ) ≤ var(β̃0 )
var(β̂1 ) ≤ var(β̃1 )
the fitted values ŷi = β̂0 + β̂1 xi have a smaller variance than
ỹi = β̃0 + β̃1 xi .
Hua Liang (GWU) 2118-M 246 /
SLR-Proof IX
Theorem 16 (Theorem (Gauss–Markov))
The LSE β̂ = (β̂0 , β̂1 )> is Best Linear Unbiased Estimator of
β = (β0 , β1 )> , i.e. if (β̃0 , β̃1 ) is another linear, unbiased estimator of β,
we have that
var(a β̂0 + b β̂1 ) ≤ var(a β̃0 + b β̃1 )
for any numbers a, b.
Hua Liang (GWU) 2118-M 247 /