Stock Watson 3U ExerciseSolutions Chapter17 Instructors
Stock Watson 3U ExerciseSolutions Chapter17 Instructors
by
17.1. (a) Suppose there are n observations. Let b1 be an arbitrary estimator of β1. Given
the estimator b1, the sum of squared errors for the given regression model is
∑ (Y − b X ) .
i =1
i 1 i
2
βˆ1RLS , the restricted least squares estimator of β1, minimizes the sum of squared
errors. That is, βˆ RLS satisfies the first order condition for the minimization which
1
requires the differential of the sum of squared errors with respect to b1 equals
zero:
∑ 2(Y − b X )(− X ) = 0.
i =1
i 1 i i
Solving for b1 from the first order condition leads to the restricted least squares
estimator
∑in=1 X iYi
βˆ1RLS = .
∑in=1 X i2
(b) We show first that βˆ1RLS is unbiased. We can represent the restricted least
squares estimator βˆ1RLS in terms of the regressors and errors:
Thus
⎛ ∑n X u ⎞ ⎡ ∑n X E (u | X ,K , X n ) ⎤
E ( βˆ1RLS ) = β1 + E ⎜ i =n1 i 2 i ⎟ = β1 + E ⎢ i =1 i ni 1 2 ⎥ = β1 ,
⎝ ∑i =1 X i ⎠ ⎣ ∑i =1 X i ⎦
where the second equality follows by using the law of iterated expectations, and
the third equality follows from
∑in=1 X i E (ui | X 1 ,K , X n )
=0
∑in=1 X i2
17.1 (continued)
because the observations are i.i.d. and E(ui |Xi) = 0. (Note, E(ui |X1,…, Xn) =
E(ui |Xi) because the observations are i.i.d.
ˆ ∑in=1 X i ui 1n ∑in=1 X i ui
β1 − β1 = n 2 = 1 n 2 .
RLS
∑i =1 X i n ∑ i =1 X i
n
1
v /σ v =
σv n
∑ v → N (0, 1)
i =1
i
d
or
1 n
∑
n i =1
X i ui →
d
N (0, σ v2 ).
For the denominator, X i2 is i.i.d. with finite second variance (because X has a
finite fourth moment), so that by the law of large numbers
1 n 2 p
∑
n i =1
X i → E ( X 2 ).
Combining the results on the numerator and the denominator and applying
Slutsky’s theorem lead to
17.1 (continued)
1
∑in=1 X i ui ⎛ var( X i ui ) ⎞
n ( βˆ1RLS − βu ) = n
→
d
N ⎜ 0, ⎟.
1
n ∑ n
i =1 X i
2
⎝ E( X 2 ) ⎠
∑in=1 X iYi Xi
βˆ1RLS = = ∑ i =1 aiYi ,
n
where ai = .
∑i =1 X i
n 2
∑ X i2
n
i =1
Thus
∑in=1 X i ui
βˆ1RLS = β1 + .
∑in=1 X i2
⎛ ∑ i=1
n
Xu ⎞
E( β̂RLS
|X 1 ,…, X n = E ⎜ β1 + n i 2 i |X 1 ,…, X n ⎟
1
⎝ ∑ i=1 X i ⎠
⎛ ∑ i=1
n
Xu ⎞
= β1 + E ⎜ n i 2 i |X 1 ,…, X n ⎟
⎝ ∑ i=1 X i ⎠
= β1.
⎛ ∑ n X iui ⎞ ∑ i=1
n
X i E(ui |X 1 ,…, X n )
E ⎜ i=1 |X ,…, X n⎟
= =0
⎝ ∑ i=1 X i ∑ i=1
n 2 1 n
⎠ X i2
17.1 (continued)
⎛ ∑ n X iui ⎞
var( β̂1RLS |X1,…, X n ) = var ⎜ β1 + i=1 |X 1 ,…, X n ⎟
⎝ ∑ i=1 X i
n 2
⎠
∑ i=1
n
X i2 var(ui |X 1 ,…, X n )
= n
(∑ i=1 X i2 )2
∑ i=1
n
X i2σ u2
= n 2 2
(∑ i=1 X i )
σ u2
= .
∑ i=1
n
X i2
σ u2
var( βˆ1|X 1 ,K , X n ) = .
∑in=1 ( X i − X ) 2
Since
n n n n n
∑ ( X i − X )2 = ∑ X i2 − 2 X ∑ X i + nX 2 = ∑ X i2 − nX 2 < ∑ X i2 ,
i =1 i =1 i =1 i =1 i =1
(f) Under assumption 5 of Key Concept 17.1, conditional on X1,…, Xn, βˆ1RLS is
normally distributed since it is a weighted average of normally distributed
variables ui:
∑in=1 X i ui
βˆ1RLS = β1 + .
∑in=1 X i2
17.1 (continued)
Using the conditional mean and conditional variance of βˆ1RLS derived in parts (c)
and (d) respectively, the sampling distribution of βˆ1RLS , conditional on X1,…, Xn,
is
⎛ σ u2 ⎞
βˆ1RLS ~ N ⎜ β1 , ⎟.
⎝ ∑ n
i =1
2
X ⎠
i
! ∑ i=1
n
Yi ∑ i=1
n
( β1 X i + ui ) ∑ i=1
n
u
β1 = n = = β1 + n i
∑ i=1 X i ∑ i=1 X i
n
∑ i=1 X i
⎛ ∑n u ⎞
var( β!1 |X 1 ,…, X n) = var ⎜ β1 + ni=1 i |X 1 ,…, X n⎟
⎝ ∑ i=1 X i ⎠
∑ i=1
n
var(ui |X 1 ,…, X n )
= n
(∑ i=1 X i )2
nσ u2
= n .
(∑ i=1 X i )2
nσ u2 σ u2
var( β!1|X 1 ,…, X n ) − var( β̂1RLS |X 1 ,…, X n ) = − n 2.
n
(∑ i=1 X i )2 ∑ i=1 Xi
In order to prove var( β!1|X 1 ,…, X n ) ≥ var( β̂1RLS |X 1 ,…, X n ), we need to show
n 1
≥ n
(∑ X i )
n
i =1
2
∑i =1 X i2
or equivalently
17.1 (continued)
2
n
⎛ n ⎞
n∑ X ≥ ⎜ ∑ X i ⎟ .
i
2
i =1 ⎝ i =1 ⎠
2
⎡ n ⎤ n n
⎢∑ i i ⎥ ∑ i ∑ bi
⋅ ≤ ⋅
2 2
( a b ) a
⎣ i =1 ⎦ i =1 i =1
which implies
2 2
⎛ n ⎞ ⎛ n ⎞ n n n
⎜⎝ ∑ i ⎟⎠ ⎜⎝ ∑
= ≤ ∑ ⋅ ∑ = ∑
2 2
X 1⋅ X i⎟
1 X i
n X i2 .
i=1 i=1 ⎠ i=1 i=1 i=1
n
That is nΣ i=1 X i2 ≥ (Σ nx=1 X i )2, or var( β!1|X 1 ,…, X n ) ≥ var( β̂1RLS |X 1 ,…, X n ).
1 n
s XY = ∑ ( X i − X )(Yi − Y )
n − 1 i =1
1 n
= ∑{[ X i − µ X ) − ( X − µ X )][Yi − µY ) − (Y − µY )]}
n = 1 i =1
1 ⎧ n n
= ⎨∑ i ( X − µ X )(Yi − µY ) − ∑ ( X − µ X )(Yi − µY )
n − 1 ⎩ i =1 i =1
n n
⎫
−∑ ( X i − µ X )(Y − µY ) + ∑ ( X − µ X )(Y − µY ) ⎬
i =1 i =1 ⎭
n ⎡1 n
⎤ n
= ⎢ ∑
n − 1 ⎣ n i =1
( X i − µ X )(Yi − µY ) ⎥ −
⎦ n −1
( X − µ X )(Y − µY )
where the final equality follows from the definition of X and Y which implies that
Σin=1 ( X i − µ X ) = n( X − µ X ) and Σin=1 (Yi − µY ) = n(Y − µY ), and by collecting terms.
We apply the law of large numbers on sXY to check its convergence in probability. It
is easy to see the second term converges in probability to zero because X →
p
µ X and
Y →
p
µY so ( X − µ X )(Y − µY ) →
p
0 by Slutsky’s theorem. Let’s look at the first
term. Since (Xi, Yi) are i.i.d., the random sequence (Xi − µX) (Yi − µY) are i.i.d. By the
definition of covariance, we have E[( X i − µ X )(Yi − µY )] = σ XY . To apply the law of
large numbers on the first term, we need to have
The second inequality follows by applying the Cauchy-Schwartz inequality, and the
third inequality follows because of the finite fourth moments for (Xi, Yi). Applying
the law of large numbers, we have
17.2 (continued)
1 n
∑
n i =1
( X i − µ X )(Yi − µY ) →
p
E[( X i − µ X )(Yi − µY )] = σ XY .
Also, n
n−1 → 1, so the first term for sXY converges in probability to σXY. Combining
p
results on the two terms for sXY , we have s XY → σ XY .
n ∑ i =1 ( X i − X )ui
1 n
ˆ
n ( β1 − β1 ) = n 1 n
n ∑ i =1 ( X i − X )
2
1
∑in=1[( X i − µ X ) − ( X − µ X )]ui
= n n
n ∑ i =1 ( X i − X )
1 n 2
1
∑in=1 ( X i − µ X )ui ( X − µX ) 1
∑in=1 ui
= n
− n
1
n ∑in=1 ( X i − X ) 2 1
n ∑in=1 ( X i − X ) 2
1
∑in=1 vi ( X − µX ) 1
∑in=1 ui
= n
− n
1
n ∑in=1 ( X i − X ) 2 1
n ∑in=1 ( X i − X ) 2
(b) The random variables u1,…, un are i.i.d. with mean µu = 0 and variance
0 < σ u2 < ∞. By the central limit theorem,
n (u − µu ) 1
∑in=1 ui
= n
→
d
N (0, 1).
σu σu
var(vi ) = var[( X i − µ X ) µi ]
≤ E[( X i − µ X ) 2 ui2 ]
≤ E[( X i − µ X ) 4 ]E[(ui ) 4 ] < ∞.
17.3 (continued)
v 1
∑in=1 vi
= n
σv σv
1
∑in=1 vi
n
→
d
N (0, 1).
σv
1
∑in=1 ( X i − X ) 2 p
n
→ 1.
var( X i )
1
n ∑in=1 vt
σv
→
d
N (0, 1),
1
n ∑ ( X t − X )2
n
i =1
σ X2
or equivalently
1
∑in=1 vi ⎛ var(vi ) ⎞
n
→
d
N ⎜ 0, 2 ⎟
.
n ∑ i =1 ( X i − X )
n 2
⎝ [var( X i )] ⎠
1
Thus
1
∑in=1 vi ( X − µX ) 1
∑in=1 ui
n ( βˆ1 − β1 ) = n
− n
1
n ∑in=1 ( X i − X ) 2 1
n ∑in=1 ( X i − X ) 2
⎛ var(vi ) ⎞
→
d
N ⎜ 0, 2 ⎟
⎝ [var( X i )] ⎠
since the second term for n (βˆ1 − β1 ) converges in probability to zero as shown
in part (b).
is consistent.
su2 p
(b) We have (i) → 1 and (ii) g ( x) = x is a continuous function; thus from the
σ 2
u
su2 su p
= → 1.
σ 2
u σu
17.5. Because E(W 4) = [E(W2)]2 + var(W2), [E(W2)]2 ≤ E (W 4) < ∞. Thus E(W2) < ∞.
17.7. (a) The joint probability distribution function of ui, uj, Xi, Xj is f (ui, uj, Xi, Xj). The
conditional probability distribution function of ui and Xi given uj and Xj is f (ui,
Xi |uj, Xj). Since ui, Xi, i = 1,…, n are i.i.d., f (ui, Xi |uj, Xj) = f (ui, Xi). By definition
of the conditional probability distribution function, we have
f (ui , u j , X i , X j ) = f (ui , X i | u j , X j ) f (u j , X j )
= f (ui , X i ) f (u j , X j ).
f (ui , u j , X i , X j ) f (ui , X i ) f (u j , X j )
f (ui , u j | X i , X j ) = = = f (ui | X i ) f (u j | X j ).
f (Xi, X j ) f (Xi ) f (X j )
The first and third equalities used the definition of the conditional probability
distribution function. The second equality used the conclusion the from part (a)
and the independence between Xi and Xj. Substituting
f (ui , u j | X i , X j ) = f (ui | X i ) f (u j | X j )
= ∫ ∫ ui u j f (ui | X i ) f (u j | X j )dui du j
= ∫ ui f (ui | X i )dui ∫ u j f (u j | X j )du j
= E (ui | X i ) E (u j | X j ).
(c) Let Q = (X1, X2,…, Xi – 1, Xi + 1,…, Xn), so that f (ui|X1,…, Xn) = f (ui |Xi, Q). Write
17.7 (continued)
f (ui , X i , Q)
f (ui | X i , Q) =
f ( X i , Q)
f (ui , X i ) f (Q)
=
f ( X i ) f (Q)
f (ui , X i )
=
f (Xi )
= f (ui | X i )
where the first equality uses the definition of the conditional density, the second
uses the fact that (ui, Xi) and Q are independent, and the final equality uses the
definition of the conditional density. The result then follows directly.
f (ui u j | X i , K X n ) = f (ui u j | X i , X j )
17.8. (a) Because the errors are heteroskedastic, the Gauss-Markov theorem does not
apply. The OLS estimator of β1 is not BLUE.
Y!i = β0 X! 0i + β1 X! 1i + u!i
where
Yi 1
Y!i = , X! 0i =
θ 0 + θ1|X i | θ 0 + θ1|X i |
Xi ui
X! 1i = , and u! = .
θ 0 + θ1|X i | θ 0 + θ1|X i |
(c) Using equations (17.2) and (17.19), we know the OLS estimator, βˆ1 , is
⎛ ∑ n ( X − X ) ui ⎞
var ( βˆ1| X 1 ,..., X n ) = var ⎜ β1 + i =n1 i | X 1 ,..., X n ⎟
⎝ ∑i =1 ( X i − X ) 2
⎠
∑ ( X − X ) var (ui | X 1 ,..., X n )
n 2
= i =1 i n
[∑i =1 ( X i − X ) 2 ]2
∑in=1 ( X i − X ) 2 var(ui | X i )
=
[∑in=1 ( X i − X ) 2 ]2
∑in=1 ( X i − X ) 2 (θ0 + θ1| X i |)
= .
[∑in=1 ( X i − X ) 2 ]2
Thus the exact sampling distribution of the OLS estimator, βˆ1 , conditional on
X1,…, Xn, is
17.8 (continued)
(d) The weighted least squares (WLS) estimators, βˆ0WLS and βˆ1WLS , are solutions to
n
min ∑ (Y!i − b0 X! 0i − b1 X! 1i )2 ,
b0, b1
i=1
the minimization of the sum of squared errors of the weighted regression. The
first order conditions of the minimization with respect to b0 and b1 are
∑ 2(Y! − b X! i 0 0i
− b1 X! 1i )(− X! 0i ) = 0,
i=1
n
∑ 2(Y! − b X! i 0 0i
− b1 X! 1i )(− X! 1i ) = 0.
i=1
−Q S + Q00 S1
βˆ1WLS = 01 0
Q00Q11 − Q012
where
Q00 = ∑ i=1
n
X! 0i X! 0i , Q01 = ∑ i=1
n
X! 0i X! 1i , Q11 = ∑ i=1
n
X! 1i X! 1i , S0 = ∑ i=1
n
X! 0iY!i , and S1 = ∑ i=1
n
X! 1iY.
!
! ! !
Substituting Yi = β0 X 0i + β1 X 0i + u!i yields
−Q Z + Q Z
βˆ1WLS = β1 + 01 0 002 1
Q00Q11 − Q01
∑ i=1
n
(Q00 X! 1i − Q01 X! 0i )u!i
β̂WLS
− β1 = .
1
Q00Q11 − Q012
From this we see that the distribution of βˆ1WLS | X1 ,... X n is N ( β1 , σ β2ˆWLS ), where
1
17.8 (continued)
σ u2! ∑ i=1
n
(Q00 X! 1i − Q01 X! 0i )2
σ β̂ WLS
2
=
1 (Q00Q11 − Q012 )2
Q002Q11 + Q012Q00 − 2Q00Q012
=
(Q00Q11 − Q012 )2
Q00
=
Q00Q11 − Q012
where the first equality uses the fact that the observations are independent, the
second uses σ u! = 1, the definition of Q00, Q11, and Q01, and the third is an
2
algebraic simplification.
1 n
∑ [( X i − X ) 2 uˆi2 − ( X i − µ X ) 2 ui2 ] → 0.
p
n i =1
1 n 2 1
n
∑ i
n i =1
[( X − X ) 2 2
ˆ
ui − ( X i − µ X ) 2 2
ui ] = ( X − µ X ) ∑ uˆi2
n i =1
1 n
− 2( X − µ X ) ∑ ( X i − µ X )uˆi2
n i =1
1 n
+ ∑ ( X i − µ X ) 2 (uˆi2 − ui2 ).
n i =1
and this term is finite if r and s are less than 2. Inspection of the terms shows that
this is true. In the second case, either r = 0 or s = 0. In this case the result follows
directly if the non-zero exponent (r or s) is less than 4. Inspection of the terms shows
that this is true.
ˆ E[(θˆ − θ ) 2 ]
Pr(|θ − θ | ≥ δ ) ≤
δ2
17.11. Note: in early printing of the third edition there was a typographical error in the
expression for µY|X. The correct expression is µY | X = µY + (σ XY / σ X2 )( x − µ X ) .
1
fY | X = x ( y ) =
σ (1 − ρ XY
2
Y
2
)
⎛ 1 ⎛ ⎛ x − µ ⎞2 ⎛ x − µ X ⎞⎛ y − µY ⎞ ⎛ y − µY ⎞ ⎞ 1 ⎛ x − µ X ⎞ ⎞
2 2
× exp ⎜ ⎜⎜ X
⎟ − 2 ρ XY ⎜ ⎟⎜ ⎟+⎜ ⎟ ⎟+ ⎜ ⎟ ⎟.
⎜ −2(1 − ρ XY ) ⎜ ⎝ σ X ⎠
2
⎝ σ X ⎠⎝ σ Y ⎠ ⎝ σ Y ⎠ ⎟⎠ 2 ⎝ σ X ⎠ ⎟⎠
⎝ ⎝
(b) The result follows by noting that fY|X=x(y) is a normal density (see equation
(17.36)) with µ = µT|X and σ2 = σ Y2|X .
17.12. (a)
∞
1 ⎛ u2 ⎞ ⎛ σ u2 ⎞ ∞ 1 ⎛ u2 σ u2 ⎞
∫−∞ σ 2π ⎜⎝ 2σ u2 ⎟⎠
E (e u ) = exp − + u du = exp ⎜ ⎟∫
2
⎝ ⎠ −∞ u σ 2 π
exp ⎜
⎝
−
2σ 2
u
+ u −
2
⎟ du
⎠
u
∞
⎛σ ⎞2
1 ⎛ 1 2⎞ ⎛σ ⎞2
= exp ⎜ u ⎟ ∫ exp ⎜ − 2 ( u − σ u2 ) ⎟ du = exp ⎜ u ⎟
⎝ 2 ⎠ −∞ σ u 2π ⎝ 2σ u ⎠ ⎝ 2 ⎠
where the final equality follows because the integrand is the density of a normal
random variable with mean and variance equal to σ u2 . Because the integrand is a
density, it integrates to 1.
17.13 (a) The answer is provided by equation (13.10) and the discussion following the
equation. The result was also shown in Exercise 13.10, and the approach used
in the exercise is discussed in part (b).
(b)
Write
the
regression
model
as
Yi
=
β0
+
β1Xi
+
vi,
where
β0
=
E(β0i),
β1
=
E(β1i),
and
vi
=
ui
+
(β0i
−
β0)
+
(β1i
−
β1)Xi.
Notice
that
E(vi
|
Xi)
=
E(ui|Xi)
+
E(β0i
−
β0|
Xi)
+
XiE(β1i
−
β1|Xi)
=
0
because
β0i
and
β1i
are
independent
of
Xi.
Because
E(vi
|
Xi)
=
0,
the
OLS
regression
of
Yi
on
Xi
will
provide
consistent
estimates
of
β0
=
E(β0i)
and
β1
=
E(β1i).
Recall
that
the
weighted
least
squares
estimator
is
the
OLS
estimator
of
Yi/σi
onto
1/σi
and
Xi/σi
,
where
σ i = θ0 + θ1 X i2 .
Write
this
regression
as
Yi / σ i = β0 (1/ σ i ) + β1 ( X i / σ i ) + vi / σ i .
This
regression
has
two
regressors,
1/σi
and
Xi/σi.
Because
these
regressors
depend
only
on
Xi,
E(vi|Xi)
=
0
implies
that
E(vi/σi
|
(1/σi),
Xi/σi)
=
0.
Thus,
weighted
least
squares
provides
a
consistent
estimator
of
β0
=
E(β0i)
and
β1
=
E(β1i).
17.14
(a) Yi = (Yi − µ) + µ, so that Yi 2 = (Yi − µ)2 + µ2 + 2(Yi − µ)µ. The result follows after
(b) This follows from the large of large numbers because Yi is i.i.d with mean E(Yi) =
µ and finite variance.
(c) This follows from the large of large numbers because Yi 2 is i.i.d with mean E( Yi 2 )
= µ2 + σ2 (from (a)) and finite variance (because Yi has a finite fourth moment, Yi 2
has a finite second moment).
(d)
1 n
∑ (
n i=1 i
Y − Y )
2
=
1 n 2
(
∑ Y + Y 2 − 2YYi
n i=1 i
)
1 n 2 1 n
= ∑ Y
n i=1 i
+ Y 2
− 2Y ∑Y 2
n i=1 i
1 n 2
= ∑Y − Y 2
n i=1 i
p
(e) This follows from (a)-(d) and Y 2 → µ 2
17.15
n d
(a) Write W = ∑Z i
2
where Zi ~ N(0,1). From the law of large number W/n → E( Z i2 )
i=1
= 1.
(b) The numerator is N(0,1) and the denominator converges in probability to 1. The
result follows from Slutsky’s theorem (equation (17.9)).