Econometrics I: TA Session 5
Giovanna Úbida
INSPER
1 / 20
Greene 3.3
Linear Transformations of the data
Consider the least squares regression of y on K variables (with a constant) X.
Consider an alternative set of regressors Z = XP, where P is a nonsingular
matrix. Thus, each column of Z is a mixture of some of the columns of X.
Prove that the residual vectors in the regressions of y on X and y on Z are
identical.
What relevance does this have to the question of changing the fit of a
regression by changing the units of measurement of the independent variables?
2 / 20
Greene 3.3
I from the regression of y on X: the residual vector is e = Mx y
I from the regression of y on Z:the residual vector is ẽ = Mz y
ẽ = Mz y = (I − Z (Z 0 Z )−1 Z 0 )y
= (I − XP(P 0 X 0 XP)−1 P 0 X 0 )y
P ns
= (I − XPP −1 (X 0 X )−1 P 0−1 P 0 X 0 )y
= (I − X (X 0 X )−1 X 0 )y
= Mx y
=e
Thus, if Z is a linear transformation of X s.t. Z = XP and P is
non-singular, the residual vector are identical. Hence, the linear
transformation don’t add any information.
3 / 20
Greene 3.3
Measure of the fit of a regression:
SSR e 0e ẽ 0 ẽ
R2 = 1 − = 1 − 0 0 = 0 0 = R̃ 2
SST yM y yM y
Where, following the Greene notation1 , M 0 is the n × n
idempotent matrix that transforms observations into deviations
from sample means. Note that since the dependent variable y does
not change (only the dependent one), y 0 M 0 y and ẽ = e the R 2 is
same for both cases. That is, the fit of the regression is not
affected by a linear transformation of the independent variable
1
See Section A.2.8.
4 / 20
PS02-Q3-Past exams
True or false: In the classical regression model with stochastic X, E[UT · U|X]
< E[ÛT · Û|X], since U ≡ Y − Xβ is based on the true beta parameter β, while
Û ≡ Y − Xβ̂ uses the estimated value β̂.
5 / 20
PS02-Q3-Past exams
n−k
E [Û 0 Û|X ] = E [Û 0 Û|X ]
n−k
0
Û Û
=E |X (n − k)
n−k
= E [s 2 |X ](n − k)
PS01−Q9
= (n − k)σ 2
6 / 20
PS02-Q3-Past exams
Moreover,
E [0[1xn] [nx1] |X ] = tr (E [0 |X ])
= E [tr (0 |X )]
= E [tr (0 |X )]
= tr (E [0 |X ])
= tr (σ 2 In )
= nσ 2
Since (n − k)σ 2 < nσ 2 , then E [0[1xn] [nx1] |X ] > E [Û 0 Û|X ]
7 / 20
Q4 - past exams
True or False
Determine if each of the statements below is correct. If so, show why; if not,
propose a modification to correct it:
I. (Short vs. Long Regression) Let X be the matrix of regressors and
assume that it is partitioned in the usual way, X = [X1 , X2 ], where X1 and
X2 are respectively N × k1 and N × k2 with the corresponding least
squares estimator partition in β̂1 and β̂2 .
a. Suppose that instead of regressing Y on X, you run Y against
X∗1 and X∗2 together, where X∗1 = (I − X2 (XT −1 T
2 X2 ) X2 )X1 and
X∗2 = (I − X1 (XT X
1 1 ) −1 T
X1 )X 2 . Although the least squares
∗
estimator resulting from this last regression, β̂ , is such that
β̂ ∗ 6= β̂, its conditional average will be the same, E[β̂ ∗ |X] =
E[β̂|X].
b. Suppose now that instead of regressing Y on X, you regress Y
on (X1 − X∗1 ) and X∗2 together. In this case, the OLS estimator
of β2 in this regression will coincide with the estimator of the
short regression of Y on X2 . However, this will not be true for
the estimator of β1 , which will be different from the estimator
of the short regression of Y on X1 .
8 / 20
Q4 - past exams
c. Suppose now that instead of regressing Y on X, you regress Y on X1 and
X∗2 together. In this case, the OLS estimator of β2 in this regression will
coincide with the estimator of the short regression of Y on X2 . However,
this will not be true for the estimator of β1 , which will be different from
the estimator of the short regression of Y on X1 .
d. Suppose now that instead of regressing Y on X, you regress Y on X1 and
X∗2 together. In this case, the OLS estimator of β1 in this regression will
coincide with the estimator of the short regression of Y on X1 . However,
this will not be true for the estimator of β2 , which will be different from
the estimator of the short regression of Y on X2 .
e. Suppose now that instead of regressing Y on X, you regress Y on X2 and
X∗1 together. In this case, the OLS estimator of β1 in this regression will
coincide with the estimator of the long regression of Y on X1 . However,
this will not be true for the estimator of β2 , which will be different from
the estimator of the long regression of Y on X2 .
9 / 20
Q4 - past exams
II. (Regression deviation and R 2 ) Suppose you run a least squares
regression of Y on X (which includes the ”summer vector” 1) and
compute the residual vector Û and the R 2 associated with that regression.
a. If you regress Y − y · 1 on Û, where y is the sample mean of y,
the estimated coefficient of this regression will be R 2 .
b. If you regress Y − y · 1 on Û, where y is the sample mean of y,
the estimated coefficient of this regression will be 1 − R 2 .
IIII. (Short vs. Long Regression) A researcher wanting to check the
possibilities of non-linearities in a classic simple regression model (i.e.
E [yi |xi ] = β1 + β2 · xi and xi scalar, Var [yi |xi ] = σ 2 ; X full column rank)
regresses the residual ûi of the simple regression on (xi )2 , the squared of
xi . If the regression function is in fact a quadratic function (i.e.
E [yi |xi ] = β1 + β2 · xi + β3 · xi2 ), this second regression (ûi on (xi )2 ) gives
us an unbiased estimate of β3 . However, the OLS estimators of the
simple regression (yi on 1 and xi ) of β1 and β2 are biased.
10 / 20
Q4 - past exams
Ia) Regressing y on X1∗ and X2∗ together
It is like a 2-step regression:
0 0
I X1∗ on X2∗ : (X2∗ X2∗ )−1 X2∗ X1∗ ,
where X1∗∗ = M2∗ X1∗ is the residual of the regression X1∗ on X2∗
0 0 0 0
I y on X1∗∗ : (X1∗∗ X1∗∗ )−1 X1∗∗ y = (X1∗ M2∗ X1∗ )−1 X1∗ M2∗ y
Thus,
0 0
β̂1∗ = (X1∗ M2∗ X1∗ )−1 X1∗ M2∗ y
11 / 20
Q4 - past exams
Let’s work a little bit more on it
∗ 0 0
βˆ1 = (X1∗ M2∗ X1∗ )−1 X1∗ M2∗ y
= (X10 M20 M2∗ M2 X1 )−1 X10 M20 M2∗ y
@
= (X10 M20 M2 X1 )−1 X10 M20 y
= (X10 M2 X1 )−1 X10 M2 y
= β̂1
∗
By symmetry βˆ2 = β̂2 . Thus β̂ ∗ = β̂ and E [β̂ ∗ |X ] = E [β̂|X ]
False
(@) M2 is orthogonal to X2
X2∗ = M1 X2 is the part of X2 that is orthogonal to X1 (cleaning X1 of X2 )
Thus, X2∗ is smaller than X2
and M2∗ is bigger than M2
Finally, M2∗ M2 = M2
12 / 20
Q4 - past exams
Ib) Regressing Y on (X1 − X∗1 ) and X∗2 together
If we call Z = (X1 − X1∗ ) we can do exactly as before, s.t.:
0 0
β̂2∗ = (X2∗ Mz X2∗ )−1 X2∗ Mz y
= (X20 M1 Mz M1 X2 )−1 X20 M10 Mz y
!
= (X20 M1 X2 )−1 X20 M1 y
= β̂2 ,the long regression estimator
False
(!)Z = (X1 − X1∗ ) = X1 − M2 X1 = (I − M2 )X1 = P2 X1
P2 X1 is the part of X1 that is related to X2
P2 X1 = Z is lower than X1
thus, Mz is bigger than M1
Finally, M1 Mz = M1
13 / 20
Q4 - past exams
IIa) and IIb) Regressing Y − y · 1 on Û
First recall that ȳ = (10 1)−1 10 y is a scalar, thus
Y − y · 1 = Y − 1y
= Y − 1(10 1)−1 10 y
= Y − P 0y
= (I − P 0 )y
M 0y
Since Û = Mx y , our goal will be regress M 0 y on Mx y
β̂ = (y 0 Mx Mx y )−1 y 0 Mx M 0 y
= (y 0 Mx y )−1 y 0 Mx y
= I 6= R 2
False
14 / 20
Q8 - past exams
RLS
Define β̂R , the restricted least squares estimator, as it follows:
β̂R = argmin ||Y − Xc||2
c
subject to Rc = θ0 ,
where R is a non-stochastic p × k matrix (with full rank equal to p). Show that
β̂R has a smaller conditional variance if we compare it with the OLS estimator,
β̂, under the Gauss-Markov hypothesis and under Rβ̂ = θ0 . Argue if this result
contradicts the Gauss-Markov theorem.
15 / 20
Q8 - past exams
L = (y − Xc)0 (y − Xc) − λ0 (Rc − θ0 )
= y 0 y − c 0 X 0 y − y 0 Xc + c 0 X 0 Xc − λ0 Rc + λ0 θ0
By FOC
∂L
= −2X 0 y + 2X 0 X β̂R − R 0 λ = 0
∂c 0
2X 0 X β̂R = 2X 0 y + R 0 λ
(X 0 X )−1 R 0 λ
β̂R = (X 0 X )−1 X 0 y +
2
(X 0 X )−1 R 0 λ
β̂R = β̂ +
2
16 / 20
Q8 - past exams
∂L
= R β̂R = θ0
∂λ0
R(X 0 X )−1 R 0 λ
R β̂ + = θ0
2
λ = [R(X 0 X )−1 R 0 ]−1 2(θ0 − R β̂)
Thus, using λ in the first FOC
(X 0 X )−1 R 0 [R(X 0 X )−1 R 0 ]−1 2(θ0 − R β̂)
β̂R = β̂ +
2
β̂R = β̂ − (X X ) R [R(X X )−1 R 0 ]−1 (R β̂ − θ0 )
0 −1 0 0
17 / 20
Q8 - past exams
Note that if R β̂ = θ0 , then β̂R would be unbiased: E [β̂R |X ] = β.
However, in general, it is not. So let’s consider the general case.
E [β̂R |X ] = E [β̂ − (X 0 X )−1 R 0 [R(X 0 X )−1 R 0 ]−1 (R β̂ − θ0 )|X ]
= β − (X 0 X )−1 R 0 [R(X 0 X )−1 R 0 ]−1 (R β̂ − θ0 )
Thus, in the general case it is biased
18 / 20
Q8 - past exams
For the variance we claim that Var (β̂R |X ) ≤ Var (β̂|X ).
Let (X 0 X )−1 R 0 [R(X 0 X )−1 R 0 ]−1 = Z
Var (β̂R |X ) = Var [β̂ − Z (R β̂ − θ0 )|X ]
= Var (β̂|X ) + Var (Z (R β̂ − θ0 )|X ) − 2Cov (β̂, Z (R β̂ − θ0 )|X )
= σ 2 (X 0 X )−1 + Var (ZR β̂ − Z θ0 |X ) − 2Cov (β̂, ZR β̂ − Z θ0 |X )
= σ 2 (X 0 X )−1 + ZRVar (β̂|X )R 0 Z 0 − 2ZRVar (β̂|X )
= σ 2 (X 0 X )−1 + ZRσ 2 (X 0 X )−1 R 0 Z 0 − 2ZRσ 2 (X 0 X )−1
= σ 2 (X 0 X )−1 + σ 2 ZR(X 0 X )−1 R 0 Z 0 −2σ 2 ZR(X 0 X )−1
| {z } | {z }
(I ) (II )
19 / 20
Q8 - past exams
(I)
0 −1 0 0 0 −1 0 0 −1 0 −1 0 −10 0 ( −1
(0((
−1 0 −1
ZR(X X ) R Z = (X X ) R [R(X X ) R ] R(X
X) R ( [R(X
( X ) R ] R(X X )
| {z } | {z }
Z Z0
0 −1 0 0 −1 0 −1 0 −1
= (X X ) R [R(X X ) R ] R(X X )
(II)
0 −1 0 −1 0 0 −1 0 −1 0 −1
ZR(X X ) = (X X ) R [R(X X ) R ] R(X X )
Thus,
Var (β̂R |X ) = σ 2 (X 0 X )−1 + σ 2 ZR(X 0 X )−1 R 0 Z 0 − 2σ 2 ZR(X 0 X )−1
= σ 2 (X 0 X )−1 + σ 2 (X 0 X )−1 R 0 [R(X 0 X )−1 R 0 ]−1 R(X 0 X )−1
− 2σ 2 (X 0 X )−1 R 0 [R(X 0 X )−1 R 0 ]−1 R(X 0 X )−1
σ (X 0 X )−1 − σ 2 (X 0 X )−1 R 0 [R(X 0 X )−1 R 0 ]−1 R(X 0 X )−1
2
| {z }
psd (Greene−CH06−5ed−pg 100)
Finally, Var (β̂R |X ) ≤ Var (β̂|X ) and Gauss-Markov still holds since β̂R is biased.
20 / 20