Class Exercises Topic 2
Solutions
∗
Jordi Blanes i Vidal
Econometrics: Theory and Applications
Exercise 2C.1 Prove that if the simple correlation between y and x is negative, the slope of re-
gressing y on x will also be negative.
Solution From the OLS formula β̂1 = cov(y,x)
var(x)
, we have that β̂1 has the sign of
cov(y, x). This
is because var(x) > 0 always. Similarly, from the formula for the correlation, ρ
=pcorr(y, x) =
cov(y,x)
p
√ √ , we have that ρ also has the sign of cov(y, x), since var(x) > 0 and var(y) > 0
var(x) var(y)
always. So, if the simple correlation between y and x is negative, the slope of the OLS estimate
also has to be negative.
Exercise 2C.2
While the error terms and the OLS residuals are clearly not the same thing, it is always the case
that
n
X n
X
ˆi = i = 0
i=1 i=1
Discuss.
Solution
The sum of the OLS residuals is indeed 0. This can be easily seen from the rst order condition of
Pn
OLS: i=1
ˆi = 0
The expectation of the error terms is 0 on average (Assumption 4) and for every value of x
(Assumption 5). However, the specic realisations of the error (i ) can be very dierent from 0.
Indeed they will usually be dierent from 0, as otherwise it would be superuous to have an error
term on our population model. As a result, the sum (in our sample) of these specic realisations of
Pn
the error term variable will also be usually dierent from 0. i=1 i 6= 0
In other words, it is very unlikely that the sum of the error terms is exactly 0. Of course we
can never check whether this is the case, as we don't know the value that these specic realisations
take in our sample.
∗
Department of Mangement, London School of Economics, Houghton Street WC2A 2AE, London, UK. Email:
[email protected]
1
Exercise 2C.3
We believe that a strong negative relation holds in real life between wages and turnover. However,
we will increase wages in our rm only if we are quite convinced that this is indeed the case. To
understand this relation a bit better we have taken a random sample of companies, and have mea-
sured their wages and their worker turnover. In the rst graph below we plot our sample, together
with our hypothesis about the relation between both variables.
Which of the following two statements do you agree with, and why:
a. The existence of this sample categorically proves that our hypothesis on the population relation
is wrong
b. The existence of this sample suggests (but does not prove) that our hypothesis on the popu-
lation relation is wrong
We gave our sample to a trainee and asked him to run a OLS regression line, which he plotted in
the second graph below, together with the sample data points:
Which of the following two statements do you agree with, and why:
a. Without any doubt, the trainee has made a mistake when handling the data
b. It is quite likely (but not certain) that the trainee has made a mistake
Solution
In the rst graph, the second statement is the correct one. `The existence of this sample suggests
(but does not prove) that our hypothesis on the population relation is wrong'. The dierence
between the population relation between y and x and the sample values is given by the population
errors. It is quite unlikely that all the error terms except two will take a negative value, as the graph
shows would be the case if our hypothesis was correct. It is however not impossible. Remember
that is a random variable that can, with positive probability, repeatedly take negative values. As
a result, there is some probability that the assumed population model indeed created our sample,
2
so the rst statement is incorrect. However, the likelihood of that occuring is quite small, so the
second statement is correct.
In the second graph, the rst statement is the correct one. `Without any doubt, the trainee has made
a mistake when handling the data'. Note that the vertical distance between the OLS regression line
and the sample values is equal to the residuals of that OLS regression. One of the conditions that
the OLS estimators must meet by denition is the condition that:
n
X
−1
n (yi − β̂0 − β̂1 xi ) = 0
i=1
| {z }
ˆ
This is clearly not the case in the second graph. Since all the residuals are positive, their average
must be positive and not equal to zero. Hence the trainee must without any doubt have made a
mistake.
Exercise 2C.4
Derive the OLS estimators.
Solution
OLS estimators minimise the sum of the squared residuals. In other words, the OLS estimators β̂0
Pn 2
and β̂1 minimise i=1
ˆi .
The rst part is to minimise the sum of the squared residuals. We need to choose b0 and b1 to
minimise:
n
X n
X
ˆ2i = (yi − b0 − b1 xi )2
i=1 i=1
The rst order conditions (FOC) of this problem are:
(note that we have substituted b0 and b1 with β̂0 and β̂1 respectively, to signify that these are the
3
values of b0 and b1 that make the FOC be true)
n
X
−2 (yi − β̂0 − β̂1 xi ) = 0 dierentiating wrt b0
i=1
n
X
−2 xi (yi − β̂0 − β̂1 xi ) = 0 dierentiating wrt b1
i=1
We now need to derive the formulas for β̂0 and β̂1 :
Using one of the rst order conditions, we rst solve for βˆ0 :
n
X
−2 (yi − β̂0 − β̂1 xi ) = 0 1st FOC
i=1
n
X
(yi − β̂0 − β̂1 xi ) = 0 eliminate constant in front of summation
i=1
n
X n
X n
X
yi − β̂0 − β̂1 xi = 0 separate summation operator
i=1 i=1 i=1
nȳ − nβ̂0 − nβ̂1 x̄ = 0 property of summation operator
ȳ − β̂0 − β̂1 x̄ = 0 divide by n
β̂0 = ȳ − β̂1 x̄ rearrange for β̂0
4
We now use the 2nd rst order condition to solve for β̂1 :
n
X
−2 xi (yi − β̂0 − β̂1 xi ) = 0 2nd FOC
i=1
n
X
xi (yi − β̂0 − β̂1 xi ) = 0 eliminate constant in front of summation
i=1
n
X
xi (yi − (ȳ − β̂1 x̄) − β̂1 xi ) = 0 substitute in identity: β̂0 = ȳ − β̂1 x̄
i=1
n
X
xi (yi − ȳ + β̂1 x̄ − β̂1 xi ) = 0
i=1
n
X n
X
xi (yi − ȳ) = xi β̂1 (xi − x̄) rearrange equation
i=1 i=1
n
X n
X
xi (yi − ȳ) = β̂1 xi (xi − x̄) take out constant β̂1 out of 2nd summation operator
i=1 i=1
Pn
xi (yi − ȳ)
β̂1 = Pni=1 rearrange for β̂1 (we are not quite done just yet)
i=1 xi (xi − x̄)
Pn by now we should know that:
(x − x̄)(yi − ȳ)
Pn i
Pn Pn
β̂1 = i=1 2 i=1 xi (yi − ȳ) = i=1 (xi − x̄)(y
Pi n− ȳ)
i=1 (xi − x̄)
Pn
and similarly: i=1 xi (xi − x̄) = i=1 (xi − x̄)2
cov(x, y)
β̂1 =
var(x)
Exercise 2C.5:
Write down the formula for the simple correlation between y and ŷ . Show that this simple correlation
2
is equal to the square root of R .
Solution:
5
Show:
√
corr(y, ŷ) = R2 :
cov(y, ŷ)
ry,ŷ = p p denition of partial correlation
var(y) var(ŷ)
Pn
i=1 (y i − ȳ)(ŷ i − ȳ)
= pPn pPn using denition of covariance and MP4: ŷ¯ = ȳ
2 2
i=1 (yi − ȳ) i=1 (ŷi − ȳ)
We also have this
n
X n
X
(yi − ȳ)(ŷi − ȳ) = ((ŷi − ȳ) + ˆi )(ŷi − ȳ)
i=1 i=1
n
X n
X n
X
= (ŷi − ȳ)2 + ŷi ˆi − ȳ ˆi
i=1 i=1 i=1
n
X
= (ŷi − ȳ)2
i=1
Continuing on:
Pn 2 n n
i=1 (ŷi − ȳ)
X X
= pPn pPn substituting: (yi − ȳ)(ŷi − ȳ) = (ŷi − ȳ)2
2 − ȳ)2
i=1 (yi − ȳ) i=1 (ŷi i=1 i=1
pP 2
n
− ȳ)2
i=1 (ŷi
= pPn pPn rewriting for algebraic clarity
2 2
i=1 (yi − ȳ) i=1 (ŷi − ȳ)
pPn
(ŷi − ȳ)2
= pPi=1
n simplify fraction by canceling
2
i=1 (yi − ȳ)
sP
n
(ŷi − ȳ)2 Explained Sum of Squares
= Pi=1
n recall: denition of R2 =
i=1 (yi − ȳ)2 Total Sum of Squares
√
= R2
6
Exercise 2C.6:
What is an estimator? Why is unbiasedness a desirable property in an estimator? Is it the only
desirable property?
Solution:
• An estimator is a method of converting the data that we obtain from our sample into a
prediction about what a population parameter is.
• Unbiasedness is a good property as it means that we are getting the value of the parameter
right on average. Another way to think about this is the following: if we could take an
innite number of samples, the average of the estimates produced by our estimator would be
equal to the true population parameter.
• Eciency is also a desirable property in an estimator. Eciency means that the variance of
the estimator is as low as possible. If the estimator is unbiased, eciency means that we are
likely to obtain an estimate that is quite close to the true population parameter most of the
time.
Exercise 2C.7:
Explain why higher sample variation in the independent variable of a regression will lead to a lower
variance of the estimator.
Solution:
The formula for the sample variance of x is:
n
X
−1
var(x) = n (xi − x̄)2
i=1
The formula for the variance of the estimator is:
σ2 σ2
var(β̂1 ) = Pn 2
=
i=1 (xi − x̄) nvar(x)
We can see that higher variation in variable x leads to a lower variance of the estimator. The
intuition is that the more variable x varies, the easier it is to see whether changes in x leads to
changes in y.
Exercise 2C.8
We have the following population model:
y = β0 + β1 x +
We have to decide between two alternative estimators. The density functions of the two estimators
are displayed below. Discuss briey which estimator you prefer.
7
Solution
We would prefer Estimator 2, since it is unbiased. Unbiasedness is more important than a small
variance.
If an estimator is unbiased, it is the best guess about β̂0 and β̂1 that we will get β0 and β1 right.
That is, if we extract an innite number of samples (xi , yi ), i = 1...n then the average value of β̂0
and β̂1 will be equal to β0 and β1 .
Exercise 2C.9 Pn
The OLS estimator minimises the sum of the squared residuals, ˆ2i . Instead of that,
i=1 Pn we could
think of an estimator that minimises the sum of the absolute value of the residuals, i=1 |ˆ
i |. Is
OLS better? Why?
Solution
OLS is associated with `good properties'. In particular, it is unbiased and it has the smallest possible
variance among the set of unbiased estimators (this is the Gauss-Markov theorem from Topic 3).
That makes it superior to any other alternative estimator.
Exercise 2C.10
Imagine that you want to estimate the following simple model:
yi = β0 + β1 xi + i (1)
Imagine that you have just proved that β̂1 is an unbiased estimator of β1 . On the basis of that
nding, prove that β̂0 is an unbiased estimator of β0 (hint: treat x̄ as a xed value).
Solution
We are rst going to start with β̂0 . We will use the identity derived from the rst, First Order
8
Condition of OLS and then take the expectation.
β̂0 = ȳ − β̂1 x̄ FOC of OLS
E[β̂0 ] = E[ȳ − β̂1 x̄] take expectation of both sides
We then go back to the simple model to get an identity for ȳ :
yi = β0 + β1 xi + i simple model from question
n
X n
X
yi = (β0 + β1 xi + i ) sum to i on both sides
i=1 i=1
n
X n
X
−1 −1
n yi = n (β0 + β1 xi + i ) divide by n on both sides
i=1 i=1
ȳ = β0 + β1 x̄ + ¯ using property of summation operator
Note, ¯ is not necessarily equal to zero. While the expectation of an error term is zero E[] = 0 (by
Assumption 4), in actual realisation, the error term in our sample will usually be dierent from zero.
Pn
As a result the sum (and mean) of the errors in our sample will be dierent from zero: i=1 i 6= 0
residu-
Pn
Additionally note: Mechanical Property 1 is i=1
ˆi = 0. This is the sum of the OLS
als (there is a hat on top of ).
Continuing on:
E[β̂0 ] = E[ȳ − β̂1 x̄]
= E[β0 + β1 x̄ + ¯ − β̂1 x̄] substitute in ȳ = β0 + β1 x̄ + ¯
= E[β0 ] + E[β1 x̄] + E[¯] − E[β̂1 x̄] property of expectation
= β0 + x̄β1 − x̄E[β̂1 ] + E[¯] taking constants out of expectation operator
= β0 + x̄β1 − x̄β1 + E[¯] question says β̂1 is unbiased estimator of β1 : (E[β̂1 ] = β1 )
from AS4 E[¯] = 0 each error term has zero expectation,
= β0 + x̄β1 − x̄β1 + (0)
so their sum and mean also has zero expectation.
= β0
9
Exercise 2C.11 Pn
σ2 x2
Show that var(β̂0 ) = n Pn i=1 i
2, starting from the condition that var(βˆ0 ) = var(¯) + var(βˆ1 x̄)
i=1 (xi −x̄)
and taking x̄ as a xed value.
Solution
var(β̂0 ) = var(¯) + var(βˆ1 x̄) from question
Pn Pn
i=1 i i=1 i
= var( ) + var(β̂1 x̄) summation notation: ¯ =
n n
n
1 X take the constants out of variance operator
= var(i ) + x̄2 var(β̂1 ) 2
n2 i=1 recall: var[aX + b] = a var[X]
Pn
σ2 σ2 using identities: i=1 var(i ) = nσ 2
= + x̄2 Pn Pn σ
2
n i=1 (xi − x̄)
2 and var(β̂1 ) = 2
i=1 (xi −x̄)
Pn 2
σ2 i=1 (xi − x̄) 2 σ2 n
= × n
P 2
+ x̄ Pn 2
×
n i=1 (xi − x̄) i=1 (xi − x̄) n
σ 2 ni=1 (xi − x̄)2 nx̄2 σ 2
P
= Pn +
n ni=1 (xi − x̄)2
P
n i=1 (xi − x̄)2
Pn
(x − x̄)2 + nx̄2
=σ 2
Pni
i=1
algebraic manipulation
n i=1 (xi − x̄)2
Pn 2
2 i=1 (xiP− 2xi x̄ + x̄2 ) + nx̄2
=σ expand bracket in numerator
n ni=1 (xi − x̄)2
Pn Pn
x2i − 2xi x̄ + ni=1 (x̄2 ) + nx̄2
P
2 i=1 i=1
=σ separate summation operator
n ni=1 (xi − x̄)2
P
Pn
x2i − 2x̄ ni=1 xi + ni=1 x̄2 + nx̄2
P P
2 i=1
=σ take constants out of summation operators
n ni=1 (xi − x̄)2
P
Pn
2 i=1 x2i − 2nx̄(x̄) + nx̄2 + nx̄2
=σ property of summation operator
n ni=1 (xi − x̄)2
P
Pn
σ2 x2i
i=1
= Pn 2
cancel down
n i=1 (xi − x̄)
10
Exercise 2C.12
We wanted to estimate the returns to education separately for every state in the US. Our population
model is:
lwage = β0 + β1 education +
For every state we used a sample size equal to 1/100,000 of the state population. The table below
displays the estimated coecients for ten states:
Can we conclude from the table above that the returns to education are more extreme (i.e. very
large or very small) for smaller states, while they are of intermediate size for larger states? Discuss.
Solution
Our ndings are the result of having dierent sample sizes in the dierent estimations. We can see
from the formula:
σ2
V ar(β̂1 ) = Pn 2
i=1 (xi − x̄)
that the variance is decreasing in the number of observations. It is therefore not surprising that
when the number of observations is lower our estimates are more extreme, that is, they have higher
variance.
Exercise 2C.13:
Imagine that you want to estimate the following simple model:
yi = β0 + β1 xi + i
where the subscript i denotes an observation in our sample. After running an OLS regression and
obtaining the estimates β̂0 and β̂1 we can obtain the tted values (also called predictions), ŷi .
Imagine that we have a particular sample observation x∗i (i.e. x∗i is xed).
• Write down the prediction ŷi∗ in terms of the OLS estimates and x∗i
11
∗
• Show that the variance of the prediction ŷi is greater when x∗i is further away from its mean
∗
(hint: treat ȳ , x̄and xi as constants and remember that β̂0 = ȳ − β̂1 x̄)
Solution:
Write down the prediction ŷi∗ in terms of the OLS estimates and x∗i :
• The prediction or tted value for x∗i is obtained simply by substituting x∗i in the OLS regression
line:
ŷi∗ = β̂0 + β̂1 x∗i
∗ ∗
Show that the variance of the prediction ŷi is greater when xi is further away from its mean (hint:
∗
treat ȳ , x̄and xi as constants and remember that β̂0 = ȳ − β̂1 x̄):
• We have that ŷi∗ = β̂0 + β̂1 x∗i . By taking var(.) on both sides, we have that:
var(ŷi∗ ) = var(β̂0 + β̂1 x∗i )
ȳ , x̄ x∗i are constants,
and but β̂0 and β̂1 are random variables. By using the hint, we can
substitute β̂0 by ȳ − β̂1 x̄.
var(ŷi∗ ) = var(β̂0 + β̂1 x∗i )
= var(ȳ − β̂1 x̄ + β̂1 x∗i ) using identity: β̂0 = ȳ − β̂1 x̄
= var ȳ + β̂1 (x∗i − x̄)
take the constants out of variance
= (x∗i − x̄)2 var(β̂1 ) 2
operator var[aX + b] = a var[X]
This shows that when the distance x∗i − x̄ is greater, the variance of ŷi∗ is also greater.
12