0% found this document useful (0 votes)
23 views30 pages

EC501 Lecture 03

The document outlines the principles of linear regression inference, focusing on the ordinary least squares (OLS) estimator, goodness-of-fit measures like R-squared, and hypothesis testing methods including t-tests and F-tests. It explains the importance of model specification and the implications of including or excluding regressors, as well as the use of adjusted R-squared to account for model complexity. Additionally, it discusses significance testing through p-values and the comparison of models using linear restrictions.

Uploaded by

T T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views30 pages

EC501 Lecture 03

The document outlines the principles of linear regression inference, focusing on the ordinary least squares (OLS) estimator, goodness-of-fit measures like R-squared, and hypothesis testing methods including t-tests and F-tests. It explains the importance of model specification and the implications of including or excluding regressors, as well as the use of adjusted R-squared to account for model complexity. Additionally, it discusses significance testing through p-values and the comparison of models using linear restrictions.

Uploaded by

T T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

EC501 Econometric Methods

3. Linear Regression: Inference

Marcus Chambers

Department of Economics
University of Essex

26 October 2023

1 / 30
Outline

Review

Goodness-of-fit

Tests for single parameters (t-tests)

Tests of linear restrictions (F-tests)

Reference: Verbeek, chapter 2.

2 / 30
Review

Model: y = Xβ + ϵ.
Ordinary least squares (OLS) estimator: b = (X ′ X)−1 X ′ y.
E{b} = β and hence b is an unbiased estimator of β.
V{b|X} = σ 2 (X ′ X)−1 ; V̂{b} = s2 (X ′ X)−1 .
b is minimum variance (Gauss-Markov Theorem) i.e. OLS is
BLUE (Best Linear Unbiased Estimator).
How ‘good’ is the estimated model? → goodness of fit.
How can we use the estimate b to make inferences about the
unknown parameters of interest in β?

3 / 30
Goodness-of-fit
How well does the estimated model fit the data?
We attempt to measure this in terms of the proportion of the
variation in y explained by the model.
We use the R2 (R-squared) statistic,
V̂{ŷ}
R2 = ,
V̂{y}
where V̂{·} denotes the sample variance.
But V̂{y} = V̂{ŷ} + V̂{e} and hence V̂{ŷ} = V̂{y} − V̂{e} so
N
1 X 2
ei
V̂{e} N−1
i=1
R2 = 1 − =1− N
,
V̂{y} 1 X
(yi − ȳ)2
N−1
i=1
PN
where ȳ = i=1 yi /N is the sample mean.
4 / 30
Properties of R2
Note that 0 ≤ R2 ≤ 1.
If R2 = 0 this implies that
N
X N
X
e2i = (yi − ȳ)2
i=1 i=1

so the model explains none of the variation in y!


On the other hand, if R2 = 1 then
N
X
e2i = 0 implying ei = 0 for all i = 1, . . . , N,
i=1

and so there is a perfect fit i.e. yi = xi′ b for all i = 1, . . . , N.


Caution: do not use R2 in models that don’t contain an intercept
– here R2 can be negative.
5 / 30
Adjusted R2
Because Ni=1 e2i will never rise (and will typically fall) by adding
P
more regressors, it is possible to make R2 artificially large by
using ‘irrelevant’ regressors.
As an alternative we can use the adjusted R2 , or R̄2 :
N
1 X 2
ei
N−K (N − 1)
i=1
R̄2 = 1 − = 1 − (1 − R2 ) .
1
N
X (N − K)
(yi − ȳ)2
N−1
i=1

R̄2 incurs a penalty if adding more


P regressors (increasing K)
does not significantly reduce Ni=1 e2i .
Too much emphasis is often placed on R2 – other aspects of the
fitted model are equally (if not more) important!

6 / 30
Example
Returning to the R output for a regression of individuals’ wages
on years of education from last week:
> fit1 <- lm(lwage~educ, data=wage1)
> summary(fit1)

Call:
lm(formula = lwage ~ educ, data = wage1)

Residuals:
Min 1Q Median 3Q Max
-2.21158 -0.36393 -0.07263 0.29712 1.52339

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.583773 0.097336 5.998 3.74e-09 ***
educ 0.082744 0.007567 10.935 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4801 on 524 degrees of freedom


Multiple R-squared: 0.1858,Adjusted R-squared: 0.1843
F-statistic: 119.6 on 1 and 524 DF, p-value: < 2.2e-16

Here, R2 = 0.1858 and R̄2 = 0.1843 i.e. around 18.5% of the


variation in (log) wages can be attributed to education.

7 / 30
Normality of b
Recall that, under the Gauss-Markov conditions (A1)–(A4) and
normality (A5), we have (conditional on X),

b ∼ N β, σ 2 (X ′ X)−1 .


For each element of b, bk , we have

bk ∼ N βk , σ 2 ckk , k = 1, . . . , K,

(1)

where ckk is the (k, k) diagonal element of (X ′ X)−1 .


If σ 2 were known then
bk − βk
zk = √ ∼ N(0, 1),
σ ckk

which could be used to test hypotheses because the standard


normal, or N(0, 1), distribution is well tabulated.

8 / 30
Hypothesis test
So to test
H0 : βk = βk0 (null hypothesis)

against H1 : βk ̸= βk0 (alternative hypothesis)

we would use the standardised statistic


bk − βk0
z0k = √ ∼ N(0, 1) under H0 .
σ ckk

Let z̄ denote the critical value (more on this shortly) for the
N(0, 1).

Decision rule: if |z0k | ≥ z̄ reject H0 in favour of H1 ;

if |z0k | < z̄ do not reject H0 .

9 / 30
Using s2 for σ 2
But σ 2 is not known!
So we use s2 in its place but this changes the distribution.
It can be shown that s2 is independent of b and that

s2
ξ = (N − K) ∼ χ2N−K .
σ2
Using (S21) we find that

z0k bk − β 0
tk = p = √ k ∼ tN−K under H0 .
ξ/(N − K) s ckk

The tN−K distribution has fatter tails than the N(0, 1).
This is due to the imprecision of using s2 rather than σ 2 .
As N − K → ∞, we have tN−K → N(0, 1) in distribution.

10 / 30
Significance level
We need to choose a significance level for the test – let’s
conduct the test at the 5% level of significance.
We can find a number, tN−K;0.025 (= tc), such that 5% of the
tN−K distribution lies outside the interval [−tc, tc].

t−distribution
.4
.3
Probability
.2
.1

Lower 2.5% Upper 2.5%


0

−5 −tc 0 +tc 5
t

11 / 30
Decision rule

If H0 is really true, there is only a 5% chance of obtaining a tk


value outside the range [−tc, tc].
If this occurs, we regard it as evidence against H0 .
The decision rule is:
• if |tk | ≥ tc, reject H0 in favour of H1 ;
• if |tk | < tc, do not reject H0 .
The interval [−tc, tc] is, therefore, the non-rejection region.
The area outside this interval is the rejection region.

12 / 30
Tests of significance

A common hypothesis to test is

H0 : βk = 0 against H1 : βk ̸= 0.

Under H0 the variable xk does not affect y in the regression


model, so this is a test of the significance of xk in the regression
determining y.
The test statistic is
bk estimate
tk = √ = ,
s ckk standard error

which is routinely computed in regression software (e.g. R,


Gretl, Stata etc.).

13 / 30
p-values

The critical values for the t-test, −tc and tc, can be obtained
from statistical tables.
But most econometric software enables us to sidestep this
process by reporting probability values, or p-values.
The p-value for a two-sided t-test is given by

p = P{tN−K < −|tk |} + P{tN−K > |tk |}.

This is the proportion of the tN−K distribution in the two tails


below −|tk | and above |tk |.
A p-value less than 0.05 implies significance at the 5% level, a
p-value less than 0.01 implies significance at the 1% level etc.
Often, p-values are reported for other test statistics too.

14 / 30
Example
Returning again to the Stata output for a regression of
individuals’ wages on years of education:
> fit1 <- lm(lwage~educ, data=wage1)
> summary(fit1)

Call:
lm(formula = lwage ~ educ, data = wage1)

Residuals:
Min 1Q Median 3Q Max
-2.21158 -0.36393 -0.07263 0.29712 1.52339

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.583773 0.097336 5.998 3.74e-09 ***
educ 0.082744 0.007567 10.935 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4801 on 524 degrees of freedom


Multiple R-squared: 0.1858,Adjusted R-squared: 0.1843
F-statistic: 119.6 on 1 and 524 DF, p-value: < 2.2e-16

Here, t2 = 10.935 with, effectively, a p-value of 0 (in the column


Pr(>|t|); it is given as < 2e − 16 i.e. < 2 × 10−16 ), suggesting
that education is a highly significant determinant of wages.
15 / 30
Comparing models
The optimal properties of OLS rest on the model being correctly
specified.
It is therefore important to test:
(a) whether important regressors have been omitted;
(b) whether unimportant regressors have been included.
Consider the two models

y = X1 β1 + ϵ, (2)

y = X1 β1 + X2 β2 + ϵ, (3)

y:N×1 X1 : N × (K − J) β1 : (K − J) × 1
ϵ:N×1 X2 : N × J β2 : J × 1
Model (2) is obtained from (3) by setting the J elements of the
vector β2 equal to zero.

16 / 30
Zero restrictions
This suggests testing the hypothesis:

H0 : β2 = 0 (J restrictions) against H1 : β2 ̸= 0.

This involves a test of more than one restriction so we can’t use


the simple t-test.
We can test the restrictions individually but this says nothing
about their joint significance.
We can write (3) as
 
β1
y = [X1 : X2 ] + ϵ = Xβ + ϵ (4)
β2

and so the restrictions in H0 are:


 
β1
[0 : IJ ] = 0 or Rβ = 0.
β2

17 / 30
General linear restrictions
More generally consider Model (4), y = Xβ + ϵ, and suppose we
want to test the set of J linear restrictions

H0 : Rβ = q against H1 : Rβ ̸= q

where R is J × K (with full row rank J) and q is J × 1.


For example, suppose K = 3 and we wish to test whether
β1 + β2 = 0 and β1 − 2β2 + β3 = 1.
Here there are 2 restrictions (J = 2) and we have
 
  β1  
1 1 0  0
β2  = .
1 −2 1 1
| {z } β3 | {z }
R | {z } q
β

There are two main methods of testing H0 :


18 / 30
Method 1

Method 1 uses the unrestricted estimator b = (X ′ X)−1 X ′ y.


We know that (under normality, and conditioning on X)

b ∼ N β, σ 2 (X ′ X)−1


⇒ Rb ∼ N Rβ, σ 2 R(X ′ X)−1 R′



(S18)

⇒ Rb − q ∼ N Rβ − q, σ 2 R(X ′ X)−1 R′ .


Under H0 we know that Rβ − q = 0 and hence

Rb − q ∼ N[0, σ 2 R(X ′ X)−1 R′ ] under H0 .

We will test whether Rb − q is significantly different from 0.

19 / 30
Two χ2 distributions
We know, from (S20), that if Y ∼ N(µ, Σ) (J × 1), then

(Y − µ)′ Σ−1 (Y − µ) ∼ χ2J .

We can apply this result with Y = Rb − q, µ = 0 and


Σ = σ 2 R(X ′ X)−1 R′ .
We find that

ξ1 = (Rb − q)′ [σ 2 R(X ′ X)−1 R′ ]−1 (Rb − q) ∼ χ2J .

But σ 2 is unknown and so this distribution can’t be used.


However, we know from our derivation of tN−K that

s2
ξ = (N − K) ∼ χ2N−K .
σ2
We therefore have two random variables, ξ1 and ξ, each with a
χ2 distribution.
20 / 30
An F-statistic
We also know, from (S22), that if ξ1 ∼ χ2J1 and ξ2 ∼ χ2J2 , then

ξ1 /J1
∼ FJ1 ,J2 .
ξ2 /J2

Using this result we find that

ξ1 /J
F= ∼ FJ,N−K under H0 .
ξ/(N − K)

Written more fully,

(Rb − q)′ [σ 2 R(X ′ X)−1 R′ ]−1 (Rb − q)/J


F =
(N − K)s2 /(σ 2 (N − K))
(Rb − q)′ [R(X ′ X)−1 R′ ]−1 (Rb − q) 1
= · .
s2 J

21 / 30
Decision rule
J
Let FN−K;0.05 (= Fc) denote the 5% critical value from the
FJ,N−K distribution.
The decision rule is:
• if F ≥ Fc, reject H0 in favour of H1 ;
• if F < Fc, do not reject H0 .

F−distribution
.8
.6
Probability
.4
.2

Upper 5%
0

0 2 4 Fc 6 8 10
F

22 / 30
Method 2

Method 2 proceeds in 4 steps and involves estimation with and


without the restrictions imposed:
1. Estimate the unrestricted model, y = Xβ + ϵ, and obtain
S1 = e′ e (the sum of squared residuals).
2. Impose the restrictions, estimate the restricted model, and
obtain S0 = e′0 e0 , where e0 is the N × 1 vector of residuals
from the restricted model.
3. Compute
  
S0 − S1 N−K
F= ∼ FJ,N−K
S1 J

under H0 .
4. Apply the decision rule as in Method 1.

23 / 30
Points to note
Some points to note:
1. F > 0.
2. J = number of restrictions: degrees of freedom for
numerator;
N − K = degrees of freedom for denominator;
K = number of regressors in unrestricted model.
3. Example of imposing restrictions: consider the model

yi = β1 + β2 xi2 + β3 xi3 + ϵi , H0 : β2 + β3 = 1

⇒ β3 = 1 − β2
⇒ yi = β1 + β2 xi2 + (1 − β2 )xi3 + ϵi
⇒ (yi − xi3 ) = β1 + β2 (xi2 − xi3 ) + ϵi
i.e. regress y0i = yi − xi3 on an intercept and xi2
0 =x −x .
i2 i3

24 / 30
Example

Continuing with the wage regression, consider the model

log(wage) = β1 + β2 educ + β3 exper + β4 exper2 + ϵ,

where wage denotes wage in dollars, educ years of education,


and exper is years of experience.
The regression we have seen so far corresponds to the two
restrictions β3 = β4 = 0 (J = 2) so that the restricted regression
is
log(wage) = β1 + β2 educ + ϵ.
Let’s use method 2 to test the two restrictions.

25 / 30
Unrestricted regression
The unrestricted regression is:
> fitu <- lm(lwage~educ+exper+expersq, data=WAGE1)
> summary(fitu)

Call:
lm(formula = lwage ~ educ + exper + expersq, data = WAGE1)

Residuals:
Min 1Q Median 3Q Max
-1.96387 -0.29375 -0.04009 0.29497 1.30216

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1279975 0.1059323 1.208 0.227
educ 0.0903658 0.0074680 12.100 < 2e-16 ***
exper 0.0410089 0.0051965 7.892 1.77e-14 ***
expersq -0.0007136 0.0001158 -6.164 1.42e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4459 on 522 degrees of freedom


Multiple R-squared: 0.3003,Adjusted R-squared: 0.2963
F-statistic: 74.67 on 3 and 522 DF, p-value: < 2.2e-16

Here we see that both exper and exper2 are individually


significant, but we are not provided with the sum of squared
residuals that we need.
26 / 30
The sum of squared residuals

There are two ways in which we can obtain S1 in this example.


The first is:
> deviance(fitu)
[1] 103.7904

The second is:


> sum(resid(fitu)^2)
[1] 103.7904

Fortunately(!), both give the same answer: S1 = 103.7904.


We have already seen the restricted regression; we obtain S0
using
> deviance(fitr)
[1] 120.7691

Hence S0 = 120.7691.

27 / 30
F-statistic

We therefore have:

S0 = 120.7691, S1 = 103.7904, N = 526 and K = 4.

The statistic of interest is


  
120.7691 − 103.7904 526 − 4
F= = 42.6961.
103.7904 2
2
As F522;0.05 ≈ 3 we clearly reject the null that β3 = β4 = 0 at the
5% level of significance.

28 / 30
An easier approach
Note that we could, after the unrestricted regression, use the
command:
> library(car)
Loading required package: carData
> linearHypothesis(fitu, c("exper = 0", "expersq = 0"))
Linear hypothesis test

Hypothesis:
exper = 0
expersq = 0

Model 1: restricted model


Model 2: lwage ~ educ + exper + expersq

Res.Df RSS Df Sum of Sq F Pr(>F)


1 524 120.77
2 522 103.79 2 16.979 42.696 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘*’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This gives the same statistic and is much easier!


It also provides the p-value which can be useful in less clear-cut
cases.

29 / 30
Summary

• t-tests for tests involving a single parameter


• F-tests for tests of sets of linear restrictions involving more
than one parameter

• Next week:
• comparing regression models
• limited dependent variables

30 / 30

You might also like