EC501 Econometric Methods
3. Linear Regression: Inference
Marcus Chambers
Department of Economics
University of Essex
26 October 2023
1 / 30
Outline
Review
Goodness-of-fit
Tests for single parameters (t-tests)
Tests of linear restrictions (F-tests)
Reference: Verbeek, chapter 2.
2 / 30
Review
Model: y = Xβ + ϵ.
Ordinary least squares (OLS) estimator: b = (X ′ X)−1 X ′ y.
E{b} = β and hence b is an unbiased estimator of β.
V{b|X} = σ 2 (X ′ X)−1 ; V̂{b} = s2 (X ′ X)−1 .
b is minimum variance (Gauss-Markov Theorem) i.e. OLS is
BLUE (Best Linear Unbiased Estimator).
How ‘good’ is the estimated model? → goodness of fit.
How can we use the estimate b to make inferences about the
unknown parameters of interest in β?
3 / 30
Goodness-of-fit
How well does the estimated model fit the data?
We attempt to measure this in terms of the proportion of the
variation in y explained by the model.
We use the R2 (R-squared) statistic,
V̂{ŷ}
R2 = ,
V̂{y}
where V̂{·} denotes the sample variance.
But V̂{y} = V̂{ŷ} + V̂{e} and hence V̂{ŷ} = V̂{y} − V̂{e} so
N
1 X 2
ei
V̂{e} N−1
i=1
R2 = 1 − =1− N
,
V̂{y} 1 X
(yi − ȳ)2
N−1
i=1
PN
where ȳ = i=1 yi /N is the sample mean.
4 / 30
Properties of R2
Note that 0 ≤ R2 ≤ 1.
If R2 = 0 this implies that
N
X N
X
e2i = (yi − ȳ)2
i=1 i=1
so the model explains none of the variation in y!
On the other hand, if R2 = 1 then
N
X
e2i = 0 implying ei = 0 for all i = 1, . . . , N,
i=1
and so there is a perfect fit i.e. yi = xi′ b for all i = 1, . . . , N.
Caution: do not use R2 in models that don’t contain an intercept
– here R2 can be negative.
5 / 30
Adjusted R2
Because Ni=1 e2i will never rise (and will typically fall) by adding
P
more regressors, it is possible to make R2 artificially large by
using ‘irrelevant’ regressors.
As an alternative we can use the adjusted R2 , or R̄2 :
N
1 X 2
ei
N−K (N − 1)
i=1
R̄2 = 1 − = 1 − (1 − R2 ) .
1
N
X (N − K)
(yi − ȳ)2
N−1
i=1
R̄2 incurs a penalty if adding more
P regressors (increasing K)
does not significantly reduce Ni=1 e2i .
Too much emphasis is often placed on R2 – other aspects of the
fitted model are equally (if not more) important!
6 / 30
Example
Returning to the R output for a regression of individuals’ wages
on years of education from last week:
> fit1 <- lm(lwage~educ, data=wage1)
> summary(fit1)
Call:
lm(formula = lwage ~ educ, data = wage1)
Residuals:
Min 1Q Median 3Q Max
-2.21158 -0.36393 -0.07263 0.29712 1.52339
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.583773 0.097336 5.998 3.74e-09 ***
educ 0.082744 0.007567 10.935 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4801 on 524 degrees of freedom
Multiple R-squared: 0.1858,Adjusted R-squared: 0.1843
F-statistic: 119.6 on 1 and 524 DF, p-value: < 2.2e-16
Here, R2 = 0.1858 and R̄2 = 0.1843 i.e. around 18.5% of the
variation in (log) wages can be attributed to education.
7 / 30
Normality of b
Recall that, under the Gauss-Markov conditions (A1)–(A4) and
normality (A5), we have (conditional on X),
b ∼ N β, σ 2 (X ′ X)−1 .
For each element of b, bk , we have
bk ∼ N βk , σ 2 ckk , k = 1, . . . , K,
(1)
where ckk is the (k, k) diagonal element of (X ′ X)−1 .
If σ 2 were known then
bk − βk
zk = √ ∼ N(0, 1),
σ ckk
which could be used to test hypotheses because the standard
normal, or N(0, 1), distribution is well tabulated.
8 / 30
Hypothesis test
So to test
H0 : βk = βk0 (null hypothesis)
against H1 : βk ̸= βk0 (alternative hypothesis)
we would use the standardised statistic
bk − βk0
z0k = √ ∼ N(0, 1) under H0 .
σ ckk
Let z̄ denote the critical value (more on this shortly) for the
N(0, 1).
Decision rule: if |z0k | ≥ z̄ reject H0 in favour of H1 ;
if |z0k | < z̄ do not reject H0 .
9 / 30
Using s2 for σ 2
But σ 2 is not known!
So we use s2 in its place but this changes the distribution.
It can be shown that s2 is independent of b and that
s2
ξ = (N − K) ∼ χ2N−K .
σ2
Using (S21) we find that
z0k bk − β 0
tk = p = √ k ∼ tN−K under H0 .
ξ/(N − K) s ckk
The tN−K distribution has fatter tails than the N(0, 1).
This is due to the imprecision of using s2 rather than σ 2 .
As N − K → ∞, we have tN−K → N(0, 1) in distribution.
10 / 30
Significance level
We need to choose a significance level for the test – let’s
conduct the test at the 5% level of significance.
We can find a number, tN−K;0.025 (= tc), such that 5% of the
tN−K distribution lies outside the interval [−tc, tc].
t−distribution
.4
.3
Probability
.2
.1
Lower 2.5% Upper 2.5%
0
−5 −tc 0 +tc 5
t
11 / 30
Decision rule
If H0 is really true, there is only a 5% chance of obtaining a tk
value outside the range [−tc, tc].
If this occurs, we regard it as evidence against H0 .
The decision rule is:
• if |tk | ≥ tc, reject H0 in favour of H1 ;
• if |tk | < tc, do not reject H0 .
The interval [−tc, tc] is, therefore, the non-rejection region.
The area outside this interval is the rejection region.
12 / 30
Tests of significance
A common hypothesis to test is
H0 : βk = 0 against H1 : βk ̸= 0.
Under H0 the variable xk does not affect y in the regression
model, so this is a test of the significance of xk in the regression
determining y.
The test statistic is
bk estimate
tk = √ = ,
s ckk standard error
which is routinely computed in regression software (e.g. R,
Gretl, Stata etc.).
13 / 30
p-values
The critical values for the t-test, −tc and tc, can be obtained
from statistical tables.
But most econometric software enables us to sidestep this
process by reporting probability values, or p-values.
The p-value for a two-sided t-test is given by
p = P{tN−K < −|tk |} + P{tN−K > |tk |}.
This is the proportion of the tN−K distribution in the two tails
below −|tk | and above |tk |.
A p-value less than 0.05 implies significance at the 5% level, a
p-value less than 0.01 implies significance at the 1% level etc.
Often, p-values are reported for other test statistics too.
14 / 30
Example
Returning again to the Stata output for a regression of
individuals’ wages on years of education:
> fit1 <- lm(lwage~educ, data=wage1)
> summary(fit1)
Call:
lm(formula = lwage ~ educ, data = wage1)
Residuals:
Min 1Q Median 3Q Max
-2.21158 -0.36393 -0.07263 0.29712 1.52339
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.583773 0.097336 5.998 3.74e-09 ***
educ 0.082744 0.007567 10.935 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4801 on 524 degrees of freedom
Multiple R-squared: 0.1858,Adjusted R-squared: 0.1843
F-statistic: 119.6 on 1 and 524 DF, p-value: < 2.2e-16
Here, t2 = 10.935 with, effectively, a p-value of 0 (in the column
Pr(>|t|); it is given as < 2e − 16 i.e. < 2 × 10−16 ), suggesting
that education is a highly significant determinant of wages.
15 / 30
Comparing models
The optimal properties of OLS rest on the model being correctly
specified.
It is therefore important to test:
(a) whether important regressors have been omitted;
(b) whether unimportant regressors have been included.
Consider the two models
y = X1 β1 + ϵ, (2)
y = X1 β1 + X2 β2 + ϵ, (3)
y:N×1 X1 : N × (K − J) β1 : (K − J) × 1
ϵ:N×1 X2 : N × J β2 : J × 1
Model (2) is obtained from (3) by setting the J elements of the
vector β2 equal to zero.
16 / 30
Zero restrictions
This suggests testing the hypothesis:
H0 : β2 = 0 (J restrictions) against H1 : β2 ̸= 0.
This involves a test of more than one restriction so we can’t use
the simple t-test.
We can test the restrictions individually but this says nothing
about their joint significance.
We can write (3) as
β1
y = [X1 : X2 ] + ϵ = Xβ + ϵ (4)
β2
and so the restrictions in H0 are:
β1
[0 : IJ ] = 0 or Rβ = 0.
β2
17 / 30
General linear restrictions
More generally consider Model (4), y = Xβ + ϵ, and suppose we
want to test the set of J linear restrictions
H0 : Rβ = q against H1 : Rβ ̸= q
where R is J × K (with full row rank J) and q is J × 1.
For example, suppose K = 3 and we wish to test whether
β1 + β2 = 0 and β1 − 2β2 + β3 = 1.
Here there are 2 restrictions (J = 2) and we have
β1
1 1 0 0
β2 = .
1 −2 1 1
| {z } β3 | {z }
R | {z } q
β
There are two main methods of testing H0 :
18 / 30
Method 1
Method 1 uses the unrestricted estimator b = (X ′ X)−1 X ′ y.
We know that (under normality, and conditioning on X)
b ∼ N β, σ 2 (X ′ X)−1
⇒ Rb ∼ N Rβ, σ 2 R(X ′ X)−1 R′
(S18)
⇒ Rb − q ∼ N Rβ − q, σ 2 R(X ′ X)−1 R′ .
Under H0 we know that Rβ − q = 0 and hence
Rb − q ∼ N[0, σ 2 R(X ′ X)−1 R′ ] under H0 .
We will test whether Rb − q is significantly different from 0.
19 / 30
Two χ2 distributions
We know, from (S20), that if Y ∼ N(µ, Σ) (J × 1), then
(Y − µ)′ Σ−1 (Y − µ) ∼ χ2J .
We can apply this result with Y = Rb − q, µ = 0 and
Σ = σ 2 R(X ′ X)−1 R′ .
We find that
ξ1 = (Rb − q)′ [σ 2 R(X ′ X)−1 R′ ]−1 (Rb − q) ∼ χ2J .
But σ 2 is unknown and so this distribution can’t be used.
However, we know from our derivation of tN−K that
s2
ξ = (N − K) ∼ χ2N−K .
σ2
We therefore have two random variables, ξ1 and ξ, each with a
χ2 distribution.
20 / 30
An F-statistic
We also know, from (S22), that if ξ1 ∼ χ2J1 and ξ2 ∼ χ2J2 , then
ξ1 /J1
∼ FJ1 ,J2 .
ξ2 /J2
Using this result we find that
ξ1 /J
F= ∼ FJ,N−K under H0 .
ξ/(N − K)
Written more fully,
(Rb − q)′ [σ 2 R(X ′ X)−1 R′ ]−1 (Rb − q)/J
F =
(N − K)s2 /(σ 2 (N − K))
(Rb − q)′ [R(X ′ X)−1 R′ ]−1 (Rb − q) 1
= · .
s2 J
21 / 30
Decision rule
J
Let FN−K;0.05 (= Fc) denote the 5% critical value from the
FJ,N−K distribution.
The decision rule is:
• if F ≥ Fc, reject H0 in favour of H1 ;
• if F < Fc, do not reject H0 .
F−distribution
.8
.6
Probability
.4
.2
Upper 5%
0
0 2 4 Fc 6 8 10
F
22 / 30
Method 2
Method 2 proceeds in 4 steps and involves estimation with and
without the restrictions imposed:
1. Estimate the unrestricted model, y = Xβ + ϵ, and obtain
S1 = e′ e (the sum of squared residuals).
2. Impose the restrictions, estimate the restricted model, and
obtain S0 = e′0 e0 , where e0 is the N × 1 vector of residuals
from the restricted model.
3. Compute
S0 − S1 N−K
F= ∼ FJ,N−K
S1 J
under H0 .
4. Apply the decision rule as in Method 1.
23 / 30
Points to note
Some points to note:
1. F > 0.
2. J = number of restrictions: degrees of freedom for
numerator;
N − K = degrees of freedom for denominator;
K = number of regressors in unrestricted model.
3. Example of imposing restrictions: consider the model
yi = β1 + β2 xi2 + β3 xi3 + ϵi , H0 : β2 + β3 = 1
⇒ β3 = 1 − β2
⇒ yi = β1 + β2 xi2 + (1 − β2 )xi3 + ϵi
⇒ (yi − xi3 ) = β1 + β2 (xi2 − xi3 ) + ϵi
i.e. regress y0i = yi − xi3 on an intercept and xi2
0 =x −x .
i2 i3
24 / 30
Example
Continuing with the wage regression, consider the model
log(wage) = β1 + β2 educ + β3 exper + β4 exper2 + ϵ,
where wage denotes wage in dollars, educ years of education,
and exper is years of experience.
The regression we have seen so far corresponds to the two
restrictions β3 = β4 = 0 (J = 2) so that the restricted regression
is
log(wage) = β1 + β2 educ + ϵ.
Let’s use method 2 to test the two restrictions.
25 / 30
Unrestricted regression
The unrestricted regression is:
> fitu <- lm(lwage~educ+exper+expersq, data=WAGE1)
> summary(fitu)
Call:
lm(formula = lwage ~ educ + exper + expersq, data = WAGE1)
Residuals:
Min 1Q Median 3Q Max
-1.96387 -0.29375 -0.04009 0.29497 1.30216
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1279975 0.1059323 1.208 0.227
educ 0.0903658 0.0074680 12.100 < 2e-16 ***
exper 0.0410089 0.0051965 7.892 1.77e-14 ***
expersq -0.0007136 0.0001158 -6.164 1.42e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4459 on 522 degrees of freedom
Multiple R-squared: 0.3003,Adjusted R-squared: 0.2963
F-statistic: 74.67 on 3 and 522 DF, p-value: < 2.2e-16
Here we see that both exper and exper2 are individually
significant, but we are not provided with the sum of squared
residuals that we need.
26 / 30
The sum of squared residuals
There are two ways in which we can obtain S1 in this example.
The first is:
> deviance(fitu)
[1] 103.7904
The second is:
> sum(resid(fitu)^2)
[1] 103.7904
Fortunately(!), both give the same answer: S1 = 103.7904.
We have already seen the restricted regression; we obtain S0
using
> deviance(fitr)
[1] 120.7691
Hence S0 = 120.7691.
27 / 30
F-statistic
We therefore have:
S0 = 120.7691, S1 = 103.7904, N = 526 and K = 4.
The statistic of interest is
120.7691 − 103.7904 526 − 4
F= = 42.6961.
103.7904 2
2
As F522;0.05 ≈ 3 we clearly reject the null that β3 = β4 = 0 at the
5% level of significance.
28 / 30
An easier approach
Note that we could, after the unrestricted regression, use the
command:
> library(car)
Loading required package: carData
> linearHypothesis(fitu, c("exper = 0", "expersq = 0"))
Linear hypothesis test
Hypothesis:
exper = 0
expersq = 0
Model 1: restricted model
Model 2: lwage ~ educ + exper + expersq
Res.Df RSS Df Sum of Sq F Pr(>F)
1 524 120.77
2 522 103.79 2 16.979 42.696 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘*’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
This gives the same statistic and is much easier!
It also provides the p-value which can be useful in less clear-cut
cases.
29 / 30
Summary
• t-tests for tests involving a single parameter
• F-tests for tests of sets of linear restrictions involving more
than one parameter
• Next week:
• comparing regression models
• limited dependent variables
30 / 30