0% found this document useful (0 votes)
43 views192 pages

WEEK3 Multiple Regression

The document provides an overview of multiple regression analysis, including the estimation of coefficients and interpretation of results. It discusses the mathematical formulation of multiple regression, the use of ordinary least squares (OLS) for estimation, and the implications of regression coefficients. An example is presented, demonstrating the relationship between hourly earnings, highest grade completed, and a measure of ability, highlighting the importance of understanding the context of regression outputs.

Uploaded by

meminatmaca55
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views192 pages

WEEK3 Multiple Regression

The document provides an overview of multiple regression analysis, including the estimation of coefficients and interpretation of results. It discusses the mathematical formulation of multiple regression, the use of ordinary least squares (OLS) for estimation, and the implications of regression coefficients. An example is presented, demonstrating the relationship between hourly earnings, highest grade completed, and a measure of ability, highlighting the importance of understanding the context of regression outputs.

Uploaded by

meminatmaca55
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 192

ECONOMETRICS

MULTIPLE REGRESSION
WEEK3

FALL 2024

Prof. Dr. Burç Ülengin


MULTIPLE REGRESSION
y = b0 + b1x1 + b2x2 + . . . bkxk + u

ESTIMATION
While sea waves might look like an almost
random movement, in every moment and location
the basic laws of hydrodynamics and gravity
hold without change. 3
PARALLELS WITH SIMPLE
REGRESSION
y = b0 + b1x1 + b2x2 + . . . bkxk + u
➢ b0 is still the intercept
➢ b1 to bk all called slope parameters
➢ u is still the error term (or disturbance)
➢ Still need to make a zero conditional mean
assumption, so now assume that
E(u|x1,x2, …,xk) = 0
➢ Still minimizing the sum of squared
residuals. 4
INTERPRETING MULTIPLE
REGRESSION COEFFICIENTS

𝑦ො = 𝛽መ0 + 𝛽መ1 𝑥1 + 𝛽መ2 𝑥2 +. . . +𝛽መ𝑘 𝑥𝑘 , so


Δ𝑦ො = 𝛽መ1 Δ𝑥1 + 𝛽መ2 Δ𝑥2 +. . . +𝛽መ𝑘 Δ𝑥𝑘 ,
so holding 𝑥2 , . . . , 𝑥𝑘 fixed implies that
Δ𝑦ො = 𝛽መ1 Δ𝑥1 , that is each 𝛽 has

a 𝑐𝑒𝑡𝑒𝑟𝑖𝑠 𝑝𝑎𝑟𝑖𝑏𝑢𝑠 interpretation.

If xi increases 1 unit => y changes bi units.


5
MULTIPLE REGRESSION WITH TWO EXPLANATORY
VARIABLES: DRIVING COEFFICIENTS

𝑦𝑖 = 𝛼 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + 𝑢𝑖

𝑦ො𝑖 = 𝑎 + 𝑏1 𝑥1𝑖 + 𝑏2 𝑥2𝑖

𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖 = 𝑦𝑖 − 𝑎 − 𝑏1 𝑥1𝑖 − 𝑏2 𝑥2𝑖

The regression coefficients are derived using the same least


squares principle used in simple regression analysis.
The fitted value of y in observation i depends on our choice of
a, b1 and b2.
The residual ei in observation i is the difference between the
actual and fitted values of y.
6
MULTIPLE REGRESSION WITH TWO EXPLANATORY
VARIABLES: DRIVING COEFFICIENTS
We define S, the sum of the squares of the residuals, and choose a, b1, and b2
so as to minimize it.
𝑆 = ∑𝑒𝑖2 = ∑(𝑦𝑖 − 𝑎 − 𝑏1 𝑥1𝑖 − 𝑏2 𝑥2𝑖 )2

= ∑(𝑦𝑖2 + 𝑎2 + 𝑏12 𝑥1𝑖


2 2
+ 𝑏12 𝑥2𝑖 − 2𝑎𝑦𝑖 − 2𝑏1 𝑥1𝑖 𝑦𝑖
−2𝑏2 𝑥2𝑖 𝑦𝑖 + 2𝑎𝑏1 𝑥1𝑖 + 2𝑎𝑏2 𝑥2𝑖 + 2𝑏1 𝑏2 𝑥1𝑖 𝑥2𝑖 )

= ∑𝑦𝑖2 + 𝑛𝑎2 + 𝑏12 ∑𝑥1𝑖


2 2
+ 𝑏22 ∑𝑥2𝑖 − 2𝑎∑𝑦𝑖
−2𝑏1 ∑𝑥1𝑖 𝑦𝑖 − 2𝑏2 ∑𝑥2𝑖 𝑦𝑖 + 2𝑎𝑏1 ∑𝑥1𝑖
+2𝑎𝑏2 ∑𝑥2𝑖 + 2𝑏1 𝑏2 ∑𝑥1𝑖 𝑥2𝑖

𝜕𝑆 𝜕𝑆 𝜕𝑆
=0 =0 =0
𝜕𝑎 𝜕𝑏1 𝜕𝑏2
First we expand S as shown, and then we use the first order conditions for
minimizing it. 7
MULTIPLE REGRESSION WITH TWO EXPLANATORY
VARIABLES: DRIVING COEFFICIENTS

𝑎 = 𝑦lj − 𝑏1 𝑥lj1 − 𝑏2 𝑥lj 2

Cov(𝑥1 , 𝑦)Var(𝑥2 )−Cov(𝑥2 , 𝑦)Cov(𝑥1 , 𝑥2 )


𝑏1 =
Var(𝑥1 )Var(𝑥2 ) − Cov(𝑥1 , 𝑥2 ) 2

Cov(𝑥2 , 𝑦)Var(𝑥1 )−Cov(𝑥1 , 𝑦)Cov(𝑥1 , 𝑥2 )


𝑏2 =
Var(𝑥1 )Var(𝑥2 ) − Cov(𝑥1 , 𝑥2 ) 2

We thus obtain three equations in three unknowns. Solving for a, b1, and b2,
we obtain the expressions shown above.
The expression for a is a straightforward extension of the expression for it in
simple regression analysis.
However, the expressions for the slope coefficients are considerably more
complex than that for the slope coefficient in simple regression analysis. 8
MULTIPLE REGRESSION WITH TWO EXPLANATORY
VARIABLES: DRIVING COEFFICIENTS

𝑎 = 𝑦lj − 𝑏1 𝑥lj1 − 𝑏2 𝑥lj 2

Cov(𝑥1 , 𝑦)Var(𝑥2 )−Cov(𝑥2 , 𝑦)Cov(𝑥1 , 𝑥2 )


𝑏1 =
Var(𝑥1 )Var(𝑥2 ) − Cov(𝑥1 , 𝑥2 ) 2

Cov(𝑥2 , 𝑦)Var(𝑥1 )−Cov(𝑥1 , 𝑦)Cov(𝑥1 , 𝑥2 )


𝑏2 =
Var(𝑥1 )Var(𝑥2 ) − Cov(𝑥1 , 𝑥2 ) 2

For the general case when there are many explanatory


variables, ordinary algebra is inadequate.
It is necessary to switch to matrix algebra.
9
OLS ESTIMATION
MATRIX APPROACH
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 +. . . +𝛽𝑘 𝑥𝑘 + 𝑢

𝑌 = 𝑋𝛽 + 𝑢
𝑦1 1 x11 x21 ... xk1 𝑢1
𝛽0
𝑦2 1 x12 x22 ...xk2 𝑢2
𝛽1
𝑌= . X= . . . 𝛽= u= .
. . .
. . .
𝑦𝑛 𝛽k 𝑢𝑛
1 x1n x2n ....xkn

OLS ESTIMATES

𝛽መ = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑌 መ = 𝜎 2 (𝑋 ′ 𝑋)−1
Var(𝛽)

10
OLS ESTIMATION
MATRIX APPROACH
n ෍ x1 ෍ x2 ..... ෍ xk
෍𝑦

෍ x1 ෍ x12 ෍ x1 x2 ..... ෍ x1 xk

𝑋𝑋=
𝑋 ′ 𝑌 = ෍ 𝑥1 𝑦
෍ x2 ෍ x1 x2 ෍ x22 ....... ෍ x2 xk .
.. .. .. ... ෍ 𝑥𝑘 𝑦
෍ xk ෍ xk x1 ෍ x2 xk ....... ෍ xk2

OLS
𝛽መ = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑌 መ = 𝜎 2 (𝑋 ′ 𝑋)−1
Var(𝛽)
ESTIMATES

11
MULTIPLE REGRESSION WITH TWO
EXPLANATORY VARIABLES: EXAMPLE
Hourly earnings, EARNINGS, depend on highest grade completed,
HGC, and a measure of ability, ASVABC.
Here is the regression output for the earnings function using Data Set.
. reg earnings hgc asvabc

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 2, 567) = 39.98
Model | 4745.74965 2 2372.87483 Prob > F = 0.0000
Residual | 33651.2874 567 59.3497133 R-squared = 0.1236
---------+------------------------------ Adj R-squared = 0.1205
Total | 38397.0371 569 67.4816117 Root MSE = 7.7039

------------------------------------------------------------------------------
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgc | .7390366 .1606216 4.601 0.000 .4235506 1.054523
asvabc | .1545341 .0429486 3.598 0.000 .0701764 .2388918
_cons | -4.624749 2.0132 -2.297 0.022 -8.578989 -.6705095
------------------------------------------------------------------------------

𝐸𝐴𝑅 𝑁𝐼𝑁𝐺𝑆 = −4.62 + 0.74𝐻𝐺𝐶 + 0.15𝐴𝑆𝑉𝐴𝐵𝐶
It indicates that earnings increase by $0.74 for every extra year of
schooling and by $0.15 for every extra point increase in ASVABC. 12
MULTIPLE REGRESSION WITH TWO
EXPLANATORY VARIABLES: EXAMPLE
. reg earnings hgc asvabc

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 2, 567) = 39.98
Model | 4745.74965 2 2372.87483 Prob > F = 0.0000
Residual | 33651.2874 567 59.3497133 R-squared = 0.1236
---------+------------------------------ Adj R-squared = 0.1205
Total | 38397.0371 569 67.4816117 Root MSE = 7.7039

------------------------------------------------------------------------------
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgc | .7390366 .1606216 4.601 0.000 .4235506 1.054523
asvabc | .1545341 .0429486 3.598 0.000 .0701764 .2388918
_cons | -4.624749 2.0132 -2.297 0.022 -8.578989 -.6705095
------------------------------------------------------------------------------

𝐸𝐴𝑅 𝑁𝐼𝑁𝐺𝑆 = −4.62 + 0.74𝐻𝐺𝐶 + 0.16𝐴𝑆𝑉𝐴𝐵𝐶
Literally, the intercept indicates that an individual who had no schooling and an
ASVABC score of zero would have hourly earnings of -$4.62.
Obviously, this is impossible. The lowest value of HGC in the sample was 6, and
the lowest ASVABC score was 22. We have obtained a nonsense estimate
because we have extrapolated too far from the data range. 13
A “PARTIALLING OUT”
INTERPRETATION
Consider the case where 𝑘 = 2, i.e.

𝑦ො = 𝛽መ0 + 𝛽መ1 𝑥1 + 𝛽መ2 𝑥2 , then

𝛽መ1 = ෍ 𝑟𝑖1 Ƹ 2 , where 𝑟𝑖1


Ƹ 𝑦𝑖 ൙෍ 𝑟𝑖1 Ƹ are

the residuals from the estimated

regression 𝑥ො1 = 𝛾ො0 + 𝛾ො2 𝑥ො2


14
“PARTIALLING OUT” continued

➢ Previous equation implies that regressing y


on x1 and x2 gives same effect of x1 as
regressing y on residuals from a regression
of x1 on x2
➢ This means only the part of xi1 that is
uncorrelated with xi2 is being related to yi
so we’re estimating the effect of x1 on y
after x2 has been “partialled out”

15
GRAPHING A RELATIONSHIP IN A MULTIPLE
REGRESSION MODEL
90

80
. cor hgc asvabc
70 (obs=570)
| hgc asvabc
Hourly earnings ($)

60
--------+------------------
50
hgc| 1.0000
40 asvabc| 0.5779 1.0000
30

20

10

0
0 5 10 15 20 25
-10
Highest grade completed

Suppose that you were particularly interested in the relationship between EARNINGS
and HGC and wished to represent it graphically, using the sample data.
A simple plot, like the one above, would be misleading.
There appears to be a strong positive relationship, but it is distorted by the fact that
HGC is positively correlated with ASVABC, which also has a positive effect on
EARNINGS. 16
GRAPHING A RELATIONSHIP IN A
MULTIPLE REGRESSION MODEL
90

80
. cor hgc asvabc
70
(obs=570)
| hgc asvabc
Hourly earnings ($)

60
--------+------------------
50 hgc| 1.0000
40 asvabc| 0.5779 1.0000

30

20

10

0
0 5 10 15 20 25
-10
Highest grade completed

We will investigate the distortion mathematically when we come


to omitted variable bias.
17
GRAPHING A RELATIONSHIP IN A MULTIPLE
REGRESSION MODEL
. reg earnings asvabc

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 1, 568) = 56.78
Model | 3489.30726 1 3489.30726 Prob > F = 0.0000
Residual | 34907.7298 568 61.4572708 R-squared = 0.0909
---------+------------------------------ Adj R-squared = 0.0893
Total | 38397.0371 569 67.4816117 Root MSE = 7.8395

------------------------------------------------------------------------------
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .2687432 .035666 7.535 0.000 .1986898 .3387966
_cons | -.359883 1.818571 -0.198 0.843 -3.931829 3.212063
------------------------------------------------------------------------------

. predict eearn, resid

To eliminate the distortion, you purge both EARNINGS and HGC of their
components related to ASVABC and then draw a scatter diagram using the
purged variables.
We start by regressing EARNINGS on ASVABC, as shown above. The residuals
are the part of EARNINGS which is not related to ASVABC. The "predict"
command is the Stata command for saving the residuals from the most recent
regression. We name them EEARN. 18
GRAPHING A RELATIONSHIP IN A MULTIPLE
REGRESSION MODEL
. reg hgc asvabc

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 1, 568) = 284.89
Model | 1153.80864 1 1153.80864 Prob > F = 0.0000
Residual | 2300.43873 568 4.05006818 R-squared = 0.3340
---------+------------------------------ Adj R-squared = 0.3329
Total | 3454.24737 569 6.07073351 Root MSE = 2.0125

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1545378 .0091559 16.879 0.000 .1365543 .1725213
_cons | 5.770845 .4668473 12.361 0.000 4.853888 6.687803
------------------------------------------------------------------------------

. predict ehgc, resid

We do the same with HGC. We regress it on ASVABC and save


the residuals as EHGC.
19
GRAPHING A RELATIONSHIP IN A MULTIPLE
REGRESSION MODEL
70

60
EEARN (EARNINGS residuals)

50

40

30

20

10

0
-6 -4 -2 0 2 4 6 8
-10

-20
EHGC (HGC residuals)

Now we plot EEARN on EHGC and the scatter is a faithful


representation of the relationship, both in terms of the slope of the
trend line (the black line) and in terms of the variation about that line. 20
GRAPHING A RELATIONSHIP IN A MULTIPLE
REGRESSION MODEL
70

60
EEARN (EARNINGS residuals)

50

40

30

20

10

0
-6 -4 -2 0 2 4 6 8
-10

-20
EHGC (HGC residuals)

As you would expect, the trend line is flatter that in scatter diagram
which did not control for ASVABC (reproduced here as the gray line). 21
GRAPHING A RELATIONSHIP IN A MULTIPLE
REGRESSION MODEL
Here is the regression of EEARN on EHGC.
. reg eearn ehgc
Source | SS df MS Number of obs = 570
---------+------------------------------ F( 1, 568) = 21.21
Model | 1256.44239 1 1256.44239 Prob > F = 0.0000
Residual | 33651.2873 568 59.2452241 R-squared = 0.0360
---------+------------------------------ Adj R-squared = 0.0343
Total | 34907.7297 569 61.3492613 Root MSE = 7.6971
------------------------------------------------------------------------------
eearn | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
ehgc | .7390366 .1604802 4.605 0.000 .4238296 1.054244
_cons | -5.99e-09 .3223957 0.000 1.000 -.6332333 .6332333
------------------------------------------------------------------------------
From multiple regression:
------------------------------------------------------------------------------
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgc | .7390366 .1606216 4.601 0.000 .4235506 1.054523
asvabc | .1545341 .0429486 3.598 0.000 .0701764 .2388918
_cons | -4.624749 2.0132 -2.297 0.022 -8.578989 -.6705095
------------------------------------------------------------------------------

A mathematical proof that the technique works requires matrix algebra. We will content
ourselves by verifying that the estimate of the slope coefficient, and equally importantly,
its standard error and t statistic, are the same as in the multiple regression 22
SIMPLE VS MULTIPLE
REGRESSION ESTIMATE
Compare the simple regression
𝑦෤ = 𝛽෨0 + 𝛽෨1 𝑥1
with the multiple regression
𝑦ො = 𝛽መ0 + 𝛽መ1 𝑥1 + 𝛽መ2 𝑥2

Generally, 𝛽෨1 ≠ 𝛽መ1 unless:


𝛽መ2 = 0 (i.e. no partial effect of 𝑥2 ) OR

𝑥1 and 𝑥2 are uncorrelated in the sample


23
Multiple Regression Analysis
y = b0 + b1x1 + b2x2 + . . . bkxk + u

GOODNESS OF FIT
GOODNESS-OF-FIT
We can think of each observation as being made up of an
explained part, and an unexplained part,
𝑦𝑖 = 𝑦ො𝑖 + 𝑢ො 𝑖 We then define the following:
෍ 𝑦𝑖 − 𝑦lj 2 is the total sum of squares (SST)

෍ 𝑦ො𝑖 − 𝑦lj 2 is the explained sum of squares (SSE)

෍ 𝑢ො 𝑖2 is the residual sum of squares (SSR)

Then SST = SSE + SSR

25
GOODNESS-OF-FIT
How do we think about how well our
sample regression line fits our sample data?

Can compute the fraction of the total sum


of squares (SST) that is explained by the
model, call this the R-squared of regression

R2 = SSE/SST = 1 – SSR/SST

26
GOODNESS-OF-FIT
We can also think of 𝑅2 as being equal to
the squared correlation coefficient between
the actual 𝑦𝑖 and the values 𝑦ො𝑖

2
2
∑ 𝑦𝑖 − 𝑦lj 𝑦ො𝑖 − 𝑦ොሜ
𝑅 = 2 2
∑ 𝑦𝑖 − 𝑦lj ∑ 𝑦ො𝑖 − 𝑦ොሜ

27
GOODNESS-OF-FIT: EXAMPLE
. reg earnings hgc asvabc

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 2, 567) = 39.98
Model | 4745.74965 2 2372.87483 Prob > F = 0.0000
Residual | 33651.2874 567 59.3497133 R-squared = 0.1236
---------+------------------------------ Adj R-squared = 0.1205
Total | 38397.0371 569 67.4816117 Root MSE = 7.7039

------------------------------------------------------------------------------
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgc | .7390366 .1606216 4.601 0.000 .4235506 1.054523
asvabc | .1545341 .0429486 3.598 0.000 .0701764 .2388918
_cons | -4.624749 2.0132 -2.297 0.022 -8.578989 -.6705095
------------------------------------------------------------------------------

Two explanatory variables, HGC and ASVABC


explains 12.4% of the varition of he EARNINGS.
28
MORE ABOUT R2

➢ R2 can never decrease when another


independent variable is added to a regression
and usually will increase

➢ Because R2 will usually increase with the


number of independent variables, it is not an
excellent way to compare models.

29
R2 COMPARISON OF DIFFERENT
REGRESSION MODELS
1. y= b 0 +b 1 x 1 +u 𝑹 𝟐𝟏
2. y= b 0 +b 1 x 1 + b 2 x 2 + u 𝑹 𝟐𝟐
3. y= b 0 +b 2 x 2 +u 𝑹 𝟐𝟑
4. Ln(y)= b 0 +b 2 x 2 +u 𝑹 𝟐𝟒
▪ For R2 comparisons
• The number of explanatory variables of the models and
• The dependent variables of the models must be the same
▪ Only 𝑅12 and 𝑅32 𝑎𝑟𝑒 comparable
▪ Adjusted R2, AIC, SC, or other criteria may be used to compare
models 1, 2, and 3.
30
Multiple Regression Analysis
y = b0 + b1x1 + b2x2 + . . . bkxk + u

MODEL MISSPECIFICATION
ASSUMPTIONS FOR
UNBIASEDNESS
Population model is linear in parameters:
y = b0 + b1x1 + b2x2 +…+ bkxk + u
We can use a random sample of size n, {(xi1,
xi2,…, xik, yi): i=1, 2, …, n}, from the
population model, so that the sample model
is yi = b0 + b1xi1 + b2xi2 +…+ bkxik + ui
E(u|x1, x2,… xk) = 0, implying that all of the
explanatory variables are exogenous
None of the x’s is constant, and there are no
exact linear relationships among them. 32
TOO MANY OR TOO FEW VARIABLES

➢ What if we exclude a variable from our


specification that does belong?
➢ OLS will usually be biased.

➢ What happens if we include variables in


our specification that don’t belong?
➢ There is no effect on our parameter
estimate, and OLS remains unbiased,
but inefficient.
33
UNBIASEDNESS AND EFFICIENCY

x x
x
x xx x
x x xx x x
xx
Unbiased x
x Unbiased
Efficient Inefficient

x x x
xxx
xx
Biased xx x Biased
x
Efficient x x Inefficient
34
VARIABLE MISSPECIFICATION I: OMISSION OF A
RELEVANT VARIABLE
Consequences of Variable Misspecification

True Model
𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝑢 𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢

𝑦ො = 𝑎 + 𝑏1 𝑥1
Fitted Model

𝑦ො = 𝑎 + 𝑏1 𝑥1
+𝑏2 𝑥2

To keep the analysis simple, we will assume that there are only two
possibilities. Either y depends only on x1, or it depends on both x1 and x2. 35
VARIABLE MISSPECIFICATION I: OMISSION OF A
RELEVANT VARIABLE

Consequences of Variable Misspecification

True Model
𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝑢 𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢

Coefficients are biased (in


Correct specification,
𝑦ො = 𝑎 + 𝑏1 𝑥1 general). Standard
no problems
Fitted Model

errors are invalid.

𝑦ො = 𝑎 + 𝑏1 𝑥1 Coefficients are
Correct specification,
+𝑏2 𝑥2 unbaised
no problems
but
Inefficient

36
MULTIPLE REGRESSION
ANALYSIS
y = b0 + b1x1 + b2x2 + . . . bkxk + u

MODEL MISSPECIFICATION I:
OMITTED VARIABLE BIAS
VARIABLE MISSPECIFICATION I: OMISSION OF A
RELEVANT VARIABLE
𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢

𝑦ො = 𝑎 + 𝑏1 𝑥1

Cov(𝑥1 , 𝑥2 )
𝐸(𝑏1 ) = 𝛽1 + 𝛽2
Var(𝑥1 )
y
effect of x2
direct effect of
x1, holding x2 b1 b2
constant apparent effect of x1,
acting as a mimic for x2

x1 x2
In the present case, the omission of x2 causes b1 to be biased by an amount
b2 Cov(x1, x2)/Var(x1). We will demonstrate this first intuitively and then mathematically.
The intuitive reason is that, in addition to its direct effect b1, x1 has an apparent indirect effect
as a consequence of acting as a proxy for the missing x2. 38
VARIABLE MISSPECIFICATION I: OMISSION OF A
RELEVANT VARIABLE
𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢

𝑦ො = 𝑎 + 𝑏1 𝑥1

Cov(𝑥1 , 𝑥2 )
𝐸(𝑏1 ) = 𝛽1 + 𝛽2
Var(𝑥1 )
y
effect of x2
direct effect of
x1, holding x2 b1 b2
constant apparent effect of x1,
acting as a mimic for x2

x1 x2

The strength of the proxy effect depends on two factors: the strength of the effect of
x2 on y, which is given by b2, and the ability of x1 to mimic x2. 39
VARIABLE MISSPECIFICATION I: OMISSION OF A
RELEVANT VARIABLE
𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢

𝑦ො = 𝑎 + 𝑏1 𝑥1

Cov(𝑥1 , 𝑥2 )
𝐸(𝑏1 ) = 𝛽1 + 𝛽2
Var(𝑥1 )
y
effect of x2
direct effect of
x1, holding x2 b1 b2
constant apparent effect of x1,
acting as a mimic for x2

x1 x2

The ability of x1 to mimic x2 is determined by the slope coefficient obtained


when x2 is regressed on x1, which of course is Cov(x1, x2)/Var(x1). 40
VARIABLE MISSPECIFICATION I: OMISSION OF A
RELEVANT VARIABLE
𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢

𝑦ො = 𝑎 + 𝑏1 𝑥1

Cov(𝑥1 , 𝑦) Cov(𝑥1 , 𝑥2 ) Cov(𝑥1 , 𝑢)


𝑏1 = = 𝛽1 + 𝛽2 +
Var(𝑥1 ) Var(𝑥1 ) Var(𝑥1 )

Cov(𝑥1 , 𝑥2 ) Cov(𝑥1 , 𝑢)
𝐸(𝑏1 ) = 𝐸 𝛽1 + 𝛽2 +
Var(𝑥1 ) Var(𝑥1 )
Cov(𝑥1 , 𝑥2 )
= 𝐸(𝛽1 ) + 𝐸 𝛽2
Var(𝑥1 )
Cov(𝑥1 , 𝑥2 )
= 𝛽1 + 𝛽2
Var(𝑥1 )

Thus b1 is biased by an amount b2 Cov(x1, x2)/Var(x1). As a consequence of


the misspecification, the standard errors, t tests and F test are invalid. 41
VARIABLE MISSPECIFICATION I: OMISSION OF A
RELEVANT VARIABLE : An Example

. reg hgc asvabc hgcm

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 2, 567) = 156.81
Model | 1230.2039 2 615.101949 Prob > F = 0.0000
Residual | 2224.04347 567 3.92247526 R-squared = 0.3561
---------+------------------------------ Adj R-squared = 0.3539
Total | 3454.24737 569 6.07073351 Root MSE = 1.9805

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1381062 .0097494 14.166 0.000 .1189567 .1572556
hgcm | .154783 .0350728 4.413 0.000 .0858946 .2236715
_cons | 4.791277 .5102431 9.390 0.000 3.78908 5.793475
------------------------------------------------------------------------------

We will illustrate the bias using an educational attainment model. We


will assume that HGC depends on ASVABC and HGCM to keep the
analysis simple. The output above shows the corresponding
regression using the EAEF Data Set.
42
VARIABLE MISSPECIFICATION I: OMISSION OF A
RELEVANT VARIABLE: An Example
. reg hgc asvabc hgcm

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 2, 567) = 156.81
Model | 1230.2039 2 615.101949 Prob > F = 0.0000
Residual | 2224.04347 567 3.92247526 R-squared = 0.3561
---------+------------------------------ Adj R-squared = 0.3539
Total | 3454.24737 569 6.07073351 Root MSE = 1.9805

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1381062 .0097494 14.166 0.000 .1189567 .1572556
hgcm | .154783 .0350728 4.413 0.000 .0858946 .2236715
_cons | 4.791277 .5102431 9.390 0.000 3.78908 5.793475
------------------------------------------------------------------------------

HGC =  + b 1 ASVABC + b 2 HGCM + u


Cov(𝐴𝑆𝑉𝐴𝐵𝐶, 𝐻𝐺𝐶𝑀)
𝐸(𝑏1 ) = 𝛽1 + 𝛽2
Var(𝐴𝑆𝑉𝐴𝐵𝐶)

We will run the regression a second time, omitting HGCM. Before we


do this, we will predict the bias's direction in the ASVABC coefficient.
43
VARIABLE MISSPECIFICATION I: OMISSION OF A
RELEVANT VARIABLE: An Example
. reg hgc asvabc hgcm

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 2, 567) = 156.81
Model | 1230.2039 2 615.101949 Prob > F = 0.0000
Residual | 2224.04347 567 3.92247526 R-squared = 0.3561
---------+------------------------------ Adj R-squared = 0.3539
Total | 3454.24737 569 6.07073351 Root MSE = 1.9805

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1381062 .0097494 14.166 0.000 .1189567 .1572556
hgcm | .154783 .0350728 4.413 0.000 .0858946 .2236715
_cons | 4.791277 .5102431 9.390 0.000 3.78908 5.793475
------------------------------------------------------------------------------

HGC =  + b 1 ASVABC + b 2 HGCM + u


Cov(𝐴𝑆𝑉𝐴𝐵𝐶, 𝐻𝐺𝐶𝑀)
𝐸(𝑏1 ) = 𝛽1 + 𝛽2
Var(𝐴𝑆𝑉𝐴𝐵𝐶)

It is reasonable to suppose that b2 is positive. This assumption is strongly


supported by the fact that its estimate in the multiple regression is positive
and highly significant. 44
VARIABLE MISSPECIFICATION I: OMISSION OF A
RELEVANT VARIABLE: An Example
. reg hgc asvabc hgcm

Source | SS df MS Number
. cor hgcm of obs
asvabc =
(obs=570) 570
---------+------------------------------ F( 2, 567) = 156.81
Model | 1230.2039 2 615.101949 Prob
| > Fhgcm =asvabc
0.0000
Residual | 2224.04347 567 3.92247526 R-squared = 0.3561
--------+------------------
---------+------------------------------ Adj R-squared
hgcm| 1.0000 = 0.3539
Total | 3454.24737 569 6.07073351 Root MSE
asvabc| 0.3819 = 1.0000
1.9805

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1381062 .0097494 14.166 0.000 .1189567 .1572556
hgcm | .154783 .0350728 4.413 0.000 .0858946 .2236715
_cons | 4.791277 .5102431 9.390 0.000 3.78908 5.793475
------------------------------------------------------------------------------

HGC =  + b 1 ASVABC + b 2 HGCM + u


Cov(𝐴𝑆𝑉𝐴𝐵𝐶, 𝐻𝐺𝐶𝑀)
𝐸(𝑏1 ) = 𝛽1 + 𝛽2
Var(𝐴𝑆𝑉𝐴𝐵𝐶)
The correlation between ASVABC and HGCM is positive, so their
covariance must be positive. Var(ASVABC) is automatically positive.
Hence, the bias should be positive. 45
VARIABLE MISSPECIFICATION I: OMISSION OF A
RELEVANT VARIABLE: An Example
. reg hgc asvabc

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 1, 568) = 284.89
Model | 1153.80864 1 1153.80864 Prob > F = 0.0000
Residual | 2300.43873 568 4.05006818 R-squared = 0.3340
---------+------------------------------ Adj R-squared = 0.3329
Total | 3454.24737 569 6.07073351 Root MSE = 2.0125

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1545378 .0091559 16.879 0.000 .1365543 .1725213
_cons | 5.770845 .4668473 12.361 0.000 4.853888 6.687803
------------------------------------------------------------------------------

Here is the regression omitting HGCM.

2446
VARIABLE MISSPECIFICATION I: OMISSION OF A
RELEVANT VARIABLE: An Example
. reg hgc asvabc hgcm

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1381062 .0097494 14.166 0.000 .1189567 .1572556
hgcm | .154783 .0350728 4.413 0.000 .0858946 .2236715
_cons | 4.791277 .5102431 9.390 0.000 3.78908 5.793475
------------------------------------------------------------------------------

. reg hgc asvabc

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1545378 .0091559 16.879 0.000 .1365543 .1725213
_cons | 5.770845 .4668473 12.361 0.000 4.853888 6.687803
------------------------------------------------------------------------------

As you can see, the coefficient of ASVABC is indeed higher when


HGCM is omitted.
Part of the difference may be due to pure chance, but part is
attributable to the bias.
47
VARIABLE MISSPECIFICATION I: OMISSION OF A
RELEVANT VARIABLE: An Example
. reg hgc hgcm

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 1, 568) = 83.59
Model | 443.110436 1 443.110436 Prob > F = 0.0000
Residual | 3011.13693 568 5.30129742 R-squared = 0.1283
---------+------------------------------ Adj R-squared = 0.1267
Total | 3454.24737 569 6.07073351 Root MSE = 2.3025

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgcm | .3445198 .0376833 9.142 0.000 .2705041 .4185354
_cons | 9.506491 .4495754 21.145 0.000 8.623458 10.38952
------------------------------------------------------------------------------

HGC =  + b 1 ASVABC + b 2 HGCM + u


Cov(𝐴𝑆𝑉𝐴𝐵𝐶, 𝐻𝐺𝐶𝑀)
𝐸(𝑏2 ) = 𝛽2 + 𝛽1
Var(𝐻𝐺𝐶𝑀)
Here is the regression omitting ASVABC instead of HGCM.

We would expect b2 to be upward biased. We anticipate that b1 is positive


and that both the covariance and variance terms in the bias expression are
positive. 48
VARIABLE MISSPECIFICATION I: OMISSION OF A
RELEVANT VARIABLE: An Example
. reg hgc asvabc hgcm

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1381062 .0097494 14.166 0.000 .1189567 .1572556
hgcm | .154783 .0350728 4.413 0.000 .0858946 .2236715
_cons | 4.791277 .5102431 9.390 0.000 3.78908 5.793475
------------------------------------------------------------------------------

. reg hgc hgcm

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgcm | .3445198 .0376833 9.142 0.000 .2705041 .4185354
_cons | 9.506491 .4495754 21.145 0.000 8.623458 10.38952
------------------------------------------------------------------------------

In this case, the bias is quite dramatic. The coefficient of HGCM has
more than doubled. (The reason for the bigger effect is that Var(HGCM)
is much smaller than Var(ASVABC), while b1 and b2 are similar in size,
judging by their estimates.)
49
SUMMARY OF DIRECTION OF BIAS

Corr(x1, x2) > 0 Corr(x1, x2) < 0

b2 > 0 Positive bias Negative bias

b2 < 0 Negative bias Positive bias

True Model 𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖1 + 𝛽2 𝑥𝑖2 + 𝑢𝑖


Estimated Model 𝑦𝑖 = 𝛽෨0 + 𝛽෨1 𝑥𝑖1 + 𝑢෤ 𝑖
50
OMITTED VARIABLE BIAS
SUMMARY
➢ Two cases where bias is equal to zero
◼ b2 = 0, that is x2 doesn’t really belong in
model
◼ x1 and x2 are uncorrelated in the sample

➢ If correlation between x2 , x1 and x2 , y is the


same direction, bias will be positive
➢ If correlation between x2 , x1 and x2 , y is the
opposite direction, bias will be negative
51
MULTIPLE REGRESSION
ANALYSIS
y = b0 + b1x1 + b2x2 + . . . bkxk + u

MODEL MISSPECIFICATION II:


INCLUSION OF AN IRRELEVANT
VARIABLE
VARIABLE MISSPECIFICATION II: INCLUSION OF AN
IRRELEVANT VARIABLE
Consequences of Variable Misspecification

True Model
𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝑢 𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢

Coefficients are biased (in


Correct specification,
𝑦ො = 𝑎 + 𝑏1 𝑥1 general). Standard
no problems
Fitted Model

errors are invalid.

Coefficients are
𝑦ො = 𝑎 + 𝑏1 𝑥1 unbiased (in general),
Correct specification,
+𝑏2 𝑥2 but inefficient.
no problems
Standard errors are
valid (in general)

The effects are different from those of omitted variable misspecification. In


this case the coefficients in general remain unbiased, but they are inefficient.
The standard errors remain valid, but are needlessly large. 53
VARIABLE MISSPECIFICATION II: INCLUSION OF AN
IRRELEVANT VARIABLE
𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝑢

𝑦ො = 𝑎 + 𝑏1 𝑥1 + 𝑏2 𝑥2

𝑦 = 𝛼 + 𝛽1 𝑥1 + 0𝑥2 + 𝑢
𝜎𝑢2 1
𝜎𝑏21 = ×
𝑛Var(𝑥1 ) 1 − 𝑟𝑥21,𝑥2
These results can be demonstrated quickly.
Rewrite the actual model, adding x2 as an explanatory variable with a coefficient
of 0. Now, the actual model and the fitted model coincide. Hence, b1 will be an
unbiased estimator of b1, and b2 will be an unbiased estimator of 0.
However, the population variance of b1 will be larger than it would have been if
the correct simple regression had been run because it includes the factor.
1/ (1 − 𝑟𝑥21 𝑥2 )

Therefore, the estimator of b1 using the multiple regression model will be less
efficient than the alternative using the simple regression model. 54
VARIABLE MISSPECIFICATION II: INCLUSION OF AN
IRRELEVANT VARIABLE

𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝑢

𝑦ො = 𝑎 + 𝑏1 𝑥1 + 𝑏2 𝑥2

𝑦 = 𝛼 + 𝛽1 𝑥1 + 0𝑥2 + 𝑢
𝜎𝑢2 1
𝜎𝑏21 = ×
𝑛Var(𝑥1 ) 1 − 𝑟𝑥21,𝑥2

The intuitive reason for this is that the simple regression model exploits the
information that x2 should not be in the regression. In contrast, with the
multiple regression model, you find this out from the regression results.
The standard errors remain valid because the model is formally correctly
specified. Still, they will tend to be larger than those obtained in a simple
regression, reflecting the loss of efficiency.
These are the results in general. Note that if x1 and x2 are uncorrelated, there
will be no loss of efficiency after all.
55
VARIABLE MISSPECIFICATION II: INCLUSION OF AN
IRRELEVANT VARIABLE

. reg lgearn hgc asvabc

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 2, 567) = 57.45
Model | 25.9166749 2 12.9583374 Prob > F = 0.0000
Residual | 127.885218 567 .225547121 R-squared = 0.1685
---------+------------------------------ Adj R-squared = 0.1656
Total | 153.801893 569 .270302096 Root MSE = .47492

------------------------------------------------------------------------------
lgearn | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgc | .0544266 .0099018 5.497 0.000 .034978 .0738753
asvabc | .0114733 .0026476 4.333 0.000 .0062729 .0166736
_cons | 1.118832 .124107 9.015 0.000 .8750665 1.362598
------------------------------------------------------------------------------

The analysis will be illustrated using a basic semilogarithmic


earnings function. The result of regressing LGEARN on HGC and
ASVABC is shown above.

56
VARIABLE MISSPECIFICATION II: INCLUSION OF AN
IRRELEVANT VARIABLE
. reg lgearn hgc asvabc hgcm hgcf

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 4, 565) = 29.22
Model | 26.3617806 4 6.59044515 Prob > F = 0.0000
Residual | 127.440112 565 .22555772 R-squared = 0.1714
---------+------------------------------ Adj R-squared = 0.1655
Total | 153.801893 569 .270302096 Root MSE = .47493

------------------------------------------------------------------------------
lgearn | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgc | .0511811 .0101812 5.027 0.000 .0311835 .0711788
asvabc | .010444 .0027481 3.800 0.000 .0050463 .0158417
hgcm | .0071835 .0102695 0.699 0.485 -.0129876 .0273547
hgcf | .004794 .0076389 0.628 0.531 -.0102101 .0197981
_cons | 1.073972 .1324621 8.108 0.000 .8137933 1.33415
------------------------------------------------------------------------------
Now add the parental education variables, HGCM and HGCF. These
variables are determinants of educational attainment and indirectly
affect earnings, but there is no evidence that they have any
additional direct effect on earnings.
The fact that the t statistics of both variables are low is evidence that
they are probably irrelevant. 57
VARIABLE MISSPECIFICATION II: INCLUSION OF AN
IRRELEVANT VARIABLE
. reg lgearn hgc asvabc

------------------------------------------------------------------------------
lgearn | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgc | .0544266 .0099018 5.497 0.000 .034978 .0738753
asvabc | .0114733 .0026476 4.333 0.000 .0062729 .0166736
_cons | 1.118832 .124107 9.015 0.000 .8750665 1.362598
------------------------------------------------------------------------------

. reg lgearn hgc asvabc hgcm hgcf

------------------------------------------------------------------------------
lgearn | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgc | .0511811 .0101812 5.027 0.000 .0311835 .0711788
asvabc | .010444 .0027481 3.800 0.000 .0050463 .0158417
hgcm | .0071835 .0102695 0.699 0.485 -.0129876 .0273547
hgcf | .004794 .0076389 0.628 0.531 -.0102101 .0197981
_cons | 1.073972 .1324621 8.108 0.000 .8137933 1.33415
------------------------------------------------------------------------------

There is no evidence that including the parental education variables


has caused the other coefficients to be biased. The other coefficients
have changed, but the changes are small about the standard errors
and appear to be chance movements.
58
VARIABLE MISSPECIFICATION II: INCLUSION OF AN
IRRELEVANT VARIABLE
. reg lgearn hgc asvabc

------------------------------------------------------------------------------
lgearn | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgc | .0544266 .0099018 5.497 0.000 .034978 .0738753
asvabc | .0114733 .0026476 4.333 0.000 .0062729 .0166736
_cons | 1.118832 .124107 9.015 0.000 .8750665 1.362598
------------------------------------------------------------------------------

. reg lgearn hgc asvabc hgcm hgcf

------------------------------------------------------------------------------
lgearn | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgc | .0511811 .0101812 5.027 0.000 .0311835 .0711788
asvabc | .010444 .0027481 3.800 0.000 .0050463 .0158417
hgcm | .0071835 .0102695 0.699 0.485 -.0129876 .0273547
hgcf | .004794 .0076389 0.628 0.531 -.0102101 .0197981
_cons | 1.073972 .1324621 8.108 0.000 .8137933 1.33415
------------------------------------------------------------------------------

The standard errors are larger in the misspecified model, reflecting


the loss of efficiency.

59
VARIABLE MISSPECIFICATION II: INCLUSION OF AN
IRRELEVANT VARIABLE
. reg lgearn hgc asvabc
. cor hgc asvabc hgcm hgcf (obs=570)
------------------------------------------------------------------------------
| hgc asvabc hgcm hgcf
lgearn | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------+------------------------------------
---------+--------------------------------------------------------------------
hgc| 1.0000
hgc | .0544266 .0099018 5.497 0.000 .034978 .0738753
asvabc| 0.5779 1.0000
asvabc | .0114733 .0026476 4.333 0.000 .0062729 .0166736
hgcm| 0.3582 0.3819 1.0000
_cons | 1.118832 .124107 9.015 0.000 .8750665 1.362598
hgcf| 0.4066 0.4179 0.6391 1.0000
------------------------------------------------------------------------------

. reg lgearn hgc asvabc hgcm hgcf

------------------------------------------------------------------------------
lgearn | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgc | .0511811 .0101812 5.027 0.000 .0311835 .0711788
asvabc | .010444 .0027481 3.800 0.000 .0050463 .0158417
hgcm | .0071835 .0102695 0.699 0.485 -.0129876 .0273547
hgcf | .004794 .0076389 0.628 0.531 -.0102101 .0197981
_cons | 1.073972 .1324621 8.108 0.000 .8137933 1.33415
------------------------------------------------------------------------------

However, the loss of efficiency could be better. The parental


education variables correlate with both HGC and ASVABC. However,
with a sample as large as the present one, the correlation has to be
greater for the loss of efficiency to become a severe problem.
60
Multiple Regression Analysis
y = b0 + b1x1 + b2x2 + . . . bkxk + u

MODEL MISSPECIFICATION:
PROXY VARIABLES
PROXY VARIABLES
SOLVING OMITED VARIABLE BIAS PROBLEM
𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 +. . . +𝛽𝑘 𝑥𝑘 + 𝑢

𝑥1 = 𝜆 + 𝜇𝑧
Suppose that a variable y is hypothesized to depend on a set of explanatory
variables x1, ..., xk as shown above, and suppose there are no data on x1
for some reason.

As we have seen, a regression of y on x2, ..., xk would yield biased


estimates of the coefficients and invalid standard errors and tests.
Sometimes, however, these problems can be reduced or eliminated by using
a proxy variable in the place of x1.
A proxy variable is hypothesized to be linearly related to the missing
variable. In the present example, z could act as a proxy for x1.
The validity of the proxy relationship must be justified based on theory,
common sense, or experience. It cannot be checked directly because there
are no data on x1. 62
PROXY VARIABLES

𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 +. . . +𝛽𝑘 𝑥𝑘 + 𝑢

𝑥1 = 𝜆 + 𝜇𝑧

𝑦 = 𝛼 + 𝛽1 (𝜆 + 𝜇𝑧) + 𝛽2 𝑥2 +. . . +𝛽𝑘 𝑥𝑘 + 𝑢
= (𝛼 + 𝛽1 𝜆) + 𝛽1 𝜇𝑧 + 𝛽2 𝑥2 +. . . +𝛽𝑘 𝑥𝑘

The regression model can be rewritten as shown if a suitable proxy


has been identified.
We thus obtain a model with all variables observable.
If the proxy relationship is exact, and we fit this relationship, most
of the regression results will be rescued.

63
PROXY VARIABLES
𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 +. . . +𝛽𝑘 𝑥𝑘 + 𝑢 𝑥1 = 𝜆 + 𝜇𝑧

𝑦 = (𝛼 + 𝛽1 𝜆) + 𝛽1 𝜇𝑧 + 𝛽2 𝑥2 +. . . +𝛽𝑘 𝑥𝑘 + 𝑢

1. The estimates of the coefficients of x2, ..., xk will be the same as those
that would have been obtained if it had been possible to regress y on
x1, ..., xk.
2. The standard errors and t statistics of the coefficients of x2, ..., xk will
be the same as those that would have been obtained if it had been
possible to regress y on x1, ..., xk.
3. R2 will be the same as it would have been if it had been possible to
regress y on x1, ..., xk.
4. The coefficient of z will be an estimate of b1m, so it will not be possible to
obtain an estimate of b1 unless you can guess the value of m..
5. However, the t statistic for z will be the same as that which would
have been obtained for x1 if it had been possible to regress y on x1, ., xk,
and so you can assess the significance of x1, even if you are not able to
estimate its coefficient. 64
PROXY VARIABLES
𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 +. . . +𝛽𝑘 𝑥𝑘 + 𝑢
𝑥1 = 𝜆 + 𝜇𝑧

𝑦 = 𝛼 + 𝛽1 (𝜆 + 𝜇𝑧) + 𝛽2 𝑥2 +. . . +𝛽𝑘 𝑥𝑘 + 𝑢
= (𝛼 + 𝛽1 𝜆) + 𝛽1 𝜇𝑧 + 𝛽2 𝑥2 +. . . +𝛽𝑘 𝑥𝑘 + 𝑢

It will not be possible to estimate a since the intercept in the revised


model is (a+bl), but usually, a is of relatively little interest, anyway.
It is generally more realistic to hypothesize that the relationship
between x1 and z is approximate rather than exact. In that case, the
four results listed above will hold approximately.
However, if z is a poor proxy for x1, then it is possible that some of the
other x variables will try to act as proxies for it, and there will still be a
problem of omitted variable bias.
65
PROXY VARIABLES

𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 𝐼𝑁𝐷𝐸𝑋 + 𝑢

▪ A proxy variable will be illustrated with an educational


attainment model.
▪ We will suppose that educational attainment depends jointly
on cognitive ability and family background.
▪ As usual, ASVABC will be used as the measure of cognitive
ability.
▪ However, the data set has no "family background" variable.
Indeed, it isn’t easy to conceive how such a variable might
be defined.
66
PROXY VARIABLES

𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 𝐼𝑁𝐷𝐸𝑋 + 𝑢

𝐼𝑁𝐷𝐸𝑋 = 𝜆 + 𝜇1 𝐻𝐺𝐶𝑀 + 𝜇2 𝐻𝐺𝐶𝐹

▪ Instead, we will try to find a proxy. One obvious


variable is the mother's educational attainment, HGCM.
▪ However, the father’s educational attainment may also
be relevant.
▪ So, we will hypothesize that the family background
index depends on both.
67
PROXY VARIABLES

𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 𝐼𝑁𝐷𝐸𝑋 + 𝑢

𝐼𝑁𝐷𝐸𝑋 = 𝜆 + 𝜇1 𝐻𝐺𝐶𝑀 + 𝜇2 𝐻𝐺𝐶𝐹

𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 (𝜆 + 𝜇1 𝐻𝐺𝐶𝑀 + 𝜇2 𝐻𝐺𝐶𝐹) + 𝑢


= (𝛼 + 𝛽2 𝜆) + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 𝜇1 𝐻𝐺𝐶𝑀 + 𝛽2 𝜇2 𝐻𝐺𝐶𝐹

Thus, we obtain a relationship expressing HGC as a


function of ASVABC, HGCM, and HGCF.

68
PROXY VARIABLES
. reg hgc asvabc hgcm hgcf

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 3, 566) = 110.83
Model | 1278.24153 3 426.080508 Prob > F = 0.0000
Residual | 2176.00584 566 3.84453329 R-squared = 0.3700
---------+------------------------------ Adj R-squared = 0.3667
Total | 3454.24737 569 6.07073351 Root MSE = 1.9607

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
hgcm | .069403 .0422974 1.641 0.101 -.013676 .152482
hgcf | .1102684 .0311948 3.535 0.000 .0489967 .1715401
_cons | 4.914654 .5063527 9.706 0.000 3.920094 5.909214
------------------------------------------------------------------------------

Here is the corresponding regression using the EAEF Data Set.

69
PROXY VARIABLES
. reg hgc asvabc

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 1, 568) = 284.89
Model | 1153.80864 1 1153.80864 Prob > F = 0.0000
Residual | 2300.43873 568 4.05006818 R-squared = 0.3340
---------+------------------------------ Adj R-squared = 0.3329
Total | 3454.24737 569 6.07073351 Root MSE = 2.0125

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1545378 .0091559 16.879 0.000 .1365543 .1725213
_cons | 5.770845 .4668473 12.361 0.000 4.853888 6.687803
------------------------------------------------------------------------------

Here is the regression of HGC on ASVABC alone.

70
PROXY VARIABLES
. reg hgc asvabc hgcm hgcf

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
hgcm | .069403 .0422974 1.641 0.101 -.013676 .152482
hgcf | .1102684 .0311948 3.535 0.000 .0489967 .1715401
_cons | 4.914654 .5063527 9.706 0.000 3.920094 5.909214
------------------------------------------------------------------------------

. reg hgc asvabc

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1545378 .0091559 16.879 0.000 .1365543 .1725213
_cons | 5.770845 .4668473 12.361 0.000 4.853888 6.687803
------------------------------------------------------------------------------

A comparison of the regressions indicates that the coefficient


of ASVABC is biased upwards if we do not attempt to control
for family background.
71
PROXY VARIABLES
. reg hgc asvabc hgcm hgcf
. cor asvabc hgcm hgcf
------------------------------------------------------------------------------
(obs=570)
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
| asvabc hgcm hgcf
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
--------+---------------------------
hgcm | .069403 .0422974 1.641 0.101 1.0000-.013676
asvabc| .152482
hgcf | .1102684 .0311948 3.535 hgcm|
0.000 0.3819.0489967
1.0000 .1715401
_cons | 4.914654 .5063527 9.706 hgcf|
0.000 0.41793.920094
0.6391 5.909214
1.0000
------------------------------------------------------------------------------

. reg hgc asvabc

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1545378 .0091559 16.879 0.000 .1365543 .1725213
_cons | 5.770845 .4668473 12.361 0.000 4.853888 6.687803
------------------------------------------------------------------------------

This is what we should expect. HGCM and HGCF are likely to


positively affect educational attainment and correlate positively
with ASVABC.
72
PROXY VARIABLES
. reg hgc asvabc hgcm hgcf library siblings

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 5, 564) = 66.87
Model | 1285.58208 5 257.116416 Prob > F = 0.0000
Residual | 2168.66529 564 3.84515122 R-squared = 0.3722
---------+------------------------------ Adj R-squared = 0.3666
Total | 3454.24737 569 6.07073351 Root MSE = 1.9609

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1277852 .010054 12.710 0.000 .1080373 .147533
hgcm | .0619975 .0427558 1.450 0.148 -.0219826 .1459775
hgcf | .1045035 .0314928 3.318 0.001 .042646 .166361
library | .1151269 .1969844 0.584 0.559 -.2717856 .5020394
siblings | -.0509486 .039956 -1.275 0.203 -.1294293 .027532
_cons | 5.236995 .5665539 9.244 0.000 4.124181 6.349808
------------------------------------------------------------------------------

LIBRARY (a dummy variable equal to 1 if anyone in the family


owned a library card when the respondent was 14) and
SIBLINGS (number of brothers and sisters of the respondent)
are two other variables in the data set that might act as
proxies for family background. 73
PROXY VARIABLES
. reg hgc asvabc hgcm hgcf library siblings

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 5, 564) = 66.87
Model | 1285.58208 5 257.116416 Prob > F = 0.0000
Residual | 2168.66529 564 3.84515122 R-squared = 0.3722
---------+------------------------------ Adj R-squared = 0.3666
Total | 3454.24737 569 6.07073351 Root MSE = 1.9609

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1277852 .010054 12.710 0.000 .1080373 .147533
hgcm | .0619975 .0427558 1.450 0.148 -.0219826 .1459775
hgcf | .1045035 .0314928 3.318 0.001 .042646 .166361
library | .1151269 .1969844 0.584 0.559 -.2717856 .5020394
siblings | -.0509486 .039956 -1.275 0.203 -.1294293 .027532
_cons | 5.236995 .5665539 9.244 0.000 4.124181 6.349808
------------------------------------------------------------------------------

The LIBRARY variable was one of three variables included in the


National Longitudinal Survey of Youth to help determine the
influence of family background on education. It has the anticipated
positive coefficient, but it is not significant.
74
PROXY VARIABLES
. reg hgc asvabc hgcm hgcf library siblings

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 5, 564) = 66.87
Model | 1285.58208 5 257.116416 Prob > F = 0.0000
Residual | 2168.66529 564 3.84515122 R-squared = 0.3722
---------+------------------------------ Adj R-squared = 0.3666
Total | 3454.24737 569 6.07073351 Root MSE = 1.9609

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1277852 .010054 12.710 0.000 .1080373 .147533
hgcm | .0619975 .0427558 1.450 0.148 -.0219826 .1459775
hgcf | .1045035 .0314928 3.318 0.001 .042646 .166361
library | .1151269 .1969844 0.584 0.559 -.2717856 .5020394
siblings | -.0509486 .039956 -1.275 0.203 -.1294293 .027532
_cons | 5.236995 .5665539 9.244 0.000 4.124181 6.349808
------------------------------------------------------------------------------

There is a tendency for parents who are ambitious for their children
to limit their number, so SIBLINGS should be expected to have a
negative coefficient. It does, but it is also insignificant.
75
PROXY VARIABLES
. reg hgc asvabc hgcm hgcf library siblings

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 5, 564) = 66.87
Model | 1285.58208 5 257.116416 Prob > F = 0.0000
Residual | 2168.66529 564 3.84515122 R-squared = 0.3722
---------+------------------------------ Adj R-squared = 0.3666
Total | 3454.24737 569 6.07073351 Root MSE = 1.9609

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1277852 .010054 12.710 0.000 .1080373 .147533
hgcm | .0619975 .0427558 1.450 0.148 -.0219826 .1459775
hgcf | .1045035 .0314928 3.318 0.001 .042646 .166361
library | .1151269 .1969844 0.584 0.559 -.2717856 .5020394
siblings | -.0509486 .039956 -1.275 0.203 -.1294293 .027532
_cons | 5.236995 .5665539 9.244 0.000 4.124181 6.349808
------------------------------------------------------------------------------

Further background variables may be relevant for educational


attainment: faith, ethnicity, and region of residence. These variables
are supplied in the data set, but it will be left to you to experiment
with them.
76
EXAMPLE
➢ The demand for Brazilian coffee in the United
States is a function of
▪ the real price of Brazilian coffee Pbc,
▪ the real price of tea Ptea and
▪ real disposable income in the Yd.

Coffee bc = b 0 +b 1 P bc +b 2 P tea +b 3 Yd +e
b 1<0 b 2>0 b 3>0
77
EXAMPLE
Coffeebc = 9.1 + 7.8Pbc + 2.4Ptea + 0.0035Yd
(15.6) (1.2) (0.0010)
t= 0.5 2.0 3.5
𝑅ത 2 = .60 N = 25

The coefficients of the second and third variables, Ptea


and Yd, appear significant in your hypothesized
direction. Still, the first variable, Pbc, appears to have an
insignificant coefficient with an unexpected sign.

78
EXAMPLE
➢ If you think there is a possibility that the demand
for Brazilian coffee is price-inelastic (that is, its
coefficient is zero), you might decide to run the
same equation without the price variable, obtaining:

Coffeebc = 9.3 + 2.6Ptea + 0.0036Yd


s.e. = (1.0) (0.0009)
t = 2.6 4.0
𝑅ത 2 = 0.61 N = 25

79
EXAMPLE
➢ By comparing two equations, we can apply our four
specification criteria for the inclusion of a variable in
an equation
1. Theory: If it’s possible that the demand for coffee could be
price-inelastic, the theory behind dropping the variable seems
plausible.
2. t-test: The t-score of the possibly irrelevant variable is 0.5,
insignificant at any level.
ഥ 𝟐 : 𝑅ത 2 increases when the variable is dropped, indicating
3. 𝐑
that the variable is irrelevant.
4. Bias: The remaining coefficients change only slightly when
Pbc is dropped, suggesting slight bias is caused by excluding
the variable. 80
EXAMPLE
➢ Based upon this analysis, you might conclude that the
demand for Brazilian coffee is indeed price-inelastic and
that the variable is irrelevant and should be dropped from
the model.
➢ Although this conclusion would be unwarranted.
➢ The elasticity of demand for coffee, in general, might be
pretty low (the evidence suggests that it is inelastic only
over a particular range of prices); it is hard to believe that
Brazilian coffee is immune to price competition from other
kinds of coffee.
➢ Indeed, one would expect quite a bit of sensitivity in the
demand for Brazilian coffee concerning the price of, for
example, Colombian coffee. 81
EXAMPLE
➢ To test this hypothesis, Pcc should be added to the
first Equation as a positive function of the price of
Colombian coffee.

Coffeebc = 10.0 + 8.0Pcc - 5.6Pbc + 2.6Ptea + 0.0030Yd


(4.0) (2.0) (1.3) (0.0010)
t = 2.0 - 2.8 2.0 3.0
𝑅ത 2 = .65 N = 25

82
EXAMPLE
➢ By comparing the first and last equations, we can once again
apply our four specification criteria:
1. Theory: The model should always have included both
prices; their logical justification is quite strong.
2. t-Test: The t-score of the new variable, the price of
Colombian coffee, is 2.0, significant at most levels.
3. 𝐑𝟐 : 𝑅2 increases with adding the variable, indicating that
the variable was omitted.
4. Bias: Although two of the coefficients remain virtually
unchanged, indicating that the correlations between these
variables and the price of Colombian coffee variable are
low, the coefficient for the price of Brazilian coffee does
change significantly, indicating bias in the original result.
83
EXAMPLE
➢ Theoretical considerations should never be discarded,
even in the face of statistical insignificance.
➢ If a variable known to be extremely important from a
theoretical point of view turns out to be statistically
insignificant in a particular sample, that variable
should be left in the equation even though it makes
the results look bad.
➢ The more thinking done before the first regression is
run, and the fewer alternative specifications
estimated, the better the regression results will likely
be.
84
Multiple Regression Analysis
y = b0 + b1x1 + b2x2 + . . . bkxk + u

MULTICOLLINEARITY
OLS ASSUMPTIONS
1. Assumptions on regressors
a. Fixed - nonstochastic regressors
b. No multicollinearity
2. Assumptions on the disturbances
a. Random disturbances have zero mean E[ui] = 0
b. Homoskedasticity Var(ui) = s2
c. No serial correlation Cov(ui uj) = 0 i j
3. Assumptions on model and its parameters
a. Constant parameters
b. Linear model
4. Assumption on the probability distribution
a. Normal distribution u ~N(0, s2 ) 86
THE GAUSS-MARKOV THEOREM
➢ Given Gauss-Markov Assumptions it can
be shown that OLS is “BLUE”
➢ Best

➢ Linear

➢ Unbiased

➢ Estimator

Thus, if the assumptions hold, use OLS

87
VARIANCE OF OLS
Given the Gauss−Markov Assumptions
𝜎2
𝑉𝑎𝑟 𝛽መ𝑗 = 2
, where
𝑆𝑆𝑇𝑗 1 − 𝑅𝑗
2
𝑆𝑆𝑇𝑗 = ෍ 𝑥𝑖𝑗 − 𝑥𝑗lj and 𝑅𝑗2 is the 𝑅2

from regressing 𝑥𝑗 on all other 𝑥′s


Variance Inflation Factor VIFj = 1/(1-𝑹𝟐𝒋 )
If no multicollinearity VIF=1,
VIF >10 severe multicollinearity.
88
ERROR VARIANCE ESTIMATE

𝜎ො 2 = ෍ 𝑢ො 𝑖2 ൘ 𝑛 − 𝑘 − 1 ≡ 𝑆𝑆𝑅 Τ𝑑𝑓
1Τ2
thus, 𝑠𝑒 𝛽መ𝑗 = 𝜎ො Τ 𝑆𝑆𝑇𝑗 1 − 𝑅𝑗2

➢ df = n – (k + 1), or df = n – k – 1
➢ df (i.e. degrees of freedom) is the
(number of observations) – (number of
estimated parameters)
89
COMPONENTS OF OLS
VARIANCES
➢ The error variance: a larger s2 implies a
larger variance for the OLS estimators
➢ The total sample variation: a larger SSTj
implies a smaller variance for the
estimators
➢ Linear relationships among the
independent variables: a larger Rj2
implies a larger variance for the estimators

90
MULTICOLLINEARITY
𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢 𝑥2 = 𝜆 + 𝜇𝑥1
Cov(𝑥1 , 𝑦)Var(𝑥2 )−Cov(𝑥2 , 𝑦)Cov(𝑥1 , 𝑥2 )
𝑏1 =
Var(𝑥1 )Var(𝑥2 ) − Cov(𝑥1 , 𝑥2 ) 2

Cov(𝑥1 , 𝑦)Var(𝜆 + 𝜇𝑥1 )−Cov([𝜆 + 𝜇𝑥1 ], 𝑦)Cov(𝑥1 , [𝜆 + 𝜇𝑥1 ])


=
Var(𝑥1 )Var(𝜆 + 𝜇𝑥1 ) − Cov(𝑥1 , [𝜆 + 𝜇𝑥1 ]) 2

Cov(𝑥1 , 𝑦)Var(𝜇𝑥1 )−Cov(𝜇𝑥1 , 𝑦)Cov(𝑥1 , 𝜇𝑥1 )


=
Var(𝑥1 )Var(𝜇𝑥1 ) − Cov(𝑥1 , 𝜇𝑥1 ) 2

What would happen if you tried to run a regression when there is an exact linear
relationship among the explanatory variables?
We will investigate, using the model with two explanatory variables shown above.
91
MULTICOLLINEARITY

𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢 𝑥2 = 𝜆 + 𝜇𝑥1

Cov(𝑥1 , 𝑦)Var(𝜇𝑥1 )−Cov(𝜇𝑥1 , 𝑦)Cov(𝑥1 , 𝜇𝑥1 )


𝑏1 =
Var(𝑥1 )Var(𝜇𝑥1 ) − Cov(𝑥1 , 𝜇𝑥1 ) 2

Cov(𝑥1 , 𝑦)𝜇2 Var(𝑥1 ) − 𝜇Cov(𝑥1 , 𝑦)𝜇Cov(𝑥1 , 𝑥1 )


=
Var(𝑥1 )𝜇2 Var(𝑥1 ) − 𝜇Cov(𝑥1 , 𝑥1 ) 2

𝜇2 Cov(𝑥1 , 𝑦)Var(𝑥1 ) − 𝜇2 Cov(𝑥1 , 𝑦)Var(𝑥1 ) 0


= 2 2
=
𝜇 Var(𝑥1 )Var(𝑥1 ) − 𝜇Var(𝑥1 ) 0

It turns out that both the numerator and the denominator are equal
to zero. The regression coefficient is not defined.
92
DETECTING
MULTICOLLINEARITY
➢ It is unusual for there to be an exact relationship among the
explanatory variables in a regression. When this occurs, it is
typically because of a logical error in the specification.
➢ How can we measure the multicollinearity in the regression
equation?
Eigenvalues of correlation matrix 𝐑
𝜆1 > 𝜆2 >. . . . > 𝜆𝑠 > 𝜆𝑠+1 . . > 𝜆𝑘 1. High R2 and F values
det( 𝐑) = 𝜆1 ∗ 𝜆2 ∗. . . . .∗ 𝜆𝑘 but insignificant
coefficient estimates
𝑇𝑟𝑎𝑐𝑒(𝐑) = 𝜆1 + 𝜆2 +. . . . . +𝜆𝑘
2. Unexpected
𝜆𝑠 ≈ 𝜆𝑠+1 . . ≈ 𝜆𝑘 ≈ 0 coefficient sign and
𝜆𝑚𝑎𝑥 value
𝐂𝐨𝐧𝐝𝐢𝐭𝐢𝐨𝐧 𝐈𝐧𝐝𝐞𝐱 = 𝜅 =
𝜆𝑚𝑖𝑛 3. Condition index >30
93
ALLEVIATING MULTICOLLINEARITY
PROBLEM
𝜎𝑢2 1
pop.var(𝑏1 ) = ×
𝑛Var(𝑥1 ) 1 − 𝑟𝑥21,𝑥2
What can you do about this problem if you encounter it? Looking at the
model with two explanatory variables. Before doing this, two important
points should be emphasized.
First, multicollinearity does not cause the regression coefficients to be
biased. Their probability distributions are still centered over the actual values
if the regression specification is correct, but they have unsatisfactorily
large variances.
Second, the standard errors and t-tests remain valid. The standard errors
are larger than they would have been without multicollinearity, warning us
that the regression estimates are erratic.
Since the problem of multicollinearity is caused by the population variances
of the coefficients being unsatisfactorily large, we will seek ways of reducing
the variances. 94
ALLEVIATING
MULTICOLLINEARITY PROBLEM
Possible measures for alleviating multicollinearity
𝜎𝑢2 1
pop.var(𝑏1 ) = ×
𝑛Var(𝑥1 ) 1 − 𝑟𝑥21,𝑥2

1. Reduce 𝜎𝑢2 by including further relevant variables in the


model.
We will look at the various components of the population
variance.
We can reduce it by bringing more variables into the
model and reducing the population variance of the
disturbance term. 95
ALLEVIATING
MULTICOLLINEARITY PROBLEM
Possible measures for alleviating multicollinearity
𝜎𝑢2 1
pop.var(𝑏1 ) = ×
𝑛Var(𝑥1 ) 1 − 𝑟𝑥21,𝑥2
2. Increase the number of observations.
Surveys: increase the budget, use clustering
Time series: use quarterly instead of annual data
The next factor to look at is n, the number of observations. If you
are working with cross-section data (individuals, households,
enterprises, etc.) and undertaking a survey, you could increase the
sample size by negotiating a bigger budget.
Suppose you are working with time series data. In that case, you can
increase the sample by working with shorter time intervals for the
data, for example, quarterly or monthly data instead of annual data.
96
ALLEVIATING
MULTICOLLINEARITY PROBLEM
Possible measures for alleviating multicollinearity
𝜎𝑢2 1
pop.var(𝑏1 ) = ×
𝑛Var(𝑥1 ) 1 − 𝑟𝑥21,𝑥2
3. Increase Var(x1).
A third possible way of reducing the multicollinearity problem might be to
increase the variance of the explanatory variables. This is possible only at
the design stage of a survey.
For example, suppose you were planning a household survey to investigate
how expenditure patterns vary with income. In that case, you should ensure
that the sample includes relatively rich, relatively poor, and middle-income
households.
Another possibility might be to reduce the correlation between the
explanatory variables. This is possible only at the design stage of a survey,
and even then, it takes work.
97
ALLEVIATING
MULTICOLLINEARITY PROBLEM
Possible measures for alleviating multicollinearity
𝜎𝑢2 1
pop.var(𝑏1 ) = ×
𝑛Var(𝑥1 ) 1 − 𝑟𝑥21,𝑥2

4. Reduce 𝑟.𝑥1 𝑥2

Another possibility might be to reduce the correlation


between the explanatory variables.
This is possible only at the design stage of a survey, and
even then, it takes work.

5898
ALLEVIATING
MULTICOLLINEARITY PROBLEM
Possible measures for alleviating multicollinearity
𝜎𝑢2 1
pop.var(𝑏1 ) = ×
𝑛Var(𝑥1 ) 1 − 𝑟𝑥21,𝑥2

5. Combine the correlated variables.

If the correlated variables are conceptually


similar, combining them into some overall
index may be reasonable.

5999
ALLEVIATING
MULTICOLLINEARITY PROBLEM
Possible measures for alleviating multicollinearity
𝜎𝑢2 1
pop.var(𝑏1 ) = ×
𝑛Var(𝑥1 ) 1 − 𝑟𝑥21,𝑥2

6. Drop some of the correlated variables.

Drop some of the correlated variables if they have


insignificant coefficients.
However, this approach to multicollinearity is dangerous
because some of the variables dropped may genuinely
belong in the model, and their omission may cause omitted
variable bias. 100
ALLEVIATING
MULTICOLLINEARITY PROBLEM
Possible measures for alleviating multicollinearity
𝜎𝑢2 1
pop.var(𝑏1 ) = ×
𝑛Var(𝑥1 ) 1 − 𝑟𝑥21,𝑥2

7. Empirical restriction 𝑦 = 𝛼 + 𝛽1 𝑥 + 𝛽2 𝑝 + 𝑢

A further way of dealing with the problem of multicollinearity is to use


extraneous information concerning the coefficient of one of the variables, if
available.
For example, suppose that y in the above equation is the demand for a
consumer expenditure category, x is aggregate disposable personal income,
and p is a price index.
You would use time series data to fit a model of this type. If x and p are
highly correlated, which is often the case with time series variables, the
problem of multicollinearity might be eliminated in the following way. 101
ALLEVIATING
MULTICOLLINEARITY PROBLEM
Possible measures for alleviating multicollinearity
𝜎𝑢2 1
pop.var(𝑏1 ) = ×
𝑛Var(𝑥1 ) 1 − 𝑟𝑥21,𝑥2

7. Empirical restriction
𝑦 = 𝛼 + 𝛽1 𝑥 + 𝛽2 𝑝 + 𝑢 𝑦 ′ = 𝛼 ′ + 𝛽1′ 𝑥 ′ + 𝑢

𝑦ො ′ = 𝑎′ + 𝑏1′ 𝑥 ′

Obtain data on income and expenditure on the category from a household


survey and regress y’ on x’. (The ‘ marks indicate that the data are
household data, not aggregate data.)
This is a simple regression because there will be relatively slight variation
in the price paid by the households.
102
ALLEVIATING
MULTICOLLINEARITY PROBLEM
Possible measures for alleviating multicollinearity
7. Empirical restriction
𝑦 = 𝛼 + 𝛽1 𝑥 + 𝛽2 𝑝 + 𝑢 𝑦 ′ = 𝛼 ′ + 𝛽1′ 𝑥 ′ + 𝑢

𝑦ො ′ = 𝑎′ + 𝑏1′ 𝑥 ′
𝑦=𝛼+ 𝑏1′ 𝑥 + 𝛽2 𝑝 + 𝑢

𝑧 = 𝑦 − 𝑏1′ 𝑥 = 𝛼 + 𝛽2 𝑝 + 𝑢

Now substitute b' for b1 in the time series model. Subtract it


from both sides and regress z = y - b' x1 on price.
This is a simple regression, so multicollinearity has been
eliminated.
103
ALLEVIATING
MULTICOLLINEARITY PROBLEM
Possible measures for alleviating multicollinearity
7. Empirical restriction
𝑦 = 𝛼 + 𝛽1 𝑥 + 𝛽2 𝑝 + 𝑢 𝑦 ′ = 𝛼 ′ + 𝛽1′ 𝑥 ′ + 𝑢
𝑦ො ′ = 𝑎′ + 𝑏1′ 𝑥 ′
𝑦=𝛼+ 𝑏1′ 𝑥 + 𝛽2 𝑝 + 𝑢

𝑧 = 𝑦 − 𝑏1′ 𝑥 = 𝛼 + 𝛽2 𝑝 + 𝑢

Some things could be improved with this technique.


First, the b1 coefficients may be conceptually different in time series and
cross-section contexts.
Second, since we subtract the estimated income component b' x1, not the
true income component b 1x1, from y when constructing z, we have
introduced an element of measurement error in the dependent variable.
104
ALLEVIATING
MULTICOLLINEARITY PROBLEM
Possible measures for alleviating multicollinearity
𝜎𝑢2 1
pop.var(𝑏1 ) = ×
𝑛Var(𝑥1 ) 1 − 𝑟𝑥21,𝑥2
8. Theoretical restriction
𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢

Last, but by no means least, is the use of a theoretical restriction, which is


defined as a hypothetical relationship among the parameters of a
regression model.
It will be explained using an educational attainment model as an example.
Suppose we hypothesize that the highest grade completed, HGC depends
on ASVABC, and the highest grade completed by the respondent's mother
and father, HGCM and HGCF, respectively.
105
ALLEVIATING MULTICOLLINEARITY
PROBLEM
. reg hgc asvabc hgcm hgcf

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 3, 566) = 110.83
Model | 1278.24153 3 426.080508 Prob > F = 0.0000
Residual | 2176.00584 566 3.84453329 R-squared = 0.3700
---------+------------------------------ Adj R-squared = 0.3667
Total | 3454.24737 569 6.07073351 Root MSE = 1.9607

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
hgcm | .069403 .0422974 1.641 0.101 -.013676 .152482
hgcf | .1102684 .0311948 3.535 0.000 .0489967 .1715401
_cons | 4.914654 .5063527 9.706 0.000 3.920094 5.909214
------------------------------------------------------------------------------
A one-point increase in ASVABC increases HGC by 0.13 years.
HGC increases by 0.07 years for every extra year of schooling of the mother and
0.11 years for every additional year of schooling of the father.
Mother's education is generally held to be at least, if not more, important than
father's education for educational attainment, so this outcome is unexpected. 106
ALLEVIATING MULTICOLLINEARITY
. reg hgc asvabc hgcm hgcf
PROBLEM
Source | SS df MS Number of obs = 570
---------+------------------------------ F( 3, 566) = 110.83
Model | 1278.24153 3 426.080508 Prob > F = 0.0000
Residual | 2176.00584 566 3.84453329 R-squared = 0.3700
---------+------------------------------ Adj R-squared = 0.3667
Total | 3454.24737 569 6.07073351 Root MSE = 1.9607

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
hgcm | .069403 .0422974 1.641 0.101 -.013676 .152482
hgcf | .1102684 .0311948 3.535 0.000 .0489967 .1715401
_cons | 4.914654 .5063527 9.706 0.000 3.920094 5.909214
------------------------------------------------------------------------------

It is also surprising that the coefficient of HGCM is not significant, even at


the 5% level, using a one-tailed test.
. cor hgcm hgcf
(obs=570)
However, assertive mating leads to a high | hgcm hgcf
correlation between HGCM and HGCF, and --------+------------------
hgcm| 1.0000
the regression appears to suffer from hgcf| 0.6391 1.0000
multicollinearity. 107
ALLEVIATING MULTICOLLINEARITY
PROBLEM
Possible measures for alleviating multicollinearity
𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝑆𝐴𝐵𝐶 + 𝛽2 𝐻𝐺𝐶𝑀 + 𝛽3 𝐻𝐺𝐶𝐹 + 𝑢

Theoretical restriction 𝛽2 = 𝛽3

Suppose that we hypothesize that mother's and father's


education are equally important. We can then impose the
restriction b2 = b3.
This allows us to re-write the equation as shown.

𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝑆𝐴𝐵𝐶 + 𝛽2 (𝐻𝐺𝐶𝑀 + 𝐻𝐺𝐶𝐹) + 𝑢


= 𝛼 + 𝛽1 𝐴𝑆𝑉𝑆𝐴𝐵𝐶 + 𝛽2 𝐻𝐺𝐶𝑃 + 𝑢

108
ALLEVIATING MULTICOLLINEARITY
. g hgcp=hgcm+hgcf PROBLEM
. reg hgc asvabc hgcp

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 2, 567) = 166.22
Model | 1276.73764 2 638.368819 Prob > F = 0.0000
Residual | 2177.50973 567 3.84040517 R-squared = 0.3696
---------+------------------------------ Adj R-squared = 0.3674
Total | 3454.24737 569 6.07073351 Root MSE = 1.9597
------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295653 .0099485 13.024 0.000 .1100249 .1491057
hgcp | .093741 .0165688 5.658 0.000 .0611973 .1262847
_cons | 4.823123 .4844829 9.955 0.000 3.871523 5.774724
------------------------------------------------------------------------------
𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝑆𝐴𝐵𝐶 + 𝛽2 (𝐻𝐺𝐶𝑀 + 𝐻𝐺𝐶𝐹) + 𝑢
= 𝛼 + 𝛽1 𝐴𝑆𝑉𝑆𝐴𝐵𝐶 + 𝛽2 𝐻𝐺𝐶𝑃 + 𝑢
Defining HGCP to be the sum of HGCM and HGCF, the equation may be
rewritten as shown. The problem caused by the high correlation between
HGCM and HGCF has been eliminated.
The estimate of b2 is now 0.094. 109
ALLEVIATING MULTICOLLINEARITY
. g hgcp=hgcm+hgcf PROBLEM
. reg hgc asvabc hgcp

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295653 .0099485 13.024 0.000 .1100249 .1491057
hgcp | .093741 .0165688 5.658 0.000 .0611973 .1262847
_cons | 4.823123 .4844829 9.955 0.000 3.871523 5.774724
------------------------------------------------------------------------------

. reg hgc asvabc hgcm hgcf

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
hgcm | .069403 .0422974 1.641 0.101 -.013676 .152482
hgcf | .1102684 .0311948 3.535 0.000 .0489967 .1715401
_cons | 4.914654 .5063527 9.706 0.000 3.920094 5.909214
------------------------------------------------------------------------------

Not surprisingly, this is a compromise between the coefficients of


HGCM and HGCF in the previous specification.
110
ALLEVIATING MULTICOLLINEARITY
. g hgcp=hgcm+hgcf PROBLEM
. reg hgc asvabc hgcp

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295653 .0099485 13.024 0.000 .1100249 .1491057
hgcp | .093741 .0165688 5.658 0.000 .0611973 .1262847
_cons | 4.823123 .4844829 9.955 0.000 3.871523 5.774724
------------------------------------------------------------------------------

. reg hgc asvabc hgcm hgcf

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
hgcm | .069403 .0422974 1.641 0.101 -.013676 .152482
hgcf | .1102684 .0311948 3.535 0.000 .0489967 .1715401
_cons | 4.914654 .5063527 9.706 0.000 3.920094 5.909214
------------------------------------------------------------------------------

The standard error of HGCP is much smaller than those of HGCM and
HGCF. The restriction has led to a large gain in efficiency, and the
multicollinearity problem has been eliminated.
111
ALLEVIATING MULTICOLLINEARITY
. g hgcp=hgcm+hgcf PROBLEM
. reg hgc asvabc hgcp

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295653 .0099485 13.024 0.000 .1100249 .1491057
hgcp | .093741 .0165688 5.658 0.000 .0611973 .1262847
_cons | 4.823123 .4844829 9.955 0.000 3.871523 5.774724
------------------------------------------------------------------------------

. reg hgc asvabc hgcm hgcf

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
hgcm | .069403 .0422974 1.641 0.101 -.013676 .152482
hgcf | .1102684 .0311948 3.535 0.000 .0489967 .1715401
_cons | 4.914654 .5063527 9.706 0.000 3.920094 5.909214
------------------------------------------------------------------------------
The t statistic is very high. Thus, imposing the restriction has improved
the regression results. However, the restriction may not be valid. We
should test it. Testing theoretical restrictions is one of the topics discussed later.
112
ALLEVIATING MULTICOLLINEARITY
PROBLEM

➢ Biased estimation techniques


➢ Some biased estimators may reduce the
effects of multicollinearity
▪ Ridge Estimation
▪ Principle Component Estimation
▪ Stein Type Shrinkage Estimates

113
Multiple Regression Analysis
y = b0 + b1x1 + b2x2 + . . . bkxk + u

INFERENCE: PRECISION OF THE


MULTIPLE REGRESSION COEFFICIENTS
PRECISION OF THE MULTIPLE
REGRESSION COEFFICIENTS
𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢 𝑦ො𝑖 = 𝑎 + 𝑏1 𝑥1𝑖 + 𝑏2 𝑥2𝑖

𝜎𝑢2 1
pop.var(𝑏1 ) = ×
𝑛Var(𝑥1 ) 1 − 𝑟𝑥21 ,𝑥2
This sequence investigates the population variances and standard errors of the
slope coefficients in a model with two explanatory variables.
The expression for the population variance of b1 is shown above. The
expression for b2 is the same, with the subscripts 1 and 2 interchanged.
The first factor in the expression is identical to that for the population variance
of the slope coefficient in a simple regression model.
The population variance of b1 depends on the population variance of the
disturbance term, the number of observations, and the variance of x1 for the
same reasons as in a simple regression model. 115
PRECISION OF THE MULTIPLE
REGRESSION COEFFICIENTS
𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢 𝑦ො𝑖 = 𝑎 + 𝑏1 𝑥1𝑖 + 𝑏2 𝑥2𝑖

𝜎𝑢2 1
pop.var(𝑏1 ) = ×
𝑛Var(𝑥1 ) 1 − 𝑟𝑥21,𝑥2

In multiple regression analysis, the expression is multiplied by a factor that


depends on the correlation between x1 and x2.
The higher the correlation between the explanatory variables, the greater will be
the population variance.
This is easy to understand intuitively. The greater the correlation, the harder it is
to discriminate between the effects of the explanatory variables on y, and the less
accurate will be the regression estimates.
The population variance expression above is valid only for a model with two
explanatory variables. When there are more than two, the expression becomes
much more complex, and switching to matrix algebra is sensible.
116
PRECISION OF THE MULTIPLE
REGRESSION COEFFICIENTS

𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢 𝑦ො𝑖 = 𝑎 + 𝑏1 𝑥1𝑖 + 𝑏2 𝑥2𝑖

𝜎𝑢2 1
pop.var(𝑏1 ) = ×
𝑛Var(𝑥1 ) 1 − 𝑟𝑥21,𝑥2
The population variance of u has to be
estimated. The sample variance of the 𝑛
residuals provides a consistent estimator. 𝑠𝑢2 = Var(𝑒)
𝑛−𝑘−1
Still, it is biased downwards by a factor
(n-k-1)/n in a finite sample, where k is
the number of explanatory variables.
𝑠𝑢2 1
s.e. (𝑏1 ) = ×
𝑛Var(𝑥1 ) 1 − 𝑟𝑥21,𝑥2

Thus, the expression above estimates the standard deviation of the probability
distribution of b1, known as the standard error of b1 for short.
117
ASSUMPTIONS OF THE CLASSICAL
LINEAR MODEL (CLM)
➢ So far, we know that given the Gauss-
Markov assumptions, OLS is BLUE,
➢ In order to do classical hypothesis testing,
we need to add another assumption (beyond
the Gauss-Markov assumptions)
➢ Assume that u is independent of x1, x2,…, xk
and u is normally distributed with zero
mean and variance s2: u ~ Normal(0,s2)

118
CLM ASSUMPTIONS (cont)
➢ Under CLM, OLS is not only BLUE, but is
the minimum variance unbiased estimator
➢ We can summarize the population
assumptions of CLM as follows
➢ y|x ~ Normal(b0 + b1x1 +…+ bkxk, s2)
➢ While for now we just assume normality,
clear that sometimes not the case
➢ Large samples will let us drop normality

119
The homoskedastic normal distribution with
a single explanatory variable
y
f(y|x)

. E(y|x) = b + b x
0 1
.
Normal
distributions

x1 x2
120
NORMAL SAMPLING
DISTRIBUTIONS
Under the CLM assumptions, conditional on
the sample values of the independent
variables
𝛽መ𝑗 ~ Normal 𝛽𝑗 , 𝑉𝑎𝑟 𝛽መ𝑗 , so that
𝛽መ𝑗 − 𝛽𝑗
൘ መ ~ Normal 0,1
𝑠𝑑 𝛽𝑗
𝛽መj is distributed normally because it
is a linear combination of the errors

121
The t Test
Under the CLM assumptions
𝛽መj − 𝛽𝑗
൘ መ ~ 𝑡𝑛−𝑘−1
𝑠𝑒 𝛽𝑗
Note this is a 𝑡 distribution (vs
normal)
because we have to estimate 𝜎 2 by 𝜎ො 2
Note the degrees of freedom:𝑛 − 𝑘 − 1

122
The t Test
➢ Knowing the sampling distribution for the
standardized estimator allows us to carry
out hypothesis tests
➢ Start with a null hypothesis
➢ For example, H0: bj=0
➢ If accept null, then accept that xj does not
affect y, controlling for other x’s

123
The t Test
To perform our test we first need to form
𝛽መ𝑗
"the" 𝑡 statistic for 𝛽መj : 𝑡𝛽෡𝑗 ≡ ൘ መ
𝑠𝑒 𝛽𝑗
We will then use our 𝑡 statistic along
with
a rejection rule to determine whether to
accept the null hypothesis, H0

124
t Test: ONE-SIDED ALTERNATIVES
➢ Besides our null, H0, we need an
alternative hypothesis, H1, and a
significance level
➢ H1 may be one-sided, or two-sided
◼ H1: bj > 0 and H1: bj < 0 are one-sided
◼ H1: bj  0 is a two-sided alternative
➢ If we want to have only a 5% probability
of rejecting H0 if it is really true, then we
say our significance level is 5% 125
ONE-SIDED ALTERNATIVES (cont)
➢ Having picked a significance level, , we
look up the (1 – )th percentile in a t
distribution with n – k – 1 df and call this c,
the critical value
➢ We can reject the null hypothesis if the t
statistic is greater than the critical value
➢ If the t statistic is less than the critical
value then we fail to reject the null

126
ONE-SIDED ALTERNATIVES (cont)

yi = b0 + b1xi1 + … + bkxik + ui

H0: bj = 0 H1: bj > 0

Fail to reject
reject
(1 - ) 
0 c
127
ONE-SIDED vs TWO-SIDED
➢ Because the t distribution is symmetric,
testing H1: bj < 0 is straightforward. The
critical value is just the negative of before
➢ We can reject the null if the t statistic < –c,
and if the t statistic > than –c then we fail
to reject the null
➢ For a two-sided test, we set the critical
value based on /2 and reject H1: bj  0 if
the absolute value of the t statistic > c
128
TWO-SIDED ALTERNATIVES

yi = b0 + b1Xi1 + … + bkXik + ui

H0: bj = 0 H1: bj  0
fail to reject

reject reject
/2 (1 - ) /2
-c 0 c
129
SUMMARY FOR H0: bj = 0
➢ Unless otherwise stated, the alternative is
assumed to be two-sided
➢ If we reject the null, we typically say “xj is
statistically significant at the  % level”
➢ If we fail to reject the null, we typically
say “xj is statistically insignificant at the 
% level”

130
COEFFICIENT HYPOTHES TEST:
EXAMPLE
. reg earnings hgc asvabc

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 2, 567) = 39.98
Model | 4745.74965 2 2372.87483 Prob > F = 0.0000
Residual | 33651.2874 567 59.3497133 R-squared = 0.1236
---------+------------------------------ Adj R-squared = 0.1205
Total | 38397.0371 569 67.4816117 Root MSE = 7.7039

------------------------------------------------------------------------------
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgc | .7390366 .1606216 4.601 0.000 .4235506 1.054523
asvabc | .1545341 .0429486 3.598 0.000 .0701764 .2388918
_cons | -4.624749 2.0132 -2.297 0.022 -8.578989 -.6705095
------------------------------------------------------------------------------

H0: b1 = 0 𝑏1 0.739 |t| < ttable


𝑡= = = 4.60
H1: b1  0 𝑠𝑒(𝑏1 ) 0.161 Prob.(t) < 0.05
Reject H0 HGC has a significant effect on Earnings at 95%
confidence level. 131
COEFFICIENT HYPOTHES TEST:
EXAMPLE
. reg earnings hgc asvabc

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 2, 567) = 39.98
Model | 4745.74965 2 2372.87483 Prob > F = 0.0000
Residual | 33651.2874 567 59.3497133 R-squared = 0.1236
---------+------------------------------ Adj R-squared = 0.1205
Total | 38397.0371 569 67.4816117 Root MSE = 7.7039

------------------------------------------------------------------------------
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgc | .7390366 .1606216 4.601 0.000 .4235506 1.054523
asvabc | .1545341 .0429486 3.598 0.000 .0701764 .2388918
_cons | -4.624749 2.0132 -2.297 0.022 -8.578989 -.6705095
------------------------------------------------------------------------------

H0: b2 = 0 𝑏2 0.155 |t| < ttable


𝑡= = = 3.60
H1: b2  0 𝑠𝑒(𝑏2 ) 0.043 Prob.(t) < 0.05
Reject H0 ASVABC has a significant effect on Earnings at
95% confidence level. 132
CONFIDENCE INTERVALS
Another way to use classical statistical
testing is to construct a confidence interval
using the same critical value as was used for
a two-sided test
A (1 - ) % confidence interval is defined as
𝛼
𝛽መ𝑗 ± 𝑐 • 𝑠𝑒 𝛽መ𝑗 , where c is the 1− percentile
2
in a 𝑡𝑛−𝑘−1 distribution

133
TESTING OTHER HYPOTHESES
A more general form of the t statistic
recognizes that we may want to test
something like H0: bj = aj
In this case, the appropriate t statistic is

𝛽መ𝑗 − 𝑎𝑗
𝑡= ൘ መ , where
𝑠𝑒 𝛽𝑗
𝑎𝑗 = 0 for the standard test

134
Computing p-values for t tests
➢ An alternative to the classical approach is
to ask, “what is the smallest significance
level at which the null would be
rejected?”
➢ So, compute the t statistic, and then look
up what percentile it is in the appropriate
t distribution – this is the p-value
➢ p-value is the probability we would
observe the t statistic we did, if the null
were true 135
Stata and p-values, t tests, etc.
➢ Most computer packages will compute the
p-value for you, assuming a two-sided test
➢ If you really want a one-sided alternative,
just divide the two-sided p-value by 2
➢ Stata provides the t statistic, p-value, and
95% confidence interval for H0: bj = 0 for
you, in columns labeled “t”, “P > |t|” and
“[95% Conf. Interval]”, respectively

136
TESTING A LINEAR
COMBINATION
Suppose instead of testing whether b1 is
equal to a constant, you want to test if it is
equal to another parameter,
that is H0 : b1 = b2
Use same basic procedure for forming a t
statistic
𝛽መ1 − 𝛽መ2
𝑡=
𝑠𝑒 𝛽መ1 − 𝛽መ2

137
TESTING LINEAR COMBO
Since
𝑠𝑒 𝛽መ1 − 𝛽መ2 = 𝑉𝑎𝑟 𝛽መ1 − 𝛽መ2 , then
𝑉𝑎𝑟 𝛽መ1 − 𝛽መ2 = 𝑉𝑎𝑟 𝛽መ1 + 𝑉𝑎𝑟 𝛽መ2 − 2𝐶𝑜𝑣 𝛽መ1 , 𝛽መ2
1ൗ
2 2 2
𝑠𝑒 𝛽መ1 − 𝛽መ2 = 𝑠𝑒 𝛽መ1 + 𝑠𝑒 𝛽መ2 − 2𝑠12
where 𝑠12 is an estimate of 𝐶𝑜𝑣 𝛽መ1 , 𝛽መ2

138
TESTING A LINEAR COMBO
➢ So, to use formula, need s12, which
standard output does not have
➢ Many packages will have the option to get
it or will perform the test for you
➢ In Stata, after reg y x1 x2 … xk you would
type test x1 = x2 to get a p-value for the test
➢ More generally, you can always restate the
problem to get the test you want

139
EXAMPLE:
➢ Suppose you are interested in the effect of campaign
expenditures on outcomes
➢ Model is
voteA = b0+b1log(expendA)+b2log(expendB)+b3prtystrA + u
➢ H0: b1 = - b2, or H0: q1 = b1 + b2 = 0
b1 = q1 – b2, so substitute in and rearrange Model 
voteA = b0+q1log(expendA)+b2log(expendB - expendA)+
b3prtystrA + u
140
EXAMPLE:
➢ This is the same model as originally, but
now you get a standard error for b1 – b2 = q1
directly from the basic regression
➢ Any linear combination of parameters could
be tested in a similar manner
➢ Other examples of hypotheses about a
single linear combination of parameters:
◼ b1 = 1 + b2 ; b1 = 5b2 ; b1 = -1/2b2 ; etc

141
MULTIPLE LINEAR
RESTRICTIONS
➢ Everything we’ve done so far has involved
testing a single linear restriction (e.g., b1 = 0
or b1 = b2 )
➢ However, we may want to test multiple
hypotheses about our parameters jointly
➢ A typical example is testing “exclusion
restrictions” – we want to know if a group
of parameters are all equal to zero

142
TESTING EXCLUSION
RESTRICTIONS
➢ Now the null hypothesis might be
something like H0: bk-q+1 = 0, ... , bk = 0
➢ The alternative is just H1: H0 is not true
➢ Can’t just check each t statistic separately
because we want to know if the q
parameters are jointly significant at a given
level – it is possible for none to be
individually significant at that level

143
EXCLUSION RESTRICTIONS (cont)
To do the test, we need to estimate the “restricted
model” without xk-q+1,, …, xk included, as well as
the “unrestricted model” with all x’s included
Intuitively, we want to know if the change in SSR
is big enough to warrant the inclusion of xk-q+1,,
…, xk
𝑆𝑆𝑅𝑟 − 𝑆𝑆𝑅𝑢𝑟 Τ𝑞
𝐹≡
𝑆𝑆𝑅𝑢𝑟 Τ 𝑛 − 𝑘 − 1
where r is restricted and ur is unrestricted 144
The F statistic
➢ The F statistic is always positive since the
SSR from the restricted model can’t be less
than the SSR from the unrestricted
➢ Essentially, the F statistic measures the
relative increase in SSR when moving from
the unrestricted to a restricted model
▪ q = number of restrictions, or dfr – dfur
▪ n – k – 1 = dfur
145
The F statistic
➢ To decide if the increase in SSR when we
move to a restricted model is “big enough”
to reject the exclusions, we need to know
about the sampling distribution of our F stat
➢ Not surprisingly, F ~ Fq,n-k-1, where q is
referred to as the numerator degrees of
freedom and n – k – 1 as the denominator
degrees of freedom

146
The F statistic

f(F)
fail to reject If F > c Reject H0
at  significance level

reject
(1 - )
0 c F
147
The R 2 form of the F statistic
Because the SSR’s may be large and unwieldy, an
alternative form of the formula is useful
We use the fact that SSR = SST(1 – R2) for any
regression, so can substitute in for SSRu and SSRur

2
𝑅𝑢𝑟 − 𝑅𝑟2 Τ𝑞
𝐹= 2 Τ
1 − 𝑅𝑢𝑟 𝑛−𝑘−1
where again r is restricted and ur is unrestricted

148
OVERALL SIGNIFICANCE
𝑦 = 𝛼 + 𝛽1 𝑥1 +. . . +𝛽𝑘 𝑥𝑘 + 𝑢
A special case of exclusion restrictions is to test
◼ H0: b1 = b2 =…= bk = 0
◼ H1: At least one of the b ≠ 0
Since the R2 from a model with only an intercept
will be zero, the F statistic is simply

𝑅 2 Τ𝑘
𝐹=
1 − 𝑅2 Τ 𝑛 − 𝑘 − 1
149
OVERALL SIGNIFICANCE
➢ In the multiple regression model, the roles of
the F and t tests differ.
▪ The F test tests the joint explanatory power
of the variables,
▪ while the t-tests test their explanatory
power individually.

150
F TESTS OF OVERALL SIGNIFICANCE
𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 𝐻𝐺𝐶𝑀 + 𝛽3 𝐻𝐺𝐶𝐹 + 𝑢
𝐻0 : 𝛽1 = 𝛽2 = 𝛽3 = 0
Source | SS df MS Number of obs = 570
---------+------------------------------ F( 3, 566) = 110.83
Model | 1278.24153 3 426.080508 Prob > F = 0.0000
Residual | 2176.00584 566 3.84453329 R-squared = 0.3700
---------+------------------------------ Adj R-squared = 0.3667
Total | 3454.24737 569 6.07073351 Root MSE = 1.9607

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
hgcm | .069403 .0422974 1.641 0.101 -.013676 .152482
hgcf | .1102684 .0311948 3.535 0.000 .0489967 .1715401
_cons | 4.914654 .5063527 9.706 0.000 3.920094 5.909214
------------------------------------------------------------------------------

𝐸𝑆𝑆/𝑘 1278/3
𝐹(𝑘, 𝑛 − 𝑘 − 1) = 𝐹(3,566) = = 110.8
𝑅𝑆𝑆/(𝑛 − 𝑘 − 1) 2176/566
Hence, the F statistic is 110.8. All serious regression packages compute it for you
as part of the diagnostics in the regression output.
151
F TESTS OF OVERALL SIGNIFICANCE
𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 𝐻𝐺𝐶𝑀 + 𝛽3 𝐻𝐺𝐶𝐹 + 𝑢
𝐻0 : 𝛽1 = 𝛽2 = 𝛽3 = 0
. reg hgc asvabc hgcm hgcf
Source | SS df MS Number of obs = 570
---------+------------------------------ F( 3, 566) = 110.83
Model | 1278.24153 3 426.080508 Prob > F = 0.0000
Residual | 2176.00584 566 3.84453329 R-squared = 0.3700
---------+------------------------------ Adj R-squared = 0.3667
Total | 3454.24737 569 6.07073351 Root MSE = 1.9607

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
hgcm | .069403 .0422974 1.641 0.101 -.013676 .152482
hgcf | .1102684 .0311948 3.535 0.000 .0489967 .1715401
_cons | 4.914654 .5063527 9.706 0.000 3.920094 5.909214
------------------------------------------------------------------------------
1278/3
𝐹crit,0.1% (3,120) = 5.78 𝐹(3,566) = = 110.8
2176/566
This result could have been anticipated because ASVABC and HGCF have highly
significant t statistics. So, we knew in advance that both b1 and b3 were non-zero.
152
F TESTS OF OVERALL SIGNIFICANCE
𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 𝐻𝐺𝐶𝑀 + 𝛽3 𝐻𝐺𝐶𝐹 + 𝑢
𝐻0 : 𝛽1 = 𝛽2 = 𝛽3 = 0
. reg hgc asvabc hgcm hgcf
Source | SS df MS Number of obs = 570
---------+------------------------------ F( 3, 566) = 110.83
Model | 1278.24153 3 426.080508 Prob > F = 0.0000
Residual | 2176.00584 566 3.84453329 R-squared = 0.3700
---------+------------------------------ Adj R-squared = 0.3667
Total | 3454.24737 569 6.07073351 Root MSE = 1.9607

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
hgcm | .069403 .0422974 1.641 0.101 -.013676 .152482
hgcf | .1102684 .0311948 3.535 0.000 .0489967 .1715401
_cons | 4.914654 .5063527 9.706 0.000 3.920094 5.909214
------------------------------------------------------------------------------
1278/3
𝐹crit,0.1% (3,120) = 5.78 𝐹(3,566) = = 110.8
2176/566
It is unusual for the F statistic to be insignificant if some of the t statistics are
significant. In principle, it could happen, however. Suppose you ran a regression
with 40 explanatory variables, none being a true determinant of the dependent
variable. 153
F TESTS OF OVERALL SIGNIFICANCE
𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 𝐻𝐺𝐶𝑀 + 𝛽3 𝐻𝐺𝐶𝐹 + 𝑢
𝐻0 : 𝛽1 = 𝛽2 = 𝛽3 = 0
. reg hgc asvabc hgcm hgcf
Source | SS df MS Number of obs = 570
---------+------------------------------ F( 3, 566) = 110.83
Model | 1278.24153 3 426.080508 Prob > F = 0.0000
Residual | 2176.00584 566 3.84453329 R-squared = 0.3700
---------+------------------------------ Adj R-squared = 0.3667
Total | 3454.24737 569 6.07073351 Root MSE = 1.9607
------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
hgcm | .069403 .0422974 1.641 0.101 -.013676 .152482
hgcf | .1102684 .0311948 3.535 0.000 .0489967 .1715401
_cons | 4.914654 .5063527 9.706 0.000 3.920094 5.909214
------------------------------------------------------------------------------
𝐹crit,0.1% (3,120) = 5.78 1278/3
𝐹(3,566) = = 110.8
2176/566
The opposite can easily happen, however. Suppose you have a multiple
regression model which is correctly specified and the R2 is high. You would
expect to have a highly significant F statistic.
However, if the explanatory variables are highly correlated and the model is
subject to severe multicollinearity, the standard errors of the slope coefficients
could all be so large that none of the t statistics is significant. 154
GENERAL LINEAR RESTRICTIONS
➢ The basic form of the F statistic will work
for any set of linear restrictions
➢ First estimate the unrestricted model and
then estimate the restricted model
➢ In each case, make a note of the SSR
➢ Imposing the restrictions can be tricky –
we will likely have to redefine variables
again

155
GENERAL LINEAR RESTRICTIONS
𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝑢 𝑅𝑆𝑆1
RSS1 > RSS2
𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝑢 𝑅𝑆𝑆2

𝐻0 : 𝛽2 = 𝛽3 = 0
𝐻1 : 𝛽2 ≠ 0 or 𝛽3 ≠ 0 or both 𝛽2 and 𝛽3 ≠ 0

We now come to the other F test of goodness of fit. This is a test of the joint
explanatory power of a group of variables when they are added to a regression
model.
For example, y may be written as a simple function of x1 in the original
specification. In the second, we add x2 and x3.
The null hypothesis for the F test is that neither x2 nor x3 belongs in the model.
The alternative hypothesis is that at least one does, perhaps both.
156
GENERAL LINEAR RESTRICTIONS
Full Equation: 𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝑢 𝑅𝑆𝑆1 𝑅12
𝑯𝟎 : 𝜷𝟐 = 𝜷𝟑 = 𝟎
𝐻1 : 𝛽2 ≠ 0 or 𝛽3 ≠ 0 or both 𝛽2 and 𝛽3 ≠ 0
Restricted Equation: 𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝑢 𝑅𝑆𝑆2 𝑅22

(𝑅𝑆𝑆1 − 𝑅𝑆𝑆2 )Τ𝑞 (𝑅22 − 𝑅12 )Τ𝑞 RSS1 > RSS2


𝐹= =
𝑅𝑆𝑆2 Τ(𝑛 − 𝑘 − 1) (1 − 𝑅22 )Τ(𝑛 − 𝑘 − 1) R22 > R12

q = number of restrictions,
k= number of explanatory variables in the Unrestricred Equation.
157
GENERAL LINEAR RESTRICTIONS
𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝑢
(𝑅𝑆𝑆1 − 𝑅𝑆𝑆2 )Τ𝑞 (𝑅22 − 𝑅12 )Τ𝑞
𝐹= =
𝑅𝑆𝑆2 Τ(𝑛 − 𝑘 − 1) (1 − 𝑅22 )Τ(𝑛 − 𝑘 − 1)
𝐻0 : 𝛽2 = 𝛽3 = 0
𝐻1 : 𝛽2 ≠ 0 or 𝛽3 ≠ 0 or both 𝛽2 and 𝛽3 ≠ 0

▪ For this F test and several others we will encounter, it is helpful to think of the F
statistic as having the structure indicated above.
▪ The “improvement” is the reduction in the residual sum of squares when the
change is made, in this case, when the group of new variables is added.
▪ The “cost” is the reduction in the number of degrees of freedom remaining after
making the change. In the present case, it is equal to the number of new variables
added because that number of new parameters is estimated.
▪ The "remaining unexplained" is the residual sum of squares after making the
change-improvement. The "degrees of freedom remaining" is the number of
degrees of freedom remaining after making the change. 158
GENERAL LINEAR RESTRICTIONS
. reg hgc asvabc

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 1, 568) = 284.89
Model | 1153.80864 1 1153.80864 Prob > F = 0.0000
Residual | 2300.43873 568 4.05006818 R-squared = 0.3340
---------+------------------------------ Adj R-squared = 0.3329
Total | 3454.24737 569 6.07073351 Root MSE = 2.0125

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1545378 .0091559 16.879 0.000 .1365543 .1725213
_cons | 5.770845 .4668473 12.361 0.000 4.853888 6.687803
------------------------------------------------------------------------------

We will illustrate the test with an educational attainment example.


Here is HGC regressed on ASVABC.
We make a note of the residual sum of squares.
159
GENERAL LINEAR RESTRICTIONS
. reg hgc asvabc hgcm hgcf

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 3, 566) = 110.83
Model | 1278.24153 3 426.080508 Prob > F = 0.0000
Residual | 2176.00584 566 3.84453329 R-squared = 0.3700
---------+------------------------------ Adj R-squared = 0.3667
Total | 3454.24737 569 6.07073351 Root MSE = 1.9607

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
hgcm | .069403 .0422974 1.641 0.101 -.013676 .152482
hgcf | .1102684 .0311948 3.535 0.000 .0489967 .1715401
_cons | 4.914654 .5063527 9.706 0.000 3.920094 5.909214
------------------------------------------------------------------------------

Now, we have added the highest grade completed by each parent.


Does parental education have a significant impact? A t-test would
show that HGCF has a highly significant coefficient, but we will
perform the F test anyway. We make a note of RSS.
160
GENERAL LINEAR RESTRICTIONS
. reg hgc asvabc RSS1

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 1, 568) = 284.89
Model | 1153.80864 1 1153.80864 Prob > F = 0.0000
Residual | 2300.43873 568 4.05006818 R-squared = 0.3340
---------+------------------------------ Adj R-squared = 0.3329
Total | 3454.24737 569 6.07073351

. reg hgc asvabc hgcm hgcf


RSS2
Source | SS df MS Number of obs = 570
---------+------------------------------ F( 3, 566) = 110.83
Model | 1278.24153 3 426.080508 Prob > F = 0.0000
Residual | 2176.00584 566 3.84453329 R-squared = 0.3700
---------+------------------------------ Adj R-squared = 0.3667
Total | 3454.24737 569 6.07073351 Root MSE = 1.9607
161
GENERAL LINEAR RESTRICTIONS
𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝑢 𝑅𝑆𝑆1

𝑦 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝑢 𝑅𝑆𝑆2

𝐻0 : 𝛽2 = 𝛽3 = 0
𝐻1 : 𝛽2 ≠ 0 or 𝛽3 ≠ 0 or both 𝛽2 and 𝛽3 ≠ 0

(𝑅𝑆𝑆1 − 𝑅𝑆𝑆2 )Τ2 (2300.4 − 2176.0)/2


𝐹(2,570 − 3 − 1) = = = 𝟏𝟔. 𝟏𝟖
𝑅𝑆𝑆2 Τ(570 − 3 − 1) 2176.0/566

𝐹crit,0.1% (2,120) = 7.32


The F statistic is 16.18.
The critical value of F(2,120) at the 0.1% level is 7.32. The critical value
of F(2,566) must be lower, so we reject H0 and conclude that the
parental education variables have significant joint explanatory power.

162
GENERAL LINEAR RESTRICTIONS
➢ The basic form of the F statistic will work
for any set of linear restrictions
➢ First estimate the unrestricted model and
then estimate the restricted model
➢ In each case, make a note of the SSR
➢ Imposing the restrictions can be tricky – we
will likely have to redefine variables again

163
TESTING A LINEAR RESTRICTION
An Example

HGC =  + b 1 ASVABC + b 2 HGCM + b 3 HGCF + u

It was argued that educational attainment might be


related to cognitive ability and family background, with
the mother's and father's educational attainment
proxying for the latter.

164
TESTING A LINEAR RESTRICTION
An Example
. reg hgc asvabc hgcm hgcf

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 3, 566) = 110.83
Model | 1278.24153 3 426.080508 Prob > F = 0.0000
Residual | 2176.00584 566 3.84453329 R-squared = 0.3700
---------+------------------------------ Adj R-squared = 0.3667
Total | 3454.24737 569 6.07073351 Root MSE = 1.9607

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
hgcm | .069403 .0422974 1.641 0.101 -.013676 .152482
hgcf | .1102684 .0311948 3.535 0.000 .0489967 .1715401
_cons | 4.914654 .5063527 9.706 0.000 3.920094 5.909214
------------------------------------------------------------------------------

However, when we run the regression using Data Set, we


find the mother's education coefficient is insignificant.

165
TESTING A LINEAR RESTRICTION
. reg hgc asvabc hgcm hgcf An Example
Source | SS df MS Number of obs = 570
---------+------------------------------ F( 3, 566) = 110.83
Model | 1278.24153 3 426.080508 Prob > F = 0.0000
Residual | 2176.00584 566 3.84453329 R-squared = 0.3700
---------+------------------------------ Adj R-squared = 0.3667
Total | 3454.24737 569 6.07073351 Root MSE = 1.9607

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
hgcm | .069403 .0422974 1.641 0.101 -.013676 .152482
hgcf | .1102684 .0311948 3.535 0.000 .0489967 .1715401
_cons | 4.914654 .5063527 9.706 0.000 3.920094 5.909214
------------------------------------------------------------------------------
. cor hgcm hgcf
(obs=570)
| hgcm hgcf
--------+------------------
hgcm| 1.0000
hgcf| 0.6391 1.0000

This might be due to multicollinearity because the mother and father's


education are highly correlated.
166
TESTING A LINEAR RESTRICTION
An Example

HGC =  + b 1 ASVABC + b 2 HGCM + b 3 HGCF + u

In the discussion of multicollinearity, several measures for


alleviating the problem were suggested, among them the
use of an appropriate theoretical restriction.

167
TESTING A LINEAR RESTRICTION
An Example
HGC =  + b 1 ASVABC + b 2 HGCM + b 3 HGCF + u
b3 = b2

𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 (𝐻𝐺𝐶𝑀 + 𝐻𝐺𝐶𝐹) + 𝑢


= 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 𝐻𝐺𝐶𝑃 + 𝑢

HGCP = HGCM + HGCF


In particular, in the case of the present model, it was suggested that the
impact of parental education might be the same for both parents, that is,
that b2 and b3 might be equal.
If this is the case, the model may be rewritten as shown. We now have a
total parental education variable, HGCP, instead of separate variables for
mother's and father's education, and the multicollinearity caused by the
correlation between the latter has been eliminated.
168
TESTING A LINEAR RESTRICTION
An Example
. g hgcp = hgcm + hgcf

. reg hgc asvabc hgcp

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 2, 567) = 166.22
Model | 1276.73764 2 638.368819 Prob > F = 0.0000
Residual | 2177.50973 567 3.84040517 R-squared = 0.3696
---------+------------------------------ Adj R-squared = 0.3674
Total | 3454.24737 569 6.07073351 Root MSE = 1.9597

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295653 .0099485 13.024 0.000 .1100249 .1491057
hgcp | .093741 .0165688 5.658 0.000 .0611973 .1262847
_cons | 4.823123 .4844829 9.955 0.000 3.871523 5.774724
------------------------------------------------------------------------------

Here is the regression with HGCP replacing HGCM and HGCF.

169
TESTING A LINEAR RESTRICTION
. reg hgc asvabc hgcm hgcf An Example
------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
hgcm | .069403 .0422974 1.641 0.101 -.013676 .152482
hgcf | .1102684 .0311948 3.535 0.000 .0489967 .1715401
_cons | 4.914654 .5063527 9.706 0.000 3.920094 5.909214
------------------------------------------------------------------------------

. reg hgc asvabc hgcp

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295653 .0099485 13.024 0.000 .1100249 .1491057
hgcp | .093741 .0165688 5.658 0.000 .0611973 .1262847
_cons | 4.823123 .4844829 9.955 0.000 3.871523 5.774724
------------------------------------------------------------------------------

A comparison of the regressions reveals that the standard error of the


coefficient of HGCP is much smaller than those of HGCM and HGCF, and
consequently, its t statistic is higher. Its coefficient is a compromise between
HGCM and HGCF, as might be expected.
170
TESTING A LINEAR RESTRICTION
An Example
. reg hgc asvabc hgcm hgcf

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
hgcm | .069403 .0422974 1.641 0.101 -.013676 .152482
hgcf | .1102684 .0311948 3.535 0.000 .0489967 .1715401
_cons | 4.914654 .5063527 9.706 0.000 3.920094 5.909214
------------------------------------------------------------------------------

. reg hgc asvabc hgcp

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295653 .0099485 13.024 0.000 .1100249 .1491057
hgcp | .093741 .0165688 5.658 0.000 .0611973 .1262847
_cons | 4.823123 .4844829 9.955 0.000 3.871523 5.774724
------------------------------------------------------------------------------

However, using a restriction will only lead to a gain in efficiency if the


restriction is valid. Its use will lead to biased coefficients and invalid
standard errors and tests if it is not valid.
171
TESTING A LINEAR RESTRICTION
An Example
. reg hgc asvabc hgcm hgcf

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
hgcm | .069403 .0422974 1.641 0.101 -.013676 .152482
hgcf | .1102684 .0311948 3.535 0.000 .0489967 .1715401
_cons | 4.914654 .5063527 9.706 0.000 3.920094 5.909214
------------------------------------------------------------------------------

. reg hgc asvabc hgcp

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295653 .0099485 13.024 0.000 .1100249 .1491057
hgcp | .093741 .0165688 5.658 0.000 .0611973 .1262847
_cons | 4.823123 .4844829 9.955 0.000 3.871523 5.774724
------------------------------------------------------------------------------

Do the coefficients of HGCM and HGCF in the unrestricted regression look


as if they satisfy the restriction? Not really, in this case. The coefficient of
HGCM is much smaller than that of HGCF, but it should be noted that the
standard errors are quite large.
172
TESTING A LINEAR RESTRICTION
An Example
. reg hgc asvabc hgcm hgcf

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 3, 566) = 110.83
Model | 1278.24153 3 426.080508 Prob > F = 0.0000
Residual | 2176.00584 566 3.84453329 R-squared = 0.3700
---------+------------------------------ Adj R-squared = 0.3667
Total | 3454.24737 569 6.07073351 Root MSE = 1.9607

. reg hgc asvabc hgcp

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 2, 567) = 166.22
Model | 1276.73764 2 638.368819 Prob > F = 0.0000
Residual | 2177.50973 567 3.84040517 R-squared = 0.3696
---------+------------------------------ Adj R-squared = 0.3674
Total | 3454.24737 569 6.07073351 Root MSE = 1.9597

We will now perform a proper test. The imposition of a restriction makes it


more difficult for the regression model to fit the data because there is one
fewer parameter to adjust. There will, therefore, be an increase in RSS
(and a decrease in R2) when imposed.

173
TESTING A LINEAR RESTRICTION
. reg hgc asvabc hgcm hgcf An Example
Source | SS df MS Number of obs = 570
---------+------------------------------ F( 3, 566) = 110.83
Model | 1278.24153 3 426.080508 Prob > F = 0.0000
Residual | 2176.00584 566 3.84453329 R-squared = 0.3700
---------+------------------------------ Adj R-squared = 0.3667
Total | 3454.24737 569 6.07073351 Root MSE = 1.9607

. reg hgc asvabc hgcp

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 2, 567) = 166.22
Model | 1276.73764 2 638.368819 Prob > F = 0.0000
Residual | 2177.50973 567 3.84040517 R-squared = 0.3696
---------+------------------------------ Adj R-squared = 0.3674
Total | 3454.24737 569 6.07073351 Root MSE = 1.9597

If the restriction is valid, the deterioration in the fit should be a small, random
amount. However, if the restriction is invalid, the distortion caused by its
imposition will significantly deteriorate the fit.
In the present case, we can see that the increase in RSS is very small, so
we are unlikely to reject the restriction.
174
TESTING A LINEAR RESTRICTION
An Example

HGC =  + b 1 ASVABC + b 2 HGCM + b 3 HGCF + u

b3 = b2

𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 (𝐻𝐺𝐶𝑀 + 𝐻𝐺𝐶𝐹) + 𝑢


= 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 𝐻𝐺𝐶𝑃 + 𝑢

HGCP = HGCM + HGCF

𝐻0 : 𝛽3 = 𝛽2 , 𝐻1 : 𝛽3 ≠ 𝛽2

The null hypothesis is that the restriction is valid, and the


alternative one is that it is invalid.

175
TESTING A LINEAR RESTRICTION
An Example
HGC =  + b 1 ASVABC + b 2 HGCM + b 3 HGCF + u
b3 = b2

𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 (𝐻𝐺𝐶𝑀 + 𝐻𝐺𝐶𝐹) + 𝑢


= 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 𝐻𝐺𝐶𝑃 + 𝑢

HGCP = HGCM + HGCF


𝐻0 : 𝛽3 = 𝛽2 , 𝐻1 : 𝛽3 ≠ 𝛽2

(𝑅𝑆𝑆𝑅 − 𝑅𝑆𝑆𝑈 )/1 2177.51 − 2176.01


𝐹= = = 0.39
𝑅𝑆𝑆𝑈 /(𝑛 − 𝑘 − 1) 2176.01/566
The test statistic is a member of the family of F tests where the numerator
is the improvement in the fit on relaxing the restriction, divided by the cost
of relaxing it (one degree of freedom because one additional parameter
has to be estimated). 176
TESTING A LINEAR RESTRICTION
An Example
HGC =  + b 1 ASVABC + b 2 HGCM + b 3 HGCF + u
b3 = b2
𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 (𝐻𝐺𝐶𝑀 + 𝐻𝐺𝐶𝐹) + 𝑢
= 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 𝐻𝐺𝐶𝑃 + 𝑢

HGCP = HGCM + HGCF


𝐻0 : 𝛽3 = 𝛽2 , 𝐻1 : 𝛽3 ≠ 𝛽2

(𝑅𝑆𝑆𝑅 − 𝑅𝑆𝑆𝑈 )/1 2177.51 − 2176.01


𝐹= = = 0.39
𝑅𝑆𝑆𝑈 /(𝑛 − 𝑘 − 1) 2176.01/566
The denominator of the test statistic is RSS after improving (that is, RSS for
the unrestricted model), divided by the number of degrees of freedom
remaining.
The F statistic is 0.39. An F statistic below 1 is never significant (look at the
F table), so we do not reject H0. The restriction appears to be valid. At
least, it is not rejected by the data. 177
ALTERNATIVE TESTING PROCEDURE OF
A LINEAR RESTRICTION
An Example

HGC =  + b 1 ASVABC + b 2 HGCM + b 3 HGCF + u

b3 = b2
𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 (𝐻𝐺𝐶𝑀 + 𝐻𝐺𝐶𝐹) + 𝑢
= 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 𝐻𝐺𝐶𝑃 + 𝑢

Linear restrictions can also be tested using a t-test.


This involves writing down the model for the restricted version and
adding the term that would convert it back to the unrestricted version.
The test evaluates whether this additional term is needed.

178
TESTING A LINEAR RESTRICTION
An Example

HGC =  + b 1 ASVABC + b 2 HGCM + b 3 HGCF + u

𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 𝐻𝐺𝐶𝑃 + 𝑢

0 = 𝛽2 𝐻𝐺𝐶𝑀 + 𝛽3 𝐻𝐺𝐶𝐹 − 𝛽2 𝐻𝐺𝐶𝑃


= 𝛽2 𝐻𝐺𝐶𝑀 + 𝛽3 𝐻𝐺𝐶𝐹 − 𝛽2 (𝐻𝐺𝐶𝑀 + 𝐻𝐺𝐶𝐹)
= (𝛽3 − 𝛽2 )𝐻𝐺𝐶𝐹

To find the conversion term, we write the restricted version of the


model under the unrestricted version and subtract.

179
TESTING A LINEAR RESTRICTION
An Example

HGC =  + b 1 ASVABC + b 2 HGCM + b 3 HGCF + u

𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 𝐻𝐺𝐶𝑃 + 𝑢

0 = 𝛽2 𝐻𝐺𝐶𝑀 + 𝛽3 𝐻𝐺𝐶𝐹 − 𝛽2 𝐻𝐺𝐶𝑃


= 𝛽2 𝐻𝐺𝐶𝑀 + 𝛽3 𝐻𝐺𝐶𝐹 − 𝛽2 (𝐻𝐺𝐶𝑀
= (𝛽3 − 𝛽2 )𝐻𝐺𝐶𝐹

We see that the term which converts the restricted model

back to the unrestricted one is (b3 - b2) HGCF.

180
TESTING A LINEAR RESTRICTION
An Example
HGC =  + b 1 ASVABC + b 2 HGCM + b 3 HGCF + u

𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 𝐻𝐺𝐶𝑃 + 𝑢

0 = 𝛽2 𝐻𝐺𝐶𝑀 + 𝛽3 𝐻𝐺𝐶𝐹 − 𝛽2 𝐻𝐺𝐶𝑃


= 𝛽2 𝐻𝐺𝐶𝑀 + 𝛽3 𝐻𝐺𝐶𝐹 − 𝛽2 (𝐻𝐺𝐶𝑀
= (𝛽3 − 𝛽2 )𝐻𝐺𝐶𝐹

𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 𝐻𝐺𝐶𝑃 + (𝛽3 − 𝛽2 )𝐻𝐺𝐶𝐹 + 𝑢

We add this term to the restricted model and investigate


whether it is needed.

181
TESTING A LINEAR RESTRICTION
An Example
HGC =  + b 1 ASVABC + b 2 HGCM + b 3 HGCF + u

𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 𝐻𝐺𝐶𝑃 + 𝑢

0 = 𝛽2 𝐻𝐺𝐶𝑀 + 𝛽3 𝐻𝐺𝐶𝐹 − 𝛽2 𝐻𝐺𝐶𝑃


= 𝛽2 𝐻𝐺𝐶𝑀 + 𝛽3 𝐻𝐺𝐶𝐹 − 𝛽2 (𝐻𝐺𝐶𝑀
= (𝛽3 − 𝛽2 )𝐻𝐺𝐶𝐹
𝐻𝐺𝐶 = 𝛼 + 𝛽1 𝐴𝑆𝑉𝐴𝐵𝐶 + 𝛽2 𝐻𝐺𝐶𝑃 + (𝛽3 − 𝛽2 )𝐻𝐺𝐶𝐹 + 𝑢

𝐻0 : 𝛽3 − 𝛽2 = 0, 𝐻1 : 𝛽3 − 𝛽2 ≠ 0
The null hypothesis is that the coefficient of the conversion term is 0, and the
alternative hypothesis is that it is different from 0.
Of course, the null hypothesis is that the restriction is valid. If it is valid, the
conversion term is unnecessary, and the restricted version adequately
represents the data. 182
TESTING A LINEAR RESTRICTION
An Example
. reg hgc asvabc hgcp hgcf

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 3, 566) = 110.83
Model | 1278.24153 3 426.080508 Prob > F = 0.0000
Residual | 2176.00584 566 3.84453329 R-squared = 0.3700
---------+------------------------------ Adj R-squared = 0.3667
Total | 3454.24737 569 6.07073351 Root MSE = 1.9607

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
hgcp | .069403 .0422974 1.641 0.101 -.013676 .152482
hgcf | .0408654 .0653386 0.625 0.532 -.0874704 .1692012
_cons | 4.914654 .5063527 9.706 0.000 3.920094 5.909214
------------------------------------------------------------------------------

Here is the corresponding regression.


We see that the coefficient of HGCF is not significantly different from
zero, indicating that the term is not needed and that the restricted
version adequately represents the data.

183
TESTING A LINEAR RESTRICTION
. reg hgc asvabc hgcp hgcf An Example
Source | SS df MS Number of obs = 570
---------+------------------------------ F( 3, 566) = 110.83
Model | 1278.24153 3 426.080508 Prob > F = 0.0000
Residual | 2176.00584 566 3.84453329 R-squared = 0.3700
---------+------------------------------ Adj R-squared = 0.3667
Total | 3454.24737 569 6.07073351 Root MSE = 1.9607

------------------------------------------------------------------------------
hgc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
asvabc | .1295006 .0099544 13.009 0.000 .1099486 .1490527
hgcp | .069403 .0422974 1.641 0.101 -.013676 .152482
hgcf | .0408654 .0653386 0.625 0.532 -.0874704 .1692012
_cons | 4.914654 .5063527 9.706 0.000 3.920094 5.909214
------------------------------------------------------------------------------

It can be shown mathematically that the F and t-tests are equivalent.


The F statistic is the square of the t statistic, and the critical value of F is
the square of the critical value of t.

184
MULTIPLE REGRESSION
ANALYSIS
A

y = b0 + b1x1 + b2x2 + . . . bkxk + u

MULTIPLE RESTRICTIONS
MULTIPLE RESTRICTIONS

𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + 𝛽4 𝑋4 + 𝛽5 𝑋5 + 𝑢

𝛽3 = 𝛽2 , 𝛽4 + 𝛽5 = 0

Multiple reparameterizations can test multiple restrictions.

Each one will result in one of the original parameters being dropped
and replaced by a test statistic for the restriction.

For example, suppose that we have the model and hypothetical


restrictions shown.

186
MULTIPLE RESTRICTIONS

𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + 𝛽4 𝑋4 + 𝛽5 𝑋5 + 𝑢

𝛽3 = 𝛽2 , 𝛽4 + 𝛽5 = 0

𝜃 = 𝛽3 − 𝛽2 , 𝜑 = 𝛽4 + 𝛽5

𝛽3 = 𝛽2 + 𝜃, 𝛽5 = 𝜑 − 𝛽4

We define the test statistics q and f.

The restrictions can be written q = f = 0.

We use these definitions to define one b parameter in terms of the


other b parameter and q or f.
187
MULTIPLE RESTRICTIONS

𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + 𝛽4 𝑋4 + 𝛽5 𝑋5 + 𝑢

𝛽3 = 𝛽2 , 𝛽4 + 𝛽5 = 0

𝜃 = 𝛽3 − 𝛽2 , 𝜑 = 𝛽4 + 𝛽5

𝛽3 = 𝛽2 + 𝜃, 𝛽5 = 𝜑 − 𝛽4

𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽2 + 𝜃 𝑋3 + 𝛽4 𝑋4 + 𝜑 − 𝛽4 𝑋5 + 𝑢
= 𝛽1 + 𝛽2 𝑋2 + 𝑋3 + 𝛽4 𝑋4 − 𝑋5 + 𝜃𝑋3 + 𝜑𝑋5 + 𝑢
= 𝛽1 + 𝛽2 𝑍 + 𝛽4 𝑊 + 𝜃𝑋3 + 𝜑𝑋5 + 𝑢

We substitute into the model.


We bring the b2 components together.
We do the same for the b4 components. 188
MULTIPLE RESTRICTIONS
𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + 𝛽4 𝑋4 + 𝛽5 𝑋5 + 𝑢

𝛽3 = 𝛽2 , 𝛽4 + 𝛽5 = 0

𝜃 = 𝛽3 − 𝛽2 , 𝜑 = 𝛽4 + 𝛽5

𝛽3 = 𝛽2 + 𝜃, 𝛽5 = 𝜑 − 𝛽4

𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽2 + 𝜃 𝑋3 + 𝛽4 𝑋4 + 𝜑 − 𝛽4 𝑋5 + 𝑢
= 𝛽1 + 𝛽2 𝑋2 + 𝑋3 + 𝛽4 𝑋4 − 𝑋5 + 𝜃𝑋3 + 𝜑𝑋5 + 𝑢
= 𝛽1 + 𝛽2 𝑍 + 𝛽4 𝑊 + 𝜃𝑋3 + 𝜑𝑋5 + 𝑢

Defining Z = X2 + X3 and W = X4 – X5, 𝑍 = 𝑋2 + 𝑋3


we can rewrite the model as shown. 𝑊 = 𝑋4 − 𝑋5

189
MULTIPLE RESTRICTIONS

𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + 𝛽4 𝑋4 + 𝛽5 𝑋5 + 𝑢

𝛽3 = 𝛽2 , 𝛽4 + 𝛽5 = 0

𝜃 = 𝛽3 − 𝛽2 , 𝜑 = 𝛽4 + 𝛽5

𝛽3 = 𝛽2 + 𝜃, 𝛽5 = 𝜑 − 𝛽4

𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽2 + 𝜃 𝑋3 + 𝛽4 𝑋4 + 𝜑 − 𝛽4 𝑋5 + 𝑢
= 𝛽1 + 𝛽2 𝑋2 + 𝑋3 + 𝛽4 𝑋4 − 𝑋5 + 𝜃𝑋3 + 𝜑𝑋5 + 𝑢
= 𝛽1 + 𝛽2 𝑍 + 𝛽4 𝑊 + 𝜃𝑋3 + 𝜑𝑋5 + 𝑢

We can now test the restrictions by regressing Y on 𝑍 = 𝑋2 + 𝑋3


Z, W, X3, and X5 and performing t-tests on the 𝑊 = 𝑋4 − 𝑋5
coefficients of X3 and X5.
190
MULTIPLE RESTRICTIONS
𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + 𝛽4 𝑋4 + 𝛽5 𝑋5 + 𝑢

𝛽3 = 𝛽2 , 𝛽4 + 𝛽5 = 0

𝜃 = 𝛽3 − 𝛽2 , 𝜑 = 𝛽4 + 𝛽5

𝛽3 = 𝛽2 + 𝜃, 𝛽5 = 𝜑 − 𝛽4

𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽2 + 𝜃 𝑋3 + 𝛽4 𝑋4 + 𝜑 − 𝛽4 𝑋5 + 𝑢
= 𝛽1 + 𝛽2 𝑋2 + 𝑋3 + 𝛽4 𝑋4 − 𝑋5 + 𝜃𝑋3 + 𝜑𝑋5 + 𝑢
= 𝛽1 + 𝛽2 𝑍 + 𝛽4 𝑊 + 𝜃𝑋3 + 𝜑𝑋5 + 𝑢
Fit and save RSSU

𝑌 = 𝛽1 + 𝛽2 𝑍 + 𝛽4 𝑊 + 𝑢 Fit and save RSSR


We could also perform a joint test of the restrictions, hypothesizing H0: q = f = 0. This
would involve comparing the residual sum of squares with that when fitting the fully
restricted model where Y depends only on Z and W. 191
MULTIPLE RESTRICTIONS
𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + 𝛽4 𝑋4 + 𝛽5 𝑋5 + 𝑢

𝛽3 = 𝛽2 , 𝛽4 + 𝛽5 = 0

𝜃 = 𝛽3 − 𝛽2 , 𝜑 = 𝛽4 + 𝛽5
𝑅𝑆𝑆𝑅 − 𝑅𝑆𝑆𝑈 /2
𝛽3 = 𝛽2 + 𝜃, 𝛽5 = 𝜑 − 𝛽4 𝐹 2, 𝑛 − 𝑘 =
𝑅𝑆𝑆𝑈 / 𝑛 − 𝑘
𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽2 + 𝜃 𝑋3 + 𝛽4 𝑋4 + 𝜑 − 𝛽4 𝑋5 + 𝑢
= 𝛽1 + 𝛽2 𝑋2 + 𝑋3 + 𝛽4 𝑋4 − 𝑋5 + 𝜃𝑋3 + 𝜑𝑋5 + 𝑢
= 𝛽1 + 𝛽2 𝑍 + 𝛽4 𝑊 + 𝜃𝑋3 + 𝜑𝑋5 + 𝑢
Fit and save RSSU
𝑌 = 𝛽1 + 𝛽2 𝑍 + 𝛽4 𝑊 + 𝑢 Fit and save RSSR
The test statistic would be as shown, where RSSU is the residual sum of squares in
the unrestricted model, RSSR is the residual sum of squares in the model with both
restrictions and k is the number of parameters in the original, unrestricted version.
192

You might also like