LECTURE 4
Introductory Econometrics
Hypothesis testing
October 25, 2011
ON THE PREVIOUS LECTURE
We have listed the classical assumptions of regression
models:
model linear in parameters, explanatory variables linearly
independent
(normally distributed) error term with zero mean and
constant variance, no serial autocorrelation
no correlation between error term and explanatory
variables
We have shown that if these assumptions hold, OLS
estimate is
consistent
unbiased
efcient
normally distributed
ON TODAYS LECTURE
We are going to discuss how hypotheses about coefcients
can be tested in regression models
We will explain what signicance of coefcients means
We will learn how to read regression output
QUESTIONS WE ASK
What conclusions can we draw from our regression?
What can we learn about the real world from a sample?
Is it likely that our results could have been obtained by
chance?
If our theory is correct, what are the odds that this
particular sample would have been observed?
HYPOTHESIS TESTING
We cannot prove that a given hypothesis is correct using
hypothesis testing
All that can be done is to state that a particular sample
conforms to a particular hypothesis
We can often reject a given hypothesis with a certain
degree of condence
In such a case, we conclude that it is very unlikely the
sample result would have been observed if the
hypothesized theory were correct
NULL AND ALTERNATIVE HYPOTHESES
First step in hypothesis testing: state explicitly the
hypothesis to be tested
Null hypothesis: statement of the range of values of the
regression coefcient that would be expected to occur if
the researchers theory were not correct
Alternative hypothesis: specication of the range of values of
the coefcient that would be expected to occur if the
researchers theory were correct
In other words: we dene the null hypothesis as the result
we do not expect
NULL AND ALTERNATIVE HYPOTHESES
Notation:
H
0
. . . null hypothesis
H
A
. . . alternative hypothesis
Examples:
One-sided test
H
0
: 0
H
A
: > 0
Two-sided test
H
0
: = 0
H
A
: = 0
TYPE I AND TYPE II ERRORS
It would be unrealistic to think that conclusions drawn
from regression analysis will always be right
There are two types of errors we can make
Type I : We reject a true null hypothesis
Type II : We do not reject a false null hypothesis
Example:
H
0
: = 0
H
A
: = 0
Type I error: it holds that = 0, we conclude that = 0
Type II error: it holds that = 0, we conclude that = 0
TYPE I AND TYPE II ERRORS
Example:
H
0
: The defendant is innocent
H
A
: The defendant is guilty
Type I error = Sending an innocent person to jail
Type II error = Freeing a guilty person
Obviously, lowering the probability of Type I error means
increasing the probability of Type II error
In hypothesis testing, we focus on Type I error and we
ensure that its probability is not unreasonably large
DECISION RULE
A sample statistic must be calculated that allows the null
hypothesis to be rejected or not depending on the
magnitude of that sample statistic compared with a
preselected critical value found in tables
The critical value divides the range of possible values of
the statistic into two regions: acceptance region and rejection
region
The idea is that if the value of the coefcient is not such as
stated under H
0
, the value of the sample statistic should
not fall into the rejection region
If the value of the sample statistic falls into the rejection
region, we reject H
0
ONE-SIDED REJECTION REGION
H
0
: 0 vs H
A
: > 0
Distribution of
:
Acceptance region
Rejection region
Probability of Type I error
TWO-SIDED REJECTION REGION
H
0
: = 0 vs H
A
: = 0
Distribution of
:
Acceptance region
Rejection region
Rejection region
Probability of Type I error
THE t-TEST
We use t-test to test hypotheses about individual
regression slope coefcients
Tests of more than one coefcient at a time (joint
hypotheses) are typically done with the F-test (see next
lecture)
The t-test is appropriate to use when the stochastic error
term is normally distributed and when the variance of that
distribution must be estimated
The t-test accounts for differences in the units of
measurement of the variables
THE t-TEST
Consider the model
y =
0
+
1
x
1
+
2
x
2
+
Suppose we want to test (b is some constant)
H
0
:
1
= b vs H
A
:
1
= b
We know from the last lecture that
1
N
1
, Var(
1
)
Var(
1
)
N(0, 1) ,
where Var(
1
) is an element of the covariance matrix of
THE t-TEST
Var(
) = Var
Var(
0
) Cov(
0
,
1
) Cov(
0
,
2
)
Cov(
1
,
0
) Var(
1
) Cov(
1
,
2
)
Cov(
2
,
0
) Cov(
2
,
1
) Var(
2
)
=
2
1
Var
=
2
1
22
b
1
=
Var
2
(X
X)
1
22
THE t-TEST
Problem: we do not know the value of the parameter
2
It has to be estimated as
2
:= s
2
=
e
e
n k
,
k is the number of regression coefcients (here k = 3)
It can be shown that
(n k)s
2
2
2
nk
We denote standard error of
1
(sample counterpart of
standard deviation
b
1
)
s.e.
s
2
(X
X)
1
22
THE t-TEST
We dene the t-statistic
t :=
2
(X
X)
1
22
(nk)s
2
2
1
nk
N(0, 1)
2
nk
nk
= t
nk
=
s
2
(X
X)
1
22
=
1
s.e.
This statistic depends only on the estimate
1
, our
hypothesis about
1
, and it has a known distribution
TWO-SIDED t-TEST
Our hypothesis is
H
0
:
1
= b vs H
A
:
1
= b
Hence, our t-statistic is
t =
1
b
s.e.
We set the probability of Type I error to 5%
We say the p-value of the test is 5% or that we have a test at
95% condence level
We compare our statistic to the critical values t
nk,0.975
and
t
nk,0.025
(note that t
nk,0.975
= t
nk,0.025
)
TWO-SIDED t-TEST
Rejection region :
p-value = 5%
Distribution tn-k
2.5 %
2.5 %
tn-k,0.975 tn-k,0.025
We reject H
0
if |t| > t
nk,0.975
ONE-SIDED t-TEST
Suppose our hypothesis is
H
0
:
1
b vs H
A
:
1
> b
Our t-statistic still is
t =
1
b
s.e.
We set the probability of Type I error to 5%
We compare our statistic to the critical value t
nk,0.95
ONE-SIDED t-TEST
Rejection region :
p-value = 5%
Distribution tn-k
5%
tn-k,0.95
We reject H
0
if t > t
nk,0.95
SIGNIFICANCE OF THE COEFFICIENT
The most common test performed in regression is
H
0
: = 0 vs H
A
: = 0
with the t-statistic
t =
s.e.
t
nk
If we reject H
0
: = 0, we say the coefcient is
signicant
This t-statistic (and corresponding p-value) are displayed
in most regression outputs
EXAMPLE
Let us study the impact of years of education on wages:
wage =
0
+
1
education +
2
experience +
Output from Gretl:
gJe1 ou1pu1 1oJ Pava h1koovova 2011-10-22 23.20 page 1 o1 1
hode 3. 0LS, us1hg obseJva11ohs 1-526
0epehdeh1 vaJ1abe. wage
coe111c1eh1 s1d. eJJoJ 1-Ja11o p-vaue
--------------------------------------------------------
cohs1 -3.39054 0.766566 -4.423 1.18e-05 ***
educ 0.644272 0.0538061 11.97 2.28e-29 ***
expeJ 0.0700954 0.0109776 6.385 3.78e-10 ***
heah depehdeh1 vaJ 5.896103 S.0. depehdeh1 vaJ 3.693086
Sum squaJed Jes1d 5548.160 S.E. o1 JegJess1oh 3.257044
R-squaJed 0.225162 Adus1ed R-squaJed 0.222199
F{2, 523) 75.98998 P-vaue{F) 1.07e-29
Log-1ke1hood -1365.969 Aka1ke cJ11eJ1oh 2737.937
SchwaJz cJ11eJ1oh 2750.733 hahhah-0u1hh 2742.948
EXAMPLE
Output from Stata:
_cons -3.390539 .7665661 -4.42 0.000 -4.896466 -1.884613
exper .0700954 .0109776 6.39 0.000 .0485297 .0916611
educ .6442721 .0538061 11.97 0.000 .5385695 .7499747
wage Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 7160.41429 525 13.6388844 Root MSE = 3.257
Adj R-squared = 0.2222
Residual 5548.15979 523 10.6083361 R-squared = 0.2252
Model 1612.2545 2 806.127251 Prob > F = 0.0000
F( 2, 523) = 75.99
Source SS df MS Number of obs = 526
CONFIDENCE INTERVAL
A 95% condence interval of is an interval centered
around
such that (
c,
+ c) with probability 95%
P
c < <
+ c
=
= P
c
s.e.
<
s.e.
<
c
s.e.
= 0.95
Since
b
s.e.
(
b
)
t
nk
, we derive the condence interval:
t
nk,0.975
s.e.
SUMMARY
We discussed the principle of hypothesis testing
We derived the t-statistic
We dened the concept of the p-value
We explained what signicance of a coefcient means
We observed a regression output on an example
TO BE CONTINUED . . . :)
Next exercise session:
revision of the t test
introduction to statistical software (hopefully)
Next lecture:
testing of multiple linear restrictions
assessing the goodness of t (R
2
)
Home assignment:
to be submitted on the next lecture