12/18/2024
Extending the Multiple Regression
Model
ECN326: Basic Econometrics
Dr Arshad Ali Bhatti
Spring 2019
Dr Arshad Ali Bhatti/ Spring
12 - 1
2019
Introduction
• Sometimes we need to include explanatory variables in our
regression model that are qualitative in nature
• Dummy variables allow us to incorporate such variables into
the regression model
• This lecture shows how dummy variables can be used to
– Allow the intercept in the regression model to differ across different
groups in the sample
– Allow the slope(s) in a regression model to vary across groups
– Perform hypothesis tests on several coefficients jointly in the
regression model
Dr Arshad Ali Bhatti/ Spring 2019 12 - 2
1
12/18/2024
Dummy Explanatory Variables
• Allow the incorporation of qualitative variables into
the regression model
• For example
– Gender—a qualitative variable taking on only two values:
male or female
• A dummy variable will convert gender into a quantitative variable,
perhaps taking on the value 0 for men and 1 for women
• Dummy variables are mutually exclusive and
exhaustive
– It must be possible to assign each observation a single
value for the dummy variable
Dr Arshad Ali Bhatti/ Spring 2019 12 - 3
Defining and Interpreting Dummy Variables
• Dummy variables allow the intercept of the
regression line to vary for different groups in
the population
– It is qualitative in nature
• Examples include race, gender, union status, region of
country, Bank ownership (Pvt and Pub) etc.
Dr Arshad Ali Bhatti/ Spring 2019 12 - 4
2
12/18/2024
Defining and Interpreting Dummy Variables
• Consider the following regression equation
that relates earnings to gender
Gender =1 for a woman
Gender = 0 for a man
Dr Arshad Ali Bhatti/ Spring 2019 12 - 5
Defining and Interpreting Dummy Variables
• How should the coefficients in this simple
model be interpreted?
– If we take the expected value of the equation for
women (i.e., Gender = 1) we have the following
conditional expectation:
Giving us the mean earnings for women
For men it’s
Dr Arshad Ali Bhatti/ Spring 2019 12 - 6
3
12/18/2024
Defining and Interpreting Dummy Variables
• Subtracting the mean male earnings from the
mean female earnings gives us the difference
in mean earnings between women and men
Dr Arshad Ali Bhatti/ Spring 2019 12 - 7
Defining and Interpreting Dummy Variables
• Let’s enhance the previous example by adding Education
to the equation
• A scatter plot of men’s and women’s wages in relation to
their level of education is shown below
Dr Arshad Ali Bhatti/ Spring 2019 12 - 8
4
12/18/2024
Defining and Interpreting Dummy Variables
Dr Arshad Ali Bhatti/ Spring 2019 12 - 9
Defining and Interpreting Dummy Variables
• The regression equation is
β1 measures the difference in mean
earnings between women and men
holding education constant
Dr Arshad Ali Bhatti/ Spring 2019 12 - 10
10
5
12/18/2024
Interaction Variables
• We may wish to allow the slope coefficient(s)
to vary across groups as well
– Done via an interaction variable (or term)
Dr Arshad Ali Bhatti/ Spring 2019 12 - 11
11
Interaction Variables
• In the previous example, the coefficient on Gender
tells us the difference between female and male
earnings, holding education constant
– β1 measures the difference in earnings between women
and men, holding education constant
– β2—the slope coefficient on education—measures the
increment in earnings resulting from an additional year of
schooling
• The return to education is assumed to be the same for men and
women
• But, this may not be true—rather Figure 2 may be more representative of
the real world
Dr Arshad Ali Bhatti/ Spring 2019 12 - 12
12
6
12/18/2024
Interaction Variables
Dr Arshad Ali Bhatti/ Spring 2019 12 - 13
13
Interaction Variables
• An interaction variable or term
– Will capture any interaction between gender and
the impact of education on earnings
– Included via a new variable in the model
• The education variable multiplied by the gender
variable
Where β 3 represents the difference in the slope between
men and women
Dr Arshad Ali Bhatti/ Spring 2019 12 - 14
14
7
12/18/2024
Interaction Variables
• Dummy variables allow the intercept to differ across groups
• Interaction terms allow the slope to differ across groups
• A model might contain
– Neither dummy variables nor interaction terms
• All groups have the same intercept and slopes
– Both dummy variables and interaction terms
• Intercept and slopes vary across groups
– Only dummy variables
• Only the intercept varies across groups
– Only interaction terms
• Only the slopes vary across groups
• You may interact a dummy variable with some, but not all,
explanatory variables
– Only the interacted variables are allowed to have a different effect, by
group, on the dependent variable
Dr Arshad Ali Bhatti/ Spring 2019 12 - 15
15
Experimental Vs Observational Studies
• Dummy variables can summarize the key differences
between experimental and observational studies
• Experimental studies may involve a treatment group
and a control group
– Where a dummy variable is used to label items as to
whether they are in the experimental or control group
• Observational studies are those in which the
differences between a group of interest are observed
Dr Arshad Ali Bhatti/ Spring 2019 12 - 16
16
8
12/18/2024
Dummy Variables When There Are More
Than Two Groups
• The model can be extended to the case where
the qualitative variable takes on more than
two possible values
• Consider a model relating the earnings of a
person (Y) to the education of person’s
parents
Dr Arshad Ali Bhatti/ Spring 2019 12 - 17
17
Dummy Variables When There Are More
Than Two Groups
• We assume the education of the parents is
classified as
Creating three dummy variables will
incorporate the parents’ education into the
model
Dr Arshad Ali Bhatti/ Spring 2019 12 - 18
18
9
12/18/2024
Dummy Variables When There Are More
Than Two Groups
• Only three dummy variables are created even though
there are four categories
– Allows us to estimate one intercept for each group
• If a qualitative variable assumes J outcomes, J − 1
dummy variables are included in the model
– Knowing a person is not in one of the J − 1 categories tells
us they must be in the Jth category
– The Jth category is redundant—including J dummy
variables would create perfect multicollinearity or dummy
variable trap
Dr Arshad Ali Bhatti/ Spring 2019 12 - 19
19
Dummy Variables When There Are More
Than Two Groups
• Now, we can structure the above model as
follows:
Dr Arshad Ali Bhatti/ Spring 2019 12 - 20
20
10
12/18/2024
Dummy Variables When There Are More
Than Two Groups
• A dummy variable indicating those parents
with college education or higher was left out
Any group might have been omitted
But the interpretation of the regression
coefficients is affected by the omitted group
Dr Arshad Ali Bhatti/ Spring 2019 12 - 21
21
Dummy Variables When There Are More
Than Two Groups
• The coefficients on the included dummy
variables measure the impact of the
corresponding explanatory variable compared
to the excluded category
• The results are always compared to the one
category that is omitted
Dr Arshad Ali Bhatti/ Spring 2019 12 - 22
22
11
12/18/2024
Hypothesis Tests on Several Regression
Coefficients: F Tests
• We have learned to perform hypothesis tests on
individual regression coefficients
– For example, in this model [Yi = β0 + β1X1i + β2X2i + β3X3i + εi]
we could test whether
• β1 = 0 or β3 = 5
• This section shows how to test more complicated
hypotheses
– For example, whether both β1 = 0 and β3 = 0 or whether
β1 = 0 and β3 = 5 jointly
– Called F tests
Dr Arshad Ali Bhatti/ Spring 2019 12 - 23
23
Joint Tests on Several Regression
Coefficients
• Suppose we have this regression model
To test whether both β1 = 0 and β3 = 0, the null
and alternative hypotheses are specified as
follows:
H1 indicates only that the null hypothesis is false, without
necessarily indicating why it is false
The null hypothesis is very specific in indicating that both
coefficients are equal to zero
Dr Arshad Ali Bhatti/ Spring 2019 12 - 24
24
12
12/18/2024
Joint Tests on Several Regression
Coefficients
• That said, we then form the restricted
regression—the model if the null is true
– Imposes the null hypothesis under consideration
– Embodies the null hypothesis: Yi = β0 + β2X2i + εi
• An unrestricted model does not impose the
restrictions embodied in the null hypothesis
– Does not restrict the coefficients in any way—is
given by the original model
Dr Arshad Ali Bhatti/ Spring 2019 12 - 25
25
Joint Tests on Several Regression
Coefficients
• Does imposing the null hypothesis have much
of an impact on how well the model fits the
data?
– If the null hypothesis is true, both models should
“fit” the data equally well
• Even if the null hypothesis were true, the unrestricted
model would better capture random variation in the
sample
• But, does the unrestricted model provide a sufficiently
better fit that we are willing to reject the null
hypothesis?
Dr Arshad Ali Bhatti/ Spring 2019 12 - 26
26
13
12/18/2024
Joint Tests on Several Regression
Coefficients
• If the null hypothesis is true
– The unexplained residual sum of squares (RSS)
and the R2 would be the same in both the
restricted and unrestricted models
– Although in practice they may differ somewhat
• A relevant test statistic would compare the RSS or the
R2 in both models to determine if the difference is large
enough to be statistically significant
Dr Arshad Ali Bhatti/ Spring 2019 12 - 27
27
Joint Tests on Several Regression
Coefficients
• To test for statistical significance, the statistic is
Where
RSSunrestricted = the unexplained sum of squared residuals for
the unrestricted regression
RSSrestricted = the unexplained sum of squared residuals for
the restricted regression
q = the number of restrictions
n − k − 1 = the number of observations minus the number of
explanatory variables
Dr Arshad Ali Bhatti/ Spring 2019 12 - 28
28
14
12/18/2024
Joint Tests on Several Regression
Coefficients
• We can convert the format of the F-statistic to contain the R2
F will be close to zero when the null is true
Because R2restricted ≈ R2 unrestricted
Values of F that are “far” from zero would provide
evidence favoring H1
F follows the F distribution with q, and n − k − 1 degrees of
freedom
Can compare F * to F q, n−k−1 (as reported in F-Tables)
Reject the null hypothesis if F * > Fq, n−k−1
Dr Arshad Ali Bhatti/ Spring 2019 12 - 29
29
Joint Tests on Several Regression
Coefficients
• In summary, implementing an F test involves four
separate steps
a. Run the unrestricted regression and calculate the
resulting R2
b. Run the restricted regression, again calculating the
resulting R2
c. Form the F statistic (F*)
d. Find the critical value of Fq,n − k − 1 in F-Tables that is large
enough for whatever significance level you think
appropriate
• If F* > Fq,n − k − 1 reject the null hypothesis
Dr Arshad Ali Bhatti/ Spring 2019 12 - 30
30
15
12/18/2024
Testing Whether All of the Regression Slope
Coefficients are Zero: The F Test
• We have discussed the use of F tests to test
hypotheses about a subset of coefficients
• But one specific F test is so common that it is
sometimes called “The F Test”
– Tests the hypothesis that all of the slope coefficients in a
model are jointly zero
• Does not require the intercept to be zero
• “The F Test” is not “The Only F Test”
– But a specific example of the more general form of F tests
described earlier
Dr Arshad Ali Bhatti/ Spring 2019 12 - 31
31
Testing Whether All of the Regression Slope
Coefficients are Zero: The F Test
• Consider the regression model:
“The F Test” will test whether all three
coefficients are equal to zero
The null and alternative hypotheses are
H0 states that none of the variables in the model
(excluding the intercept) is statistically significant
Dr Arshad Ali Bhatti/ Spring 2019 12 - 32
32
16
12/18/2024
Testing Whether All of the Regression Slope
Coefficients are Zero: The F Test
• In the special case of “The F Test” when the null is
true, the restricted model involves a regression of Y
on a constant alone
• Should not expect a constant to explain the variation
in the dependent variable
– Thus the R2 for the restricted model would be zero
– The number of restrictions would be equal to the number
of parameters set to zero, or k
Dr Arshad Ali Bhatti/ Spring 2019 12 - 33
33
Testing Whether All of the Regression Slope
Coefficients are Zero: The F Test
• The F statistic would simplify to
Dr Arshad Ali Bhatti/ Spring 2019 12 - 34
34
17
12/18/2024
Dr Arshad Ali Bhatti/ Spring 2019 12 - 35
35
Applications
Dr Arshad Ali Bhatti/ Spring 2019 12 - 36
36
18
12/18/2024
References
• See Course outline plus
• Ashenfelter (2003), Statistics and
Econometrics, John Wiley
Dr Arshad Ali Bhatti/ Spring 2019 12 - 37
37
19