5/31/2023
National Economics University
E-PhD, Cohort 6
EPHD 3121: Econometrics
LECTURE 1: MULTIPLE REGRSSSION MODELS
Bach Ngoc Thang
thangbn@[Link]
Hanoi, 2023
Instructor’s brief profile
• Education
• Ph.D. in Economics (UQ, 2014); M.A. and B.A. in
Development Economics (NEU, 2005 and 2002).
• Teaching
• Macroeconomics 1 & 2 (NEU, since 2014); Econometrics (E-
PhD, since 2019); Principles of Macroeconomics (BBAE,
since 2020), International Trade (UQ, 2010 – 14);
• Research interests
• Development economics, governance, SMEs.
• Some notable publications in World Development, Journal of
Development Studies, Economics of Transition and
Institutional Change, The Developing Economies, etc.
• Google Scholar:
[Link]
AAJ&view_op=list_works&sortby=pubdate
5/31/2023 2
1
5/31/2023
Outlines
• Course introduction
• From economic to econometric model
• Estimating the parameters
• Hamburger chain data ([Link])
• Sampling properties
• Model specification
• Family income equation (edu_inc.dta)
• Poor data, collinearity, and insignificance
• Cars data ([Link])
• Required reading: Chap. 5&6 (Hill et al., 2011)
5/31/2023 3
Course introduction
• Course objectives
• Changes in this semester
• Students’ assessment
• Course syllabus
• Students’ expectations?
• What do you expect to learn from this course?
• What could the instructor do to enhance students’ greater
learning outcomes?
5/31/2023 4
2
5/31/2023
Economic model
• The interplay between sales and advertising
expenditure:
𝑆𝑎𝑙𝑒𝑠 = 𝛽1 + 𝛽2 𝑃𝑟𝑖𝑐𝑒 + 𝛽3 𝐴𝑑𝑣𝑒𝑟𝑡 (1)
Where, 𝛽1 , 𝛽2 , 𝛽3 are the unknown parameters.
• A quantitative inference:
Marginal analysis: change in Sales when Advert
increase by one unit:
∆𝑆𝑎𝑙𝑒𝑠 𝜕𝑆𝑎𝑙𝑒𝑠
𝛽3 = = (2)
∆𝐴𝑑𝑣𝑒𝑟𝑡 (𝑃𝑟𝑖𝑐𝑒 ℎ𝑒𝑙𝑑 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡) 𝜕𝐴𝑑𝑣𝑒𝑟𝑡
5/31/2023 5
Where does an economic model
come from?
• From economic/management theory:
• The relationship between advertising and sales?
• The responsiveness of sales to advertising? Which
goods/services are more responsive to advertising?
• From empirical investigation:
• A toolkit model: lack of theoretical foundations?
• Data mining?
• From practices or observations:
• Are all the facts/objects observable?
• A combination of all the above?
• A due literature review is needed.
5/31/2023 6
3
5/31/2023
The econometric model
• Again, the sales-advertising nexus:
𝑆𝑎𝑙𝑒𝑠
= 𝐸 𝑆𝑎𝑙𝑒𝑠 + 𝑒 = 𝛽1 +𝛽2 𝑃𝑟𝑖𝑐𝑒 + 𝛽3 𝐴𝑑𝑣𝑒𝑟𝑡 + 𝑒 (3)
• The general model:
𝑦 = 𝛽1 + 𝛽2 𝑥2 +𝛽3 𝑥3 + ⋯ +𝛽𝐾 𝑥𝐾 + 𝑒 (4)
Where, 𝛽1 , 𝛽2 , … , 𝛽𝐾 are the unknown coefficients to be
estimated.
• The marginal analysis:
∆𝐸(𝑦) 𝜕𝐸(𝑦)
𝛽𝑘 = |𝑜𝑡ℎ𝑒𝑟 𝑥𝑠 ℎ𝑒𝑙𝑑 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 = (5)
∆𝑥𝑘 𝜕𝑥𝑘
5/31/2023 7
The multiple regression plane
5/31/2023 8
4
5/31/2023
Monthly sales, price, and advertising
in Big Andy’s Burger Barn
5/31/2023 9
Estimating the unknown
parameters
• Minimizing the sum of squares function:
𝑆 𝛽1 , 𝛽2 , 𝛽3
𝑁 𝑁
2 2
= 𝑦𝑖 − 𝐸(𝑦𝑖 ) = 𝑦𝑖 − 𝛽1 − 𝛽2 𝑥𝑖2 −𝛽3 𝑥𝑖3 (5)
𝑖=1 𝑖=1
• The least squares estimators:
𝑏1 , 𝑏2 , 𝑏3 correspond to the unknown
parameters/coefficients 𝛽1 , 𝛽2 , 𝛽3 .
• Are the above OLS estimators BLUE?
5/31/2023 10
10
5
5/31/2023
Stata practice: Sales equation
• Data source: [Link]
• To do:
• Variable description and summary statistics
• Conducting OLS estimates of Sales on Price and Advert.
• Interpreting estimated results.
5/31/2023 11
11
OLS estimates for Sales equation
5/31/2023 12
12
6
5/31/2023
The error variance and standard
error (s.e.)
• The error variance:
2 𝑆𝑆𝐸 σ𝑁
𝑖=1 𝑒𝑖Ƹ
2 σ𝑁 ො 𝑖 )2
𝑖=1(𝑦𝑖 −𝑦
𝜎ො = = = (6)
𝑁−𝐾 𝑁−𝐾 𝑁−𝐾
• The standard errors of estimated coefficient 𝑏2 :
𝜎2
𝑣𝑎𝑟 𝑏2 = 2 σ𝑁 (7)
(1 − 𝑟23 ) 𝑖=1(𝑥𝑖2 − 𝑥ҧ2 )2
σ(𝑥𝑖2 −𝑥ҧ2 )(𝑥𝑖3 −𝑥ҧ3 )
𝑟23 = (8)
σ(𝑥𝑖2 −𝑥ҧ2 )2 (𝑥𝑖3 −𝑥ҧ3 )2
5/31/2023 13
13
The standard error (cont’d)
𝜎2
𝑣𝑎𝑟 𝑏2 = 2 σ𝑁 (7𝑎)
(1 − 𝑟23 ) 𝑖=1(𝑥𝑖2 − 𝑥ҧ2 )2
• Factor affecting the variance of 𝑏2 :
• The variance of estimated (random) error 𝜎ො 2 .
• A larger sample size N.
• More variance in an explanatory variable around its
mean.
• A larger correlation between x2 and x3.
5/31/2023 14
14
7
5/31/2023
Sampling properties:
Assumptions of the MRM
• MR1: 𝑦𝑖 = 𝛽1 + 𝛽2 𝑥𝑖2 + … + 𝛽𝐾 𝑥𝑖𝐾 + 𝑒𝑖 , 𝑖 = 1, … , 𝑁
• MR2: 𝐸(𝑦𝑖 ) = 𝛽1 + 𝛽2 𝑥𝑖2 + … + 𝛽𝐾 𝑥𝑖𝐾 ↔ 𝐸 𝑒𝑖 = 0
• MR3: 𝑣𝑎𝑟(𝑦𝑖 ) = 𝑣𝑎𝑟 𝑒𝑖 = 𝜎 2
• MR4: 𝑐𝑜𝑣(𝑦𝑖 ; 𝑦𝑗 ) = 𝑐𝑜𝑣(𝑒𝑖 , 𝑒𝑗 ) = 0, 𝑖 ≠ 𝑗
• MR5: The values of each 𝑥𝑖𝑘 are not random and are not
exact linear functions of the other explanatory variables.
• MR6: 𝑦𝑖 ~ 𝑁[(𝛽1 + 𝛽2 𝑥𝑖2 + … + 𝛽𝐾 𝑥𝑖𝐾 ), 𝜎 2 ] ↔ 𝑒𝑖 ~
N(0, 𝜎 2 ).
5/31/2023 15
15
Model specification:
Omitted variables
• Essential features of model choice:
• Choice of functional forms
• Choice of explanatory variables to be included in the model
• Whether the assumptions of MR1 – MR6 hold
• Omitted variables:
• The econometric model of family income regressed on
husband’s and wife’s years of education:
𝑦 = 𝛽1 + 𝛽2 𝑥2 +𝛽3 𝑥3 + 𝑒
• The omitted-variable bias of omitting wife’s year of
education:
𝑐𝑜𝑣(𝑥 2 , 𝑥3 )
𝑏𝑖𝑎𝑠 𝑏2∗ = 𝐸 𝑏2∗ − 𝛽2 = 𝛽3 (9)
𝑣𝑎𝑟(𝑥2 )
5/31/2023 16
16
8
5/31/2023
Correlation matrix:
Family income data
5/31/2023 17
17
Stata practice:
Family income data
• Data source: edu_inc.dat
• To do:
• Regressing family income (Faminc) on both husband’s
and wife’s years of education (Hedu and Wedu).
• Omitting wife’s years of education in the above
specification.
• Determining upward or downward estimates.
• Adding the number of young children (Kl6) as another
regressor.
5/31/2023 18
18
9
5/31/2023
Family income data: estimated
models
• On both husband’s and wife’s years of education:
• On husband’s years of education only:
• Adding the number of young children:
5/31/2023 19
19
Model specification:
Irrelevant variables
• Adding two artificially generated variables X5 and
X6:
• The consequences of adding irrelevant explanatory
variables:
• Reducing the precision of the estimated coefficients.
5/31/2023 20
20
10
5/31/2023
Choosing the model
• The basis of theoretical and general understanding
of the relationship.
• Unobserved heterogeneity: omitted-variable bias,…
• Significance tests: F- and T-statistics test.
• Using model selection criteria: Adjusted R-squared,
Akaike information criterion (AIC), Schwarz
criterion (BIC): see pages 237 - 8.
• The general specification test (RESET): see pages
238 - 9.
• Violation of the MRM assumptions MR1 – 6?
5/31/2023 21
21
Poor data, collinearity and
insignificance
• The survey data issues:
• Most economic data is non-experimental, or
“uncontrolled” data.
• The solution: Randomized Control Trial (RCT), quasi-
experiments, field or laboratory experiments, but the
disadvantages?
• More systemic issues: regressors no longer
exogenous or independent
• Correlated by definition/construction.
• Co-movement or confounding factors.
• Unobserved heterogeneity.
5/31/2023 22
22
11
5/31/2023
Collinearity
• Consequences:
• The standard errors are large, leading to insignificant
estimates of the coefficients/parameters.
• Sensitive estimators, due to addition or deletion of a few
observations, or variables.
• An example:
• Data source: [Link]
• To do: (i) regressing energy consumption (miles per
gallon, MPG) on number of cylinders; (ii) adding on
engine displacement (ENG) and vehicle weight (WGT).
5/31/2023 23
23
Collinearity:
Estimated models of car data
• On the number of cylinders:
• Adding on engine displacement and vehicle weight:
5/31/2023 24
24
12
5/31/2023
Identifying and mitigating
collinearity
• Identifying:
• Excessively high partial correlation or R-squared
obtained from the OLS regression of one explanatory
variable on all the remaining explanatory variables. As a
rule of thumb, say above 0.8.
• Mitigating:
• A better sample.
• Using non-sample information, for example, using priors
to impose some restrictions on the parameters.
• Production technology: CTS, IRT, …
5/31/2023 25
25
Next week
• Lecture 2: Using indicator variables
• Indicator and qualitative factors.
• Application.
• Log-linear and log-log model.
• Treatment effects.
• Required reading: Chap. 4&7 (Hill et al., 2011).
5/31/2023 26
26
13