0% found this document useful (0 votes)
153 views61 pages

CH 02 Simple Regression TQT

Uploaded by

fk2bfn4mxb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
153 views61 pages

CH 02 Simple Regression TQT

Uploaded by

fk2bfn4mxb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 61

Chapter 2: The Simple Linear Regression

1. The definition of simple linear regression (SLR) model

2. The method of Ordinary Least Square (OLS)

3. Interpretation of the SLR model

4. Properties of the OLS estimator

5. Unit of measurement and functional form

6. Assumptions of the OLS estimator

7. Mean and variances of the OLS estimator


2.1. The definition of simple linear regression model

Econometrics is based on techniques such as regression


analysis and hypothesis testing.

What is regression analysis?


“Regression analysis is concerned with the study of the dependence of one variable, the dependent

variable, on one or more other variables, the explanatory variables, with a view to estimating and/or
predicting the (population) mean or average value of the former in terms of the known or fixed (in
repeated sampling) values of the latter” [2, p.18].

-Regression studies the dependence of one variable (the dependent variable) on


other variables (the explanatory or independent variables).

-The goal of regression analysis is to estimate or predict the population mean of


one variable on the basis of the known or fixed values of the other variables.
Simple linear regression (SLR) is a linear regression model with
a single explanatory variable (Cont)

𝛛
The slope parameter ( 𝛽1 = 𝛛 𝑦 ) shows how much 𝑦 changes if 𝑥 increases by one-
𝑥
𝛛𝑢
unit. But this interpretation is only correct if all other factors are constant ( = 0).
𝛛𝑥

Intercept parameter or constant term

Dependent variable,
explained variable, Independent variable, Error term
response variable, explanatory variable, unobservables,
predicted variable, control variable, disturbance,
regressand… pridictor white noise,..
regressor,…
Simple linear regression: graphical presentation of the coefficients
(Cont)

𝛽1: Constant slope indicates that a one-unit change in X has the same effect
on Y regardless of X's innial values.
Simple linear regression: (Cont)

Sunlight, rainfall, moisture


Quantify the effect of fertilizer on diseases, soil fertility,...
output, holding all other factors constant

Work experience, ethnicity, gender, ability,...

Measures the change in wage


given an addtional year of formal schooling,
holding all other factors fixed
2.2. The method of Ordinary Least Square
(OLS)
The population linear regresson model (PRM):
𝑌 = 𝛽0 + 𝛽1𝑋 + 𝒖
We have the population linear regression model (PRM):
 𝑌 and 𝑋 are two variables that describe the properties of the population under
consideration: 𝑌 is explained by 𝑋 𝑜𝑟 ℎ𝑜𝑤 𝑌 𝑣𝑎𝑟𝑖𝑒𝑠 𝑤𝑖𝑡ℎ 𝑐ℎ𝑎𝑛𝑔𝑒𝑠 𝑖𝑛 𝑋.
𝒀 and 𝑿 are 𝐨𝐛𝐬𝐞𝐫𝐯𝐞𝐝
𝒖 is 𝐚𝐧 𝐞𝐫𝐫𝐨𝐫 𝐭𝐞𝐫𝐦, representing
𝑌 = 𝛽0 + 𝛽1𝑋 + 𝒖 all other unobserved factors
than 𝑋 that affect Y.

𝜷𝟎,𝜷𝟏are 𝐮𝐧𝐤𝐨𝐰𝐧 population parameters

The goal of the linear regression is to estimate the population mean of the
dependent variable on the basis of the known values of the independent
variable(s).
To estimate 𝑬 𝒚 𝒙 which is known as the conditional expectation function or
the population linear regression function.
The conditional mean: E(𝒘𝒂𝒈𝒆|𝒆𝒅𝒖)
(Cont)
The conditional mean: E(𝒚|𝒙)
(Cont)

Distribution of wages given


education=the college level
The population linear regression function (PRF):
E(𝒚|𝒙)= 𝜷𝟎 + 𝜷𝟏𝒙
(Cont)
Under some certain assumptions, we can capture a ceteris paribus relationship
between 𝑦 and 𝑥

 -Linear in parameters 𝛽0 and 𝛽1


 -Zero conditional mean assumption: E 𝒖 𝒙 = 𝟎: The average value of 𝑢
does not depend on the value of 𝑥 and equals zero.

-This assumption implies that 𝐸(𝑢) = 0 and 𝐶𝑜𝑣(𝑥, 𝑢) = 0

Take the expected value of the PRM (𝑦 = 𝛽0 + 𝛽1𝑥 + 𝑢) on x using the zero conditional mean
assumption, we have the population regression function (PRF):
E(𝒚|𝒙) = 𝐸(𝛽0 + 𝛽1𝑥 +𝑢 𝑥
E(𝒚|𝒙) = 𝐸(𝛽0 𝑥 + 𝐸(𝛽1𝑥 𝑥 + 𝐸
E(𝒚|𝒙) 𝑢𝑥
The average value
= of y𝜷can be expressed as a linear function of
𝟎 + 𝜷𝟏 𝒙
x
Zero conditional mean assumption: E 𝒖 𝒙 = 𝟎
(Cont)
This assumption means that: 𝑢 repespents other unobservable factors does not
have a systematical effect on 𝑌. Why? This assumption is critical for causal analysis; it cannot
be tested statistically and has to be argued by economic
theory.

𝑢𝑖 = The observed value − Mean

The positive and negative values of


𝑢𝑖 cancel out each other, which makes their
average or mean effect on Y equal to zero.

Note that 𝒙𝟏,𝒙𝟐,, 𝒙𝟑,…here refer to 𝒙𝒊, and not different variables
Zero conditional mean assumption: E 𝒖 𝒙 = 𝟎
(Cont)
E(𝒖|𝒙)= 𝟎: This assumption is a strong assumption for cetetis paribus analysis;
it cannot be tested statistically and has to be argued by economic theory.

𝑊𝑎𝑔𝑒 = 𝛽0 + 𝛽1𝑒𝑑𝑢 +𝑢 𝑢: ability

To capture the ceteris paribus relationship between wage and education, we


have to assume that the average ability is the same for all levels of education.
E.g. E (𝑎𝑏𝑖𝑙𝑖𝑡𝑦|𝑒𝑑𝑢 = 0 )=… E (𝑎𝑏𝑖𝑙𝑡𝑦|𝑒𝑑𝑢 = 6 )… =E (𝑎𝑏𝑖𝑙𝑡𝑦|𝑒𝑑𝑢 =
12 )=…
E(𝑎𝑏𝑖𝑙𝑡𝑦|𝑒𝑑𝑢 = 16)=… E(𝑎𝑏𝑖𝑙𝑡𝑦|𝑒𝑑𝑢 = 22)=0.
𝑬 𝒘𝒂𝒈𝒆 𝒆𝒅𝒖 = 𝜷𝟎 +
𝜷𝟏𝒆𝒅𝒖

Note: 𝐸(𝑢) = 0 : the assumption for defining the intercept, 𝜷𝟎


𝐼𝑠 𝑡ℎ𝑖𝑠 𝑎𝑠𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 𝑙𝑖𝑘𝑒𝑙𝑦 𝑡𝑜 𝑏𝑒 𝑣𝑖𝑜𝑙𝑎𝑡𝑒𝑑?
E(𝑢|𝑥)= 0: the assumption with impact, 𝜷𝟏
Zero conditional mean assumption: E 𝒖 𝒙 = 𝟎
(Cont)
𝑢: land quality
𝑅𝑖𝑐𝑒 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝛽0 + 𝛽1𝑓𝑒𝑟𝑡𝑖𝑙𝑖𝑧𝑒𝑟 +𝑢
To capture the ceteris paribus relationship between 𝑅𝑖𝑐𝑒 𝑜𝑢𝑡𝑝𝑢𝑡
and 𝑓𝑒𝑟𝑡𝑖𝑙𝑖𝑧𝑒𝑟, we have to assume that the average quality of
land is independent of the amount of fertilizer.
In other words: E(𝒖|𝒇𝒆𝒓𝒕𝒊𝒍𝒊𝒛𝒆𝒓) = 𝟎 holds if the amount of fertilizer is
applied independently of other plot characteristics (e.g. land quality)
𝑬 𝑶𝒖𝒕𝒑𝒖𝒕 𝒇𝒆𝒓𝒕𝒊𝒍𝒊𝒛𝒆𝒓 = 𝜷𝟎 + 𝜷𝟏𝒇𝒆𝒓𝒕𝒊𝒍𝒊𝒛𝒆𝒓
𝐼𝑠 𝑡ℎ𝑖𝑠 𝑎𝑠𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 𝑙𝑖𝑘𝑒𝑙𝑦 𝑡𝑜 𝑏𝑒 𝑣𝑖𝑜𝑙𝑎𝑡𝑒𝑑?
The sample linear regression function:
SRF ��ෝ
𝒊 = 𝜷෡ 𝟎 +
𝑇ℎ𝑒 𝑎𝑐𝑡𝑢𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠
��ෝ 0
𝜷෡𝟏𝒙𝒊 𝑖 =
+ 𝛽1መ 𝑥𝑖: sample linear regression
𝐼𝑛 ෣𝑐𝑜𝑚𝑒function
= 815 + 53𝐸𝑑𝑢
𝑖 𝑖
(Cont)
𝛽መ

𝐵𝑦 𝑗𝑜𝑖𝑛𝑖𝑛𝑔 𝑎𝑙𝑙 𝑐𝑜𝑛𝑡𝑖𝑜𝑛𝑎𝑙 𝑚𝑒𝑎𝑛 𝑣𝑎𝑙𝑢𝑒𝑠 𝒇𝒊𝒕𝒕𝒆𝒅 𝒗𝒂𝒍𝒖𝒆𝒔


𝑤𝑒 𝑜𝑏𝑡𝑎𝑖𝑛 𝑡𝒉𝒆 𝒓𝒆𝒈𝒓𝒆𝒔𝒔𝒊𝒐𝒏 𝒍𝒊𝒏𝒆

The average income for all household heads with 12 years of education is E(Income|edu=12)=815+53*12=1451

Note: But it is false to interpret that evevery household head with 12 years of education will earn 1451
The sample linear regression function (SRF)
(Cont)
��ෝ is called as “Y-hat” or “Y-cap”,

��ෝ𝒊 is the estimator of 𝐸 𝑦 𝑥𝑖 , which also means the fitted or predicted value
of 𝒚𝒊

𝜷෡𝟎 is the estimator of 𝛽0

𝜷෡𝟏 is the estimator of 𝛽1


Population Sample
��ෝ is the estimator of the error term, which𝑦is𝑖 =
also
𝛽መknown
+ 𝛽መas𝑥the
+ residual
�ො
Model 𝑦 = 𝛽0 + 𝛽1𝑥 + 𝑢 0 1 𝑖 𝑖
�� 𝒊 = 𝒚𝒊- �� 𝒊
ෝ ෝ
Population linear regression model (PRM) Sample linear regression model (SRM)
The 𝒔𝒖𝒃𝒔𝒄𝒓𝒊𝒑𝒕 𝒊 is used to index the observations
��ෝ of a sample. = 𝛽መ
Function E(𝑦|𝑥) = 𝛽0 + 𝛽1𝑥 𝑖
+ 𝛽መ 𝑥
0 1 𝑖
Population linear regression function (PRF)
Sample linear regression function (SRF)
What is OLS?
Answer:
𝑶𝑳𝑺 𝒊𝒔 𝒂𝒏 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒐𝒓 𝒇𝒐𝒓 𝜷෡𝟎, 𝜷෡𝟏 𝒕𝒐 𝒎𝒊𝒏𝒊𝒎𝒊𝒛𝒆 𝒕𝒉𝒆 𝒔𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒔𝒒𝒖𝒂𝒓𝒆𝒅 𝒓𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: 𝒊=𝟏 � 𝒊
(Cont) �ෝ
σ𝒏 𝟐
Deriving the OLS estimators
(Cont)
𝐿𝑒𝑡 𝑥𝑖 , 𝑦𝑖 : 𝑖 = 1,2, … 𝑛 denote a random sample size n from
the population
Let 𝛽መ 𝑎𝑛𝑑 𝛽መ are possible values of the
population
0
parameters
1
𝛽0and 𝛽1
From the sample, we have to find 𝛽መ 0 and 𝛽1መ to minimize
(the residual sum of squares)
the 𝑅𝑅𝑆
𝑛 𝑛
𝑅𝑅𝑆 = ෍ ��ො
𝑖=1 (𝑦𝑖 − 𝛽0 − 𝛽1 𝑖 𝑥 ) →
መ መ 2
𝑖=1 2 𝑖= ෍
Take partial derivatives of RSS with 𝑚𝑖𝑛
respect to 𝛽መ 𝑎𝑛𝑑 𝛽መ .
0 1
Set each of the partial derivatives to zero
Solve for {𝛽መ 𝑎𝑛𝑑 𝛽መ } and replace them with the
solutions
0 1

𝛽መ0𝑎𝑛𝑑 𝛽መ1 are chosen to make the residuals add


up to zero
OLS estimators
(Cont)

𝛽መ0 = 𝑦ത −
መ σ 𝑛𝑖=1(𝑥𝑖−𝑥ҧ) 𝐶𝑜𝑣(𝑥,𝑦)
𝛽 = 𝑥𝑦−𝑥ҧ.𝑦 = =
𝛽1 𝑥ҧ
መ 1
𝑥ത2 − 𝑥ҧ σ 𝑛𝑖=1 (𝑥ത
(𝑦 −𝑦 𝑖 −𝑥
) ҧ)
2 𝑉𝑎𝑟(𝑥)
𝑖
These sample2 functions are the OLS estimators for
𝛽0and 𝛽1.

For a given sample, the numerical values of 𝛽0መ 1


called the OLS estimates for 𝛽0and 𝛽1.
𝑎𝑛𝑑 𝛽መ are
We have derived the formula for calculating the OLS estimates of our
parameters, but you don’t have to compute them by hand.

Regressions in Stata are very easy and simple, to run the regression of y on
x,
just type: reg y x
Estimators vs estimate
 An estimator, also referred to as a "sample" statistic, is a
rule, formula, or procedure that explains how to estimate
the population parameter from the sample data.
 An estimate is a specific numerical value generated by the
estimator in an application.

� 𝑎𝑔𝑒𝑖 = 𝛽መ0 + 𝛽መ1𝑒𝑑𝑢𝑖


Estimator: 𝛽0መ ,1

� 𝑎𝑔𝑒𝑖 = 815 + 53𝑒𝑑𝑢𝑖


𝛽መ

Estimate: 815, 53
How do we fit the regression line �ෝ ෡ ෡
𝒊 = 𝜷 𝟎 + 𝜷 𝟏𝒙𝒊 to the
Answer: �
data? W𝑒 𝑤𝑖𝑙𝑙 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑡ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑞𝑢𝑎𝑟𝑒𝑑 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠: 𝑀𝑖𝑛 σ 𝑖=1 �𝑖
𝑛

(Cont)
2

𝒚𝒊
𝒚𝒊>1328

��ෞ
𝑖 = 𝒓𝒆𝒔𝒊𝒅𝒖𝒂𝒍= 𝒚𝒊 -
ෞ𝒚𝒊

� 𝑦𝑖
�ෞ � ��ෝ𝒊 =
𝑖 � 𝑓𝑖𝑡𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
𝒚𝒊 <1328

𝒚𝒊
𝒊

��ෝ = 𝟖𝟏𝟓 +
𝟓𝟑𝒙
The Difference Between the Residual and the
Error
(Cont)
2.3. Interpretation
��ො = 𝛽መ + 𝛽መ 𝑥: the
estimated or sample regression function
0 1

𝛽1መ : the Δ𝑥
: the constant or
𝛽0 Δ�ො 1
slope=��

intercept, which shows

the average value of the = 𝛽መ by
Δ𝑥 which
which yො
indicates the average changes
amount
when 𝑥 other
unit, holding increases
factors byinone
the model
dependent variable ��ෝ constant.
when the independent
variable
𝒙 is set to zero. 𝛽1መ >0: a possitie association between 𝑦 and
𝑥1መ
𝛽 <0: a negative association between 𝑦 and
𝑥1መ =0: no association between 𝑦 and
𝛽
𝑥
Meat consumption and income in Lao Cai
(Cont) Monthly meat consumption per
household (1000 VND)
Monthly household income per
capita (1000 VND)

Fitted regression: 𝑀෣𝑒𝑎𝑡 299 + 0.161 𝐼𝑛𝑐𝑜𝑚𝑒


=

𝜷෡𝟎: The intercept 𝜷෡𝟏: The slope estimate of 0.161 means that
(constant) of 299 means a household's meat consumption would increase
that a household with by
has a predicted meat 0.161 thousand VND if their per capita
zero income
consumption of 299 income
thousand VND. increased
What is the by one thousand
average VND. for households with
meat consumption
a monthly income per capita of one million VND?

𝑀෣𝑒𝑎𝑡 = 𝐸 𝑚𝑒𝑎𝑡 𝑖𝑛𝑐𝑜𝑚𝑒 = 1000


= 299 + 0.161 ∗ 1000 = 460 𝑡ℎ𝑜𝑢𝑠𝑎𝑛𝑑 𝑉𝑁𝐷
Wage and education among workers in FDI enterprises in Hanoi, 2018
(Cont) Monthly wage (1000 VND) Years of education

෣ −856 + 754 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛


Fitted regression:𝑊 𝑎𝑔𝑒 =

෡𝟎: The intercept (constant)


𝜷 𝜷෡𝟏: The slope estimate of 754 means
of -856 means that all that the average predited wage for all
workers without education workers with 12 years of education is -
has a predicted wage of -856 856+(754
thousand VND/month. *12)=8192 thousand VND/month (about 8.2
million VND)
On average, each additional year
Is this meaningful? of education would
754*1=754
increase
thousand
wageVND/month.
by

The result should not be interpreted as a causal effect.


Intercept often does not make sense to interpret
(Cont)
𝑾෣𝒂𝒈𝒆 = −𝟖𝟓𝟔 + 𝟕𝟓𝟒 𝑬𝒅𝒖𝒄𝒂𝒕𝒊𝒐𝒏

𝜷෡
: The intercept NOTE
(constant) of -856 means ෡𝟎: the intercept or constant often has no
𝟎
that all workers without
𝜷
education has a predicted real
wage of -856 thousand meaning for some reasons:
 Zero settings for all independent variables are often
VND/month.
impossible or irrational. (e.g., can we set
Is this meaningful? a household's food consumption to zero?).

 The intercept might be outside of the observed data


E(�ෝ
(e.g., no worker without education).
)=0
 Estimating the intercept is done to make sure that the
residual mean is equal to zero, which makes
the intercept meaningless.
The intercept is outside of the data range
𝑾෣𝒂𝒈𝒆 = −𝟖𝟓𝟔 + 𝟕𝟓𝟒 𝑬𝒅𝒖𝒄𝒂𝒕𝒊𝒐𝒏 (Cont)

The intercept = - 856 when setting the independent variable (education) to zero ( zero years of
education).

But the data shows that education has the smallest value of 6 (6 years of education).

Therefore, we can’t interpret the intercept because it is outside of the data range.
The intercept = - 856 when setting the independent variable (education) to zero ( zero years of
education).

But the data shows that education has the smallest value of 6 (6 years of education).

Therefore, we can’t interpret the intercept because it is outside the range of the study data.
𝜷෡𝟎:The intercept absorbs the bias for the regression model.
(Cont)

 The omission of some relevant variables in the model.


e.g., some other factors than education, such as work experience or ability, can affect wage.

 This omission can cause the bias: 𝐸


�ෝ 𝑖
≠0
e.g., the residuals may have an overall positive 𝐸
�ෝ 𝑖 > 0 or negative mean 𝐸
�ෝ 𝑖 < 0

 The intercept prevents this bias by compelling


the residual mean to equal zero: 𝐸
𝜷෡𝟎:The intercept absorbs the bias for the regression model.
The regression model can always be redefined with the same slope but a new intercept and error, where the
new error has a zero mean: 𝐸 �ෝ 𝑖 =0
(Cont)

𝐸
��ෝ
𝑖 <0
𝐸
��ෝ
𝑖 =0

𝜷𝒂

𝛽
መ0
𝜷𝒃
𝐸
��ෝ
𝑖 >0
2.5.
2.4. Properties of the OLS estimator: fitted values and residuals
Pr
��ෝ = 356.067 + 0.143𝑿=3560.667+0.143*3000=785.42 ope
rtie
s of
𝑦-��ෝ
the : Residuals
𝑿 𝒚 ��ෝ:
Obs OL(෢𝒖)
Predicted 𝑦 S
esti
1 3000 995 785.423017 ma
209.577
2 8208 2900 1530.78546
tor
1369.215
3 3613 1450 873.15481 576.8452
4 4624 1460 1017.84787 442.1521
5 4751 510 1036.02395 -526.024
6 5151 760 1093.27145 -333.271
7 5884 1005 1198.17749 -193.177
8 2696 100 741.914917 -641.915
9 2485 912 711.716861 200.2831
10 8860 570 1624.09889 -1054.1
Fitted values and residuals
𝑀෣𝑒𝑎𝑡 = 356 + 0.143
𝐼𝑛𝑐𝑜𝑚𝑒
(Cont)
Some algebraic properties of OLS
(Cont)
From the first order conditions of OLS, we have some algebraic properties of OLS:
 The sum and mean of the residuals will aways equal zero
𝑛

෍ ෝ
𝑖 = 𝐸 ��ෝ𝑖

=0
𝑖=1
𝐶 ෢𝑜𝑣 𝑥𝑖, ��ෝ𝑖 = 0 𝑖=1 𝑥𝑖��ෝ𝑖
 The residuals will be uncorrelated with the=independent
 The residuals will be uncorrelated with the fitted or0predicted values
variable or σ 𝑛

𝐶෢𝑜𝑣 𝑦ෝ
𝑖, ��ෝ𝑖
=0
 Sample averages of y and x lie on regression line
𝑌 ത = 𝛽መ + 𝛽መ 𝑋
0
1
 The average of the predicted values is equal to the average of actual
Some algebraic properties of OLS
(Cont)
Observation X Y Predicted Y: Residuals:

1 995 𝑌෠ 785.423017 �ො 209.576983

3000 2900 1530.78546 1369.214536

2 1450 873.15481 576.845190

1460 1017.84787 442.152134


8208
510 1036.02395 -526.023947
3
760 1093.27145 -333.271447
3613
1005 1198.17749 -193.177490
4
100 741.914917 -641.914917
4624
912 711.716861 200.283139
5
570 1624.09889 -1054.098889
4751
512 561.585292 -49.585292
6
Mean 𝐸 � =0.00000000
𝑌ത=1015.818 𝑌ത෠=1015.81818
5151
�ෝ
Sum σ 𝑖𝑛𝑖=1 ��ෝ𝑖=0.00000000
7 =0.0000
Cov(𝑥𝑖,
5884 =0.00000
Cov(
�ෝ �ෝ
Decomposition of total variation
TSS ESS
��ො RSS
(Y-𝑌෠)
𝑌ത 𝑌 𝑌෠ (𝑌 − ( 𝑌෠ − 𝑌ത)2 ��ො
𝑌ത)2 2

1015.818182 995 785.4230167 433.3966942 53081.93213 209.576983 43922.51


1015.818182 2900 1530.785464 3550141.124 265191.3019 1369.214536 1874748
1015.818182 1450 873.1548101 188513.8512 20352.83762 576.845190 332750.4
1015.818182 1460 1017.847866 197297.4876 4.119617482 442.152134 195498.5
1015.818182 510 1036.023947 255852.0331 408.2729503 -526.023947 276701.2
1015.818182 760 1093.271447 65442.94215 5999.008273 -333.271447 111069.9
1015.818182 1005 1198.17749 117.0330579 33254.91739 -193.177490 37317.54
1015.818182 100 741.9149168 838722.9421 75022.99858 -641.914917 412054.8
1015.818182 912 711.7168607 10778.21488 92477.61353 200.283139 40113.34
1015.818182 570 1624.098889 198753.8512 370005.4186 -1054.098889 1111124

1015.818182 512 561.5852924 253832.7603 206327.5178 -49.585292 2458.701

TSS (Total sum of squares) 5559885.636

ESS (Explained sum of squares) 1122125.939

RSS (Residual sum of squares) 4437760


Goodness-of-fit measure (R-squared)
Total sum of squares Explained sum of Residual sum of squares
squares

𝑻𝑺𝑺 = σ 𝑛 (𝑦 𝑛 𝑹𝑺𝑺 = σ 𝑛
𝑖=1
(��ෝ𝑖)2,
𝑦ത)2 𝑬𝑺𝑺 = ෍
where
𝑖=1 (�ො �ෝ
𝑖 𝑦ത)
2

𝑖− 𝑖=𝑦−
𝑖 ��ො
𝑖=1
𝑖

TSS represents total variation in the ESS represents variation RSS represents variation not
dependent variable(s)
explained by regression explained by regression

Total variation Explained part Unexplained part


Goodness-of-fit measure:R-squared or the coefficient of determination: measures
the proportion of the total variation in 𝑌 that is explained by the regression.

𝐸𝑆𝑆 𝑅𝑆𝑆
𝑅2 = = 1−
𝑇𝑆𝑆 𝑇𝑆𝑆
Graphic presentation for decomposing the total variation
Goodness-of-fit measure (R-squared)

𝑊෣𝑎𝑔𝑒 = −856 + 754


𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 The regression explains 17.2 %
of the total variation in monthly wage

N=265; 𝑅2=0.172

𝑀෣𝑒𝑎𝑡 = 356 + 0.143


𝐼𝑛𝑐𝑜𝑚𝑒
The regression explains 20.2 %

N=11; 𝑅2=0.201 of the total variation in meat consumption

Note: High R-squared doesn't mean it has a causal interpretation.


2.5. Unit of measurement and functional form
Unit of measurements
Change in the measurement unit of the dependent variable 𝑌 Intercept 𝜷𝟎 Slope coefficient 𝛽1
Y is divided by the constant 𝐶: 𝑌/𝐶 𝜷𝟎/𝐂 𝜷𝟏/𝐂
Y is multiplied by the constant 𝐶: 𝑌∗𝐶 𝜷𝟎 ∗ 𝐂 𝜷𝟎 ∗ 𝐂

� 𝑎𝑔𝑒 = 608 + 69.6edu wage is measured in 1.000 VND


� 𝑎𝑔𝑒 = 0.608 + 0.0696𝑒𝑑𝑢 wage is measured in 1.000.000 VND: Wage/1000

Change in the measurement unit of the in dependent variable 𝐗 Intercept 𝜷𝟎 Slope coefficient 𝛽1
X is divided by the constant 𝑪: 𝐗/𝐶 𝜷𝟎 𝛽1 ∗ 𝐶
X is multiplied by the constant 𝑪: 𝐗∗𝐶 𝜷𝟎 𝛽1/𝐶

� 𝑎𝑔𝑒 = 608 + 70edu education is measured in years.


� 𝑎𝑔𝑒 = 608 + 𝟓. 𝟖𝑒𝑑𝑢 education is measured in months. Edu*12 ( months)
Functional forms
Cont
w𝑎𝑔𝑒 = 𝜷𝟎 + 𝜷𝟏𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 + 𝑢
 Each additional year of education has the same effect on wages. Is this
reasonable?
 In fact, the effect may be greater at higher levels of education.

𝑊𝑎𝑔𝑒 = 𝑒 (𝜷𝟎 +𝜷𝟏 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛) , 𝛽 > 0 and 𝑢=0
1
Functional forms
(Cont)
 We can model how each year of education increases wages by a constant
percentage.
 From the exponential function: 𝑊𝑎𝑔𝑒 = 𝑒(𝜷 𝟎 +𝜷 𝟏 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛+𝑢) , an approximately
constant percentage change effect can be modeled as:
𝐿𝑜𝑔 𝑤𝑎𝑔𝑒 = 𝛽0 + 𝛽1𝑒𝑑𝑢 + 𝑢, where log wage is the natural logarithm of wage
𝛛 log(𝑤𝑎𝑔𝑒) 𝛛𝑤𝑎𝑔𝑒 Percentage change of wage if education increases by one
𝑤𝑎𝑔𝑒 year
𝛽1 = 𝛛 𝑒𝑑𝑢 = 𝛛 𝑒𝑑𝑢
If ∆ 𝑢 = 0, then we have %∆𝑤𝑎𝑔𝑒 ≈(100*𝛽1) ∆𝑒𝑑𝑢 if ∆𝑒𝑑𝑢 is very small.
We multiply 𝛽1 by 100 to obtain the perentage change given one extra year of education

𝐿𝑜𝑔෣(𝑤𝑎𝑔𝑒) = 8.070 + 0.0436 𝑒𝑑𝑢; 𝑅2 =0.15

%∆𝑊෣𝑎𝑔𝑒 ≈(100*0.0436 ∗ 1) = 4.36%

𝑊෣𝑎𝑔𝑒 𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑒𝑠 𝑏𝑦 𝒂𝒃𝒐𝒖𝒕 4.36% 𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑎𝑑𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙 𝑦𝑒𝑎𝑟 𝑜𝑓 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛

Important notes:
 Exact %∆𝑤𝑎𝑔𝑒= exp(0.0436*1)-1= .04456445 =4.45%
Functional form
(Cont)
2.6. Standard assumptions for the simple linear regression model

 Assumption SLR.1 (Linear in parameters)

 Assumption SLR.2 (Random sampling)

 Assumption SLR.3 (Sample variation in the explanatory variable)

 Assumption SLR.4 (Zero conditional mean)

 Assumption SLR.5 (Homoskedasticity)


Assumption SLR.1 (Linear in parameters)

The relationship between Y and X is linear in parameters.

(1) Linear in both parameters and the variables.


𝑦 = 𝛽0 + 𝛽1𝑥 + 𝑢
This implies that a one-unit change in X has the same effect on Y regardless of X’s innial values.
𝑦 is a linear function of 𝑥 and the regression curve is a straight line.

(2) Linear in parameters but non-linear in variables

𝑦 = 𝛽0 + 𝛽1𝑥2 + 𝑢

(3) Linear in variables but non-linear in parameters


𝑦 = 𝛽0 + 𝛽12𝑥 + 𝑢 𝐨𝐫 𝑦 = 𝛽0 + exp 𝛽1 𝑥+𝑢

From now on, the term “linear regression” means linearity in parameters (𝛽𝑠).
Both regression models are linear in parameters.
(Cont)
English= 𝑦 = 𝛽0 + 𝛽1𝑖𝑛𝑐𝑜𝑚𝑒 + 𝑢 Math= 𝑦 = 𝛽0 + 𝛽1𝑖𝑛𝑐𝑜𝑚𝑒 + 𝛽2𝒊𝒏𝒄𝒐𝒎𝒆𝟐 +𝑢
Stata command: curvefit english income, function (1) Stata command: curvefit math income, function (4)

1,000,000 VND/month 1,000,000 VND/month


Assumption SLR.2 (Random sampling)
(Cont)

The data is a random sample drawn


from the population.
It means that every member of the
population has an equal chance of being
selected for the sample.
This assumption is likely to be violated.
Why?
Assumption SLR.3 (Sample variation in the independent variable): σ 𝑛𝑖=1 (𝑥𝑖 −
𝑥ҧ)2 >0
(Cont)

 Sample variation in the independent variable means that


the values of the independent variable are not the
same.

σ 𝑛𝑖=1 (𝑥𝑖 −
𝑥ҧ)2 >0
 If the values of the independent variables are identical, it

is impossible to estimate 𝛽1መ because the


መ denominator=0.
𝑛
𝛽1 = 𝑖=1 (𝑥𝑖−𝑥ҧ)
σ σ 𝑛𝑖=1 (𝑥𝑖 −
σ 𝑛𝑖=1 (𝑥ത
(𝑦𝑖−𝑦 𝑖 −𝑥
) ҧ 𝑥ҧ)2 =0
)2
No variation in education: Every employee has the same level of education: 12 years.

(Cont)
Assumption SLR.4 (Zero conditional mean): 𝑬(𝒖|𝒙) = 𝟎
(Cont)
We have already discussed about this crucial assumption

 Given any value of the explanatory variable, the expected value of the error
term is zero: 𝐸 𝑢 𝑥 = 0;
 In other words: the value of the independent variable (X) must contain
no information about the mean of the unobservables (u).
 Note: 𝐸 𝑢 𝑥 = 0: There is no linear or non-linear
relationship
between 𝑥 and 𝑢.
 𝐸 𝑢𝑥 = 0 implies that the explanatory variable is exogenous:
𝐶𝑜𝑣 𝑥, 𝑢 = 𝐸 𝑥𝑢 = 0: There is no linear association between 𝑥 and 𝑢.
Questions:
𝑪𝒐𝒗 𝒙, 𝒖 =0 implies that 𝑬 𝒖 𝒙 = 𝟎?
𝑪𝒐𝒗 𝑥, 𝑢 = 𝟎 𝑜𝑟 𝑪𝒐𝒓𝒓(𝑥, 𝑢) =0 does not implies
that 𝑬 𝒖 𝒙 = 𝟎
Assumption SLR.5 (Homoskedasticity): V𝐚𝐫(𝒖|𝒙) = 𝝈𝟐
(Cont)

Given any values of the explanatory variable,


the error term has the same variance.
Var(𝑢|𝑥) = 𝝈𝟐 which is the same as Var(𝑦|𝑥) = 𝝈𝟐
In other words: the value of the independent variable (X)
must contain no information about the variance of
the unobservables (u).
E.g.: Even though the average wage goes up with education, the spread of
wages around the mean is assumed to stay the same at all levels of education.
Note: Because Var(𝑢|𝑥) = Var 𝑦 𝑥 , heteroskedasticity occurs
whenever V𝑎𝑟 𝑦 𝑥 is a function of 𝑥.
Homoskedasticity vs heteroskedasticity
(Cont)
𝑯𝒐𝒎𝒐𝒔𝒌𝒆𝒅𝒂𝒔𝒕𝒊𝒄𝒊𝒕𝒚 For any value of x, the variance of y is similar. 𝑯𝒆𝒕𝒆𝒓𝒐𝒔𝒌𝒆𝒅𝒂𝒔𝒕𝒊𝒄𝒊𝒕𝒚: Wage variation around the mean
is unconstant, increasing with education level
Heteroskedasticity before and after logarithm transformation (Cont)

𝑯𝒆𝒕𝒆𝒓𝒐𝒔𝒌𝒆𝒅𝒂𝒔𝒕𝒊𝒄𝒊𝒕𝒚: Wage variation around the 𝑯𝒐𝒎𝒐𝒔𝒌𝒆𝒅𝒂𝒔𝒕𝒊𝒄𝒊𝒕𝒚: Variation in log wage around the
mean tends to increase with 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 𝑙𝑒𝑣𝑒𝑙𝑠 mean is relatively constant across all education leves.
W𝑎𝑔𝑒 = 𝛽0 + 𝛽1𝑒𝑑𝑢 𝐿𝑜𝑔_W𝑎𝑔𝑒 = 𝛽0 + 𝛽1𝑒𝑑𝑢
Stata command: Stata command:
gen Log_Wage=ln(wage)
Reg wage edu
Reg Log_Wage edu
hettest hettest
Gauss-Markov assumptions of the Simple Linear
Regression (SLR)
Under SLR.1-SLR.4, 𝑡ℎ𝑒 𝑂𝐿𝑆 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟 𝛽𝑗෠ is unbiased: 𝑗 𝑗
Under
E(𝛽෠ )=
Under 𝛽 𝑡ℎ𝑒 𝑂𝐿𝑆𝑡ℎ𝑒
SLR.5,
SLR.1-SLR.5, 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟 𝛽𝑗መ has𝛽መ
𝑂𝐿𝑆 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟 thefor
smallest
𝛽 is thevariance among
best linear other linear
unbiased unbiased
estimator
estimator.
(BLUE)
𝑗 𝑗

1. Assumption SLR.1 (Linear in parameters)


2. Assumption SLR.2 (Random sampling)
3. Assumption SLR.3 (Sample variation in the explanatory variable)
4. Assumption SLR.4 (Zero conditional mean: E(u|x=0))
5. Assumption SLR.5 (Homoskedasticity: Var(u|x= 𝝈𝟐))

What happens if any of our four assumptions SLR.1–SLR.4 is not satisfied?

What happens if the assumptions SLR.1–SLR.4 are satisfied but the assumption SLR.5 is not?
2.7. Mean and variances of the OLS estimator
Interpretation of unbiasedness
 Under SLR.1-SLR.4, 𝑡ℎ𝑒 𝑂𝐿𝑆 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟 𝛽𝑗෠ is unbiased: E(𝛽𝑗෠ )=𝑗𝛽
 Unbiasedness does not imply that, with a given sample, our estimated parameters would
equal the exact true values of the population parameters.
 In a given sample, estimates may larger (𝛽෠0 > 𝛽0 ; 𝛽1 ෠ >1 𝛽 ) or smaller 0(𝛽෠ 0<𝛽 1; 𝛽෠
1 <𝛽 )
the true values.
than
Instead, the unbiasedness should be interpreted that:

o if sampling is repeated from the same population;


o and the estimation is repeated many times;
o we will get the expected value of the estimated parameters, which will be equal to the true
population parameters.
0 )=𝛽
E(𝛽መ 0 ; 1 1

E(𝛽መ )=𝛽
Variances of OLS estimators:
Estimating the variance of the error term:
�ෝ
From a population regression model: 𝑦 = 𝛽0 + 𝛽1𝑥+ 𝒖, 𝒖 is the error term
𝟐 have a sample regression model: 𝑦𝑖 = 𝛽0መ 1 𝑖 + 𝑖 representing all
we
unobservables.
σ 𝑛𝑖=1 ��ෝ 𝑖 = 𝑬
𝛽መ 𝑥��ෝ 𝑖 += 0; , where ��ෝ is an
�ො
Recall�ෝ = 𝑦residual
that𝒊 the 𝑖 − ��ො (𝒖෢) can be seen as an estimate of . errorestimate
𝑖 the term of 𝒖
(𝒖) Now= the
weresidual.
have to obtain an unbiased estimate of variance of the error
term.

 Estimating the error variance or the variance of the error


term:1 �ෝ 𝟐 1
��ෝ 𝑛−𝑘−1 σ 𝑛𝑖=1 ��ො
𝟐
𝑖 𝑛−2 σ 𝑛
𝑖=1 𝑖
2
=
An unbiased estimate of the error variance can be calculated as:
= ��ො 2
, where 𝑘= the number of
 Estimating theindependent error of the regression: �ෝ
standardvariables
��ො =
��ො 2
Note: Under the assumptions 1-5: 𝐸 �ෝ 𝟐 = 𝝈𝟐, which is
an unbiased estimate of the error variance.
��ො measures the average distance
between the observed values and the regression line (the fitted
Estimating the variance of the error term:
1 𝑛 1 ෍ 𝑛 ො
�ෝ ��ෝ
𝟐
�2 = 2𝑖
= ෍ 𝑖=1 𝑛−2 𝑖=1
𝟐 𝑛−𝑘−1 �
ො 𝑖

𝑀෣𝑒𝑎𝑡 = 𝟑𝟓𝟔 + 𝟎. 𝟏𝟒𝟑 𝑰𝒏𝒄𝒐𝒎𝒆


N=11

��ෝ measures the average distance between the observed values and the regression line (the
fitted values).
Estimating the variance of the error term:
�ෝ
𝟐
We have σ 𝑛𝑖=1 𝑖
(Cont)
�ො ��ෝ
1
𝑛𝑖=1 ��ො 2 2
= 4437760/11-
𝑛−𝑘−1 σ 𝑖
=4437760,
𝟐 = where n=11,2=4437760/9=493084.44
k=1 (𝑥)
��ෝ = ��ො 2= 493084.44 =
702.19972

��ෝ indicates that the average difference between observed and


fitted meat consumption values is about 702 thousand VND.

Note:
 The standard error of the regression
(�ෝ ) has the same unit
as the dependent variable.
 ��ෝ also has other names: the standard error of the estimate and the
Variances and standard errors for regression coefficients
𝑉𝑎𝑟(𝛽෢0); 𝑉𝑎𝑟 𝛽෢1 ; 𝑆𝑒 𝛽0መ; 1 ��ෝ
𝑛−𝑘−1
1
𝑖=1 𝑖

(Cont) =
𝟐 𝑛
σ
𝑆𝑒 𝛽መ �ො
2
; ��ො = ��ො2
Variance and standard error for the slope coefficient
𝜷෡𝟏 𝛽1መ = � 2 ; 𝑆𝑒 𝛽መ = 𝑉𝑎𝑟 𝛽መ
𝑉𝑎𝑟 �
𝑖=1 (𝑥 𝑖 −𝑥ҧ)=
�ෝ 𝑛

σ𝑛 1 ෝ
σ 𝑖 = 1 (𝑥 𝑖 −𝑥ҧ)
2
1 2

Variance and standard error for the intercept coefficient


𝜷෡𝟎
��ෝ2𝑛−1𝑖=1 � 𝑥2
𝑉𝑎𝑟 𝛽0መ 𝑛
σ 𝑛 𝑥𝑖2
; 𝑆𝑒 𝛽መ
0 = 𝑉𝑎𝑟 𝛽0መ �
σ 𝑖 = 1 (𝑥 𝑖 −𝑥ҧ) (𝑥 𝑖
σ 𝑛𝑖=1

= 2 = −𝑥ҧ) 2
Note:
 Standard errors are the estimated standard deviations of the regression coefficients.

 They measure how precisely the coefficients are estimated.


Properties of the mean and variances

E(𝛽መ
𝑗 )= 𝑗

True: 𝛽𝑗
Properties of the mean and variances
Exercise:
1. Is the residual the error term? Explain
2. Why do the sum and mean of the residual always equal zero?
3. What happens to the sum and mean of the residual if we exclude the
intercept from the OLS model?
4. What happens to the OLS estimator if the sample is not randomly selected
from a population?
5. What happens to a simple linear regression model if the value of the
explanatory variable is similar for all observations?
6. Suppose our model satisfies SLR assumptions 1–4 but suffers from
heterokedasticity. In this case, are our estimates biased?. What is the
consequence of the heterokedasticity?
7. Comment on the statement that a model with a high R-squared shows a
strongly causal relationship.
8. Which model violates the assumption of the OLS?
𝑌 = 𝛽0 + 𝛽11/𝑋+ 𝑢; (1)
𝑌 = 𝛽0 + 1/𝛽1𝑋+ 𝑢; (2)
𝑌 = 𝛽0 + 𝛽12𝑋 + 𝑢 (3)
Excercise
9.Let Qd denote the quantity of a given product, and let P denote the price of that product. A
simple model is presented that connects quantity demanded to price: 𝑄𝑑 = 𝛽0 + 𝛽1𝑃 + 𝑢
(i) What possible factors are contained in 𝑢? Is it likely that these will be related to price
(ii) Will a simple regression analysis show the ceteris paribus effect of price on quantity demanded? Explain.

10. The following table contains monthly meat consumption per household (thousand VND) and monthly household
income per capita (thousand VND) for
list 20 househol
meat dincom list
s. meat income
e
1 1390 5031 11 1770 4365
2 1320 6491 12 1620 4727
3 2900 4900 13 1460 5067
4 790 3267 14 650 5094
5 1600 5164 15 995 3000
6 2400 3260 16 2900 8208
7 1310 4847 17 1450 3613
8 1690 8395 18 1460 4624
9 1880 6625 19 510 4751
10 1205 2394 20 760 5151

(i)estimate the relationship between the dependent variable (meat consumption) and the independent variable (household
income per capita) using an OLS regression model. Comment on the link between two variables. What is the meaning of the
intercept and slope coefficients?
(ii)How much higher is the level of meat consumption predicted to be if the monthly income per capita is increased by 200
thousand VND?
(iii) Is this true if we say that given a one million VND increase in household income per capita, the value of meat consumption
increases at the same level for all households?
(iv)calculate the fitted values of the dependent variable and the residuals. Do the sum and mean of the residuals equal zero?
What is the average of the fitted values and the observed values of the dependent variables?
(v) Please interpret the R-squared. How much of the variation in meat consumption is unexplained by the regression?
(vi)calculate the standard error of the regression coefficients and the standard error of the regression. What is the unit of analysis
for the standard error of the regression?
11. Using a simple linear regression model, a researcher investigates the dependence of the monthly wage (in thousand VND) on
the number of years of education among wage workers in Hanoi in 2018.
(i) What is the average predicted wage when education equals zero?
(ii) How much does the monthly wage increase if the number of years of education increases from 12 to 16 years?
(iii) Does this model infer a causal relationship between wage and education?
(iv) What percentage of the variance in wages is explained by education?
12. A sample of 11 households with their income and food consumption is given in the table.
Income Food consumption
Thousand VND/per person/month Thousan VND/per person/month

3000 995

8208 2900

3613 1450

4624 1460

4751 510

5151 760

5884 1005

2696 100

2485 912

8860 570

1436 512

Using the OLS estimator, estimate the relationship between the dependent variable (food consumption) and the explanatory variable
( income):
𝐹𝑜𝑜𝑑 = 𝛽0 + 𝛽1𝐼𝑛𝑐𝑜𝑚𝑒+ 𝑢
(i) Using the regression result, please report the marginal propensity to consume food (MPCF) .

(iii)What is the MPCF if the regression model excludes the intercept? 𝐹𝑜𝑜𝑑 = 𝛽1𝐼𝑛𝑐𝑜𝑚𝑒+ 𝑢. (Note, please use
“constant is zero” in excel”) or noconstant in Stata
(iv)Using the result from the model without intercept, calculate the fitted values of the dependent variable and the residuals.
Do the sum and mean of the residuals equal zero? What is the average of the fitted values and the observed values of the
dependent variables?
(v) Does the exlussion of the intercept from the model cause the bias? Explain.

You might also like