Simple Linear Regression
Abhishek Dureja
Teaching Assistant: Anshika Arora
Plaksha University
Linear Regression
Conditional Expectations Function
▶ As Econometricians, we try to summarize and explain various
economic relationships and outcomes
▶ In particular, given 2 variables Y and X from population, we are
interested in:
i Explaining Y in terms of X
ii Studying how Y varies with changes in X
▶ We are eventually trying to use variation in x to explain variation
in y
2 / 31
Linear Regression
Conditional Expectations Function
▶ Example I: Y is hourly wage and X is years of education
=⇒ We are trying to explain how wage rate changes with years
of education
▶ Example II: Y is consumption and X is income
=⇒ We are trying to explain how consumption varies income
3 / 31
Linear Regression
Conditional Expectations Function
▶ We want to explain variation in y using variation in x
▶ The important question is: How can we do this?
▶ The conditional expectation function (CEF) provides a very
good means to summarize relationships between 2 variables
▶ Let us understand the idea and use of a conditional expectations
function in estimating the relationship between Y and X using
example II
4 / 31
Regression Analysis
Conditional Expectation Function
▶ Assume a population (not sample) of 60 families
▶ Information on weekly consumption and income recorded
▶ We want to understand how consumption (Y) varies with income
(X)
▶ To understand how consumption (Y) varies with income (X) we
will make use of the conditional expectations function
5 / 31
Regression Analysis
Conditional Expectation Function
6 / 31
Regression Analysis
Conditional Expectation Function
▶ The following gives you the scatter plot of the population data
Scatterplot of Weekly Income vs. Consumption Expenditure
180
Weekly Consumption Expenditure ($)
160
140
120
100
80
60
75 100 125 150 175 200 225 250
Weekly Income ($)
7 / 31
Linear Regression
Conditional Expectations Function
▶ The important question is how do we explain Y in terms of X
▶ We know one thing for sure:
→ We need to explain Yi in terms of Xi such that the most of
the variation in Yi is explained by variation of Xi
(i is indexing individual i)
=⇒ This is like a fitting problem
▶ Which function of Xi best fits the scatterplot so as to explain
variation in Yi
▶ Let m(Xi ) be any (arbitrary) function of Xi
(More generally, we can also write m(X) as well)
8 / 31
Linear Regression
Conditional Expectations Function
▶ We need to find m(Xi ) such that we are able to explain most
variation in Yi
▶ Mathematically, this is the problem of minimising the mean squared
error
arg min E (Yi − m(Xi ))2
m(Xi )
▶ For any given Xi , m(Xi ) will give us the predicted value of Yi ≡ Ŷ
▶ But actual value is Yi
=⇒ Yi − m(Xi ) is the error
=⇒ E [(Yi − m(Xi ))]2 is the mean-squared error (MSE)
▶ Minimising MSE is our decision criterion
9 / 31
Linear Regression
Conditional Expectations Function
arg min E (Yi − m(Xi ))2
m(Xi )
▶ Hence, we need to find a functional form of m(Xi ) that minimises
the mean squared error
▶ The functional form, i.e. m(Xi ), that minimises the mean
squared error is nothing but the conditional expectations
function (CEF)
▶ Mathematically,
E [Yi | Xi ] = arg min E (Yi − m(Xi ))2 ,
m(Xi )
10 / 31
Linear Regression
Conditional Expectations Function
CEF-Prediction Property
▶ Let m(Xi ) be any function of Xi
▶ The Conditional Expectation Function (CEF) solves
E [Yi | Xi ] = arg min E (Yi − m(Xi ))2 ,
m(Xi )
=⇒ CEF is the minimum mean square error (MMSE)
predictor of Yi given Xi
▶ E (Yi − m(Xi ))2 is minimized when m(Xi ) = E (Yi |Xi )
▶ We are not doing the proof of this
11 / 31
Linear Regression
Conditional Expectations Function
▶ Now we theoretically know that CEF has a very strong predictive
power
▶ CEF can help us to summarise relationships between 2
variables: Y and X
▶ Let us now return to example II and see how well CEF does in
summarising the consumption-income relationship
12 / 31
Regression Analysis
Conditional Expectation Function
▶ Since conditional expectation is the mean value of Y (dependent
variable) conditional on/ given X (independent variable)
=⇒ Let us divide the population into 10 income groups
=⇒ We are dividing the population based on the value of
independent variables
▶ Let us now compute the mean consumption conditional on each
value of income
13 / 31
Regression Analysis
Conditional Expectation Function
▶ The last row provides the conditional mean of Y
14 / 31
Regression Analysis
Conditional Expectation Function
15 / 31
Linear Regression
Conditional Expectations Function
CEF for years of schooling and wage rate
▶ The following figure plots the CEF for wage and years of education
16 / 31
Linear Regression and Causality
Economic Relationships and the Conditional Expectation Function
▶ The figure plots the CEF of log weekly wages given schooling for
a sample of middle-aged men
(implying those who have completed education)
▶ The distribution of earnings is also plotted for a few key values: 4,
8, 12, and 16 years of schooling
▶ The CEF in the figure reflects an important fact
▶ Despite enormous variation in individual circumstances, people
with more schooling generally earn more, on average
=⇒ CEF is able to summarize the relationship between earnings and
years of schooling
17 / 31
Linear Regression and Causality
Economic Relationships and the Conditional Expectation Function
▶ The properties of CEF are central to the linear regression
The CEF-Decomposition Property
▶ The CEF-Decomposition Property states that Yi can be decomposed
as
Yi = E [Yi |Xi ] + ϵi
where ϵi is mean independent of Xi i.e. E [ϵi |Xi ] = 0
=⇒ ϵi is uncorrelated with any function of Xi
18 / 31
Regression Analysis
Economic Relationships and the Conditional Expectation Function
▶ Given Xi , E [Y |Xi ] is fixed; but Yi varies
▶ Yi can be written as:
Yi = E [Y |Xi ] + ϵi
▶ E [Y |Xi ] is the mean value of Y given Xi
▶ ϵi is stochastic disturbance or stochastic error term
▶ ϵi represents the effect of all excluded/omitted explanatory variables
that affect Y
▶ For many reasons, it is not possible to include these variables:
i Unavailability of data
ii Peripheral variables: Joint influence of these variables may be very
small
iii Intrinsic randomness in human behavior
19 / 31
Linear Regression
Population Regression Function
▶ To estimate the relationship between Y and X
=⇒ We can use the CEF function
▶ The next important question is: How can we estimate the CEF?
▶ We cannot by hand join all the conditional means
▶ We need a way to estimate the CEF function
20 / 31
Linear Regression
Population Regression Function
▶ The important question to ask is:
→ How to estimate the CEF?
▶ If the joint distribution of (Y,X) is bivariate normal
=⇒ CEF is linear in X (independent variable)
=⇒ Conditional expectation of Y can be written as a linear
function of X
=⇒ E [Y |X ] = β0 + β1 X
21 / 31
Linear Regression
Population Regression Function
=⇒ E [Y |X ] = β0 + β1 X
▶ The assumption that the joint distribution of (Y,X) is normal may
be strong
▶ The above relationship will still hold if the conditional distribution of
Y |X is normal
▶ How does the normality of Y |X look like?
22 / 31
Linear Regression
Normality of Y |Xi
23 / 31
Linear Regression
Population Regression Function
▶ If the conditional distribution of Y |Xi follows a Normal distribution,
then CEF can be written as
E [Y |Xi ] = β0 + β1 Xi
▶ Using the CEF decomposition property:
Yi = E [Y |Xi ] + ui
=⇒ Yi = β0 + β1 Xi + ui
▶ More generally:
=⇒ Y = β0 + β1 X + u
24 / 31
Linear Regression
Conditional Expectations Function
=⇒ Y = β0 + β1 X + u
▶ The above relationship is known as the population regression
function (PRF)
▶ The true relationship between Y and X in the population is
known as the population regression function (PRF)
▶ PRF represents the true functional form relationship between Y and
X
25 / 31
Linear Regression
Conditional Expectations Function
▶ The PRF is given as
=⇒ Y = β1 + β2 X + u
▶ PRF represents the true relationship between Y and X in the
population
▶ We have assumed that the PRF (true relationship) between Y and
X is linear in parameters (β0 and β1 ) and variable X
26 / 31
Linear Regression
Conditional Expectations Function
=⇒ Y = β1 + β2 X + u
▶ Even if the Y |X is not normally distributed, we can still estimate
CEF using the above relationship
▶ The reason is that (linear) PRF can then be seen as a (best)
linear estimation of CEF
27 / 31
Linear Regression
Conditional Expectations Function
PRF estimation of CEF
28 / 31
Linear Regression
Conditional Expectations Function
▶ The dark line represents the CEF, which captures the relationship
between weekly earnings and years of education
▶ The dotted line represents the population regression line
▶ The regression line fits the somewhat bumpy and nonlinear
CEF
▶ Even though the regression line is a model for Yi
=⇒ Regression line fits the nonlinear CEF as if we were estimating a
model for the CEF i.e. E [Yi |Xi ]
=⇒ We will use linear regression to model relationship
between Y and X
=⇒ Linear regression is used to model the conditional
expectations function
29 / 31
Linear Regression
Population Regression Function
▶ Now, given our discussion, we know two things:
i CEF, i.e. E [Y |X ], is the best predictor of Y as it minimises the
MMSE
ii True relationship between Y and X is given by the PRF:
Y = β0 + β1 X + u
30 / 31
Linear Regression
Conditional Expectations Function
▶ We have assumed that the true relationship between Y and X in the
population is given as follows
Y = β0 + β1 X + u
where
i Y is the dependent variable
ii X is the independent variable
iii u is called the error term or disturbance in the relationship
▶ u represents factors other than X that affect Y
31 / 31