0% found this document useful (0 votes)
10 views31 pages

Lecture 3 - Conditional Expectation Function

The document discusses Simple Linear Regression and the Conditional Expectations Function (CEF) as a method to explain the relationship between two variables, Y and X. It emphasizes the importance of minimizing the mean squared error to find the best-fitting function that explains variation in Y based on changes in X. The document also outlines the Population Regression Function (PRF) and its role in estimating the CEF, highlighting that the true relationship between Y and X is linear in parameters.

Uploaded by

aatmikajain448
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views31 pages

Lecture 3 - Conditional Expectation Function

The document discusses Simple Linear Regression and the Conditional Expectations Function (CEF) as a method to explain the relationship between two variables, Y and X. It emphasizes the importance of minimizing the mean squared error to find the best-fitting function that explains variation in Y based on changes in X. The document also outlines the Population Regression Function (PRF) and its role in estimating the CEF, highlighting that the true relationship between Y and X is linear in parameters.

Uploaded by

aatmikajain448
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Simple Linear Regression

Abhishek Dureja

Teaching Assistant: Anshika Arora

Plaksha University
Linear Regression
Conditional Expectations Function

▶ As Econometricians, we try to summarize and explain various


economic relationships and outcomes
▶ In particular, given 2 variables Y and X from population, we are
interested in:

i Explaining Y in terms of X

ii Studying how Y varies with changes in X

▶ We are eventually trying to use variation in x to explain variation


in y

2 / 31
Linear Regression
Conditional Expectations Function

▶ Example I: Y is hourly wage and X is years of education

=⇒ We are trying to explain how wage rate changes with years


of education

▶ Example II: Y is consumption and X is income

=⇒ We are trying to explain how consumption varies income

3 / 31
Linear Regression
Conditional Expectations Function

▶ We want to explain variation in y using variation in x

▶ The important question is: How can we do this?

▶ The conditional expectation function (CEF) provides a very


good means to summarize relationships between 2 variables

▶ Let us understand the idea and use of a conditional expectations


function in estimating the relationship between Y and X using
example II

4 / 31
Regression Analysis
Conditional Expectation Function

▶ Assume a population (not sample) of 60 families

▶ Information on weekly consumption and income recorded

▶ We want to understand how consumption (Y) varies with income


(X)

▶ To understand how consumption (Y) varies with income (X) we


will make use of the conditional expectations function

5 / 31
Regression Analysis
Conditional Expectation Function

6 / 31
Regression Analysis
Conditional Expectation Function

▶ The following gives you the scatter plot of the population data

Scatterplot of Weekly Income vs. Consumption Expenditure

180
Weekly Consumption Expenditure ($)

160

140

120

100

80

60

75 100 125 150 175 200 225 250


Weekly Income ($)

7 / 31
Linear Regression
Conditional Expectations Function

▶ The important question is how do we explain Y in terms of X

▶ We know one thing for sure:

→ We need to explain Yi in terms of Xi such that the most of


the variation in Yi is explained by variation of Xi
(i is indexing individual i)
=⇒ This is like a fitting problem
▶ Which function of Xi best fits the scatterplot so as to explain
variation in Yi
▶ Let m(Xi ) be any (arbitrary) function of Xi

(More generally, we can also write m(X) as well)

8 / 31
Linear Regression
Conditional Expectations Function

▶ We need to find m(Xi ) such that we are able to explain most


variation in Yi
▶ Mathematically, this is the problem of minimising the mean squared
error
arg min E (Yi − m(Xi ))2
 
m(Xi )

▶ For any given Xi , m(Xi ) will give us the predicted value of Yi ≡ Ŷ

▶ But actual value is Yi

=⇒ Yi − m(Xi ) is the error

=⇒ E [(Yi − m(Xi ))]2 is the mean-squared error (MSE)

▶ Minimising MSE is our decision criterion


9 / 31
Linear Regression
Conditional Expectations Function

arg min E (Yi − m(Xi ))2


 
m(Xi )

▶ Hence, we need to find a functional form of m(Xi ) that minimises


the mean squared error

▶ The functional form, i.e. m(Xi ), that minimises the mean


squared error is nothing but the conditional expectations
function (CEF)

▶ Mathematically,

E [Yi | Xi ] = arg min E (Yi − m(Xi ))2 ,


 
m(Xi )

10 / 31
Linear Regression
Conditional Expectations Function

CEF-Prediction Property
▶ Let m(Xi ) be any function of Xi

▶ The Conditional Expectation Function (CEF) solves

E [Yi | Xi ] = arg min E (Yi − m(Xi ))2 ,


 
m(Xi )

=⇒ CEF is the minimum mean square error (MMSE)


predictor of Yi given Xi
 
▶ E (Yi − m(Xi ))2 is minimized when m(Xi ) = E (Yi |Xi )

▶ We are not doing the proof of this

11 / 31
Linear Regression
Conditional Expectations Function

▶ Now we theoretically know that CEF has a very strong predictive


power

▶ CEF can help us to summarise relationships between 2


variables: Y and X

▶ Let us now return to example II and see how well CEF does in
summarising the consumption-income relationship

12 / 31
Regression Analysis
Conditional Expectation Function

▶ Since conditional expectation is the mean value of Y (dependent


variable) conditional on/ given X (independent variable)

=⇒ Let us divide the population into 10 income groups

=⇒ We are dividing the population based on the value of


independent variables

▶ Let us now compute the mean consumption conditional on each


value of income

13 / 31
Regression Analysis
Conditional Expectation Function

▶ The last row provides the conditional mean of Y

14 / 31
Regression Analysis
Conditional Expectation Function

15 / 31
Linear Regression
Conditional Expectations Function

CEF for years of schooling and wage rate

▶ The following figure plots the CEF for wage and years of education

16 / 31
Linear Regression and Causality
Economic Relationships and the Conditional Expectation Function

▶ The figure plots the CEF of log weekly wages given schooling for
a sample of middle-aged men
(implying those who have completed education)

▶ The distribution of earnings is also plotted for a few key values: 4,


8, 12, and 16 years of schooling

▶ The CEF in the figure reflects an important fact

▶ Despite enormous variation in individual circumstances, people


with more schooling generally earn more, on average

=⇒ CEF is able to summarize the relationship between earnings and


years of schooling

17 / 31
Linear Regression and Causality
Economic Relationships and the Conditional Expectation Function

▶ The properties of CEF are central to the linear regression

The CEF-Decomposition Property

▶ The CEF-Decomposition Property states that Yi can be decomposed


as
Yi = E [Yi |Xi ] + ϵi

where ϵi is mean independent of Xi i.e. E [ϵi |Xi ] = 0

=⇒ ϵi is uncorrelated with any function of Xi

18 / 31
Regression Analysis
Economic Relationships and the Conditional Expectation Function

▶ Given Xi , E [Y |Xi ] is fixed; but Yi varies


▶ Yi can be written as:
Yi = E [Y |Xi ] + ϵi

▶ E [Y |Xi ] is the mean value of Y given Xi


▶ ϵi is stochastic disturbance or stochastic error term
▶ ϵi represents the effect of all excluded/omitted explanatory variables
that affect Y
▶ For many reasons, it is not possible to include these variables:
i Unavailability of data
ii Peripheral variables: Joint influence of these variables may be very
small
iii Intrinsic randomness in human behavior

19 / 31
Linear Regression
Population Regression Function

▶ To estimate the relationship between Y and X

=⇒ We can use the CEF function

▶ The next important question is: How can we estimate the CEF?

▶ We cannot by hand join all the conditional means

▶ We need a way to estimate the CEF function

20 / 31
Linear Regression
Population Regression Function

▶ The important question to ask is:

→ How to estimate the CEF?

▶ If the joint distribution of (Y,X) is bivariate normal

=⇒ CEF is linear in X (independent variable)

=⇒ Conditional expectation of Y can be written as a linear


function of X

=⇒ E [Y |X ] = β0 + β1 X

21 / 31
Linear Regression
Population Regression Function

=⇒ E [Y |X ] = β0 + β1 X

▶ The assumption that the joint distribution of (Y,X) is normal may


be strong

▶ The above relationship will still hold if the conditional distribution of


Y |X is normal

▶ How does the normality of Y |X look like?

22 / 31
Linear Regression
Normality of Y |Xi

23 / 31
Linear Regression
Population Regression Function

▶ If the conditional distribution of Y |Xi follows a Normal distribution,


then CEF can be written as

E [Y |Xi ] = β0 + β1 Xi

▶ Using the CEF decomposition property:

Yi = E [Y |Xi ] + ui

=⇒ Yi = β0 + β1 Xi + ui

▶ More generally:

=⇒ Y = β0 + β1 X + u
24 / 31
Linear Regression
Conditional Expectations Function

=⇒ Y = β0 + β1 X + u

▶ The above relationship is known as the population regression


function (PRF)

▶ The true relationship between Y and X in the population is


known as the population regression function (PRF)

▶ PRF represents the true functional form relationship between Y and


X

25 / 31
Linear Regression
Conditional Expectations Function

▶ The PRF is given as

=⇒ Y = β1 + β2 X + u

▶ PRF represents the true relationship between Y and X in the


population

▶ We have assumed that the PRF (true relationship) between Y and


X is linear in parameters (β0 and β1 ) and variable X

26 / 31
Linear Regression
Conditional Expectations Function

=⇒ Y = β1 + β2 X + u

▶ Even if the Y |X is not normally distributed, we can still estimate


CEF using the above relationship

▶ The reason is that (linear) PRF can then be seen as a (best)


linear estimation of CEF

27 / 31
Linear Regression
Conditional Expectations Function

PRF estimation of CEF

28 / 31
Linear Regression
Conditional Expectations Function

▶ The dark line represents the CEF, which captures the relationship
between weekly earnings and years of education

▶ The dotted line represents the population regression line

▶ The regression line fits the somewhat bumpy and nonlinear


CEF

▶ Even though the regression line is a model for Yi

=⇒ Regression line fits the nonlinear CEF as if we were estimating a


model for the CEF i.e. E [Yi |Xi ]

=⇒ We will use linear regression to model relationship


between Y and X

=⇒ Linear regression is used to model the conditional


expectations function
29 / 31
Linear Regression
Population Regression Function

▶ Now, given our discussion, we know two things:

i CEF, i.e. E [Y |X ], is the best predictor of Y as it minimises the


MMSE

ii True relationship between Y and X is given by the PRF:

Y = β0 + β1 X + u

30 / 31
Linear Regression
Conditional Expectations Function

▶ We have assumed that the true relationship between Y and X in the


population is given as follows

Y = β0 + β1 X + u

where
i Y is the dependent variable

ii X is the independent variable

iii u is called the error term or disturbance in the relationship

▶ u represents factors other than X that affect Y

31 / 31

You might also like