0% found this document useful (0 votes)
13 views24 pages

Pertemuan 2 - Simple Linear Regression

The document discusses the fundamentals of Simple Linear Regression, focusing on the population model, assumptions, and the Ordinary Least Squares (OLS) method. It emphasizes the importance of the error term and the relationship between the independent variable x and the dependent variable y, particularly in terms of estimating population parameters β0 and β1. The document also outlines the derivation of OLS estimates, properties of residuals, and the minimization approach used in regression analysis.

Uploaded by

Neva Q.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views24 pages

Pertemuan 2 - Simple Linear Regression

The document discusses the fundamentals of Simple Linear Regression, focusing on the population model, assumptions, and the Ordinary Least Squares (OLS) method. It emphasizes the importance of the error term and the relationship between the independent variable x and the dependent variable y, particularly in terms of estimating population parameters β0 and β1. The document also outlines the derivation of OLS estimates, properties of residuals, and the minimization approach used in regression analysis.

Uploaded by

Neva Q.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Simple Linear Regression

Meet 2
Population Model
• Cross-sectional analysis
• Assume that sample is collected randomly from the population.
• We want to know how y varies with changes in x.
• What if y is affected by factors other than x.
• What is the functional form.
• How we can distinguish causality from correlation?
• Consider the following model, that hold in the population:
𝑦 = β0 + β1 𝑥 + 𝑢
Population Model
• We allow for other factors to affect y by including u (error term).
• If the other factors in u are held fixed, ∆u = 0, then x has a linear
effect on y.
• Linearity: a one unit change in x has the same effect on y.

• The goal from empirical work is to estimate β0 and β1 (population


parameters).
• β0 and β1 are not directly observable.
• We estimate β0 and β1 using data and ASSUMPTIONS.
A simple assumption
● The average value of u, the error term, in the population is 0: E(u) = 0
● This is not a restrictive assumption, since we can always use 0 to
normalize E(u) to 0.
● Show this!
Zero conditional mean / Mean independence
● We need to make a crucial assumption about how u and x are related
● We want it to be the case that knowing something about x does not
give us any information about u, so that they are completely
unrelated.
● E(u|x) = E(u) = 0, which implies
● E(y|x) = 0 + 1x
● This is the most crucial, challenging assumption for the interpretation
of 1 as a causal parameter.
E(y|x) as a linear function of x, where for any x
the distribution of y is centered about E(y|x)

f(y)

. E(y|x) = 0 + 1x

x1 x2
Ordinary Least Squares
● Basic idea of regression is to estimate the population parameters
from a sample
● Let {(xi,yi): I = 1, …,n} denote a random sample of size n from the
population
● For each observation in this sample, it will be the case that: yi = 0 +
1xi + ui
● ui is unobserved.
Deriving OLS Estimates
● To derive the OLS estimates we need to realize that our
main assumption of E(u|x) = E(u) = 0 also implies that
● Cov(x,u) = E(xu) = 0
● Why? Remember from basic probability that Cov(X,Y) =
E(XY) – E(X)E(Y).
● Derive this!
● We can write our 2 restrictions just in terms of x, y, 0 and
 , since u = y – 0 – 1x
Deriving OLS continued
● We can write our 2 restrictions just in terms of x, y, 0 and
 , since u = y – 0 – 1x
E(y – 0 – 1x) = 0
E[x(y – 0 – 1x)] = 0

●These are called moment restrictions.


෢0 and β
•β ෢1 are the estimates from the data.
More Derivation

Plug β0 into the second equation!


Summary of OLS slope estimate
● The slope estimate is the sample covariance between x and
y divided by the sample variance of x.
● If x and y are positively correlated, the slope will be positive
● If x and y are negatively correlated, the slope will be
negative
● Only need x to vary in our sample
More OLS
● Intuitively, OLS is fitting a line through the sample points
such that the sum of squared residuals is as small as
possible, hence the term least squares.
● The residual, û, is an estimate of the error term, u, and is
the difference between the fitted line (sample regression
function) and the sample point
Sample regression line, sample data points
and the associated estimated error terms
Alternate approach to derivation
● Given the intuitive idea of fitting a line, we can set up a
formal minimization problem
● That is, we want to choose our parameters such that we
minimize the following:
Alternate approach, continued
● If one uses calculus to solve the minimization problem for
the two parameters you obtain the following first order
conditions, which are the same as we obtained before,
multiplied by n


𝑛
෡ ෡
𝑦 −𝛽 −𝛽𝑥 = 0
𝑖 0 1 𝑖
𝑖=1
𝑛

෍𝑥 𝑖 𝑦𝑖 − 𝛽መ0 − 𝛽መ1 𝑥𝑖 =0

𝑖=1
A short simulation
Residuals and fitted values are uncorrelated, by construction!
Algebraic Properties of OLS
● The sum of the OLS residuals is zero, coefficients were
optimally chosen to ensure that the residuals sum to zero.
● Thus, the sample average of the OLS residuals is zero as
well.
● The sample covariance (correlation) between the regressors
and the OLS residuals is zero.
● 𝑥
Because fitted values are linear functions of the 𝑖 , fitted
values and residuals are uncorrelated too.
● The OLS regression line always goes through the mean of
the sample.
● If we plug 𝑥,ҧ we predict 𝑦,
ത that is the point (𝑥, 𝑦)
ത is on the OLS
෢0 + β
regression line: 𝑦ത = β ෢1 𝑥ҧ
Algebraic Properties of OLS
• Residuals sum to zero!
തො since
• 𝑦ത = 𝑦,

• Sample covariance between x and residuals is always zero:

• The fitted values and residuals are uncorrelated too:

• The OLS regression line always goes through the mean of the sample.

You might also like