Simple Regression
Topics
Def. Simple Regression
How does OLS work?
Properties of OLS
Expected values and variance of the OLS
estimators
BLUE estimators
Regression
Regression is concerned with describing and evaluating the
relationship between a given variable (usually called the
Dependent Variable) and one or more other variables (usually
known as the Independent Variable(s)).
Simple regression
– For simplicity, say k=1. This is the situation where y depends on
only one dependent variable (x).
Terminology for Simple Regression
Simple Regression: An Example
Suppose that we have the following data on the excess returns on a
fund manager’s portfolio (“fund XXX”) together with the excess
returns on a market index:
We have some intuition that the beta on this fund is positive, and we
therefore want to find whether there appears to be a relationship
between x and y given the data that we have. The first stage would
be to form a scatter plot of the two variables.
Graph (Scatter Diagram)
45
Excess return on fund XXX
4045
Excess return on fund XXX
3540
3035
2530
2025
20
15
15
10
10
5
5
0
0
00 55 10
10 1515 20 20 25 25
Excess returnon
Excess return onmarket
marketportfolio
portfolio
Finding a Line of Best Fit
We can use the general equation for a straight line,
y = a + bx
to get the line that best “fits” the data.
Is this realistic? No. So what we do is to add a random
disturbance term, u into the equation.
y t = + x t + u t
where t = 1,2,3,4,5
Actual and Fitted Value
Choose intercept and slope so that the (vertical) distances from the
data points to the fitted lines are minimised (so that the line fits the data
as closely as possible):
y
yi y
ût
û i
yt denote the actual data point t
ŷt
ŷi
ŷt denote the fitted value from
the regression line
ût denote the residual, yt - ŷt
xi x
Finding a Line of Best Fit
ordinary least square (OLS)
Choose and so that the (vertical) distances from
the data points to the fitted lines are minimised (so
that the line fits the data as closely as possible):
yt = + xt + ut
What we actually do is take each distance and
square it and minimise the total sum of the squares
(hence least squares).
How Does OLS Work?
5
5, or minimise t .
ˆ
So min. 1
u 2
ˆ
u 2
2
ˆ
u 2
3
ˆ
u 2
4
ˆ
u 2
ˆ
u 2
This is
t 1
known as the residual sum of squares.
y t = + x t + u t
But what was ût ? It was the difference between the
actual point and the line, yt - ŷt .
So minimising t t
y ˆ
y 2
is equivalent to minimising
t with respect to
ˆ
u 2
$ and $ .
Deriving the OLS Estimator
ˆ t ˆ ˆxt , so let L
But y t t
( y ˆ
y ) 2
t
( y ˆ ˆ
x ) 2
t
t i
Want to minimise L with respect to (w.r.t.) $ and $ , so differentiate L
w.r.t. $ and $
L
ˆ
2 ( yt ˆ ˆxt ) 0 (1)
t
L
2 xt ( yt ˆ ˆxt ) 0 (2)
ˆ t
From (1), ( y t ˆ ˆxt ) 0 y t Tˆ ˆ xt 0
t
But y t Ty and xt Tx .
Deriving the OLS Estimator (cont’d)
So we can write Ty Tˆ Tˆx 0 or y ˆ ˆx 0 (3)
From (2), xt ( yt ˆ ˆxt ) 0 (4)
t
From (3), ˆ y ˆx (5)
Substitute into (4) for $ from (5),
xt ( yt y ˆx ˆxt ) 0
t
t t t
x y y x ˆ
x t
x ˆ
t 0
x
2
xt yt Tyx ˆTx 2 ˆ xt 0
2
t
Deriving the OLS Estimator (cont’d)
Rearranging for $ ,
ˆ (Tx 2 xt2 ) Tyx xt yt
So overall we have
ˆ x y Tx y
t t
and ˆ y ˆx
x Tx2
t
2
This method of finding the optimum is known as ordinary least
squares.
What do We Use and For?
In the CAPM example used above, plugging the 5 observations in to
make up the formulae given above would lead to the estimates
$ = -1.74 and $ = 1.64. We would write the fitted line as:
yˆ t 1.74 1.64 x t
Question: If an analyst tells you that she expects the market to yield a
return 20% higher than the risk-free rate next year, what would you
expect the return on fund XXX to be (prediction)?
Solution: We can say that the expected value of y = “-1.74 + 1.64 *
value of x”, so plug x = 20 into the equation to get the expected value
for y:
yˆ i 1.74 1.64 20 31.06
The Population and the Sample
The population is the total collection of all objects or people
to be studied, for example,
Interested in Population of interest
predicting outcome the entire electorate
of an election
A sample is a selection of just some items from the
population.
A random sample is a sample in which each individual item
in the population is equally likely to be drawn.
The PRF and the SRF
The population regression function (PRF) is a description of
the model that is thought to be generating the actual data and
the true relationship between the variables (i.e. the true values
of and ).
The PRF is yt b1 b2 xt ut
The SRF is yˆ t ˆ ˆxt
and we also know that uˆt yt yˆt ,
We use the SRF to infer likely values of the PRF.
We also want to know how “good” our estimates of and
are.
Expected Value and Variances of the OLS Estimators
We observe data for xt, but since yt also depends on ut, we must be
specific about how the ut are generated.
We usually make the following set of assumptions about the ut’s (the
unobservable error terms):
Technical Notation Interpretation
1. E(ut) = 0 The errors have zero mean
2. Var (ut) = 2 The variance of the errors is constant and finite
over all values of xt (Homoskedasticity)
3. Cov (ui,uj)=0 The errors are statistically independent of
one another (Zero Autocorrelation)
4. Cov (ut,xt)=0 No relationship between the error and
corresponding x variate
Homoskedasticity vs. Heteroskedasticity
Homoskedasticity Heteroskedasticity
Properties of the OLS Estimator
If assumptions 1. through 4. hold, then the estimators $ and $
determined by OLS are known as Best Linear Unbiased
Estimators (BLUE).
What does the acronym stand for?
“Estimator” - $ is an estimator of the true value of .
“Linear” - $ is a linear estimator for dependent variable
“Unbiased” - On average, the actual value of the $ and $ ’s will
be equal to the true values.
“Best” - means that the OLS estimator has minimum variance
among the class of linear unbiased estimators. The
Gauss-Markov theorem proves that the OLS
estimator is best.
Properties of the OLS Estimator (cont’)
An alternative assumption to 4., which is slightly stronger, is that the
xt’s are non-stochastic or fixed in repeated samples.
A fifth assumption is required if we want to make inferences about
the population parameters (the actual and ) from the sample
parameters ( $ and $ )
Additional Assumption is needed, if you want to check hypothesis
5. ut is normally distributed
Linearity
In order to use OLS, we need a model which is linear in the
parameters ( and ). It does not necessarily have to be linear in
the variables (y and x).
Linear in the parameters means that the parameters are not
multiplied together, divided, squared or cubed etc.
Some models can be transformed to linear ones by a suitable
substitution or manipulation, e.g. the exponential regression model
Yt e X t eut ln Yt ln X t ut
Then let yt=ln Yt and xt=ln Xt
yt xt ut
Linear and Non-linear Models
This is known as the exponential regression model. Here, the
coefficients can be interpreted as elasticities.
Similarly, if theory suggests that y and x should be inversely related:
yt ut
xt
then the regression can be estimated using OLS by substituting
1
zt
xt
But some models are intrinsically non-linear, e.g.
yt xt ut
Unbiasedness
The least squares estimates of $ and$ are unbiased. That is
E($)= and E($ )=
Thus on average the estimated value will be equal to the true
values. To prove this also requires the assumption that E(ut)=0.
Unbiasedness is a stronger condition than consistency.
Efficiency
An estimator $ of parameter is said to be efficient if it is unbiased
and no other unbiased estimator has a smaller variance. If the
estimator is efficient, we are minimising the probability that it is a
long way off from the true value of .
Consistency
The least squares estimators $ and $ are consistent. That is, the
estimates will converge to their true values as the sample size
increases to infinity. Need the assumptions E(xtut)=0 and
Var(ut)=2 < to prove this. Consistency implies that
lim Pr ˆ 0 0
T
Estimator or Estimate?
Estimators are the formulae used to calculate the
coefficients
Estimates are the actual numerical values for the
coefficients.
Standard Errors of Parameters
Any set of regression estimates of $ and $ are specific to the sample
used in their estimation.
Recall that the estimators of and from the sample parameters ($ and
$
ˆ t 2 t
) are given by x y Nx y
and ˆ y ˆx
xt Nx 2
What we need is some measure of the reliability or precision of the
estimators($ and$). The precision of the estimate is given by its standard
error. Given assumptions 1 - 4 above, then the standard errors can be
shown to be given by
t t ,
2 2
x x
SE (ˆ )
N ( xt x ) 2 N xt2 N 2 x 2
1 1
SE ( ˆ )
( xt x ) 2
t
x 2
Nx 2
where s is the estimated standard deviation of the residuals.
Estimating the Variance of the Disturbance Term
The variance of the random variable ut is given by
Var(ut) = E[(ut)-E(ut)]2
which reduces to Var(ut) = E(ut2)
1
We could estimate this using the average of ut :
2
s
2
N
t
u 2
Unfortunately this is not workable since ut is not observable. We can
use the sample counterpart to ut, which is ût : 1
2
s
2
N
uˆ t
But this estimator is a biased estimator of 2.
An unbiased estimator of is given by
s
uˆ 2
t
N 2
where uˆ 2
t is the residual sum of squares and N is the sample
size.
Questions?