0% found this document useful (0 votes)
45 views40 pages

Chapter 2 SLRM

Regression analysis is used to describe the relationship between a dependent variable and one or more independent variables. It aims to estimate or predict the average value of the dependent variable based on the independent variables. Simple linear regression involves one dependent and one independent variable, where the relationship is modeled with a straight line. The regression line is estimated by minimizing the sum of squared residuals, with the goal of making the sample regression function as close as possible to the true population regression function.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views40 pages

Chapter 2 SLRM

Regression analysis is used to describe the relationship between a dependent variable and one or more independent variables. It aims to estimate or predict the average value of the dependent variable based on the independent variables. Simple linear regression involves one dependent and one independent variable, where the relationship is modeled with a straight line. The regression line is estimated by minimizing the sum of squared residuals, with the goal of making the sample regression function as close as possible to the true population regression function.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Chapter 2

A brief overview of the


classical linear regression model

1
Regression

• Regression is probably the single most important tool at the


econometrician’s disposal.

But what is regression analysis?


• It is concerned with describing and evaluating the relationship
between a given variable (usually called the dependent variable) and
one or more other variables (usually known as the independent
variable(s)).
• Regression analysis is concerned with the study of the dependence of
one variable, the dependent variable, on one or more other variables,
the explanatory variables, with a view to estimating and/or predicting
the (population) mean or average value of the dependent variable in
terms of the known or fixed (in repeated sampling) values of the
latter.
Regression vs. correlation

• In correlation analysis, the primary objective is to measure


the strength or degree of linear association between two
variables.
• The coefficient measures this strength of (linear) association.
– Smoking vs. lung cancer,
– Advertisement expenditure vs. sales volume etc
• There is no distinction between the dependent and
explanatory variables. However,
• Regression analysis try to estimate or predict the average
value of one variable (dependent variable) on the basis of the
fixed values of other variables
Simple Linear regression (SLR)

• SLR is some times called two variables (bivariate) regression


(one variable as independent and the other variable as
dependent).
• It assumes that there is one affecting variable and one
dependent (affected) variable.
• It is the technique used to develop the equation for the straight
line and predict the value.
• Linear: the relationship between the two variables is straight
line (negatively or positively).
 Simple linear equation: Y= β0 + β1X
 Simple linear regression model: Y= β0 + β1X + u
NB: the two variables are Y (dependent) & X ( independent)
SLR…

• This is the situation where y depends on only one x


variable.
• Examples of the kind of relationship that may be of
interest include:
– How asset returns vary with their level of market
risk
– Measuring the long-term relationship between
stock prices and dividends.
– Constructing an optimal hedge ratio
The Meaning Linearity

1. Linear both in the parameters and the variables


Y= β0 + β1X
2. Linear in the parameters but non-linear in the Variables
Y= β0 + β1X2
3. Non-linear in the parameters
Y= β0 + β12X or Y= β0 + β12X2
NB: Linear regression means a regression that is linear in
the parameters. Equations number 1 & 2 above are linear
A parameter is linear if not divided, squared, and multiplied
etc
Some Notation

• If Y is dependent variable affected by X independent variable; the following


alternative names for the y and x variables can be used:
y x
dependent variable independent variables
regressand regressor
effect variable causal variables
explained variable explanatory variable
endogenous exogenous
response stimulus/control
predictand predictor
SLR Model explained
Y= β0 + β1X + u
β0 + β1X = symmetric ( explained ) aspect of Y
u = unsymmetrical (unexplained) factors that affect Y
Called error term, disturbance term or residual
 It accounts for unobserved impact on Y
 It account for all factors other than X that affect Y
Why it is not introduce these variables into the model explicitly?
we usually ignore some variables due to
• Vagueness of theory
• Unavailability of data or not core variables
Why do we include a Disturbance term?
• The disturbance term can capture a number of features:
- We always leave out some determinants of yt
- There may be errors in the measurement of yt that cannot be
modelled.
- Random outside influences on yt which we cannot model

In regression analysis our interest is in estimating the values of the unknowns β0 and
β1 on the basis of observations on y and x.
- Y value is then estimated based on the estimated value of β0 and β1
β1 is the SLOPE PARAMETER. It captures ceteris paribus effect of x on Y
Δy = β1ΔX, if Δ u = 0

For example, if β1 =3, a 2 unit increase in x would cause a 6 unit


change in y (2 x 3 = 6)
if x and y are positively (negatively) correlated, β1 will be positive
(negative)
B0 is the INTERCEPT PARAMETER or CONSTANT TERM
Determining the Regression Coefficients

• So how do we determine what  0 and 1 are?


• Choose 0 and 1 so that the (vertical) distances from the data points to
the fitted lines are minimised (so that the line fits the data as closely as
possible):
• This graph can depict how 0 and 1 y

in addition to the OLS computations shown


in the following example.

x
SLR Model…

Example
• Assume that a sample of 10 sales persons are taken from sales department.
Sales calls and unit sales data are observed for the sample.
• Assume that the relationship between unit sales (Y) and sales calls (X) is
linear. Y= β0 + β1X sample x Y
1 14 28
2 35 66
3 22 38
In this example
4 29 70
n ( sample size) = 10
5 6 22
Mean of X, (Ⴟ) = 199/10 = 19.9
6 15 27
Mean of Y, (ӯ) = 408/10 = 40.8
7 17 28
8 20 47
9 12 14
10 29 68
N= 10 ∑X= 199 ∑Y= 408
Example…

(X-Ⴟ) (X-Ⴟ)2 (Y- ) (X-Ⴟ) (Y-)


-5.9 34.81 -12.8 75.52
15.1 228.01 25.2 380.52
2.1 4.41 -2.8 -5.88
9.1 82.81 29.2 265.72
-13.9 193.21 -18.8 261.32
-4.9 24.01 -13.8 67.62
-2.9 8.41 -12.8 37.12
0.1 0.01 6.2 0.62
-7.9 62.41 -26.8 211.72
9.1 82.81 27.2 247.52
720.9 1,541.80
Example…

β1 = 1541.80/ 720.90 = 2.14


β0 = 40.8- (2.14x19.9) = -1.716
Therefore, Y= -1.716 + 2.14X
Estimated Y is represented by Ŷ (Y hat) and actual Y is represented by Y
thus Ŷ = -1.716 + 2.14X
For X value of 14, Ŷ is 28.1817 (-1.716+ 2.14x14) slightly different from
actual Y of 28. this difference is due to the error term indicating that
there are other factors not explained represented by U (residual).
The following table represents the estimated Y for all observations of X
Example…

sample x Y XY Ŷ
1 14 28 392 28.181700
2 35 66 2310 73.094400
3 22 38 836 45.291300
4 29 70 2030 60.262200
5 6 22 132 11.072100
6 15 27 405 30.320400
7 17 28 476 34.597800
8 20 47 940 41.013900
9 12 14 168 23.904300
10 29 68 1972 60.262200

As you can see in the above table, the estimated Y, (Ŷ) for every
given value of X is slightly different from the actual Y. These
differences can be represented by u (residual).
Residual means that there are remaining factors other than X that
are not explained (not modeled) in the equation.
Population regression function Vs
Sample regression function
The population regression function (PRF) or DGP (data
generating Process) is an idealized concept. In practice one
rarely has access to the entire population of interest.
PRF is a description of the model that is thought to be
generating the actual data and the true relationship between the
variables (i.e. the true values of β0 and 1).
The primary objective in regression analysis is to estimate the
PRF on the basis of the sample regression function (SRF) as
accurately as possible.
How should the SRF be constructed so that β0 is as close as
possible to the true β0 and β1 is as close as possible to the
true β1 even though we will never know the true β0 and β1?
PRF Vs SRF

PRF 𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖 + 𝑢𝑖
ˆ t  ˆ  ˆxt  u
y
SRF
Errors come to occur when estimating population
parameters based on the parameters measured for sample
data.
The objective is to determine the SRF in such a manner
that it is as close as possible to the actual Y.
This is done by making the sum of squared residuals of SRF
as small as possible, hence the method is OLS. The sum of
the squared residuals is also some function of the
estimators of SRF.
Ordinary Least Squares (OLS)
• The most common method used to fit a straight line to the data is known as
OLS (ordinary least squares).
 It is one of the most powerful and popular methods of regression analysis
 We use OLS method to determine the estimated value of β0 , β1 and error
term ût
 Least square stands for minimum sum of square error (SSE)
• What we actually do is take each distance and square it (i.e. take the area of
each of the squares in the diagram) and minimise the total sum of the
squares (hence least squares).
• Tightening up the notation, let
yt denote the actual data point t
ŷt denote the fitted value from the regression line
ût ŷ
denote the residual, yt - t
Actual and Fitted Value
y

yi
 The values through the straight line are
the estimated value of Y for give values
of X ûi
 The point indicated through the line is
ŷ i
the estimated Y, but the point indicated
above the straight line is the actual Y.
 The difference between the two points
(error) is indicated by residual
 Thus OLS method is a method used to
minimize the sum of squared values of
errors.
xi x
Estimator or Estimate?

• Estimators are the formulae used to calculate the coefficients

• Estimates are the actual numerical values for the coefficients.


• We use the SRF to infer to the value of PRF.
Numerical properties of the estimators

• The OLS estimators are expressed solely in terms of the sample data
• They are point estimators; that is, given the sample, each estimator will
provide only a single (point) value of the relevant population parameter.
• Properties of the regression line include:
• The mean value of the estimated Y is equal to the mean value of the
actual Y
• The mean value of the residuals is zero.
• The residuals ui are uncorrelated with the predicted Yi. [cov(ui, Yi)=0]
• The residuals ui are uncorrelated with Xi. [cov(ui, Xi)=0]
• The line passes through the sample means of Y and X
• The OLS line ensures the residuals that are equal in magnitude are
given equal weight. Consider the two residuals –4 and 4. In both of
these observations, the estimated y-value is equal distance from the
observed y-value, 4 units. It just happens you overestimated y in the
first case and underestimated y in the second case.
How OLS Works
5

• So min uˆ12  uˆ 22  uˆ32  uˆ 42 ,uˆor


2 .uˆ 2
t
5 minimise t 1
• This is known as the residual sum of squares.
• But what was ût ? It was the difference between the actual point and the

line, yt - . t
 y  yˆ t 
2
t
• So minimising is equivalent
 uˆt2 to minimising
with respect to 0 and 1
• In order to use OLS, we need a model which is linear in the parameters
(0 and 1 ). It does not necessarily have to be linear in the variables (y
and x).

• Linear in the parameters means that the parameters are not multiplied
together, divided,
Continuing withsquared or cubed etc.example given for X (sales
the preceding
calls) & Y (sales in units): see the following computation

for  t t y  ˆ
y 2
Example…
Y Ŷ (Y-Ŷ) (Y-Ŷ)2
Assumption about the error 28 28.181700 -0.1817 0.033015
1. It is a random variable with a mean or 66 73.094400 -7.0944 50.33051
expected value of zero,
38 45.291300 -7.2913 53.16306
2. The variance of u ( δ2) is the same for all
value of x, which means that the variance of 70 60.262200 9.7378 94.82475
Y about the regression line equals δ2 and 22 11.072100 10.9279 119.419
the same for all values of X 27 30.320400 -3.3204 11.02506
3. The value of error is independent. The 28 34.597800 -6.5978 43.53096
value of error for a particular value of X is
47 41.013900 5.9861 35.83339
not related to the value of error for any
other value of X, thus the value of Y for a 14 23.904300 -9.9043 98.09516
particular value of X is not related to the 68 60.262200 7.7378 59.87355
value of Y for any other value of X.
4. the error is a normally distributed random 408.00 (0.00) 566.13
variable. Because Y is a linear function of
error, Y is also a normally distributed random
variable.
Residuals and Goodness-of-fit
• Residuals: Residuals in Linear Regression are the
difference between the actual value and the predicted
value.
• Goodness of fit (GOF): Goodness of fit measures for
linear regression are attempts to understand how well a
model fits a given set of data.
• A well-fitting regression model results in predicted values
close to the observed data values.
• Models almost never describe the process that
generated a dataset exactly
• Models approximate reality. However, even models that
approximate reality can be used to draw useful
inferences or to prediction future observations
GOF…
TOTAL SUM OF SQUARES (TSS) is the total variation of
estimated Y from the actual Y. The total variation is
divided in to two:
1. That explained by regression (explained by
independent variable) with degree of freedom of n-
1. There is one independent variable only.
this is sum of square regression(SSR)
2. The error, or unexplained variation. With degree of
freedom of n-2, parameters estimated
this is called sum of square error (SSE)
SSR = TSS –SSE
GOF…
Estimating error σ 2
Recall that the deviations of the Y values about the estimated
regression line are called residuals.
 Sum square error (SSE): error2 due to residuals
  yt  yˆ t 
thus SSE =
The mean square error (MSE) provides the estimate of δ2. It is the
SSE divided by the degree of freedom.

 From the preceding example we can compute MSE as


SSE = 566.13 MSE = 566.13/10-2 = 70.766
σ= 70.766 = 8.41
= 8.41 is referred to as the standard error of estimate
 Degree of freedom is 2 (two variables X &Y).
The smaller the σthe closer the estimated Y to the actual Y.
GOF…
R2 or Coefficient of Determination:
R² measures something related to the explained variance.
Thus it can be obtained as one minus fraction of variable
unexplained. So for example if R² = 0.89, then 89% of the total
variation in y can be explained by the linear relationship
between features and y and the 11% is unexplained.

R² ranges from 0 to 1. zero indicates that the model doesn’t


improve the prediction & 1 indicates perfect prediction.

The square of the correlation r describes the strength of a


straight-line relationship.
GOF…
Compute R2
SST = SSR+SSE Y (Y-ӯ) (y-ӯ)2
Thus, SSR = SST- SSE
28 -12.8 163.84
66 25.2 635.04
SSE is computed previously. 38 -2.8 7.84
SSE = 566.13 70 29.2 852.64
SST = 3863.60 22 -18.8 353.44
SSR = 3,863.60-566.13 = 3,397.47 27 -13.8 190.44
28 -12.8 163.84
R2 = SSR/SST 47 6.2 38.44
R2 =3397.47/3863.60 = 87.9% 14 -26.8 718.24
68 27.2 739.84
87.9% of the total variation in y can be
08.00 SST = 3,863.60
explained by the linear relationship
between X and y and the 12.1 % is
unexplained.
Correlation Coefficient

• Correlation coefficient (r) is a descriptive measure of the


strength of linear association between two variables, x and y.
• Values of the correlation coefficient are always between -1
and 1.
• A value of 1 indicates that the two variables x and y are
perfectly related in a positive linear sense. That is, all data
points are on a straight line that has a positive slope.
• A value of -1 indicates that x and y are perfectly related in a
negative linear sense, with all data points on a straight line
that has a negative slope.
• Values of the correlation coefficient close to zero indicate that
x and y are not linearly related.
• r= root of r2 thus, r = √0.879 = 0.9375
R2…

• Here is the basic idea.


• Think about trying to predict a new value of y. With no other information
than our sample of values of y, a reasonable choice is ӯ.
• Now consider how your prediction would change if you had an explanatory
variable.
• If we use the regression equation to predict, we would use Ŷ= b0 +b1x. This
prediction takes into account the value of the explanatory variable x.
• Let’s compare our two choices for predicting y. With the explanatory variable
x, we use Ŷ; without this information, we use ӯ. How can we compare these
two choices?
• When we use ӯ to predict, our prediction error is y − ӯ.
• If, instead, we use Ŷ, our prediction error is y − ŷ.
• The use of x in our prediction changes our prediction error from is y − ӯ to y −
ŷ.
• The difference is ŷ − ӯ. Our comparison uses the sums of squares of these
differences
• ∑(ӯ − ŷ)2 and ∑(y − ӯ )2.
• The ratio of these two quantities is the square of the correlation:
The Assumptions Underlying the
Classical Linear Regression Model (CLRM)
• In order to achieve a ceteris paribus analysis of x’s affect on y and for valid
interpretation of the regression estimates, we need assumptions about the Xi
variable(s) and the error term.
• The model which we have used is known as the classical linear regression model.
• We observe data for xt, but since yt also depends on ut, we must be specific about
how the ut are generated.
• We usually make the following set of assumptions about the ut’s (the unobservable
error terms):
• Technical Notation Interpretation
1. E(ut) = 0 The errors have zero mean
2. Var (ut) = 2 The variance of the errors is constant and finite
over all values of xt
3. Cov (ui,uj)=0 The errors are statistically independent of
one another
4. Cov (ut,xt)=0 No relationship between the error and
corresponding x variante
The Assumptions Underlying the
CLRM Again
• An alternative assumption to 4., which is slightly stronger, is that the xt’s are non-
stochastic or fixed in repeated samples.
• A fifth assumption is required if we want to make inferences about the
population parameters (the actual  and ) from the sample parameters ( and
)  
• Additional Assumption
5. ut is normally distributed

Stochastic: means that probable (randomness) in the occurrence of events &


predicted in statistical approaches but can’t be done precisely. Eg. The number of
phone call at customer center. In regression dependent variable are stochastic.
Non-stochastic: explanatory variables are non-stochastic (FIXED)
Properties of the OLS Estimator:
Gauss-Markov theorem

• If assumptions 1. through 4. hold, then the estimators  and  determined by


OLS are known as Best Linear Unbiased Estimators (BLUE).
What does the acronym stand for?

• “Estimator” - is an estimator of the true value of .



• “Linear” - is a linear estimator

• “Unbiased” - On average, the actual value of the and ’s will be
equal to the true values.  
• “Best” - means that the OLS estimator has
 minimum variance among

the class of linear unbiased estimators. The Gauss-Markov
theorem proves that the OLS estimator is best.
Consistency/Unbiasedness/Efficiency

• Consistent
The least squares estimators and  areconsistent. That is, the estimates will converge
to their true values as the sample size increases to infinity. Need the assumptions
E(xtut)=0 and Var(ut)=2 <  to prove this. Consistency implies that

• Unbiased T 
 
lim Pr ˆ      0   0

The least squares estimates of 



and are unbiased. That is E( )=and E( )=
Thus on average the estimated value will be equal to the true values. To prove this
also requires the assumption that E(ut)=0. Unbiasedness is a stronger condition than
consistency.

• Efficiency

An estimator  of parameter  is said to be efficient if it is unbiased and no other
unbiased estimator has a smaller variance. If the estimator is efficient, we are
minimising the probability that it is a long way off from the true value of .
Hypothesis & Testing for significance
In the linear equation, If 𝛽1 is 0, the estimated value of Y
equals 𝛽0. This mean that the value of Y doesn’t depend on
the value of X and hence we would conclude that X and Y
are not linearly related. In another way, if the value of 𝛽1 is
not 0, we conclude that the two variable are related.
Thus, to test for a significant regression relationship, we
must conduct hypothesis test to determine whether the
value of 𝛽1 is zero.
Hypothesis testing is used to confirm if the estimated
regression coefficients bear any statistical significance.
Either the confidence interval approach or the t-test
approach can be used in hypothesis testing.
t-test
• The standard error of estimate determined
previously is used to tests for a significant
relationship between x & y.
• The purpose of t-test is to see whether we can
conclude that 𝛽1 ≠ 0. we use sample data to test
the following hypothesis about 𝛽1 ≠ 0 .
Ho: 𝛽1 = 0
H1: 𝛽1 ≠ 0
If Ho is rejected we conclude that 𝛽1 ≠ 0 and say
that there is significant relationship between X & Y.
t-test…

• consider what would happen if we used a different random sample for the same
regression study.
• A regression analysis of this new sample might result in an estimated regression
equation similar to our previous estimated regression equation Ŷ= -1.716 + 2.14X
• However, it is doubtful that we would obtain exactly the same equation (with an
intercept of exactly -1.716 and a slope of exactly 2.14).
• Indeed, b0 and b1, the least squares estimators, are sample statistics with their own
sampling distributions.
• The properties of the sampling distribution of b1 follow.
Expected Value E(b1)
E(b1)= 𝛽1
Standard Deviation (σb1)
σb1 = σ /
Note that the expected value of b1 is equal to 𝛽1 , so b1 is an unbiased estimator of 𝛽1 .
Because we do not know the value of σ, we develop an estimate of σb1 , denoted , by
estimating σ with s in equation. Thus, we obtain the following estimate of σ .
σb1 = s /
t-test…
s or σ = 8.41 x (X-Ⴟ) (X-Ⴟ)2
14 -5.9 34.81
∑(X-Ⴟ)2 = 720.9
35 15.1 228.01
22 2.1 4.41
σb1 = 8.41 / 29 9.1 82.81
6 -13.9 193.21
σb1 or s = 0.313
15 -4.9 24.01
17 -2.9 8.41
The t test for a significant relationship is 20 0.1 0.01
based on the fact that the test statistic 12 -7.9 62.41
29 9.1 82.81
199.00 720.9
• Follows a t distribution with n-2 degrees
of freedom.
• If the null hypothesis is true, then = 0 and
• = 6.8
t = b1/s . t= =
t=
which means, t= =
Let us conduct this test of significance
• The p-value: or probability value, tells you how likely it is that
your data could have occurred under the null hypothesis. It
does this by calculating the likelihood of your test statistic
which is the number calculated by a statistical test (t) using
your data.
• Significance level(alpha) α : is used to refer to a pre-chosen
probability and the term “P value” used to indicate a
probability that you calculate after a given study.
• If your P-Value is less than the chosen significance level then
you reject the null hypothesis.
• Most common significance level is 0.05 (95% confidence).
• p-value thus does not provide much support for the null
hypothesis. The importance depends upon the level of
significance for the test.
Reject H0 if p-value ≤ α

You might also like