0% found this document useful (0 votes)

27 views19 pages

Chapter Two

Chapter Two discusses Simple Linear Regression, which estimates the relationship between a dependent variable and one or more independent variables. It explains the Population Regression Function (PRF) and the Sample Regression Function (SRF), along with key assumptions of the Classical Linear Regression Model (CLRM) necessary for valid Ordinary Least Squares (OLS) estimators. The chapter also covers the derivation of OLS estimators and their statistical properties.

Uploaded by

demekekindu13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views19 pages

Chapter Two

Uploaded by

demekekindu13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Chapter Two Simple Linear Regression

CHAPTER TWO
SIMPLE LINEAR REGRESSION
2.1 Definition
Regression analysis is the process of estimating the relationship between two or more
variables. In any regression there are dependent variables and explanatory
(independent) variables; hence, regression used to study the dependence of one
variable (dependent variable) over one or more of explanatory (independent) variables.

We regress the dependent variable over the explanatory variables (Regress Y on X); and
we estimate or predict the expected/mean value of the dependent variable in terms of
the known (fixed) values of the independent variables.

Regression analysis concerned with the statistical dependence, not functional or

deterministic relationship, among variables. The dependent variable is assumed
random or stochastic with certain pattern of probability distributions. The expected
value of the dependent variable for a given fixed value of the independent variable is a
function of the independent variable. If the econometric model is given as:

The Population Regression Function (PRF) which shows the Conditional Expected value
of the dependent variable (conditional upon X, the independent variable) is given as:

( )
Where: ( ): conditional mean of Y at a given value of X.
Y: the dependent and X is independent variables
: are regression coefficients. : is the Intercept Coefficient and : is
the Slope Coefficients.

Suppose the relation between household consumption expenditure and income; where
consumption expenditure is (Y) dependent variable while income (X) is the explanatory
variable. That is, consumption expenditure increases as income increases.

Y is the actual/observed consumption expenditure; however, the expected consumption

expenditure for any given household is given as the conditional mean of Y at X:
( ) - PRF that shows the conditional mean of Y

1
Chapter Two Simple Linear Regression

Suppose we have data on ( ) and X as follows:

A B C D E
( ) 70 80 95 120 140
X 100 120 140 160 180

Using the population regression line, the PRF is graphically shown as follows

Consumption expenditure in birr (Y) PRF Line:

120 D
𝐸(𝑌 𝑋) 𝛽 𝛽𝑋

95 C

90 B

70 A

0 100 120 140 160 Income (X)

The population regression line is the locus of the conditional means of the dependent
variable for the ﬁxed values of the explanatory variable(s). It contains the average
distributions of consumption (Y) at a given income (X). Consumption expenditure
increases as income increases, on average.

Note that at any given X, expenditure is random; it could be above or below the
regression line. That is, Y is randomly distributed while X is statistical fixed. However,
the conditional mean of consumption expenditure ( ( )) of household in each income
level is predictable and it is denoted by points on the regression line such as at points: A,
B, C and D. For example at income 100, the expected expenditure is birr 70 (at point A on
PRF line). But, the actual consumption expenditure be anywhere above or below point A (it
could be 50, 55, 60,… or 75, 80, 90 , etc .

Our objective in regression analysis is to ﬁnd out how the average value of the
dependent variable varies with the given value of the explanatory variable.

2
Chapter Two Simple Linear Regression

Since we don’t have the entire population data we should rely on sample data to
estimate population mean value. Thus, the sample counter part of the population
regression function is referred as Sample Regression Function (SRF) which given as:
̂ ̂ ̂

̂ : (Y hat) is estimator of conditional mean of the dependent variable, ( ).

̂ : Estimator of and ̂ : estimator of

If we draw the line for the estimated mean values (SRF) it will not necessarily overlap
with the line of PRF, it only approximately closes to PRF line (below the broken line
shows the estimated SRF).

Y SRF

PRF
100 D
C
90 B -
A
70

100 120 140 160 200 (X)

2.2. Simple Linear Regression model (Two – variable case)

Regression analysis could be simple or multiple regression models. The Multiple
Regression Model is a model that contains three or more variables:
(5 variable model)

The simple linear regression model is a regression model that contains only two
variables: one dependent and one explanatory variable.

2.2.1 Assumptions of Classical Simple Regression model (CLRM)

Ordinary Least Square (OLS) is one of the most widely used methods in regression. OLS
Estimators are used to estimate population parameters. However, estimate of OLS are
valid if certain key assumptions are satisfied, which referred as assumptions CLRM,
discussed below.

3
Chapter Two Simple Linear Regression

Assumption 1: The model is linear in parameters

That is, the expected value of the dependent variable Y or E Y X i  is a linear function of

the parameters, the  ' s ; however, it may or may not be linear with respect to the
explanatory, X.
Example:
Y1   0  1 X  u1  

Y2   i   1 X 2  u 2  are all linear in parameters regression functions

ny   0   1nX  u 3 

Y1   0  12 X  u1 

2 
But Y2   0  X  u 2 are non  linear in parameter regressioin functions.
1 
Y3   0   1 2 X  u 3 

Assumption 2: The explanatory variables (X values) are ﬁxed in repeated sampling.

Values taken by the regressors(X) are considered ﬁxed in repeated samples. More
technically, X is assumed to be non-stochastic or non- random.

Assumption 3: The expected value of the error terms (mean value) is zero.
Zero mean value of the error terms is essential summation of errors at each value of the
explanatory variable is zero; ( )= 0.

u
X
Assumption 4: Homoscedasticity or constant variance of the error terms, u i .

( ) , given value of X, variance of ui is the same for all observations. It is

conditional variance of the error term (conditional upon the explanatory variable, X);
more specifically:
Var ui xi   Eu i  E u i xi 
2

 
 E ui2   2 i  1 n

( )

4
Chapter Two Simple Linear Regression

Even if we vary the value of the explanatory variable (X), the variance of error terms
corresponding to each value of the explanatory variable is the same. The opposite of
Homoscedasticity is heteroscedasticity which means the variance of the error term is
not constant.
Assumption 5: No autocorrelation between the disturbance or error terms.
Given any two X values, X i & X j i  j  , the correlation between any two error terms
ui & u j i  j  is zero. There should be no correlation / covariance between two or more
error terms. That is:
( ) [[ ( )][ ( )]]

( ) [[ ][ ]] ( )= ( ) ( )=0

Assumption 6: Zero covariance between the error terms and the explanatory
Variables, Xi. That is: E u i xi   0

Covui , xi   E ui  E ui  X i  E  X i 

 E ui X i   E  X i E ui . sin ce E  X i  is non  random
 E ui X i   0, by assumption

Assumption 7: The number of observations ‘n’ must be greater than the number of
Explanatory variables. In other words, the number of observations ‘n’ must be greater
than the number of parameters to be estimated.

Assumption 8: The regression model is has to be speciﬁed correctly.

Alternatively, there is no speciﬁcation bias or error in the model used in econometric
analysis.
Model speciﬁcation has to include the following points:
 What variables should be included in the model?
 What is the functional form of the model? Is it linear in the parameters, in
variables, or both?
 What are the probabilistic assumptions made about , and entering the
model?
Assumption 10: No perfect multicollinearity among regresses or the explanatory
variable.
Assumption 11: The error terms are normally distributed for all observations.

5
Chapter Two Simple Linear Regression

When the error term is normally distributed, then, Y (dependent variable) and the
parameters of the regression are also normally distributed, so that tests can be
conducted on the statistical significance of parameters.

The nature of the probability distribution of assumes an extremely important role in

hypothesis testing. Taking the three assumptions about the error term (assumptions 3,
4 and 11), we summarize as:
( )
Which read as the error term ( ) is Normally distributed with zero mean and constant
variance, .

2.2.3 Deriving Ordinary least square (OLS) Estimators

As stated earlier, Ordinary Least Square (OLS) method is one of the most commonly
used methods of estimation. OLS used to derive estimators (formula), based on data.
These Estimators used to compute estimated value of population parameters.

The OLS regression method is based on the assumptions of CLRM. Estimates of OLS are
acceptable if the CLRM assumptions are satisfied in the process. If those assumptions
are not satisfied, OLS can’t be used. The OLS method has some very attractive statistical
properties that have made it one of the most powerful and popular methods of
regression analysis which will be discussed later.

Given a two – variable PRF (population Regression function):

Y   0  1 X  u,

Since the PRF is not observable, we estimate it from the SRF (Sample Regression
Function):
̂ = ̂ ̂ Y
SRF
̂ ̂ * *
̂ ̂ ̂
̂ - (̂ ̂ )
*
̂ = -̂
e1
X

Now, given data or observations on Y& X, we would like to determine the SRF in such a
 
manner that the estimated value Yˆi is as close as possible to the actual Y. To this end,

6
Chapter Two Simple Linear Regression

the OLS method determines or estimates ˆ 0 and ˆ1 in such a way that it minimizes the

error or, it minimizes the Squared Sum of the Residuals (RSS) as possible as it could be.

The objective function is minimize∑ ̂ with respect to ˆ0 and ˆ1

̂ ̂ ̂ , now take the squared sum of both sides:


∑ ̂   Yi  ˆ0  ˆ1 X i 2

Min
ˆ ˆ
: 
 0 .1
2
u i 
  Yi  ˆ0  ˆ1 X i 
2

Then take the partial derivatives with respect to ˆ0 and ˆ1 as follows

F.O.C
 2
  ui   
 2 Yi  ˆ0  ˆ1 X i  2 u i   01
ˆ
 0

 2
  ui   

 2 Yi  ˆ0  ˆ1 X i  X i   2 xi ui  0 2
ˆ 1

From equation (1), we have the following summations:

Y i  nˆ0  ˆ1  X i  0, Dividing both sides by ‘n’ sample size, we have:

Y i
 ˆ0  ˆ1
 X i  0 ; Thus,
n n
̂ =̅ ̂ ̅
̂ is the least square point estimator for

Where: ̅ ̅ are average values (mean) of Y and X respectively. This is the least
square estimator for  0 the intercept term.

From equation (2), we have:

[∑( ̂ ̂ )]

∑ ̂ ∑ ̂ ∑

Note that; ̂ ̅ ̂ ̅ and ∑ is written as ̅

∑ (̅ ̂ ̅) ∑ ̂ ∑

∑ (̅ ̂ ̅) ̅ ̂ ∑

∑ ̅̅ + ̂ ̅ ̂ ∑

∑ ̅̅ = ̂ ∑ - ̂ ̅ and ∑ ̅̅= ̂ [∑ ̅ ]; Hence

7
Chapter Two Simple Linear Regression

∑ ̅̅
̂
∑ ̅
-
̂ is the least square point estimator for

In deviation form: Given that;∑ ̅̅ = ∑[( ̅)(( ̅))] =∑

Where ( ̅) = and ( ̅) = and

∑[( ̅) ]= ∑ ̅ =∑ , then formula for ̂ could be written in deviation form
as follows:
∑
̂
∑
The lower letters and denotes the deviation of each observed value from the mean.
A deviation form of each variable is deviation of values of individual observation from
its mean or average value.
= ̅ - the deviation of Y values from the mean value
= ̅ - the deviation of values from the mean value

Numerical Example 1: Consider a hypothetical date on output (Y) produced and labor
input ( ) used for a firm are give as follows:
Obs. 1 2 3 4 5 6 7 8 9 10
(Firm)
Y 11 10 12 6 10 7 9 10 11 10
X 10 7 10 5 8 8 6 7 9 10

Then we have: two variables Y (the dependent variable) and X the explanatory variable,
sample size: ,
∑ , ∑ , ∑ ,∑ ,∑ ,̅ ̅

The model is specified as:

Then estimate the values of the regression coefficients based on the data:

ˆ1 
Y X  nyx  789  1089.6  21  0.75
i i

 X  nx 668  108 28
2 2 2
i

ˆ0  Y  ˆ1 X  9.6  0.758  3.6

Thus, the SRF or the estimated equation will be: ̂

(Note: ̂ )

8
Chapter Two Simple Linear Regression

Where ̂ and ̂ are Point Estimates of the true parameters. The value
for ̂ ( ) interpreted as the marginal product of labor; for a one unit increase in
labor employment, total output will increase by 0.75 unit.

Note also the following summations in deviation form:

∑ ,∑ ,∑ , ∑ ̂ =̂ ∑ Note:̂
∑
∑
=

2.2.4 Important properties of OLS estimators:

1. The SRF passes through the sample means of Y and X,(̅ and ̅ respectively); in other
words, the SRF contains both mean values.
ˆ0  Y  ˆ1 X Y SRF 𝑌̂𝑖 = 𝛽̂ 𝛽̂ 𝑋𝑖
 Y  ˆ0  ˆ1 X
A

̅
X
X
2. The mean value of the estimated or the fitted value (̂) is equal to the mean of the

actual Y: ̅ ̅
̂

= ̂ +̂ Summing both sides & dividing by ‘n’, we have.

∑ = ∑( ̂ ̂)
∑ ∑̂ ∑̂ ∑̂
= + where = 0 by assumption 3

̅ = ̅̂ + 0; Hence, ̅ = ̂
̅

̅ ̅
̂ ̂ ̂ ̅= ̅
3. The residuals and the estimated value of the dependent variable are uncorrelated,
That is: (̂ ̂ ) = 0

𝑐𝑜𝑣(𝑌̂𝑖 𝑢̂ 𝑖 ) =𝐸 [ 𝑌̂ 𝑌̅̂ 𝑢̂ 𝑈 ̅
̂ ] = 𝐸[(𝑦̂)𝑢̂]
= 𝐸[(𝛽̂ 𝑥𝑖 )𝑢 ̂] = 𝛽̂ 𝐸[𝑥 𝑢
𝑖 ̂] = 0
Since 𝐸[𝑥𝑖 𝑢̂] = cov [𝑥𝑖 𝑢̂] = 0

9
Chapter Two Simple Linear Regression

2.2.5. Measure of Goodness of Fit (Coefficient of Determination, )

The coefficient of determination is a measure that shows how well the regression model
explains the variation in the actual value (population value) of the dependent variable.
Having estimated a particular linear model, a natural question that comes up is:
 How well does the estimated value fit the population value, or how ̂ is close to
the actual value of Y?

The coefficient of determination is a summary measure that tells how well the
sample regression line fits the observation (data), in simple regression. Using sample
observation we produce the SRF (Sample Regression Function). The measure of the
‘goodness of fit’, which denoted by in simple regression model, helps us to see how
close is the estimated sample regression line to the population regression line.
Recall that, Written In deviation form; where ̅ ̂ ̂ ̅̂

̂ ̂ ; Squaring & sum both sides we have:

∑ ∑( ̂ ̂ ) and ∑ ∑( ̂ ̂ ̂ ̂ )
∑ ∑( ̂ )+ 2∑ ̂ ̂ +∑ ̂ , where ∑ ̂ ̂ , thus
2

 
y  yˆ   u
2 2
i i i
 
TSS ESS RSS

ESS = ∑ ̂ ̂ ∑ = ∑ ∑̂

RSS = ∑ ̂ = ∑ ∑̂ = ∑ ̂ ∑
TSS = ESS + RSS

Where:
TSS: Total sum of square (Total variation of the dependent variable),
ESS: Explained sum of Square, Explained variation accounted for the explanatory
variable.
RSS: Residual sum of square, (Unexplained variation), the variation in the dependent
variable that is not explained by explanatory variable in the model.
The coefficient of determination ( ) is computed as a ratio between the ESS and TSS
obtained from the data.

From our numerical example 1 above, we compute TSS, ESS and RSS as follows:
a) TSS = ∑ = ∑ ̅

10
Chapter Two Simple Linear Regression

b) ESS ̂ ∑ = = 15.75
c) RSS = TSS – ESS = 30.4 – 15.75 = 14.65

= = 0.52

Interpretation: = 0.52 means that about 52% of the variation in output(Y) is explained
by the variation in labor hour input (X).

Note the following points about

 is non – negative, it can’t be negative.
 It is always lies between zero and one: 0  r 2  1,

 If = 0; the model doesn’t explain anything, the explanatory variable doesn’t explain
the changes on the dependent variable.
 If = 1 means perfect fit: ̂

2.3. The BLUE property of OLS Estimators: Gauss Markov Theorem

The Gauss - Markov Theorem state that given the assumptions of the classical linear
regression model, the OLS estimators satisfy Best Linear Unbiased Estimators (BLUE)
property. The OLS estimators are said to be a Best Linear Unbiased Estimator (BLUE) of
population values if the following are satisfied:

1) It is linear, that is, a linear function of a random variable, such as the dependent
variable Y
2) It is unbiased, that is, the expected value of the estimator is equal to its true
 
value. E ̂1 =  1 and E( ̂ )
3) It has minimum variance (efficient estimator) in the class of all such linear
unbiased estimators.
Proof
1) ˆ1 is linear in Yi

Yi  ˆ  ˆ
0 1
xi  U i

 ˆ1 
y x
i i
, Let
xi
 wi
x  xi
2 2
i

 ˆ1  wY
i i  wiY1  w2Y2   wnYn , where wi ' s are
fixed sin ce xi ' s are fixed.

11
Chapter Two Simple Linear Regression

2) ˆ1 is unbiased estimator of ˆ1 i.e.E ˆ1  ˆ1    

From proof number (1) we have:
ˆ1   wiYi   wi  0  1 X i  ui 
  0  wi  1  wi X i   ui wi
 1   wi ui

 
 E ˆ1  E 1   wi ui 
 E 1   E wi u1  w2u 2    wn u n 
 E 1   w1 E u1   w2 E u 2     wn E u n 
 1  0
 
 E ˆ1  1

3) Minimum variance: See the proof on the text

2.4. Variance & covariance of OLS estimators ˆ0 & ˆ1

Estimates ˆ0 & ˆ1 differ from sample to sample. Since we have only one sample at a

time, we rely on the precision of these estimates in representing the true parameters
( ). The measure of such precision is the standard error.

The variances and standard error of the OLS estimators are computed below.

a) The variance of the Error ( )can be estimated from data as follows:

∑̂
̂ = and ̂ √ ̂ - the standard error
Where
̂ estimator of the actual variance of the error term ( )
RSS: Residual Sum of Squares, and
is the degree of freedom, where n is the sample size and k is the number of
variables in the model.

For our example 1 above: RSS = 14.65, n= 10 and k = 2 , then n – k = 8.

Thus, variance of the disturbance term could be estimated as follows:
∑̂
̂ = = = 1.83, and standard error : =√ ( ̂) = √ 1.353

̂ ∑ ̂ √∑
b) (̂ ) = ∑
and (̂ )
√ ∑
̂ ∑
From our example: (̂ ) = ∑
= = = 4.366 ( ̂ )= √ = 2.09
̂
c) (̂ ) =∑ and (̂ )
√∑

12
Chapter Two Simple Linear Regression

̂
From our numerical example 1: (̂ ) = ∑
= = 0.065’ ( ̂ )= √ = 0.256

d)  
cov ˆ0 , ˆ1   x var ˆ1  
More variation in X (the explanatory variable) and increase in sample size (n) increases
precision of the estimators ( ˆ0 & ˆ1 ), this so because it will reduce the variance of

estimators
2.5. Implications of the Normality Assumption
The classical normal linear regression model assumes that ui  N 0,  2 : The Error  
Term is normally distributed with mean zero and variance this would imply that the
dependent variable and the coefficients are also normally distributed.
  
Yi  N  0  1 x,  2 , because Yi is a linear fucntion of ui , hence Y is also normally

distributed

 Similarly, ˆ0 & ˆ1 are Normally distributed with the following mean and
∑
variances ̂ N( ⁄∑ ) and ̂ N ∑

Given properties of normal distribution, the standardized values of coefficients

computed as follows:

Z

 1  1 ˆ 1

 ˆ 0
 t nk

   
1 0
t nk Similarly,
Se ˆ Se ˆ

se  1 1 0

2.6. Interval Estimation for regression coefficients:  0 & 1

In reliability of a point estimator is measured by its standard, error. Therefore, instead
of relying on the point estimate alone, We may construct an interval around the point
 
estimator ˆ1 & ˆ2 , say within two or three standard errors on either side of the point
estimator, such that this interval has, say 95% probability of including the true
parameter value (say, 1 ,  2

 
Symbolically: ˆ2     2  ˆ2    1   , where  0    1 is known as the level of
significance (or probability of committing a type – I error)

Note : Type – I error  rejecting a true hypothesis

Type –II error accepting a false hypothesis

13
Chapter Two Simple Linear Regression

Confidence interval for the regression coefficients:  0 and

The confidence interval for the true will be constructed as follows given the value
and the degree of freedom for the t- critical value.
̂
P[ ⁄ ̂ ⁄ ]=

Where: Denote the degree of freedom for the t – critical value, since we have two
parameters (in the case of two variables model) k = 2, hence, and if = 5%;
Hence, , the confidence level, will be 0.95 or 95%.Then a 95% confidence interval
for is given as follows by rearranging the above statement:

p[ ̂ ( ⁄ ) (̂ ) ̂ ( ⁄ ) ( ̂ )]= 95%

Using the previous firm example; we have estimated the model as:
; & , ( ̂ )= 2.09
Since we are intended to construct a 95% confidence interval, 0.5 and = 0.025, the

t – critical value: = 2.306 (obtained from the t - table).

Then confidence interval for will be:
p[ ̂ ( ) (̂ ) ̂ ( ) ( ̂ )] = 95%
 Pr3.6  2.3062.09   0  3.6  2.3062.09  95%
 Pr 1.22   0  8.42

Therefore, the 95% confidence interval for is  1.22, 8.42

Interpretation: Given the confidence coefficient of 95%, in the long run, in 95% out of
100 cases intervals like (-1.22, 8.42) will contain the true  0 .

The confidence interval for  1

p[ ̂ ( ⁄ ) (̂ ) ̂ ( ⁄ ) ( ̂ )] = 95%

0.16, 1.34 The interval estimates of the true value of at 95% confidence

2.7 Hypothesis Testing

Hypothesis testing is the process of determining whether or not a given hypothesized
statement is valid or not. The goal of hypothesis tests is to ascertain whether the
statistics from sampled data are reliable to make inference about population values. In
Hypothesis testing, assumption about the probability distribution is essential.

14
Chapter Two Simple Linear Regression

Hypothesis testing could be a two- or one-tail test .Whether one uses a two or one-tail
test will depend upon how the alternative hypothesis is formulated.

1. Test for statistical significance of individual coefficient: The t – test for

t-statistics is used to test a hypothesis about the Statistical Significance of individual
coefficient or parameter in the model. A Parameter or coefficient in a model is measure
the effect of explanatory variable on the dependent variable of a given model. A
coefficient is said to be statistically significant if the value of the test statistic lies in the
critical region (see fig below). In such case the null hypothesis is rejected. Similarly, a
test is said to be statistically insignificant if the value of the test statistic lies in the
acceptance region. In such case, the null hypothesis can’t be rejected. Note that Test
significance of a coefficient is a two tail test. Significance of a coefficient is tested under
the following hypothesis:
(The coefficient is statistically zero, insignificant)
(The coefficient is significantly different from zero, statistically
significant)
( )

Critical region
𝛼⁄ Critical region (rejection
region
95% Region of 𝛼⁄
Acceptance
25 %
⁄ ⁄

Two – sided or Two – Tail Test for Coefficients:

We use two – tail test when we do not have a strong a priori or theoretical expectation
about the direction in which the alternative hypothesis should move relative to the null
hypothesis.

Decision rule:
Reject H0 if the absolute : | ⁄

But we accept H0 if | < ⁄

From our previous example of the firm

; ̂ = 0.75
If we want to test the following hypothesis at : ;

15
Chapter Two Simple Linear Regression

̂
̂ ̂ =

̂ = 2.93 - is the t- statistics computed from the sample data for ̂

Accept Null Reject Null hypothesis.

-2.306 2.306
t – tabulated - the critical Value from the table at ⁄ = 0.025 and is equal to
2.306; [ ]. But, the t- value computed is 2.93.which is outside the
acceptance region, in other words, the estimated value lies in the critical (rejection)
region.

Conclusion: Since outside the acceptance region, hence, we reject H0 - is

significantly different from zero.
The exact level of significance (the P– value): The P- value is defined as the lowest
significance level at which the null – hypothesis is rejected. From our illustrative
example, given the null hypothesis that 1  0, we have computed t- value of 2.93.
What is the P – value of obtaining t – value of as much as or greater than 2.93? The
probability of obtaining a t- value of 2.929 or greater (at ) is about 0.02, i.e.
| | which indicate a low probability of getting t – value greater than
2.93, we reject the null hypothesis at alpha = 0.05. Statistical software’s always provide
the p- value of any statistical estimation.

2. Testing the overall significance of the model: (The F – test)

The F - test is a test conducted to certify whether the regression coefficients of a model are
jointly significant or not. The result of this test used to determine if the estimated model
is adequate or not. It is especially important in multiple regression models which
contain more than one explanatory variable (to be discussed in chapter 3). Given a
general regression function of the form:

The F –test, F value is computed the data as follows

TSS = ESS + RSS, dividing both sides by  2 ;

y  yˆ u
2 2
TSS ESS RSS
    
i i i

2 2 2 2 2 2

Thus,
ESS df

ESS k  1
RSS df RSS n  k
F k 1, n  k

16
Chapter Two Simple Linear Regression

To compute F – statistics from sample information use the formula:

⁄
F= =
⁄

Decision Rule
Reject H0 if and conclude that the regression coefficients are
jointly statistically significantly different from zero. The joint effects of the explanatory
variables on the dependent variable is significant; and the model reliable for prediction.

Given the previous estimated equation for output and labor

̂
We have: , and and , and
The F- value is computed from the sample data as follows;
⁄ ⁄
( ) = = ⁄
=
⁄

Next read the F - critical value from F - table at ( ) and at level of

significant
Which is given as: F (1, 8) at 5%, level is 5.32;
Note: F (1, 8) at 5%, level is 5.32 which is lower than F computed

Reporting the Results of the Regression

In reporting regression results we should follow a standard format where all relevant
results are included. The most commonly reported results includes: estimates of
coefficients, standard errors, t- statistics computed, sample size, , F – values , level of
significance ( ), and other test results.

Y i  3.6  0.75xi
Se  2.09 0.256 r 2  0.52 n = 10, RSS = 14.65,
t  1.72 2.93 df  8 , F1,8  8.6

Numerical Example 2 Data on weekly households’ consumption expenditure ( ) and

income (X).

Obs. 1 2 3 4 5 6 7 8 9 10
Y 70 65 90 95 110 115 120 140 155 150
X 80 100 120 140 160 180 200 220 240 260

17
Chapter Two Simple Linear Regression

We have: two variables Y (the dependent variable) and X the explanatory variable and
sample size: .∑ ,∑ ,∑ ,
̅ ̅ In deviation form: ∑ ∑
The model is specified as:
a) Estimate the two regression coefficients:
∑
̂ = = 0.5091
∑
̂ ̅ ̂ ̅ = 111

b) Compute TSS, ESS and RSS

̂ ∑
TSS = ∑ ∑ ̅ = 132,100 - 123,210 = 8,890
RSS = ∑ = TSS – ESS = 8890 – 8553 = 337

C) Compute the - the goodness of fit and

= = 0.962 and √ = √

d) Compute the estimated variance or the sample variance, ̂ , and the standard error
̂ and √̂ = √

e) Compute standard errors of the coefficients

̂ ̂
se( ̂ ) = √ ( ̂ ) =√∑ =√ = 0.0357 ; var( ̂ ) = ∑ = 0.00128

̂ ∑
se( ̂ ) = √ ( ̂ ) =√ ∑
=√ ; var( ̂ ) = = 41.104

f) Compute t – values for the coefficients

̂ ̂
̂
(̂ )
= 3.812 and ̂
(̂ )
= 14.261

The estimated SRF is reported as follows:

̂
Se = (6.411) (0.0357)
= (3.812) (14.261)
= 0.962, , ESS = 8553, RSS = 337, n = 10, = 0.05 , = 203.039
⁄ = = 2.306 (from t – table)

18
Chapter Two Simple Linear Regression

From the report above, we can directly conclude that both coefficients are statistically
significantly different from zero at 0.05 level of significant and the claim of the null
hypothesis is rejected. This is because the absolute value of is
greater than the - critical valueat the given ( )and level of significant
( ) , for both coefficients.

Econometrics: Linear Regression Analysis
No ratings yet
Econometrics: Linear Regression Analysis
20 pages
Two-Variable Regression Analysis
100% (1)
Two-Variable Regression Analysis
27 pages
Two-Variable Regression Analysis, Some Basic Ideas
No ratings yet
Two-Variable Regression Analysis, Some Basic Ideas
28 pages
Simple Regression Model - Specification
No ratings yet
Simple Regression Model - Specification
5 pages
Ch2 Two Variable Analysis
No ratings yet
Ch2 Two Variable Analysis
13 pages
Unit 2metrics
No ratings yet
Unit 2metrics
50 pages
Linear Regression Essentials
No ratings yet
Linear Regression Essentials
14 pages
Econometrics: Linear Regression Basics
No ratings yet
Econometrics: Linear Regression Basics
21 pages
Understanding Simple Linear Regression
No ratings yet
Understanding Simple Linear Regression
21 pages
Chapter 2 (Econometrics)
No ratings yet
Chapter 2 (Econometrics)
36 pages
Chapter 2
No ratings yet
Chapter 2
58 pages
Understanding Simple Regression Models
No ratings yet
Understanding Simple Regression Models
32 pages
Chapter 2 Simple Linear Regression
No ratings yet
Chapter 2 Simple Linear Regression
31 pages
Two-Variable Regression Basics
No ratings yet
Two-Variable Regression Basics
11 pages
Econ 131: Regression Basics
No ratings yet
Econ 131: Regression Basics
25 pages
Demand Estimation via Regression Analysis
No ratings yet
Demand Estimation via Regression Analysis
9 pages
Lecture 10-Simple-Regression To Multiple Regression
No ratings yet
Lecture 10-Simple-Regression To Multiple Regression
7 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Handout - Basic Regression - Analysis
No ratings yet
Handout - Basic Regression - Analysis
14 pages
Econometrics For MGT ppt-2
No ratings yet
Econometrics For MGT ppt-2
58 pages
Chapter 2
No ratings yet
Chapter 2
18 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
36 pages
Econometrics I
No ratings yet
Econometrics I
43 pages
Eco Trix
No ratings yet
Eco Trix
16 pages
Unit 4
No ratings yet
Unit 4
18 pages
Simple Linear Regression Analysis Guide
No ratings yet
Simple Linear Regression Analysis Guide
95 pages
Regression Analysis
No ratings yet
Regression Analysis
16 pages
CH 03 Two Variable Re. Analysis
No ratings yet
CH 03 Two Variable Re. Analysis
37 pages
Ch2, SLR
No ratings yet
Ch2, SLR
39 pages
Simple Linear Regression1
No ratings yet
Simple Linear Regression1
36 pages
Econometrics Lecture 1 15834207 2023 03 06 17 58
No ratings yet
Econometrics Lecture 1 15834207 2023 03 06 17 58
33 pages
Chapter Two: Bivariate Regression Mode
100% (1)
Chapter Two: Bivariate Regression Mode
54 pages
Econometrics I Handout
No ratings yet
Econometrics I Handout
41 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
Level II IFT Study Notes Quant R04 Introduction To Linear Regression
No ratings yet
Level II IFT Study Notes Quant R04 Introduction To Linear Regression
13 pages
1486016038da Mod12 Q1 e Text
No ratings yet
1486016038da Mod12 Q1 e Text
11 pages
L2 TwoVariable Regression 2023
No ratings yet
L2 TwoVariable Regression 2023
22 pages
Handout Theory by PD Sir
No ratings yet
Handout Theory by PD Sir
94 pages
Chapter Two
No ratings yet
Chapter Two
44 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
Group Assignment Final PDF
100% (1)
Group Assignment Final PDF
13 pages
Understanding Regression Analysis
No ratings yet
Understanding Regression Analysis
6 pages
Regression Analysis NEW-1
No ratings yet
Regression Analysis NEW-1
60 pages
Correlation and Regression
No ratings yet
Correlation and Regression
10 pages
Two Variable
No ratings yet
Two Variable
27 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
63 pages
Linear Regression For Intermediate
No ratings yet
Linear Regression For Intermediate
6 pages
Lecture 6 Correlation and Regression
No ratings yet
Lecture 6 Correlation and Regression
10 pages
325unit 1 Simple Regression Analysis
No ratings yet
325unit 1 Simple Regression Analysis
10 pages
2 Variable Regression
No ratings yet
2 Variable Regression
28 pages
Econometrics Assignment Overview
100% (3)
Econometrics Assignment Overview
8 pages
Ria Stats Regression Analysiss
No ratings yet
Ria Stats Regression Analysiss
2 pages
Lecture 10
No ratings yet
Lecture 10
5 pages
Study of Rock Strength To Divide Limestone Quality Zone in Mangilu Area, Bungoro District, Pangkep Regency, South Sulawesi Province
No ratings yet
Study of Rock Strength To Divide Limestone Quality Zone in Mangilu Area, Bungoro District, Pangkep Regency, South Sulawesi Province
1 page
HW 3
No ratings yet
HW 3
20 pages
LM10 Simple Linear Regression IFT Notes
No ratings yet
LM10 Simple Linear Regression IFT Notes
28 pages
Worksheet 003 Curve Fitting Interpolation and Extrapolation
No ratings yet
Worksheet 003 Curve Fitting Interpolation and Extrapolation
4 pages
A Comparison of Simple and Compound Interest V02
No ratings yet
A Comparison of Simple and Compound Interest V02
6 pages
两变量线性回归基础概念解析
No ratings yet
两变量线性回归基础概念解析
24 pages
Basic Simple Linear Regression
No ratings yet
Basic Simple Linear Regression
15 pages
Nitrogen Displacement Calculations Guide
100% (3)
Nitrogen Displacement Calculations Guide
59 pages
A Review of Comparative Studies of Spatial Interpolation Methods in Environmental Sciences - Performance and Impact Factors PDF
0% (1)
A Review of Comparative Studies of Spatial Interpolation Methods in Environmental Sciences - Performance and Impact Factors PDF
14 pages
Chapter 9-Correlation and Regression
No ratings yet
Chapter 9-Correlation and Regression
23 pages
2025-06-26 Lecture 7 Notes-C11
No ratings yet
2025-06-26 Lecture 7 Notes-C11
7 pages
A Generalized Finite Element Formulation For Arbitrary Basis Functions - From Isogeometric Analysis To XFEM
No ratings yet
A Generalized Finite Element Formulation For Arbitrary Basis Functions - From Isogeometric Analysis To XFEM
21 pages
Simple Trapezoidal Rule Explained
No ratings yet
Simple Trapezoidal Rule Explained
20 pages
MathsCBCS SyllabuswefAY201920
No ratings yet
MathsCBCS SyllabuswefAY201920
27 pages
Interpolation for Oceanographers
No ratings yet
Interpolation for Oceanographers
28 pages
Bioekonomi Pa Wemm
No ratings yet
Bioekonomi Pa Wemm
9 pages
Notes On The Gauss-Markov Estimator
No ratings yet
Notes On The Gauss-Markov Estimator
2 pages
Midterm 1 Exam A Answer Key
No ratings yet
Midterm 1 Exam A Answer Key
9 pages
5cf783r0hSYZTD8N 0COXan7bvGRd4pWm-EPSM UNIT 7 WeatherTrendsSalesPredictor
No ratings yet
5cf783r0hSYZTD8N 0COXan7bvGRd4pWm-EPSM UNIT 7 WeatherTrendsSalesPredictor
5 pages
Spatial Interpolation Techniques
No ratings yet
Spatial Interpolation Techniques
12 pages
Final Demo
100% (1)
Final Demo
16 pages
Drawing Techniques for Scientists
No ratings yet
Drawing Techniques for Scientists
82 pages
TB Lesson9LeastSquaresRegression 687c88fec377c2.687c89000dd321.13958774
No ratings yet
TB Lesson9LeastSquaresRegression 687c88fec377c2.687c89000dd321.13958774
4 pages
Module6 Regression Quiz
No ratings yet
Module6 Regression Quiz
4 pages
ML-Lab07-Building and Evaluating Multivariate Regression Models in Python
No ratings yet
ML-Lab07-Building and Evaluating Multivariate Regression Models in Python
5 pages
Econometrics Theory Practice I (2020)
No ratings yet
Econometrics Theory Practice I (2020)
8 pages
Che F242 Numerical Methods
No ratings yet
Che F242 Numerical Methods
4 pages
Take Home Final Exam (Problem 2)
No ratings yet
Take Home Final Exam (Problem 2)
12 pages
Econometrics Formulas Updated
No ratings yet
Econometrics Formulas Updated
4 pages
Game Engine Programming 2 Week 6 Module 1
No ratings yet
Game Engine Programming 2 Week 6 Module 1
18 pages

Chapter Two

Uploaded by

Chapter Two

Uploaded by

Chapter Two Simple Linear Regression

Regression analysis concerned with the statistical dependence, not functional or

Y is the actual/observed consumption expenditure; however, the expected consumption

Suppose we have data on ( ) and X as follows:

Consumption expenditure in birr (Y) PRF Line:

0 100 120 140 160 Income (X)

̂ : (Y hat) is estimator of conditional mean of the dependent variable, ( ).

100 120 140 160 200 (X)

2.2. Simple Linear Regression model (Two – variable case)

2.2.1 Assumptions of Classical Simple Regression model (CLRM)

Assumption 1: The model is linear in parameters

Assumption 2: The explanatory variables (X values) are ﬁxed in repeated sampling.

( ) , given value of X, variance of ui is the same for all observations. It is

Covui , xi   E ui  E ui  X i  E  X i 

Assumption 8: The regression model is has to be speciﬁed correctly.

The nature of the probability distribution of assumes an extremely important role in

2.2.3 Deriving Ordinary least square (OLS) Estimators

Given a two – variable PRF (population Regression function):

The objective function is minimize∑ ̂ with respect to ˆ0 and ˆ1

̂ ̂ ̂ , now take the squared sum of both sides:

From equation (1), we have the following summations:

Y i  nˆ0  ˆ1  X i  0, Dividing both sides by ‘n’ sample size, we have:

From equation (2), we have:

Note that; ̂ ̅ ̂ ̅ and ∑ is written as ̅

∑ ̅̅ = ̂ ∑ - ̂ ̅ and ∑ ̅̅= ̂ [∑ ̅ ]; Hence

In deviation form: Given that;∑ ̅̅ = ∑[( ̅)(( ̅))] =∑

Where ( ̅) = and ( ̅) = and

The model is specified as:

ˆ0  Y  ˆ1 X  9.6  0.758  3.6

Thus, the SRF or the estimated equation will be: ̂

Note also the following summations in deviation form:

2.2.4 Important properties of OLS estimators:

= ̂ +̂ Summing both sides & dividing by ‘n’, we have.

2.2.5. Measure of Goodness of Fit (Coefficient of Determination, )

̂ ̂ ; Squaring & sum both sides we have:

Note the following points about

2.3. The BLUE property of OLS Estimators: Gauss Markov Theorem

2) ˆ1 is unbiased estimator of ˆ1 i.e.E ˆ1  ˆ1    

3) Minimum variance: See the proof on the text

2.4. Variance & covariance of OLS estimators ˆ0 & ˆ1

a) The variance of the Error ( )can be estimated from data as follows:

For our example 1 above: RSS = 14.65, n= 10 and k = 2 , then n – k = 8.

Given properties of normal distribution, the standardized values of coefficients

2.6. Interval Estimation for regression coefficients:  0 & 1

Note : Type – I error  rejecting a true hypothesis

Confidence interval for the regression coefficients:  0 and

t – critical value: = 2.306 (obtained from the t - table).

Therefore, the 95% confidence interval for is  1.22, 8.42

The confidence interval for  1

2.7 Hypothesis Testing

1. Test for statistical significance of individual coefficient: The t – test for

Two – sided or Two – Tail Test for Coefficients:

But we accept H0 if | < ⁄

From our previous example of the firm

̂ = 2.93 - is the t- statistics computed from the sample data for ̂

Accept Null Reject Null hypothesis.

Conclusion: Since outside the acceptance region, hence, we reject H0 - is

2. Testing the overall significance of the model: (The F – test)

The F –test, F value is computed the data as follows

To compute F – statistics from sample information use the formula:

Given the previous estimated equation for output and labor

Next read the F - critical value from F - table at ( ) and at level of

Reporting the Results of the Regression

Numerical Example 2 Data on weekly households’ consumption expenditure ( ) and

b) Compute TSS, ESS and RSS

C) Compute the - the goodness of fit and

e) Compute standard errors of the coefficients

f) Compute t – values for the coefficients

The estimated SRF is reported as follows:

You might also like