0% found this document useful (0 votes)
27 views19 pages

Chapter Two

Chapter Two discusses Simple Linear Regression, which estimates the relationship between a dependent variable and one or more independent variables. It explains the Population Regression Function (PRF) and the Sample Regression Function (SRF), along with key assumptions of the Classical Linear Regression Model (CLRM) necessary for valid Ordinary Least Squares (OLS) estimators. The chapter also covers the derivation of OLS estimators and their statistical properties.

Uploaded by

demekekindu13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views19 pages

Chapter Two

Chapter Two discusses Simple Linear Regression, which estimates the relationship between a dependent variable and one or more independent variables. It explains the Population Regression Function (PRF) and the Sample Regression Function (SRF), along with key assumptions of the Classical Linear Regression Model (CLRM) necessary for valid Ordinary Least Squares (OLS) estimators. The chapter also covers the derivation of OLS estimators and their statistical properties.

Uploaded by

demekekindu13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Chapter Two Simple Linear Regression

CHAPTER TWO
SIMPLE LINEAR REGRESSION
2.1 Definition
Regression analysis is the process of estimating the relationship between two or more
variables. In any regression there are dependent variables and explanatory
(independent) variables; hence, regression used to study the dependence of one
variable (dependent variable) over one or more of explanatory (independent) variables.

We regress the dependent variable over the explanatory variables (Regress Y on X); and
we estimate or predict the expected/mean value of the dependent variable in terms of
the known (fixed) values of the independent variables.

Regression analysis concerned with the statistical dependence, not functional or


deterministic relationship, among variables. The dependent variable is assumed
random or stochastic with certain pattern of probability distributions. The expected
value of the dependent variable for a given fixed value of the independent variable is a
function of the independent variable. If the econometric model is given as:

The Population Regression Function (PRF) which shows the Conditional Expected value
of the dependent variable (conditional upon X, the independent variable) is given as:

( )
Where: ( ): conditional mean of Y at a given value of X.
Y: the dependent and X is independent variables
: are regression coefficients. : is the Intercept Coefficient and : is
the Slope Coefficients.

Suppose the relation between household consumption expenditure and income; where
consumption expenditure is (Y) dependent variable while income (X) is the explanatory
variable. That is, consumption expenditure increases as income increases.

Y is the actual/observed consumption expenditure; however, the expected consumption


expenditure for any given household is given as the conditional mean of Y at X:
( ) - PRF that shows the conditional mean of Y

1
Chapter Two Simple Linear Regression

Suppose we have data on ( ) and X as follows:


A B C D E
( ) 70 80 95 120 140
X 100 120 140 160 180

Using the population regression line, the PRF is graphically shown as follows

Consumption expenditure in birr (Y) PRF Line:

120 D
𝐸(𝑌 𝑋) 𝛽 𝛽𝑋

95 C

90 B

70 A

50

0 100 120 140 160 Income (X)

The population regression line is the locus of the conditional means of the dependent
variable for the fixed values of the explanatory variable(s). It contains the average
distributions of consumption (Y) at a given income (X). Consumption expenditure
increases as income increases, on average.

Note that at any given X, expenditure is random; it could be above or below the
regression line. That is, Y is randomly distributed while X is statistical fixed. However,
the conditional mean of consumption expenditure ( ( )) of household in each income
level is predictable and it is denoted by points on the regression line such as at points: A,
B, C and D. For example at income 100, the expected expenditure is birr 70 (at point A on
PRF line). But, the actual consumption expenditure be anywhere above or below point A (it
could be 50, 55, 60,… or 75, 80, 90 , etc .

Our objective in regression analysis is to find out how the average value of the
dependent variable varies with the given value of the explanatory variable.

2
Chapter Two Simple Linear Regression

Since we don’t have the entire population data we should rely on sample data to
estimate population mean value. Thus, the sample counter part of the population
regression function is referred as Sample Regression Function (SRF) which given as:
̂ ̂ ̂

̂ : (Y hat) is estimator of conditional mean of the dependent variable, ( ).


̂ : Estimator of and ̂ : estimator of

If we draw the line for the estimated mean values (SRF) it will not necessarily overlap
with the line of PRF, it only approximately closes to PRF line (below the broken line
shows the estimated SRF).

Y SRF

PRF
100 D
C
90 B -
A
70

100 120 140 160 200 (X)

2.2. Simple Linear Regression model (Two – variable case)


Regression analysis could be simple or multiple regression models. The Multiple
Regression Model is a model that contains three or more variables:
(5 variable model)

The simple linear regression model is a regression model that contains only two
variables: one dependent and one explanatory variable.

2.2.1 Assumptions of Classical Simple Regression model (CLRM)

Ordinary Least Square (OLS) is one of the most widely used methods in regression. OLS
Estimators are used to estimate population parameters. However, estimate of OLS are
valid if certain key assumptions are satisfied, which referred as assumptions CLRM,
discussed below.

3
Chapter Two Simple Linear Regression

Assumption 1: The model is linear in parameters


That is, the expected value of the dependent variable Y or E Y X i  is a linear function of

the parameters, the  ' s ; however, it may or may not be linear with respect to the
explanatory, X.
Example:
Y1   0  1 X  u1  

Y2   i   1 X 2  u 2  are all linear in parameters regression functions

ny   0   1nX  u 3 

Y1   0  12 X  u1 

2 
But Y2   0  X  u 2 are non  linear in parameter regressioin functions.
1 
Y3   0   1 2 X  u 3 

Assumption 2: The explanatory variables (X values) are fixed in repeated sampling.


Values taken by the regressors(X) are considered fixed in repeated samples. More
technically, X is assumed to be non-stochastic or non- random.

Assumption 3: The expected value of the error terms (mean value) is zero.
Zero mean value of the error terms is essential summation of errors at each value of the
explanatory variable is zero; ( )= 0.

u
X
Assumption 4: Homoscedasticity or constant variance of the error terms, u i .

( ) , given value of X, variance of ui is the same for all observations. It is

conditional variance of the error term (conditional upon the explanatory variable, X);
more specifically:
Var ui xi   Eu i  E u i xi 
2

 
 E ui2   2 i  1 n

( )

4
Chapter Two Simple Linear Regression

Even if we vary the value of the explanatory variable (X), the variance of error terms
corresponding to each value of the explanatory variable is the same. The opposite of
Homoscedasticity is heteroscedasticity which means the variance of the error term is
not constant.
Assumption 5: No autocorrelation between the disturbance or error terms.
Given any two X values, X i & X j i  j  , the correlation between any two error terms
ui & u j i  j  is zero. There should be no correlation / covariance between two or more
error terms. That is:
( ) [[ ( )][ ( )]]

( ) [[ ][ ]] ( )= ( ) ( )=0

Assumption 6: Zero covariance between the error terms and the explanatory
Variables, Xi. That is: E u i xi   0

Covui , xi   E ui  E ui  X i  E  X i 


 E ui X i   E  X i E ui . sin ce E  X i  is non  random
 E ui X i   0, by assumption

Assumption 7: The number of observations ‘n’ must be greater than the number of
Explanatory variables. In other words, the number of observations ‘n’ must be greater
than the number of parameters to be estimated.

Assumption 8: The regression model is has to be specified correctly.


Alternatively, there is no specification bias or error in the model used in econometric
analysis.
Model specification has to include the following points:
 What variables should be included in the model?
 What is the functional form of the model? Is it linear in the parameters, in
variables, or both?
 What are the probabilistic assumptions made about , and entering the
model?
Assumption 10: No perfect multicollinearity among regresses or the explanatory
variable.
Assumption 11: The error terms are normally distributed for all observations.

5
Chapter Two Simple Linear Regression

When the error term is normally distributed, then, Y (dependent variable) and the
parameters of the regression are also normally distributed, so that tests can be
conducted on the statistical significance of parameters.

The nature of the probability distribution of assumes an extremely important role in


hypothesis testing. Taking the three assumptions about the error term (assumptions 3,
4 and 11), we summarize as:
( )
Which read as the error term ( ) is Normally distributed with zero mean and constant
variance, .

2.2.3 Deriving Ordinary least square (OLS) Estimators


As stated earlier, Ordinary Least Square (OLS) method is one of the most commonly
used methods of estimation. OLS used to derive estimators (formula), based on data.
These Estimators used to compute estimated value of population parameters.

The OLS regression method is based on the assumptions of CLRM. Estimates of OLS are
acceptable if the CLRM assumptions are satisfied in the process. If those assumptions
are not satisfied, OLS can’t be used. The OLS method has some very attractive statistical
properties that have made it one of the most powerful and popular methods of
regression analysis which will be discussed later.

Given a two – variable PRF (population Regression function):


Y   0  1 X  u,

Since the PRF is not observable, we estimate it from the SRF (Sample Regression
Function):
̂ = ̂ ̂ Y
SRF
̂ ̂ * *
̂ ̂ ̂
̂ - (̂ ̂ )
*
̂ = -̂
e1
X

Now, given data or observations on Y& X, we would like to determine the SRF in such a
 
manner that the estimated value Yˆi is as close as possible to the actual Y. To this end,

6
Chapter Two Simple Linear Regression

the OLS method determines or estimates ˆ 0 and ˆ1 in such a way that it minimizes the

error or, it minimizes the Squared Sum of the Residuals (RSS) as possible as it could be.

The objective function is minimize∑ ̂ with respect to ˆ0 and ˆ1

̂ ̂ ̂ , now take the squared sum of both sides:


∑ ̂   Yi  ˆ0  ˆ1 X i 2

Min
ˆ ˆ
: 
 0 .1
2
u i 
  Yi  ˆ0  ˆ1 X i 
2

Then take the partial derivatives with respect to ˆ0 and ˆ1 as follows

F.O.C
 2
  ui   
 2 Yi  ˆ0  ˆ1 X i  2 u i   01
ˆ
 0

 2
  ui   

 2 Yi  ˆ0  ˆ1 X i  X i   2 xi ui  0 2
ˆ 1

From equation (1), we have the following summations:

Y i  nˆ0  ˆ1  X i  0, Dividing both sides by ‘n’ sample size, we have:

Y i
 ˆ0  ˆ1
 X i  0 ; Thus,
n n
̂ =̅ ̂ ̅
̂ is the least square point estimator for

Where: ̅ ̅ are average values (mean) of Y and X respectively. This is the least
square estimator for  0 the intercept term.

From equation (2), we have:

[∑( ̂ ̂ )]

∑ ̂ ∑ ̂ ∑

Note that; ̂ ̅ ̂ ̅ and ∑ is written as ̅

∑ (̅ ̂ ̅) ∑ ̂ ∑

∑ (̅ ̂ ̅) ̅ ̂ ∑

∑ ̅̅ + ̂ ̅ ̂ ∑

∑ ̅̅ = ̂ ∑ - ̂ ̅ and ∑ ̅̅= ̂ [∑ ̅ ]; Hence

7
Chapter Two Simple Linear Regression

∑ ̅̅
̂
∑ ̅
-
̂ is the least square point estimator for

In deviation form: Given that;∑ ̅̅ = ∑[( ̅)(( ̅))] =∑

Where ( ̅) = and ( ̅) = and


∑[( ̅) ]= ∑ ̅ =∑ , then formula for ̂ could be written in deviation form
as follows:

̂

The lower letters and denotes the deviation of each observed value from the mean.
A deviation form of each variable is deviation of values of individual observation from
its mean or average value.
= ̅ - the deviation of Y values from the mean value
= ̅ - the deviation of values from the mean value

Numerical Example 1: Consider a hypothetical date on output (Y) produced and labor
input ( ) used for a firm are give as follows:
Obs. 1 2 3 4 5 6 7 8 9 10
(Firm)
Y 11 10 12 6 10 7 9 10 11 10
X 10 7 10 5 8 8 6 7 9 10

Then we have: two variables Y (the dependent variable) and X the explanatory variable,
sample size: ,
∑ , ∑ , ∑ ,∑ ,∑ ,̅ ̅

The model is specified as:


Then estimate the values of the regression coefficients based on the data:

ˆ1 
Y X  nyx  789  1089.6  21  0.75
i i

 X  nx 668  108 28
2 2 2
i

ˆ0  Y  ˆ1 X  9.6  0.758  3.6

Thus, the SRF or the estimated equation will be: ̂


(Note: ̂ )

8
Chapter Two Simple Linear Regression

Where ̂ and ̂ are Point Estimates of the true parameters. The value
for ̂ ( ) interpreted as the marginal product of labor; for a one unit increase in
labor employment, total output will increase by 0.75 unit.

Note also the following summations in deviation form:

∑ ,∑ ,∑ , ∑ ̂ =̂ ∑ Note:̂


=

2.2.4 Important properties of OLS estimators:


1. The SRF passes through the sample means of Y and X,(̅ and ̅ respectively); in other
words, the SRF contains both mean values.
ˆ0  Y  ˆ1 X Y SRF 𝑌̂𝑖 = 𝛽̂ 𝛽̂ 𝑋𝑖
 Y  ˆ0  ˆ1 X
A

̅
X
X
2. The mean value of the estimated or the fitted value (̂) is equal to the mean of the

actual Y: ̅ ̅
̂

= ̂ +̂ Summing both sides & dividing by ‘n’, we have.


∑ = ∑( ̂ ̂)
∑ ∑̂ ∑̂ ∑̂
= + where = 0 by assumption 3

̅ = ̅̂ + 0; Hence, ̅ = ̂
̅

̅ ̅
̂ ̂ ̂ ̅= ̅
3. The residuals and the estimated value of the dependent variable are uncorrelated,
That is: (̂ ̂ ) = 0

𝑐𝑜𝑣(𝑌̂𝑖 𝑢̂ 𝑖 ) =𝐸 [ 𝑌̂ 𝑌̅̂ 𝑢̂ 𝑈 ̅
̂ ] = 𝐸[(𝑦̂)𝑢̂]
= 𝐸[(𝛽̂ 𝑥𝑖 )𝑢 ̂] = 𝛽̂ 𝐸[𝑥 𝑢
𝑖 ̂] = 0
Since 𝐸[𝑥𝑖 𝑢̂] = cov [𝑥𝑖 𝑢̂] = 0

9
Chapter Two Simple Linear Regression

2.2.5. Measure of Goodness of Fit (Coefficient of Determination, )


The coefficient of determination is a measure that shows how well the regression model
explains the variation in the actual value (population value) of the dependent variable.
Having estimated a particular linear model, a natural question that comes up is:
 How well does the estimated value fit the population value, or how ̂ is close to
the actual value of Y?

The coefficient of determination is a summary measure that tells how well the
sample regression line fits the observation (data), in simple regression. Using sample
observation we produce the SRF (Sample Regression Function). The measure of the
‘goodness of fit’, which denoted by in simple regression model, helps us to see how
close is the estimated sample regression line to the population regression line.
Recall that, Written In deviation form; where ̅ ̂ ̂ ̅̂

̂ ̂ ; Squaring & sum both sides we have:


∑ ∑( ̂ ̂ ) and ∑ ∑( ̂ ̂ ̂ ̂ )
∑ ∑( ̂ )+ 2∑ ̂ ̂ +∑ ̂ , where ∑ ̂ ̂ , thus
2

 
y  yˆ   u
2 2
i i i
 
TSS ESS RSS

ESS = ∑ ̂ ̂ ∑ = ∑ ∑̂

RSS = ∑ ̂ = ∑ ∑̂ = ∑ ̂ ∑
TSS = ESS + RSS

Where:
TSS: Total sum of square (Total variation of the dependent variable),
ESS: Explained sum of Square, Explained variation accounted for the explanatory
variable.
RSS: Residual sum of square, (Unexplained variation), the variation in the dependent
variable that is not explained by explanatory variable in the model.
The coefficient of determination ( ) is computed as a ratio between the ESS and TSS
obtained from the data.

From our numerical example 1 above, we compute TSS, ESS and RSS as follows:
a) TSS = ∑ = ∑ ̅

10
Chapter Two Simple Linear Regression

b) ESS ̂ ∑ = = 15.75
c) RSS = TSS – ESS = 30.4 – 15.75 = 14.65

= = 0.52

Interpretation: = 0.52 means that about 52% of the variation in output(Y) is explained
by the variation in labor hour input (X).

Note the following points about


 is non – negative, it can’t be negative.
 It is always lies between zero and one: 0  r 2  1,

 If = 0; the model doesn’t explain anything, the explanatory variable doesn’t explain
the changes on the dependent variable.
 If = 1 means perfect fit: ̂

2.3. The BLUE property of OLS Estimators: Gauss Markov Theorem


The Gauss - Markov Theorem state that given the assumptions of the classical linear
regression model, the OLS estimators satisfy Best Linear Unbiased Estimators (BLUE)
property. The OLS estimators are said to be a Best Linear Unbiased Estimator (BLUE) of
population values if the following are satisfied:

1) It is linear, that is, a linear function of a random variable, such as the dependent
variable Y
2) It is unbiased, that is, the expected value of the estimator is equal to its true
 
value. E ̂1 =  1 and E( ̂ )
3) It has minimum variance (efficient estimator) in the class of all such linear
unbiased estimators.
Proof
1) ˆ1 is linear in Yi

Yi  ˆ  ˆ
0 1
xi  U i

 ˆ1 
y x
i i
, Let
xi
 wi
x  xi
2 2
i

 ˆ1  wY
i i  wiY1  w2Y2   wnYn , where wi ' s are
fixed sin ce xi ' s are fixed.

11
Chapter Two Simple Linear Regression

2) ˆ1 is unbiased estimator of ˆ1 i.e.E ˆ1  ˆ1    


From proof number (1) we have:
ˆ1   wiYi   wi  0  1 X i  ui 
  0  wi  1  wi X i   ui wi
 1   wi ui

 
 E ˆ1  E 1   wi ui 
 E 1   E wi u1  w2u 2    wn u n 
 E 1   w1 E u1   w2 E u 2     wn E u n 
 1  0
 
 E ˆ1  1

3) Minimum variance: See the proof on the text

2.4. Variance & covariance of OLS estimators ˆ0 & ˆ1


Estimates ˆ0 & ˆ1 differ from sample to sample. Since we have only one sample at a

time, we rely on the precision of these estimates in representing the true parameters
( ). The measure of such precision is the standard error.

The variances and standard error of the OLS estimators are computed below.

a) The variance of the Error ( )can be estimated from data as follows:

∑̂
̂ = and ̂ √ ̂ - the standard error
Where
̂ estimator of the actual variance of the error term ( )
RSS: Residual Sum of Squares, and
is the degree of freedom, where n is the sample size and k is the number of
variables in the model.

For our example 1 above: RSS = 14.65, n= 10 and k = 2 , then n – k = 8.


Thus, variance of the disturbance term could be estimated as follows:
∑̂
̂ = = = 1.83, and standard error : =√ ( ̂) = √ 1.353

̂ ∑ ̂ √∑
b) (̂ ) = ∑
and (̂ )
√ ∑
̂ ∑
From our example: (̂ ) = ∑
= = = 4.366 ( ̂ )= √ = 2.09
̂
c) (̂ ) =∑ and (̂ )
√∑

12
Chapter Two Simple Linear Regression

̂
From our numerical example 1: (̂ ) = ∑
= = 0.065’ ( ̂ )= √ = 0.256

d)  
cov ˆ0 , ˆ1   x var ˆ1  
More variation in X (the explanatory variable) and increase in sample size (n) increases
precision of the estimators ( ˆ0 & ˆ1 ), this so because it will reduce the variance of

estimators
2.5. Implications of the Normality Assumption
The classical normal linear regression model assumes that ui  N 0,  2 : The Error  
Term is normally distributed with mean zero and variance this would imply that the
dependent variable and the coefficients are also normally distributed.
  
Yi  N  0  1 x,  2 , because Yi is a linear fucntion of ui , hence Y is also normally

distributed

 Similarly, ˆ0 & ˆ1 are Normally distributed with the following mean and

variances ̂ N( ⁄∑ ) and ̂ N ∑

Given properties of normal distribution, the standardized values of coefficients


computed as follows:

Z

 1  1 ˆ 1

 ˆ 0
 t nk

   
1 0
t nk Similarly,
Se ˆ Se ˆ

se  1 1 0

2.6. Interval Estimation for regression coefficients:  0 & 1


In reliability of a point estimator is measured by its standard, error. Therefore, instead
of relying on the point estimate alone, We may construct an interval around the point
 
estimator ˆ1 & ˆ2 , say within two or three standard errors on either side of the point
estimator, such that this interval has, say 95% probability of including the true
parameter value (say, 1 ,  2

 
Symbolically: ˆ2     2  ˆ2    1   , where  0    1 is known as the level of
significance (or probability of committing a type – I error)

Note : Type – I error  rejecting a true hypothesis


Type –II error accepting a false hypothesis

13
Chapter Two Simple Linear Regression

Confidence interval for the regression coefficients:  0 and

The confidence interval for the true will be constructed as follows given the value
and the degree of freedom for the t- critical value.
̂
P[ ⁄ ̂ ⁄ ]=

Where: Denote the degree of freedom for the t – critical value, since we have two
parameters (in the case of two variables model) k = 2, hence, and if = 5%;
Hence, , the confidence level, will be 0.95 or 95%.Then a 95% confidence interval
for is given as follows by rearranging the above statement:

p[ ̂ ( ⁄ ) (̂ ) ̂ ( ⁄ ) ( ̂ )]= 95%

Using the previous firm example; we have estimated the model as:
; & , ( ̂ )= 2.09
Since we are intended to construct a 95% confidence interval, 0.5 and = 0.025, the

t – critical value: = 2.306 (obtained from the t - table).


Then confidence interval for will be:
p[ ̂ ( ) (̂ ) ̂ ( ) ( ̂ )] = 95%
 Pr3.6  2.3062.09   0  3.6  2.3062.09  95%
 Pr 1.22   0  8.42

Therefore, the 95% confidence interval for is  1.22, 8.42

Interpretation: Given the confidence coefficient of 95%, in the long run, in 95% out of
100 cases intervals like (-1.22, 8.42) will contain the true  0 .

The confidence interval for  1

p[ ̂ ( ⁄ ) (̂ ) ̂ ( ⁄ ) ( ̂ )] = 95%

0.16, 1.34 The interval estimates of the true value of at 95% confidence

2.7 Hypothesis Testing


Hypothesis testing is the process of determining whether or not a given hypothesized
statement is valid or not. The goal of hypothesis tests is to ascertain whether the
statistics from sampled data are reliable to make inference about population values. In
Hypothesis testing, assumption about the probability distribution is essential.

14
Chapter Two Simple Linear Regression

Hypothesis testing could be a two- or one-tail test .Whether one uses a two or one-tail
test will depend upon how the alternative hypothesis is formulated.

1. Test for statistical significance of individual coefficient: The t – test for


t-statistics is used to test a hypothesis about the Statistical Significance of individual
coefficient or parameter in the model. A Parameter or coefficient in a model is measure
the effect of explanatory variable on the dependent variable of a given model. A
coefficient is said to be statistically significant if the value of the test statistic lies in the
critical region (see fig below). In such case the null hypothesis is rejected. Similarly, a
test is said to be statistically insignificant if the value of the test statistic lies in the
acceptance region. In such case, the null hypothesis can’t be rejected. Note that Test
significance of a coefficient is a two tail test. Significance of a coefficient is tested under
the following hypothesis:
(The coefficient is statistically zero, insignificant)
(The coefficient is significantly different from zero, statistically
significant)
( )

Critical region
𝛼⁄ Critical region (rejection
region
95% Region of 𝛼⁄
Acceptance
25 %
⁄ ⁄

Two – sided or Two – Tail Test for Coefficients:


We use two – tail test when we do not have a strong a priori or theoretical expectation
about the direction in which the alternative hypothesis should move relative to the null
hypothesis.

Decision rule:
Reject H0 if the absolute : | ⁄

But we accept H0 if | < ⁄

From our previous example of the firm


; ̂ = 0.75
If we want to test the following hypothesis at : ;

15
Chapter Two Simple Linear Regression

̂
̂ ̂ =

̂ = 2.93 - is the t- statistics computed from the sample data for ̂

Accept Null Reject Null hypothesis.

-2.306 2.306
t – tabulated - the critical Value from the table at ⁄ = 0.025 and is equal to
2.306; [ ]. But, the t- value computed is 2.93.which is outside the
acceptance region, in other words, the estimated value lies in the critical (rejection)
region.

Conclusion: Since outside the acceptance region, hence, we reject H0 - is


significantly different from zero.
The exact level of significance (the P– value): The P- value is defined as the lowest
significance level at which the null – hypothesis is rejected. From our illustrative
example, given the null hypothesis that 1  0, we have computed t- value of 2.93.
What is the P – value of obtaining t – value of as much as or greater than 2.93? The
probability of obtaining a t- value of 2.929 or greater (at ) is about 0.02, i.e.
| | which indicate a low probability of getting t – value greater than
2.93, we reject the null hypothesis at alpha = 0.05. Statistical software’s always provide
the p- value of any statistical estimation.

2. Testing the overall significance of the model: (The F – test)


The F - test is a test conducted to certify whether the regression coefficients of a model are
jointly significant or not. The result of this test used to determine if the estimated model
is adequate or not. It is especially important in multiple regression models which
contain more than one explanatory variable (to be discussed in chapter 3). Given a
general regression function of the form:

The F –test, F value is computed the data as follows


TSS = ESS + RSS, dividing both sides by  2 ;

y  yˆ u
2 2
TSS ESS RSS
    
i i i

2 2 2 2 2 2

Thus,
ESS df

ESS k  1
RSS df RSS n  k
F k 1, n  k

16
Chapter Two Simple Linear Regression

To compute F – statistics from sample information use the formula:



F= =

Decision Rule
Reject H0 if and conclude that the regression coefficients are
jointly statistically significantly different from zero. The joint effects of the explanatory
variables on the dependent variable is significant; and the model reliable for prediction.

Given the previous estimated equation for output and labor


̂
We have: , and and , and
The F- value is computed from the sample data as follows;
⁄ ⁄
( ) = = ⁄
=

Next read the F - critical value from F - table at ( ) and at level of


significant
Which is given as: F (1, 8) at 5%, level is 5.32;
Note: F (1, 8) at 5%, level is 5.32 which is lower than F computed

Reporting the Results of the Regression

In reporting regression results we should follow a standard format where all relevant
results are included. The most commonly reported results includes: estimates of
coefficients, standard errors, t- statistics computed, sample size, , F – values , level of
significance ( ), and other test results.

Y i  3.6  0.75xi
Se  2.09 0.256 r 2  0.52 n = 10, RSS = 14.65,
t  1.72 2.93 df  8 , F1,8  8.6

Numerical Example 2 Data on weekly households’ consumption expenditure ( ) and


income (X).

Obs. 1 2 3 4 5 6 7 8 9 10
Y 70 65 90 95 110 115 120 140 155 150
X 80 100 120 140 160 180 200 220 240 260

17
Chapter Two Simple Linear Regression

We have: two variables Y (the dependent variable) and X the explanatory variable and
sample size: .∑ ,∑ ,∑ ,
̅ ̅ In deviation form: ∑ ∑
The model is specified as:
a) Estimate the two regression coefficients:

̂ = = 0.5091

̂ ̅ ̂ ̅ = 111

b) Compute TSS, ESS and RSS


̂ ∑
TSS = ∑ ∑ ̅ = 132,100 - 123,210 = 8,890
RSS = ∑ = TSS – ESS = 8890 – 8553 = 337

C) Compute the - the goodness of fit and


= = 0.962 and √ = √

d) Compute the estimated variance or the sample variance, ̂ , and the standard error
̂ and √̂ = √

e) Compute standard errors of the coefficients


̂ ̂
se( ̂ ) = √ ( ̂ ) =√∑ =√ = 0.0357 ; var( ̂ ) = ∑ = 0.00128

̂ ∑
se( ̂ ) = √ ( ̂ ) =√ ∑
=√ ; var( ̂ ) = = 41.104

f) Compute t – values for the coefficients


̂ ̂
̂
(̂ )
= 3.812 and ̂
(̂ )
= 14.261

The estimated SRF is reported as follows:


̂
Se = (6.411) (0.0357)
= (3.812) (14.261)
= 0.962, , ESS = 8553, RSS = 337, n = 10, = 0.05 , = 203.039
⁄ = = 2.306 (from t – table)

18
Chapter Two Simple Linear Regression

From the report above, we can directly conclude that both coefficients are statistically
significantly different from zero at 0.05 level of significant and the claim of the null
hypothesis is rejected. This is because the absolute value of is
greater than the - critical valueat the given ( )and level of significant
( ) , for both coefficients.

19

You might also like