0% found this document useful (0 votes)
12 views29 pages

Statitistical Inference in Multiple Linear Regression

Uploaded by

Aayush Biswas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views29 pages

Statitistical Inference in Multiple Linear Regression

Uploaded by

Aayush Biswas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT 8 STATISTICAL INFERENCE IN

MULTIPLE LINEAR REGRESSION


Structure
8.1 Introduction
Objectives
8.2 Properties of Estimated Regression Coefficients
8.3 Test of Significance of Multiple Regression Model
8.4 Test of Significance of Individual Regression Coefficients
8.5 Confidence Interval of Regression Coefficients
8.6 R2 and Adjusted R2
8.7 Summary
8.8 Solutions / Answers

8.1 INTRODUCTION
In Unit 6, you have learnt how to test the significance of the individual
regression coefficients, the intercept and the slope as well as the overall
regression model for simple linear regression. You have also learnt how to
compute the confidence intervals of regression coefficients. In Unit 7, you have
studied the multiple linear regression model used for explaining the
relationship between the response variable and more than one regressor
variable. inferential aspect of the multiple regression model. The concept is the
same as discussed in Unit 6 for simple linear regression. In Sec. 8.2, we
describe properties of estimated regression coefficients for fitted multiple linear
regression model. We discuss testing of significance of the overall fitted
multiple regression model and individual regression coefficients in Secs. 8.3
and 8.4, respectively.

In this unit, you will learn about the In Sec. 8.5, we explain how to determine
the (1 – α)100% confidence interval of individual regression coefficients. We
discuss the coefficient of determination and adjusted coefficient of
determination in Sec. 8.6.

In the next unit, you will learn about the multiple linear regression model when
some of the regressor variables are quantitative and some are qualitative.

Objectives:
After studying the unit, you should be able to:

• perform residual analysis for a fitted multiple regression model;

• test the significance of multiple regression model;

• conduct hypotheses testing of individual regression coefficients;

• determine confidence interval of individual regression coefficients; and

• interpret the coefficient of determination and adjusted coefficient of


determination. 209
Regression Analysis
8.2 PROPERTIES OF ESTIMATED REGRESSION
COEFFICIENTS
In Sec. 5.3.2 of Unit 5, we have described properties of least squares estimators
of regression coefficients in the simple linear regression model. In this section,
we consider two properties of least squares estimators
ˆ = ˆ 0 ˆ 1 ˆ 2 ... ˆ k  of regression coefficients in the multiple regression
model as follows:
1. The ith least squares estimator ̂i is an unbiased estimator of
i ; i = 1, 2,..., k,i.e.,

E(ˆ i ) = i ; i = 1, 2,..., k .... (1)

(
2. We define the variance-covariance matrix of ˆ = ˆ 0 , ˆ 1 , ˆ 2 ,..., ˆ k ) as:

()
V ˆ = 2 ( XX )−1 =  2 S ... (2)

S00 S01 ... S0k 


S 
 10 S11 ... S1k 
where S = ( Sij ) = . . .  = ( XX )−1
(k +1)(k +1)
 
 . . . 
Sk 0 Sk1 ... Skk 

with i, j = 0, 1, 2, ..., k.
Here,  2 S is also known as variance-covariance matrix of the estimated
regression coefficients. Its dimension is (k+1)×( k+1).
Thus, the variance of ̂i is given by
( )
V ˆ = 2S ;
i i = 0,1, 2,..., k
ii ... (3)

Hence, the standard error of ̂i can be defined as:


SE(ˆ ) =  S
i ii ... (4)

The covariance of ̂i and ̂ j is given by

( )
cov ˆ j , ˆ j = 2Sij ; i  j = 1, 2,..., k ... (5)

Generally, the value of  2 is unknown. So we estimate it with ̂ 2 as in


equation (35) of Unit 7 given by
n n

 ri2  (y i − yˆ i ) 2
ˆ 2 = i =1
= i =1
... (6)
n − k −1 n − k −1
Thus, we define an estimated variance-covariance matrix of ̂ as:

ˆ ˆ ) = ˆ 2 ( XX )−1 = ˆ 2 S
V( … (7)

The standard error of ̂i can be defined for unknown ̂ 2 as:


210
SE(ˆ i ) = ˆ 2 Sii ;
Statistical Inference in
i = 0,1, 2,..., k ... (8) Multiple Linear Regression

You may like to study the following examples to learn this concept.
Example 1: For the data of systolic blood pressure considering age and weight
as regressor variables given in Example 3 of Unit 7, determine the variance-
covariance matrix of the regression coefficients. Also compute the standard
errors of regression coefficients.
Solution: In order to compute variance-covariance matrix of the regression
coefficients, we use the following values computed in Example 3 of Unit 7:
ˆ 2 = 1.82415221 and

 2.9230 −0.0474 −0.0185


( XX ) =  −0.0474 0.0059 −0.0019
−1

 −0.0185 −0.0019 0.0011 

We obtain the variance-covariance matrix of the regression coefficients as


follows:
It is to be noted that all
V(ˆ ) = ˆ 2 ( XX )−1 calculations were
performed up to 15 fixed
 2.9230 −0.0474 −0.0185 decimal places for

V(ˆ ) = 1.8242   −0.0474 0.0059 −0.0019 


showing accurate results
or   in this block. For the
 −0.0185 −0.0019 0.0011  sake of simplicity, we
are showing results up to
4 decimal places only.
 5.3319 −0.0865 −0.0337  The results may vary if
=  −0.0865 0.0108 −0.0035  we carry out the
calculations by fixing
 −0.0337 −0.0035 0.0019  values at various
decimal places.
Thus, the standard errors of regression coefficients are:
V(ˆ 0 ) = 5.3319  SE (ˆ 0 ) = 5.3319 = 2.3091
V(ˆ 1 ) = 0.0108  SE (ˆ 1 ) = 0.0108 = 0.1039
V(ˆ 2 ) = 0.0019  SE(ˆ 2 ) = 0.0019 = 0.0442
and all the non-diagonal elements give covariance of βi and βj for i ≠ j = 0, 1, 2.
For instance, Cov(ˆ 1 , ˆ 2 ) = −0.00346676 when i = 1 and j = 2.

Example 2: For the data of systolic blood pressure considering three regressor
variables: age, weight and height given in Example 4 of Unit 7, obtain the
variance-covariance matrix of the regression coefficients. Also compute SE(β0),
SE(β1), SE(β3) and SE(β2).
Solution: From the solution of Example 4 of Unit 7, we have

ˆ 2 = MSSRes = 1.4497 and

 49.4557 0.1774 −0.0092 −0.3421


 0.1774 0.0070 −0.0019 0.0017 
( XX )−1 =  
 −0.0092 −0.0019 0.0012 −0.0001
 
 −0.3421 0.0017 −0.0001 0.0025 
211
Regression Analysis The variance-covariance matrix of regression coefficients is computed as
follows:
V(ˆ ) = ˆ 2 ( XX )−1

 49.4557 0.1774 −0.0092 −0.3421


 0.1774 0.0070 −0.0019 0.0017 
V() = 1.44971426  
ˆ 
 −0.0092 −0.0019 0.0012 −0.0001
 
 −0.3421 0.0017 −0.0001 0.0025 
 71.6966 0.2572 −0.0133 −0.4959 
 0.2572 0.0102 −0.0027 −0.0024 
= 
 −0.0133 −0.0027 0.0016 −0.0001
 
 −0.4959 −0.0024 −0.0001 0.0037 
The standard errors of regression coefficients are

( ) ( )
V ˆ 0 = 71.6966  SE ˆ 0 = 71.6966 = 8.4674
V (ˆ ) = 0.0102  SE (ˆ ) = 0.0102 = 0.1008
1 1

V (ˆ ) = 0.0016  SE ( ˆ ) = 0.0016 = 0.0394


2 2

and V (ˆ ) = 0.0037  SE ( ˆ ) = 0.0037 = 0.0604


3 3

You should solve the following exercises for practice.

E1) Describe the variance-covariance matrix of ̂ for a multiple regression


with three regressor variables.
E2) For the exercise given in E6 of Unit 7, determine the variance-
covariance matrix of the regression coefficients. Also compute the
standard errors of regression coefficients 0 , 1 and 2 .

E3) For the exercise given in E7 of Unit 7, obtain the variance-covariance


matrix of the regression coefficients. Also compute SE(βi) for i = 0, 1, 2
and 3.

8.3 TEST OF SIGNIFICANCE OF MULTIPLE


REGRESSION MODEL
We use an analysis of variance (ANOVA) approach to test the significance of
the fitted multiple regression model. Recall that in the course “MST-05:
Statistical Techniques” you have learnt that the analysis of variance approach is
based on the partitioning of the total variability into two parts. According to
this approach, we can partition the total variability in the response variable Y,
i.e., the deviation of the observed value yi from the overall mean y into two
parts as
(i) deviation of the predicted value ( ŷi ) from the overall mean y and

(ii) deviation of the observed value ( yi ) from the predicted value ( ŷi ).

212
Mathematically, we define Statistical Inference in
Multiple Linear Regression
( yi − y ) = ( yˆ i − y ) + ( yi − yˆ i ) ... (9)

After squaring and taking summation on both the sides of equation (9), we
obtain
n n n n n

 ( yi − y ) = (yˆ i − y)2 +  ( yi − yˆ i ) +   (yˆ j − y) ( yi − yˆ i )


2 2

i =1 i =1 i =1 i =1 j i =1

n n
The cross-product term will be zero, i.e.,  ( ŷ
i =1 ji
j − y ) ( yi − yˆ i ) = 0

Therefore, we have
n n n

( y − y) =  ( yˆ i − y ) + ( yi − yˆ i )
2 2 2
i ... (10)
i =1 i =1 i =1

In other words, we can write equation (10) as:


Total S um of Squares = Sum of Squares due to Regression+ Deviation from
Regression Sum of Squares
or, Total Sum of Squares = Regression Sum of Squares + Residual (error) Sum
of Squares
Symbolically, we usually write the above result as:

SST = SSReg + SSRes ... (11)

In the same way, we can partition the corresponding degrees of freedom (d.f.)
as:
d.f. for Total Sum of Squares = d.f. for Regression Sum of Squares + d.f. for
Residual Sum of Squares
or (n – 1) = k + (n – k –1)
where k represents the number of regressor variables which is used in the
regression model.
Note that if there is only one regressor variable in the regression model, i.e.,
k = 1, the error d.f. will be (n – 2).
Before computing various measures of variations, we define the null and
alternative hypotheses as follows:
Null Hypothesis: H0 : The fitted regression model is not significant, i.e.,
H0 : 1 = 1 = ... = k = 0
Alternative Hypothesis: H1: The fitted regression model is significant, i.e.,
H1 : Any of the i ( i = 1, 2,..., k ) is not equal to zero.
For applying ANOVA, we first compute these measures of variations as
follows:
• Total Sum of Squares (SST)
It has (n – 1) degrees of freedom. We calculate the overall variability in
the response variable Y and obtain the total sum of squares as:
213
Regression Analysis n
SST =  ( yi − y )
2

i =1

n
=  yi2 − ny 2 ... (12)
i =1

In matrix notation, we define


SST = YY − ny2 ... (13)
where Y is an (n×1) vector of response variables.
• Regression Sum of Squares (SSReg)
It has k degrees of freedom since there are k regressor variables in the
model. Under the assumption of ‘linearity’ in regression analysis, we
obtain the regression sum of squares (SSReg) using the predicted values
ŷi as:
n
SSReg =  ( yˆ i − y )
2

i =1

In case of simple linear =  ŷi2 − ny 2 ... (14)


regression, we can
write We can compute SSReg with the help of matrix notation as:
ˆ Y
SSReg = Y ˆ − ny 2 ... (15)

or SSReg = ˆ  XY − ny 2 ... (16)

We can also rewrite it as:


k n
SSReg =  ˆ i y j x ij − ny 2 ... (17)
i = 0 j=1

where x 0 j = 1; j = 1, 2,..., n

• Residual (Error) Sum of Squares (SSRes)


Whatsoever good
models we considered, Since we have estimated (k + 1) parameters in the model, it will have
some points may not (n – k – 1) degrees of freedom. We obtain the residual sum of squares
always exactly lie on (SSRes) as follows:
the fitted line or plane
n
SSRes =  (yi − yˆ i )2
of the linear regression
model. There will be
i =1
some deviation (or
error) of these points
from that fitted model,
where (yi − yˆ i ) = ri = i th residual
which are termed as
residuals.
Therefore, from equations (12) and (14), we can write SSRes as:
SSRes = SST − SSReg ... (18)

From equations (12) and (14), we write SSRes as:


n n
SSRes =  yi2 −  yˆ i2 ... (19)
i =1 i =1

Similarly, using equations (13) and (16), we write SSRes in matrix


notation as:

214
SSRes = YY − ˆ XY ... (20) Statistical Inference in
Multiple Linear Regression

Next, we define the mean sum of squares as:

Sum of Squares
Mean Sum of Squares =
Respective Degrees of Freedom

Therefore, we have

• Regression Mean Sum of Squares (MSSReg)


SSReg
MSSReg = ... (21)
k
• Residual (Error) Mean Sum of Squares
SSRes
MSSRes = ... (22)
n − k −1
It is clear from equation (35) of Unit 7 and equation (22) of this unit that
MSSRes is equal to the estimated value of 2 , i.e., ̂ 2 . Therefore, we may also
write
ˆ 2 = MSSRes ... (23)

For testing the significance of the regression model, we define the variance
ratio (F) as:

Mean Sum of Squares due to Regression MSSReg


Fcal = = ... (24)
Mean Sum of Squares due to Error MSSRes

which follows F-distribution with (k, n – k – 1) degrees of freedom. Then we


determine the tabulated value F(k,n −k −1),  /2 with (k, n – k – 1) degrees of
freedom at α % level of significance. These values have been tabulated for
various degrees of freedom in Table II given at the end of this block.
All the calculations explained above can be summarised in the form of
ANOVA table as shown in Table 1.
Table 1: ANOVA Table

Source of Degrees of Sum of Mean Sum of Variance


Variation Freedom Squares Squares Ratio
Fig. 8.1
(df) (SS) (MSS)

Regression k SSRe g MSSRe g


Residual n – k–1 Fcal
SSRe s MSSRes

Total n–1 SST

If Fcal  F(k,n −k −1),  /2 , we may reject H0 at α % level of significance and conclude


that the regression sum of squares is significant at α % level of significance,
i.e., there is some dependence of response variable Y on the regressor variables
in the model. This implies that regressor variables are explaining the variability
in the response variable Y. If Fcal  F(k,n −k −1),  /2 , we do not reject H0 and it
implies that we do not have enough evidence against H0. 215
Regression Analysis We now solve a couple of examples so that you may comprehend the ANOVA
method. We will be testing the significance of both simple and multiple linear
regression models in these examples.
Example 3: For the systolic blood pressure data given in Example 1 of Unit 5,
test the significance of fitted simple linear regression model using ANOVA
method at 5% level of significance.
Solution: We define the null and alternative hypotheses as follows:
Null Hypothesis: H0 : 1 = 2 = 0 , i.e., the fitted multiple regression model is
not significant.
Alternative Hypothesis: H1 : Any of the 1 and 2 is not equal to zero, i.e., the
fitted multiple regression model is significant.
From the solution of Example 2 of Unit 5, we have
By the property of F- 15 15
distribution, we have n = 15,  yi = 1881,  x i = 474 , y = 125.4
i =1 i =1
15 15 15

 xi2 = 15372,
i =1
 yi xi = 59880 ,
i =1
y
i =1
2
i = 236403

We have also computed ˆ 0 = 90.0427 and ˆ 1 = 1.1189 .


The total sum of squares using equation (12) is obtained as:
15
SST =  yi2 − ny 2 = 236403 − 15(125.4) 2
i =1

= 236403 − 235877.4 = 525.6


We now calculate the sum of squares due to regression SSReg using equation
(17) as:
 n n

SSReg = ˆ 0  yi + ˆ 1  yi x i  − ny 2
 i =1 i =1 
= (90.0427)  (1881) + (1.1189)  (59880) − 15(125.4)2
It is to be noted that all
= 169370.2866 + 66999.8781 − 235877.4
calculations were
performed up to 15 fixed = 492.7646
decimal places for
showing accurate results Sum of squares due to error is calculated using equation (18) as:
in this block. For the
sake of simplicity, we SSRes = SST − SSReg = 525.6 − 492.7646 = 32.8354
are showing results up to
4 decimal places only.
The results may vary if Mean sum of squares due to regression is
we carry out the
SSReg 492.7646
calculations by fixing MSSReg = = = 492.7646
values at various k 1
decimal places.
Mean sum of squares due to error is
SSRes 32.8354
MSRes = = = 2.5258
n − k −1 13

We compute the value of Fcal as:

216
MSSReg 492.7646 Statistical Inference in
FCal = = = 195.0927 Multiple Linear Regression
MSSRes 2.5258

We summarise the above calculations in the following ANOVA table.


Table 2: ANOVA table
Source of Degrees of Sum of Mean Sum of Variance
Variation Freedom (df) Squares (SS) Squares (MSS) Ratio (Fcal)
Regression 1 492.7646 492.7646
195.0927
Error 15 – 1 – 1 = 13 32.8354 2.5258

Total 15 – 1 = 14 525.6

From Table II given at the end of this block, we have


F(1,13),0.025 = 6.41

Since Fcal  F(1,13),0.025 = 6.41 , we may reject our null hypothesis at 5% level of
significance. We conclude that there may be some dependence of SBP on age. You can also test the
Hence the fitted multiple linear regression model is significant. If you compare significance of the
these results with the results of Example 6 of Unit 6, you will observe that fitted regression model
t = (13.9676) = 195.0927 which is equal to Fcal. So, you can use either the t-
2 2 for Example 4 using
matrix approach. Give
test or the ANOVA method for testing the significance of simple linear it a try and match the
regression model. results obtained using
both methods.
Example 4: Using the data given in Example 1 of Unit 7, test the significance
of the multiple linear regression model at 5% level of significance.
Solution: In this case,
Null Hypothesis: H0 : 1 = 2 = 0 , and alternative Hypothesis: H1 : Any of the 1
and 2 is not equal to zero.
From Example 2 of Unit 7, we have
15 15 15
n = 15,  yi = 1881, y = 125.4,  x1i = 474, x 2i = 1102
i =1 i =1 i =1

n 15 15

 yi2 = 236403,
i =1
 x12 = 15372,
i=n
x
i =1
2
2i = 83140

15 15 15

 x1i x 2i = 35523,  yi x1i = 59880 and  yi x 2i = 139075


i =1 i =1 i =1

We also have
0 = 88.1732, 1 = 0.9266 and 2 = 0.1082
We now compute the total sum of squares using equation (12) as follows:
n
SST =  yi2 − ny 2
i =1

= 236403 −15(125.4)2 = 525.6


Next, we compute
2 n 15 15 15

 ˆ i y jx ij = ˆ 0  yi + ˆ 1  yi x i + ˆ 2  yi x 2i
i = 0 j=1 j=1 j=1 j=1 217
Regression Analysis = 88.1732 1881+ 0.9266  59880 + 0.1082 139075
= 236381.1102
Using equation (17), we calculate the sum of squares due to regression as:
2 n
SSReg =  ˆ i y j x ij − ny 2 = 236381.1102 − 15(125.4) 2
i = 0 j=1

= 236381.1102 – 235877.4
= 503.7102
The sum of squares due to error is computed using equation (18) as:
SSRes = SST − SSReg
=525.6 − 503.7102 = 21.8898
We present all these values in the following ANOVA table.
Table 3: ANOVA Table
Source of Sum of Mean Sum of Ftab
Df Fcal
Variation Squares Squares
503.7102
Regression 3−1 = 2 503.7102 = 251.8551 251.8551
2 Fcal =
1.8242 F( 2,12), 0.025 = 5.10
21.8898
Error 14−2=12 21.8898 = 1.8242 =138.0669
12

Total 15−1=14 525.6

Since Fcal > 5.10, we may reject H0 at 5% level of significance. Hence, the
fitted regression model can be considered as significant, i.e., both regressor
variables are contributing significantly in the model and we may conclude that
the relationship of Y with X1 and X2 is linearly significant.
Example 5: Using the data given in Example 4 of Unit 7, test the significance
of the multiple linear regression model at 5% level of significance using the
matrix approach.
Solution: We have Null Hypothesis: H0 : 1 = 2 = 3 = 0 and alternative
Hypothesis: H1 : Any of the 1, 2 and 3 is not equal to zero.
From the solution of Example 4 of Unit 7, we have
n = 15, y = 125.4, YY = 236403,

 1881  71.5436 
 59880   0.8462 
XY =   and  = 
ˆ 
139075   0.1048 
   
 299065  0.1223 
We now compute the total sum of squares using equation (13) as:
SST = YY − ny2

= 236403 − 15(125.4)2 =525.6


Then we obtain

218
 1881  Statistical Inference in
  Multiple Linear Regression
ˆ XY =  71.5436 0.8462 0.1048 0.1223   59880 
139075 
 
 299065
= 236387.0531
Using equation (16), we calculate the sum of squares due to regression as:
SSReg = XY − ny 2 = 236387.0531 −15(125.4) 2

= 236387.0531– 235877.4
= 509.6531
The sum of squares due to error can be computed using equation (18) as:
SSRes = SST − SSReg
= 525.6 − 509.6531
= 15.9469
We present these calculations in the following ANOVA table.
Table 8: ANOVA Table
Source of Sum of Ftab
Df Mean Sum of Squares Fcal
Variation Squares
4−1 = 3 509.6531
Regression 509.6531 = 169.8843 169.8844 F( 3,11), 0.025
3 Fcal =
1.4497
15.9469 = 4.63
Error 14−3=11 15.9469 = 1.4497 =117.1847
11
Total 15−1=14 525.6

Since Fcal > 4.63, we reject H0 at 5% level of significance. Hence the fitted
regression model can be considered as significant, i.e., all regressor variables
are contributing significantly in the model. We may conclude that the
relationship of Y with X1, X2 and X2 is significant.
You may now like to solve the following exercises to assess your
understanding.
E4) Test the significance of the fitted multiple regression model given in E5
of Unit 7 at 1% level of significance.
E5) For the exercise given in E6 of Unit 7, test the significance of the fitted
multiple regression model at 5% level of significance.

8.4 TEST OF SIGNIFICANCE OF


INDIVIDUAL REGRESSION COEFFICIENTS
In Sec. 8.3, we have tested significance of the regression model using the
ANOVA technique. If the null hypothesis to test the significance of the
regression model is rejected, we may be interested in determining which
regressor variables are contributing significantly to the regression model, and
which ones are not. In this section, we will evaluate the strength of relationship
between response variable Y and a regressor variable by testing the
significance of each individual regression coefficient i (i = 0, 1, 2, ..., k). For
219
Regression Analysis this, we will apply the Student’s t-test. In Sec. 6.4 of Unit 6, you have learnt
how to test the significance of the individual regression coefficient for simple
linear regression model. So, in this section, we describe, in brief, the procedure
for testing an individual regression coefficient for multiple linear regression
model.
We formulate the null and alternative hypotheses for the ith regression
coefficient as:
H0 : i = 0 vs H1 : i  0 ; i = 0, 1, 2, ..., k

We define the t-test statistic as:

ˆ i − i ˆ i
ti = = ... (25)
SE(ˆ i ) 2Sii
where, *i = 0 under null hypothesis H0 : i = 0 for testing the significance of
the regression coefficient.
The statistic ti follows t-distribution with (n – k – 1) degrees of freedom. We
then determine the tabulated t-value, i.e., t (n −k −1), /2 with (n – k – 1) degrees of
freedom at α% level of significance. This value has been tabulated for various
degrees of freedom in Table I given at the end of this block as explained in
Unit 4 of MST-004.
If t i is greater than or equal to the tabulated value, i.e., t i  t (n −k −1), /2 for the
Fig. 8.2
given degrees of freedom and the level of significance, we may reject the null
hypothesis and conclude that the value of the ith regression coefficient (i ) is
significant. We may also conclude that the regressor variable Xi (i = 1, 2, ..., k)
is contributing significantly to the model.
In the following examples, we explain the procedure of testing the significance
of individual regression coefficients.
Example 6: For the data of systolic blood pressure considering age and weight
as regressor variables given in Example 3 of Unit 7, test whether the regression
coefficients (i) β0, (ii) β1 and (iii) β2 are significant or not at 5% level of
significance.
Solution: In order to test the significance of regression coefficients, we use the
following values computed in Example 3 of Unit 7:
0 = 88.1732, 1 = 0.9266 and 2 = 0.1082

From the solution of Example 1, we have

SE (ˆ 0 ) = 2.3091, SE (ˆ 1 ) = 0.1039 and SE(ˆ 2 ) = 0.0442

(i) To test the null hypothesis H0 : 0 = 0 against H1 : 0  0, we calculate


t-statistic using equation (25) as follows:

ˆ 0 88.1732
t0 = = = 38.1851
SE (ˆ 0 ) 2.3091

From Table I given at the end of this block, we have

220
t tab = t (n −k −1), /2 = t12,0.025 = 2.179 Statistical Inference in
Multiple Linear Regression

Since |t0| > 2.179, we may reject H0 at 5% level of significance. Hence,


we may conclude that the intercept ( 0 ) may be playing an important role
in the fitted multiple regression model.
(ii) To test the null hypothesis H0 : 1 = 0 against H1 : 1  0, we use
t-statistic using equation (25) as:

ˆ 1 0.9266
t1 = = = 8.9168
SE (ˆ 1 ) 0.1039

Since |t1| > 2.179, we may reject H0 at 5% level of significance. Hence,


we can conclude that age (X1) is playing an important role in the fitted
multiple regression model.
(iii) Similarly, we can test null hypothesis H0 : 2 = 0 against H1 : 2  0
using the t-statistic [equation (25)] given as:
ˆ 2 0.1082
t2 = = = 2.4496
ˆ
SE (2 ) 0.0442

Since |t2| > 2.179, we may reject the null hypothesis H0 : 2 = 0 at 5%


level of significance and conclude that weight (X2) is also an important
variable in the model.
So we may infer that both regressor variables are significant and
important in constructing the fitted multiple regression model at 5% level
of significance.
Example 7: For the data of systolic blood pressure considering three regressor
variables: age, weight and height given in Example 4 of Unit 7, test whether
the regression coefficients (i) β0, (ii) β1 (iii) β3 and (iv) β2 are significant at 5%
level of significance.
Solution: In order to test the significance of regression coefficients, we use
the following values computed in Example 4 of Unit 7:

0 = 71.5436, 1 = 0.8462, 2 = 0.1048 and 3 = 0.12225275

From the solution of Example 2, we have

( ) ( ) ( )
SE ˆ 0 = 8.4674 , SE ˆ 1 = 0.1008 , SE ˆ 2 = 0.0394 and

SE ( ˆ ) = 0.0604
3

(i) To test the null hypothesis H0 : 0 = 0 against H1 : 0  0, we compute


t-statistic from equation (25) as:
ˆ 0 71.5436
t0 = = = 8.4493
ˆ
SE (0 ) 8.4674
From Table I given at the end of this block, we have
221
Regression Analysis t tab = t n −k −1, /2 = t11,0.025 = 2.201

Since |t0| > 2.179, we may reject H0 at 5% level of significance. Hence,


we can conclude that the intercept ( 0 ) may be considered as significant
in the fitted multiple regression model.
(ii) To test the null hypothesis H0 : 1 = 0 against H1 : 1  0, we determine
t-statistic using equation (25) as:

ˆ 1 0.8462
t1 = = = 8.3965
SE (ˆ 1 ) 0.1008

Since |t1| > 2.201, we may reject H0 at 5% level of significance. Hence,


we may conclude that age (X1) is playing an important role in the fitted
multiple regression model.
(iii) We can test null hypothesis H0 : 2 = 0 against H1 : 2  0 using the
t-statistic [equation (25)] as:
ˆ 2 0.1048
t2 = = = 2.6608
SE (ˆ 2 ) 0.0394
Since |t2| > 2.201, we may reject the null hypothesis H0 : 2 = 0 at 5%
level of significance and conclude that weight (X2) is also an important
variable in the fitted multiple regression model.
(iv) Similarly, we can test null hypothesis H0 : 3 = 0 against H1 : 3  0 using
the t-statistic as:
ˆ 3 0.1223
t3 = = = 2.0247
ˆ
SE (3 ) 0.0604

Since |t3| < 2.201, we may not reject the null hypothesis H0 : 3 = 0 at 5%
level of significance and conclude that height (X3) is insignificant
variable in the model.
Thus, we may infer that only two regressor variables, age and weight, are
significantly contributing in the construction of the given multiple
regression model.
You may now like to solve the following exercises to check your
understanding.

E6) Explain the rejection/acceptance criteria for testing the significance of


individual regression coefficients.
E7) For the exercise given in E5 of Unit 7, test whether the individual
regression coefficients 0 , 1 and 2 are significant at 1% level of
significance.
E8) For the exercise given in E6 of Unit 7, test the null hypothesis H0: βi = 0
against the alternative hypothesis H1: βi ≠ 0 for i = 0, 1, 2 and 3 for testing
the significance of the individual regression coefficients at 5% level of
significance.

222
Statistical Inference in
8.5 CONFIDENCE INTERVAL OF REGRESSION Multiple Linear Regression
COEFFICIENTS
In Unit 6, we have explained how to determine the confidence interval for
simple linear regression with the desired confidence level. We expect that the
(1 – α)100% times confidence interval will include the true value of the
regression coefficient. In the same way, we define the (1 – α)100% lower and
upper confidence limits for jth (j = 0, 1, 2, ..., k) regression coefficient,
respectively, in the multiple linear regression model when σ is unknown as:

(ˆ j ) L = ˆ j − t (n −k −1),  /2 SE (ˆ j ) ... (26)

(ˆ j ) U = ˆ j + t (n −k −1),  /2 SE (ˆ j ) ... (27)


where t(n-k-1),α⁄2 represents the tabulated value of t-variate with (n – k – 1)
degrees of freedom at α% level of significance and SE(ˆ j ) for (j = 0, 1, 2, ..., k)
can be obtained using the matrix approach as explained in Sec. 8.2.

( (ˆ ) , (ˆ ) ) is called the (1−  )100% confidence interval for the j
j L j U
th

regression coefficient.
You may like to solve the following examples for practice.

Example 8: Obtain the 95% confidence intervals for (i) 0 ,(ii) 1 and (iii) 2
for Example 4 given in Sec. 8.3.
Solution: From the solution of Example 4, we have
0 = 88.1732, 1 = 0.9266 and 2 = 0.1082
SE (ˆ 0 ) = 2.3091, SE (ˆ 1 ) = 0.1039 and SE(ˆ 2 ) = 0.0442
From Table I given at the end of this block, we have
t (n −k −1), /2 = t12,0.025 = 2.179
From equations (26) and (27), the lower and upper confidence limits of ̂ j ;
j = 0, 1, 2, ..., k can be determined as: Note that we can also
test the significance of
ˆ j  t (n −k −1),  /2 SE(ˆ j ) ; j = 0, 1, 2, ..., k the individual
regression coefficient
(i) For j = 0, we obtain the lower and upper confidence limits of 0 as: with the help of
confidence interval. If
ˆ 0L = ˆ 0 − t (n −k −1), /2 SE(ˆ 0 ) = 88.1732 − 2.179  2.3091 the (1−α)100%
confidence interval
= 88.1732 − 5.0315 = 83.1417 contains the value of
the respective
ˆ 0U = ˆ 0 + t n −k −1, /2SE(ˆ 0 ) = 88.1732+2.179  2.3091 regression coefficient
under null hypothesis,
= 88.1732+5.0315 = 93.2047 we do not reject the null
hypothesis. Otherwise,
(ii) For j = 1, the lower and upper confidence limits of ̂1 can be determined we may reject the null
hypothesis at α% level
as: of significance.

ˆ 1L = ˆ 1 − t (n −k −1), /2 SE(ˆ 1 ) = 0.9266 − 2.179  0.1039

= 0.9266 − 0.2264 =0.7002


223
Regression Analysis ˆ 1U = ˆ 1 + t (n −k −1), /2 SE(ˆ 1 ) = 0.9266 − 2.179  0.1039

= 0.9266 − 0.2264 =1.1530

(iii) For j = 2, the lower and upper confidence limits of ̂ 2 can be determined
as:

ˆ 2L = ˆ 2 − t (n −k −1), /2 SE(ˆ 2 ) = 0.1082 − 2.179  0.0442

= 0.1082 − 0.0962 = 0.0119

ˆ 2U = ˆ 2 + t (n −k −1), /2 SE(ˆ 2 ) = 0.1082+2.179  0.0442

= 0.1082+0.0962 = 0.2044
Thus, the 95% confidence intervals for ˆ 0 , ˆ 1 and ̂2 are (83.1417, 93.2047),
(0.7002, 1.1530) and (0.0119, 0.2044), respectively. You have learnt in MST-
004 that if we draw multiple samples from the same population and calculate
95% confidence intervals for each sample, we expect that the population
regression coefficients are within 95% of these confidence intervals. We can
also say that if we select 100 different samples from the same population and
compute 95% confidence intervals for each sample, 95 confidence intervals
would contain the true value of the regression coefficients.
Example 9: For Example 5 given in Sec. 8.3, obtain the confidence intervals
for (i) 0 ,(ii) 1 ,(iii) 2 and (iv) 3 .
Solution: From the solution of Example 5, we have
0 = 71.5436, 1 = 0.8462, 2 = 0.1048 and
3 = 0.12225275

SE (ˆ 0 ) = 8.4674 , SE (ˆ 1 ) = 0.1008 , SE(ˆ 2 ) = 0.0394 and


SE( ˆ ) = 0.0604
3

t (n −k −1),  /2 = t11,0.025 = 2.201

From equations (26) and (27), the lower and upper confidence limits of ̂ j ;
j = 0, 1, 2, ..., k can be determined as follows:
ˆ j  t ( n − k −1),  /2SE(ˆ j ) ; j = 0, 1, 2, ..., k
(i) For j = 0, we obtain the lower and upper confidence limits of 0 as:

ˆ 0L = ˆ 0 − t (n −k −1), /2 SE(ˆ 0 ) = 71.5436 − 2.201 8.4674

= 71.5436 − 18.6367 = 52.9069


ˆ 0U = ˆ 0 + t (n −k −1), /2SE(ˆ 0 ) = 71.5436+2.201 8.4674

= 71.5436+18.6367 = 90.1803
(ii) For j = 1, the lower and upper confidence limits of ̂1 can be computed
as:
224
ˆ 1L = ˆ 1 − t (n −k −1), /2 SE(ˆ 1 ) = 0.8462 − 2.201 0.1008 Statistical Inference in
Multiple Linear Regression

= 0.8462 − 0.2218 =0.6244

ˆ 1U = ˆ 1 + t (n −k −1), /2 SE(ˆ 1 ) = 0.8462+2.201 0.1008

= 0.8462+0.2218 =1.0681

(iii) For j = 2, the lower and upper confidence limits of ̂ 2 can be determined
as:
ˆ 2L = ˆ 2 − t (n −k −1), /2 SE(ˆ 2 ) = 0.1048 − 2.201 0.0394

= 0.1048 − 0.0867 = 0.0181


ˆ 2U = ˆ 2 + t (n −k −1), /2 SE(ˆ 2 ) = 0.1048+2.201 0.0394

= 0.1048+0.0867 = 0.1916

(iv) For j = 3, The lower and upper confidence limits of ̂3 can be obtained as:

ˆ 3L = ˆ 3 − t (n −k −1), /2 SE(ˆ 3 ) = 0.1223 − 2.201 0.0604

= 0.1223 − 0.1329 = – 0.0107


ˆ 3U = ˆ 3 + t (n −k −1), /2 SE(ˆ 3 ) = 0.1223+2.201 0.0604

= 0.1223+ 0.1329 = 0.2552

Hence, the 95% confidence intervals for ˆ 0 , ˆ 1 , ˆ 2 and ˆ 3 are (52.9069,


90.1803), (0.6244, 1.0681), (0.0181, 0.1916) and (− 0.0107, 0.2552),
respectively.
You may like to solve the following exercises for practice, before studying the
next section.

E9) How can we test the significance of individual regression coefficient


with the help of confidence interval?
E10) For the exercise given in E7, determine the 99% confidence intervals of
the 0 , 1 and 2 .
E11) For the exercise given in E8, compute the 95% confidence intervals of
the regression coefficients.
The value of R2
indicates the
8.6 R2AND ADJUSTED R2 proportion of total
variation in the
So far, we have discussed the testing significance of the fitted regression model observed ‘y’ values
explained by the
as well as individual regression coefficients using hypothesis testing and fitted regression
confidence interval. In this section, you will learn another method for checking model.
the adequacy of the fitted regression model. As explained in Sec. 6.7 of Unit 6,
we use coefficient of determination to check or measure the adequacy of the
fitted regression model. It describes the strength of linear relationship between
the response variable and the regressor variables. From Sec. 6.7, you know that
the coefficient of determination is denoted by R2.
225
Regression Analysis R2 measures the proportion of total variability about the mean explained by
regression model, i.e., the variation accounted by the regressor variables. It
determines the proportion of variation in Y about its mean, which is explained
by the fitted regression model. Mathematically, we define R2 as:
n

 ( ŷ − y)
2
i
Variation explained by Regression Model
R2 = = i =1
n
... (28)
(y − y)
Total variation in Y 2
i
i =1

Using equation (10), we rewrite R2 as:


n

 (y i − yˆ i )
R = 1−
2 i =1
n
... (29)
 (y
i =1
i − y) 2

We can also compute the value of R2 using the values calculated in ANOVA
table discussed in Sec 8.3:
Regression sum of squares SSReg
R2 = = ... (30)
Total sum of squares SST
SSRes
or R2 = 1− ... (31)
SST
We can rewrite R 2 in matrix notation using equations (13) and (16) as:

ˆ  XY − ny 2
R2 = ... (32)
YY − ny 2

The value of R2 lies between zero to one, i.e., 0  R 2  1. We can use R2 to


determine the best fitted model among different models with equal (or fixed)
number of regressor variables. We prefer that model as the best one which has
the largest value of R2. Note that the value of R2 increases as we increase the
number of regressor variables (k) in the model. In a multiple regression model,
addition of a regressor variable can increase the value of R2 apart from the
contribution of that regressor variable. For comparing two values of R2, it is
necessary to take into consideration the number of regressor variables on which
this value is based. This is taken care of while calculating an adjusted R2.
So on the basis of R2, it is difficult to decide the contribution of the additional
regressor variable. In adjusted R2, we consider mean sum of squares instead of
sum of squares as Mean Sum of Squares are constants apart from the number
of regressor variables in the model. It is clear that adjusted R2 will only
increase if we include a regressor which is actually reducing the value of
residual mean squares, i.e., if it contributes significantly to the model.
We determine adjusted R2 for comparing two regression models having
different number of regressor variables or also when the number of regressor
variables are same in both models but both are determined based on different
sample sizes. We usually denote the adjusted R2 by R 2Adj or R 2 .

We compute the value of adjusted R2 as:


226
SSRes (n − k − 1) Statistical Inference in
R 2Adj = 1 − ... (33) Multiple Linear Regression
SST ( n − 1)

We can also determine the value of adjusted R2 using the following formula:

R 2Adj = 1 −
( n − 1) (1 − R 2 ) ... (34)
(n − k − 1)

where k is the number of regressor variables considered in the regression


model and n is the sample size. Generally, the value of R 2Adj will be smaller than
that of R2.

We now consider an example to illustrate the computation of R 2 and R Adj


2
.

Example 10: For Example 4 given in Sec. 8.3, determine the value of
R 2 and R Adj
2
.

Solution: From the solution of Example 4, we have


n = 15, k = 2, SST = 525.6 and SSRes = 21.8898
From equation (31), we calculate the coefficient of determination as:
SSRes 21.8898
R2 = 1− = 1− = 0.9584
SST 525.6
Note that the fitted multiple regression model explains approximately 95.84%
of variations in Y due to variations in X1 and X2. So, it can be considered as a
good model. We compute the value of adjusted coefficient of determination
using equation (33) as follows:
SSRes (n − k − 1)
R 2Adj = 1 −
SST ( n − 1)

21.8898 12 1.8242
= 1− = 1−
525.6 14 37.5429
= 1 − 0.0486 = 0.9514
Hence, the model explains approximately 95.14% of variations in response
variable due to age and weight. Note that R 2Adj is more reliable than R2.

Example 11: For Example 5 given in Sec. 8.3, determine the value of
R 2 and R 2Adj .
Solution: From the solution of Example 5, we have
n =15, k = 3, SST = 525.6 and SSRes = 15.9469
From equation (31), we calculate the coefficient of determination as:
SSRes 15.9469
R2 = 1− = 1− = 0.9697
SST 525.6

227
Regression Analysis The fitted multiple regression model explains approximately 96.97% of
variations in Y due to variations in X1 and X2. So, it can be considered as a
good model. We compute the value of adjusted coefficient of determination
using equation (33) as:
SSRes (n − k − 1)
R 2Adj = 1 −
SST ( n − 1)
1.4497
= 1− =1− 0.0386
37.5429
= 0.9614

Hence, the model explains approximately 96.14% of variations in response


variable due to age and weight. Note that R 2Adj is more reliable than R2.

You can try the following exercises in order to check your understanding.
E12) Differentiate between coefficient of determination and adjusted
coefficient of determination.
E13) For the exercise given in E4, calculate the coefficient of determination
and adjusted coefficient of determination.
E14) Obtain R2 and R 2Adj for the exercise given in E5.

With this, we end the unit and summarise our discussion.

8.7 SUMMARY
1. The variance of the ith estimated regression coefficient ̂i is given by

V(ˆ i ) = 2Sii ; i = 0,1, 2, ..., k

where S = (Sij )(k +1)(k +1) = ( XX )−1

2. The standard error of the ith estimated regression coefficient (ˆ i ) is defined
as:
SE(ˆ i ) =  Sii

3. The covariance between ith and jth regression coefficients of (ˆ i and ˆ j ) is
given by
cov(ˆ i , ˆ j ) = 2Sij ;i  j = 1, 2,..., k

4. The significance of the fitted multiple regression model can be tested by the
analysis of variance (ANOVA) technique.
5. To calculate the overall variability of the response variable Y, we obtain the
total sum of squares as:
n n
SST =  ( yi − y ) =  yi2 − ny 2 (It has (n – 1) degrees of freedom.)
2

I =1 i =1

In matrix notation, we write this expression as:


SST = YY − ny2
228
6. We obtain the regression sum of squares (SSReg) using the predicted values Statistical Inference in
Multiple Linear Regression
ŷi as :
n
SSReg =  ( yˆ i − y ) =  ŷi2 − ny 2 (It has k degrees of freedom.)
2

i =1

In matrix notation, it is expressed as:


SSReg = ˆ  XY − ny 2

7. We obtain the residual sum of squares (SSRes) as follows:

SSRes = SST − SSReg (it has (n – k–1) degrees of freedom.)

8. For testing the significance of the regression model, we define the variance
ratio (F) as:

MSSReg SSReg k
Fcal = =
MSSRes SSRes (n − k − 1)

which follows the F-distribution with (k, n−k−1) degrees of freedom.


9. For testing the significance of the ith regression coefficient, we define the
test statistics as:
ˆ i − i ˆ i
ti = = ; j = 0, 1, 2, ..., k
SE ˆ i( ) 2Sii

where statistic ti follows the t-distribution with (n−k−1) degrees of


freedom.
10. We define the (1 − α)100% lower and upper confidence limits for jth (j = 0,
1, 2, ..., k) regression coefficient for multiple linear regression model as:

(ˆ )
j L
= ˆ j − t (n −k −1),  /2 SE ˆ j ( )
(ˆ )
j U
= ˆ j + t (n −k −1),  /2 SE (ˆ )j

11. The coefficient of determination (R2) measures the proportion of variation


in Y about its mean which is explained by the fitted regression model. We
define R2 as:
n

 ( ŷ − y)
2
i
SSReg 1 − SSRes
R2 = i =1
n
= =
(y − y)
2 SST SST
i
i =1

12. In a multiple regression model, addition of a regressor variable can increase


the value of R2 apart from the contribution of that regressor variable. For
comparing two values of R2, it is necessary to take into consideration the
number of regressor variables on which this value is based. That is why we
prefer an adjusted R2 while dealing with multiple regression analysis. We
compute the value of adjusted R2 as:
SSRes (n − k − 1)
R 2Adj = 1 −
SST ( n − 1) 229
Regression Analysis
8.8 SOLUTIONS AND ANSWERS
E1) Refer to Sec. 8.2.
E2) We have computed the following values in E6 of Unit 7:

 1.8333 0.0013 −3.1843


( XX ) =  0.0013 0.0021 −0.1354 
−1

 −3.1843 −0.1354 14.3753 

From the solution of E7 in Unit 7, we have

ˆ 2 = 281.3269
Now, the variance-covariance matrix is obtained as:

V(ˆ ) = ˆ 2 ( XX )−1

 1.8333 0.0013 −3.1843


V(ˆ ) = 281.3269   0.0013 0.0021 −0.1354 
 
 −3.1843 −0.1354 14.3753 

 515.7464 0.3684 −895.8257 



=  0.3684 0.5768 −38.0902 
 −895.8257 −38.0902 4044.1477 

Thus, we obtain

( ) ( )
V ˆ 0 = 515.7464  SE ˆ 0 = 22.7101

V (ˆ ) = 0.5768  SE (ˆ ) = 0.7595


1 1

V (ˆ ) = 4044.1477  SE ( ˆ ) = 63.5936


2 2

E3) The following values are computed in E7 of Unit 7:

 89.4681 −0.4103 −0.1805 −0.4528


 −0.4103 0.0146 −0.0147 0.00002 
( XX ) = 
−1 
 −0.1805 −0.0147 0.0288 0.0028 
 
 −0.4528 0.00002 0.0028 0.0027 

From the solution of E8 in Unit 7, we have


ˆ 2 = 0.0746
we may determine the variance-covariance matrix as:

V(ˆ ) = ˆ 2 ( XX )−1

230
 89.4681 −0.4103 −0.1805 −0.4528 Statistical Inference in
 −0.4103 0.0146 −0.0147 0.00002  Multiple Linear Regression
V() = 0.0746  
ˆ 
 −0.1805 −0.0147 0.0288 0.0028 
 
 −0.4528 0.00002 0.0028 0.0027 

 6.6698 −0.0306 −0.0135 −0.0338 


 −0.0306 0.0011 −0.0011 0.000001
= 
 −0.0135 −0.0011 0.0022 0.0002 
 
 −0.0338 0.000001 0.0002 0.0002 
Thus, we obtain

( ) ( )
V ˆ 0 = 6.6698  SE ˆ 0 = 2.5826

V (ˆ ) = 0.0011  SE (ˆ ) = 0.0330


1 1

V (ˆ ) = 0.0022  SE ( ˆ ) = 0.0464


2 2

V (ˆ ) = 0.0002  SE ( ˆ ) = 0.0141


3 3

E4) We define the null and alternative hypotheses as:


H0 : 1 = 2 = 0 vs H1 : Any of the 1 and 2 is not equal to zero, i.e.,
From E5 of Unit 7, we have
15
n = 15,  yi = 5895, y = 393, YY = 2326375,
i =1

5895 
XY =  215275
3332.5 

and ˆ  =  477.3269 −2.0795 −12.9545

We now compute the total sum of squares as:

SST = YY − ny2 = 2326375 −15  (393)2 =9640

5895 
ˆ  XY =  477.32692576 −2.07953754 −12.95445048  215275
3332.5 
= 2322999.0772
We calculate the sum of squares due to regression as:

SSReg = XY − ny 2 = 2322999.07716949 − 15 ( 393)


2

= 236381.1102 – 235877.4 = 6264.0772


Sum of squares due to error is computed as:
231
Regression Analysis SSRes = SST − SSReg

= 9640 − 6264.0772 = 3375.9228


ANOVA TABLE
Source of Sum of Mean Sum of Ftab
df Fcal
Variation Squares Squares
Regression 2 6264.0772 3132.0386
11.1331 F( 2,12), 0.005 = 8.51
Error 12 3375.9228 281.3269
Total 14 9640

Since Fcal > Ftab at 1% level of significance, we reject H0 . Hence, the


model can be considered as significant.

E5) Null Hypothesis: H0 : 1 = 2 = 3 = 0 and alternative Hypothesis:


H1 : Any of the 1, 2 and 3 is not equal to zero.

From E6 of Unit 7, we have

n = 12, y = 3, YY = 111.88,

 36   −2.6916 
1390.81  0.1266 
XY =   and  = 
ˆ 
 374.05   0.0317 
   
 5710.3   0.0036 
We now compute the total sum of squares as:
SST = YY − ny2 = 111.88 − 12(3)2 = 3.88

ˆ  XY = 111.2836

We calculate the sum of squares due to regression as:


SSReg = XY − ny 2 = 111.2836 −12 ( 3) = 3.2836
2

Sum of squares due to error is computed as:


SSRes = SST − SSReg = 3.88 − 3.2836 = 0.5964
ANOVA TABLE
Source of Sum of Mean Sum of Ftab
DF Fcal
Variation Squares Squares
Regression 3 3.2836 1.0945
14.6819 F(3,8), 0.025 = 5.42
Error 8 0.5964 0.0746
Total 11 3.88

Since Fcal > Ftab at 5% level of significance, we reject H0 . Hence, the model
can be considered as significant.
E6) Refer to Sec. 8.4.
E7) We have computed the following values in E2:

( ) ( ) ( )
SE ˆ 0 = 22.7101 , SE ˆ 1 = 0.7595 and SE ˆ 2 = 63.5936

232 From the solution of E5 in Unit 7, we have


0 = 477.3269, 1 = −2.0795 and 2 = −12.9545 Statistical Inference in
Multiple Linear Regression
We now test the null hypothesis H0 : 0 = 0 against H1 : 0  0
The t-statistic is given as:

ˆ 0 477.3269
t0 = = = 21.0183
( )
SE ˆ 0
22.7101

From Table I given at the end of this block, we have


t tab = t (n −k −1),  /2 = t12,0.005 = 3.055

Since |t0| > 3.055, we may reject H0 at 1% level of significance.

To test the null hypothesis H0 : 1 = 0 against H1 : 1  0,


we use t statistic:
ˆ 1 −2.0795
t1 = = = −2.7380
( )
SE ˆ 1 0.7595

Since |t1| < 3.055, we may not reject H0 at 1% level of significance.


Similarly, we can test null hypothesis H0 : 2 = 0 against H1 : 2  0
using the t-statistic given as:
ˆ 2 −12.9545
t2 = = = − 0.2037
( )
SE ˆ 2
63.5936

Since |t2| < 3.055, we may not reject the null hypothesis H0 : 2 = 0 at
1% level of significance. So we may infer that both regressor variables
are not playing important roles in constructing the model.
E8) The following values are computed in E3:

( ) ( ) ( )
SE ˆ 0 = 2.5826 , SE ˆ 1 = 0.0330 , SE ˆ 2 = 0.0464

and SE ( ˆ ) = 0.0141
3

From the solution of E6 in Unit 7, we have


0 = −2.6916, 1 = 0.1266, 2 = 0.0317 and 3 = 0.0036
To test the null hypothesis H0 : 0 = 0 against H1 : 0  0,
we use t-statistic as follows:
ˆ 0 −2.6916
t0 = = = −1.0422
( )
SE ˆ 0
2.5826

From Table I given at the end of this block, we have


t tab = t (n −k −1),  /2 = t 8,0.025 = 2.306
We use t-statistic as follows:
ˆ 1 0.1266
t1 = = = 3.8377
( )
SE ˆ 1 0.0330
233
Regression Analysis We can test null hypothesis H0 : 2 = 0 against H1 : 2  0 using the
t-statistic given as:
ˆ 2 0.0317
t2 = = = 0.6844
SE ˆ( ) 2
0.0464

Similarly, we can test null hypothesis H0 : 3 = 0 against H1 : 3  0


using the t-statistic given as:

ˆ 3 0.0036
t3 = = = 0.2522
SE ˆ( )3
0.0141

Since only |t1| > 2.306, we may reject the null hypothesis H0 : 1 = 0 at
5% level of significance and conclude that X1 is an important variable
in the model.
The values of t 0 , t 2 and t 3 are not greater than 2.306. So, we do not
reject their respective null hypotheses at 5% level of significance.
Hence, we may conclude that 0 , 2 and 2 are not playing important
roles in the model.
E9) Refer to Sec. 8.5.
E10) From the solution of E7, we have
0 = 477.3269, 1 = −2.0795 and 2 = −12.9545

( ) ( ) ( )
SE ˆ 0 = 22.7101 , SE ˆ 1 = 0.7595 and SE ˆ 2 = 63.5936

t (n −k −1),  /2 = t12,0.005 = 3.055

For j = 0, we obtain the lower and upper confidence


limits of 0 as follows:

( )
ˆ 0L = ˆ 0 − t (n −k −1),  /2 SE ˆ 0

= 477.3269 − 3.055  22.7101


= 407.9477

( )
ˆ 0U = ˆ 0 + t (n −k −1),  /2SE ˆ 0

= 477.3269 + 3.055  22.7101


= 546.7061

The lower and upper confidence limits of ̂1 can be computed as:

( )
ˆ 1L = ˆ 1 − t (n −k −1), /2 SE ˆ 1

= −2.0795 − 3.055  0.7595


= − 4.3998
234
( )
ˆ 1U = ˆ 1 + t (n −k −1), /2 SE ˆ 1
Statistical Inference in
Multiple Linear Regression

= −2.0795+3.055  0.7595
= 0.2407

The lower and upper confidence limits of ̂ 2 can be determined as:

( )
ˆ 2L = ˆ 2 − t (n −k −1),  /2 SE ˆ 2

= −12.9545 − 3.055  63.5936


= −207.2329

ˆ 2U = ˆ 2 + t (n −k −1),  /2 SE ˆ 2( )
= −12.9545 + 3.055  63.5936
= 181.3240

The 95% confidence intervals for ˆ 0 , ˆ 1 and ˆ 2 are tabulated below.

Lower limit Upper Limit Confidence Interval


̂ j SE(ˆ j )
(ˆ jL ) (ˆ jU ) (ˆ jL , ˆ jU )
j

0 477.32692576 22.71005051 407.9477214 546.7061301 (407.9477214, 546.7061301)


1 –2.07953754 0.75949962 – 4.399808878 0.240733798 (– 4.399808878, 0.240733798)
2 63.59361368 63.59361368 –207.2329403 181.3240393 (–207.2329403, 181.3240393)

E11) From the solution of E8, we have


0 = −2.6916, 1 = 0.1266, 2 = 0.0317 and 3 = 0.0036

( ) ( )
SE ˆ 0 = 2.58259858 , SE ˆ 1 = 0.0330 , SE ˆ 2 = 0.0464 ( )
( )
and SE ˆ 3 = 0.0141

t (n −k −1),  /2 = t 8,0.025 = 2.306

The lower and upper confidence limits of ̂ j ; j = 0, 1, 2, ..., k can be


determined using ˆ  t SE(ˆ ) ; j = 0, 1, 2, ..., k
j (n − k −1),  /2 j

For j = 0, we obtain the lower and upper confidence limits of 0 as


follows:

( )
ˆ 0L = ˆ 0 − t (n −k −1),  /2 SE ˆ 0

= −2.6916 − 2.306  2.5826 = – 8.6471


ˆ 0U = ˆ 0 + t (n −k −1),  /2SE ˆ 0( )
= −2.6916 + 2.306  2.5826 = 3.2639
The lower and upper confidence limits of ̂1 can be computed as:

235
Regression Analysis
( )
ˆ 1L = ˆ 1 − t (n −k −1), /2 SE ˆ 1

= 0.1266 − 2.306  0.0330 = 0.0505

( )
ˆ 1U = ˆ 1 + t (n −k −1), /2 SE ˆ 1

= 0.1266 + 2.306  0.0330 = 0.2026

The lower and upper confidence limits of ̂ 2 can be determined as:

( )
ˆ 2L = ˆ 2 − t (n −k −1),  /2 SE ˆ 2

= 0.0317 − 2.306  0.0464 = – 0.0752

( )
ˆ 2U = ˆ 2 + t (n −k −1),  /2 SE ˆ 2

= 0.0317 + 2.306  0.0464 = 0.1386


The lower and upper confidence limits of ̂3 can be determined as:

( )
ˆ 3L = ˆ 3 − t (n −k −1),  /2 SE ˆ 3

= 0.0036 − 2.306  0.0141 = – 0.0290

( )
ˆ 3U = ˆ 3 + t (n −k −1),  /2 SE ˆ 3

= 0.0036 + 2.306  0.0141 = 0.0361


The 95% confidence intervals for ˆ 0 , ˆ 1 , ˆ 2 and ˆ 3 are tabulated as
follows:
Lower limit Upper Limit Confidence Interval
̂ j SE(ˆ j )
(ˆ jL ) (ˆ jU ) (ˆ jL , ˆ jU )
j

0 –2.6916 2.5826 – 8.6471 3.2639 (– 8.6471, 3.2639)


1 0.1266 0.0330 0.0505 0.2026 (0.0505, 0.2026)
2 0.0317 0.0464 – 0.0752 0.1386 (– 0.0752, 0.1386)
3 0.0036 0.0141 – 0.0290 0.0361 (– 0.0290, 0.0361)

E12) Refer to Sec. 8.6.


E13) From the solution of E4, we have
n = 15, k = 2, SST = 9640 and SSRes = 3375.92283051
SSRes 3375.9228
R2 = 1− = 1− = 0.6498
SST 9640
SSRes (n − k − 1)
Further, R 2Adj = 1 −
SST ( n − 1)
3375.9228 12
= 1− = 0.5914
9640 14
Hence, the model explains only 64.98% and 59.14% of variations in Y
according to the coefficient of determination and the adjusted
coefficient of determination, respectively.
236
E14) From the solution of E5, we have Statistical Inference in
Multiple Linear Regression
n = 12, k = 3, SSReg = 0.5964 and SST = 3.88

SSReg 0.5964
R2 = 1− = 1− = 0.8463
SST 3.88
SSRes (n − k − 1)
Further, R 2Adj = 1 −
SST ( n − 1)

0.5964 8
= 1− = 0.7887
3.88 11
Hence, the model explains only 84.63% and 78.87% of variations in Y
according to the coefficient of determination and the adjusted
coefficient of determination, respectively.

237

You might also like