POLYTECHNIC UNIVERSITY OF THE PHILIPPINES
COLLEGE OF ENGINEERING
DEPARTMENT OF INDUSTRIAL ENGINEERING
MODULE 6
MULTIPLE LINEAR REGRESSION
In Industrial Engineering, many applications of regression
analysis involve situations wherein there is more than one
regressor. A regression model that contains more than one
regressor is called a multiple regression model.
I. The Multiple Linear Regression Model
Suppose that the effective life of a cutting tool (Y ) depends on
the cutting speed ( x1 ) and the tool angle ( x 2 ) . A multiple
regression model that might describe this relationship is:
Y = 0 + 1 x1 + 2 x 2 +
Specifically, the model stated above is a multiple linear
regression model with two regressors because it is a linear
function of the unknown parameters 0 , 1 , 2 .
Models that are more complex in structure may often still be
analyzed by multiple linear regression models techniques.
Consider the polynomial model in one regressor variable:
Y = 0 + 1 x + 2 x 2 + 3 x 3 +
If we let x1 = x, x 2 = x , x3 = x :
2 3
Y = 0 + 1 x + 2 x2 + 3 x3 +
Models that include interaction effects may also be analyzed by
multiple linear regression methods.
Y = 0 + 1 x + 2 x2 + 12 x1 x2 +
If we let x3 = x1 x 2 and 3 = 12 , we will have:
Y = 0 + 1 x + 2 x2 + 3 x3 +
II. Least Squares Estimation of the Parameters
Similar to the simple linear regression model, the least squares
estimation methodology may be used to estimate the regression
parameters.
Using the same derivation method in simple linear regression
and letting x ij denote the ith observation of variable x j , we
have the following least square normal equations below:
n n n n
n0 + 1 xi1 + 2 xi 2 + + k xik = yi
i =1 i =1 i =1 i =1
n n n n n
0 xi1 + 1 xi1 xi1 + 2 xi1 xi 2 + + k xi1 xik = xi1 y i
i =1 i =1 i =1 i =1 i =1
n n n n n
0 xik + 1 xik xi1 + 2 xik xi 2 + + k xik xik = xik y i
i =1 i =1 i =1 i =1 i =1
EXAMPLE:
You are given the wire bond data in the succeeding table. Fit a
multiple linear regression model for the given data.
Observation Pull Strength Wire Length Die Height
Number (y) (x1) (x2)
1 9.95 2 50
2 24.45 8 110
3 31.75 11 120
4 35.00 10 550
5 25.02 8 295
6 16.86 4 200
7 14.38 2 375
8 9.60 2 52
9 24.35 9 100
10 27.50 8 300
11 17.08 4 412
12 37.00 11 400
13 41.95 12 500
14 11.66 2 360
15 21.65 4 205
16 17.89 4 400
17 69.00 20 600
18 10.30 1 585
19 34.93 10 540
20 46.59 15 250
21 44.88 15 290
22 54.12 16 510
23 56.63 17 590
24 22.13 6 100
25 21.15 5 400
Given the data above, we arrive at the following values:
25 25 25
n = 25 yi = 725.82 xi1 = 206 xi 2 = 8,294
i =1 i =1 i =1
25 25 25
xi1 xi1 = 2,396 xi 2 xi 2 = 3,531,848 xi1 xi 2 = 77,177
i =1 i =1 i =1
25 25
xi1 yi = 8,008.37 xi 2 yi = 274,811.31
i =1 i =1
For this particular case, the normal equations are:
25 25 25
n0 + 1 xi1 + 2 xi 2 = yi
i =1 i =1 i =1
25 25 25 25
0 xi1 + 1 xi1 xi1 + 2 xi1 xi 2 = xi1 yi
i =1 i =1 i =1 i =1
25 25 25 25
0 xik + 1 xik xi1 + 2 xik xi 2 = xi 2 yi
i =1 i =1 i =1 i =1
Inserting the computed values into the normal equation, we
have:
250 + 2061 + 8,2942 = 725.82
206 + 2,396 + 77,177 = 8,008.37
0 1 2
8,2940 + 77,177 1 + 3,531,8482 = 274,811.31
We have three unknowns 0 , 1 , 2 but we have three equations.
Using the system of equations methodology, the values of the
three unknowns are as follows:
0 = 2.264, 1 = 2.744, 2 = 0.013
Therefore, the fitted (multiple linear) regression equation is:
y = 2.264 + 2.744 x1 + 0.013x2
The equation above can now be used to predict pull strength for
pairs of values of the regressor variables wire length and die
height.
III. Matrix Approach to Multiple Linear Regression
In fitting a multiple regression model, it is much more
convenient to express the mathematical operations using a
matrix notation. Suppose that there are k regressor variables and
n observations, and that the model relating the regressors to the
response variable y is:
y i = 0 + 1 xi1 + 2 xi 2 + + k xik + i i = 1, 2, ,n
This model is a system of n equations that can be expressed in
matrix notation as
y = X +
where
y1 1 x11 x12 x1k 0 1
y2 1 x 21 x 22 x2k 1 2
y= X= = =
yn 1 x n1 xn2 x nk k n
To solve for the values of , we use the matrix equation:
= ( X ' X )1 X ' y
EXAMPLE # 1: Let us use the matrix approach in coming up
with the fitted regression model using the data in the previous
example.
1 2 50 9.95
1 8 110 24.45
1 11 120 31.75
1 10 550 35.00
1 8 295 25.02
1 4 200 16.86
1 2 375 14.38
1 2 52 9.60
1 9 100 24.35
1 8 300 27.50
1 4 412 17.08
1 11 400 37.00
X = 1 12 500 y = 41.95
1 2 360 11.66
1 4 205 21.65
1 4 400 17.89
1 20 600 69.00
1 1 585 10.30
1 10 540 34.93
1 15 250 46.59
1 15 290 44.88
1 16 510 54.12
1 17 590 56.63
1 6 100 22.13
1 5 400 21.15
The XX matrix is (please check!):
25 206 8,294
X ' X = 206 2,396 77,177
8,294 77,177 3,531,848
and the Xy vector (please check!):
725.82
X'y = 8,008.37
274,811.31
The least squares estimates are found by:
= ( X ' X )1 X ' y
0 2.26379143
1 = 2.74426964
2 0.01252781
Therefore, the fitted regression model is:
y = 2.264 + 2.744 x1 + 0.013x2
which is the same as what we got previously.
Estimating
2
Just as in simple linear regression, it is important to estimate
2
in a multiple regression model. In simple linear regression, the
equation was
SS E
2 =
n2
Remember that the reason why we subtracted 2 from n is
because we estimated two parameters in our regression model,
0 and 1 . However, in multiple linear regression, we estimate
more than two parameters in our regression model. We adjust
the previous equation to take this fact into consideration. Thus, a
more general equation for estimating the variance of the error
term is
n
ei2
SS E
2 = i =1
=
n p n p
where p is the number of parameters, i.e. regressors, including
the intercept, estimated.
EXAMPLE # 2: The electric power consumed each month by a
chemical plant is thought to be related to the average ambient
temperature (x1 ) , the number of days in the month (x2 ) , the
average product purity (x3 ) , and the tons of product produced
(x4 ) . The past years historical data are available and are
presented in the Excel file. Use the Data Analysis function in
Excel to come up with a regression analysis of the problem.
IV. Hypothesis Tests in Multiple Linear Regression
In multiple linear regression problems, there are certain tests of
hypotheses about the model parameters that are useful in
measuring model adequacy. Similar to the simple linear
regression model, a major assumption for the hypothesis tests is
that the error terms are NID(0, 2).