Basic Business
Statistics
Multiple Regression Model Building
Chap 15-1
Linear vs. Nonlinear Fit
Y Y
X X
residuals
residuals
X X
Linear fit does not give Nonlinear fit gives
random residuals random residuals
Nonlinear Relationships
The relationship between the dependent
variable and an independent variable may not
be linear
Can review the scatter plot to check for non-
linear relationships
Example: Quadratic model
Yi = β0 + β1X1i + β 2 X + ε i
2
1i
The second independent variable is the square
of the first variable
Quadratic Regression Model
Model form:
Yi = β0 + β1X1i + β 2 X + ε i2
1i
where:
β0 = Y intercept
β1 = regression coefficient for linear effect of X on Y
β2 = regression coefficient for quadratic effect on Y
εi = random error in Y for observation i
Quadratic Regression Model
Yi = β0 + β1X1i + β 2 X1i2 + ε i
Quadratic models may be considered when the scatter
plot takes on one of the following shapes:
Y Y Y Y
X1 X1 X1 X1
β1 > 0 β1 > 0 β1 < 0 β1 < 0
β2 > 0 β2 < 0 β2 > 0 β2 < 0
β1 = the coefficient of the linear term
β2 = the coefficient of the squared term
Testing for Significance:
Quadratic Effect
Testing the Quadratic Effect
Compare quadratic regression equation
Yi = b0 + b1X1i + b 2 X1i2
with the linear regression equation
Yi = b0 + b1X1i
Testing for Significance:
Quadratic Effect
(continued)
Testing the Quadratic Effect
Consider the quadratic regression equation
Yi = b0 + b1X1i + b 2 X1i2
Hypotheses
H0: β2 = 0 (The quadratic term does not improve the model)
H1: β2 ≠ 0 (The quadratic term improves the model)
Testing for Significance:
Quadratic Effect
(continued)
Testing the Quadratic Effect
Hypotheses
H0: β2 = 0 (The quadratic term does not improve the model)
H1: β2 ≠ 0 (The quadratic term improves the model)
where:
The test statistic is
b2 = squared term slope
b2 − β2 coefficient
t STAT =
Sb 2 β2 = hypothesized slope (zero)
Sb = standard error of the slope
d.f. = n − 3 2
Testing for Significance:
Quadratic Effect
(continued)
Testing the Quadratic Effect
Compare adjusted r2 from simple regression model to
adjusted r2 from the quadratic model
If adjusted r2 from the quadratic model is larger than
the adjusted r2 from the simple model, then the
quadratic model is likely a better model
Example: Quadratic Model
Filter Purity increases as filter time increases:
Purity Time
3 1
7 2 Purity vs. Time
8 3
100
15 5
22 7 80
33 8
60
40 10
Purity
54 12
40
67 13
70 14 20
78 15
0
85 15
0 5 10 15 20
87 16 Time
99 17
Example: Quadratic Model
(continued)
Simple regression results:
^Y = -11.283 + 5.985 Time
Standard
Coefficients Error t Stat P-value
t statistic and r2 are all high,
Intercept -11.28267 3.46805 -3.25332 0.00691 but the residuals are not
Time 5.98520 0.30966 19.32819 2.078E-10 random:
Time Residual Plot
Regression Statistics
10
R Square 0.96888
Residuals
Adjusted R Square 0.96628 5
Standard Error 6.15997 0
-5 0 5 10 15 20
-10
Time
Example: Quadratic Model in Excel
& Minitab
(continued)
Quadratic regression results:
^ = 1.539 + 1.565 Time + 0.245 (Time)2
Y
Excel Minitab
Standard The regression equation is
Coefficients Error t Stat P-value Purity = 1.54 + 1.56 Time + 0.245 Time Squared
Intercept 1.53870 2.24465 0.68550 0.50722
Predictor Coef SE Coef T P
Time 1.56496 0.60179 2.60052 0.02467 Constant 1.5390 2.24500 0.69 0.507
Time 1.5650 0.60180 2.60 0.025
Time-squared 0.24516 0.03258 7.52406 1.165E-05
Time Squared 0.24516 0.03258 7.52 0.000
S = 2.59513 R-Sq = 99.5% R-Sq(adj) = 99.4%
The quadratic term is statistically significant (p-value very small)
Example: Quadratic Model in Excel
& Minitab
(continued)
Quadratic regression results:
^ = 1.539 + 1.565 Time + 0.245 (Time)2
Y
Regression Statistics The regression equation is
Purity = 1.54 + 1.56 Time + 0.245 Time Squared
R Square 0.99494
Adjusted R Predictor Coef SE Coef T P
Square 0.99402 Constant 1.5390 2.24500 0.69 0.507
Time 1.5650 0.60180 2.60 0.025
Standard Error 2.59513 Time Squared 0.24516 0.03258 7.52 0.000
S = 2.59513 R-Sq = 99.5% R-Sq(adj) = 99.4%
The adjusted r2 of the quadratic model is higher than the adjusted r2 of the
simple regression model. The quadratic model explains 99.4% of the
variation in Y.
Example: Quadratic Model Residual
Plots
(continued)
Quadratic regression results:
Y = 1.539 + 1.565 Time + 0.245 (Time)2
Time Residual Plot Time-squared Residual Plot
10 10
Residuals
Residuals
5 5
0 0
0 5 10 15 20 0 100 200 300 400
-5 -5
Time Time-squared
The residuals plotted versus both Time and Time-squared show a random
pattern.
Collinearity
(continued)
Including two highly correlated independent
variables can adversely affect the regression
results
No new information provided
Some Indications of Strong
Collinearity
Incorrect signs on the coefficients
Large change in the value of a previous
coefficient when a new variable is added to the
model
A previously significant variable becomes
non-significant when a new independent
variable is added
Detecting Collinearity
(Variance Inflationary Factor)
VIFj is used to measure collinearity:
1
VIFj =
1− R j
2
where R2j is the coefficient of determination of
variable Xj with all other X variables
If VIFj > 5, Xj is highly correlated with
the other independent variables
Example: Pie Sales
Pie Price Advertising
Week Sales ($) ($100s)
1 350 5.50 3.3
2 460 7.50 3.3 Recall the multiple regression
3 350 8.00 3.0
equation of chapter 2:
4 430 8.00 4.5
5 350 6.80 3.0
6 380 7.50 4.0
7 430 4.50 3.0 Sales = b0 + b1 (Price)
8 470 6.40 3.7
9 450 7.00 3.5 + b2 (Advertising)
10 490 5.00 4.0
11 340 7.20 3.5
12 300 7.90 3.2
13 440 5.90 4.0
14 450 5.00 3.5
15 300 7.00 2.7
Detecting Collinearity in Excel
using PHStat
PHStat / regression / multiple regression …
Check the “variance inflationary factor (VIF)” box
Regression Analysis
Output for the pie sales example:
Price and all other X
Regression Statistics VIF is < 5
Multiple R 0.030438 There is no evidence of
R Square 0.000926
Adjusted R
collinearity between Price
Square -0.075925 and Advertising
Standard Error 1.21527
Observations 15
VIF 1.000927