ENGINEERING
DATA ANALYSIS
Prepared by: Engr. Marlo Dexie Dale G. Malaluan
Module 9
The Analysis of Variance
9 The Analysis of Variance
MODULE OUTLINE
9.1 Hypothesis Test in Simple Linear Regression
Regression 9.2.1 Test for significance of regression
9.1.1 Use of t-tests 9.2.2 Tests on individual regression
9.1.2 Analysis of variance approach to test coefficients & subsets of coefficients
significance of regression
9.2 Hypothesis Tests in Multiple Linear
2
Learning Objectives for Module 8
After careful study of this module, you should be able to do the following:
1. Understand how the analysis of variance is used to analyze the data from these
experiments
2. Test hypotheses and construct confidence intervals on the regression coefficients
3
Simple Linear Regression
Estimating 𝝈𝝈𝟐𝟐
The error sum of squares is
n n
SS E = ∑ ei2 = ∑ ( yi − yˆ i )2
i =1 i =1
It can be shown that the expected value of the error sum of squares is
𝐸𝐸(𝑆𝑆𝑆𝑆𝑆𝑆) = (𝑛𝑛 – 2)σ𝟐𝟐.
4
Estimator of Variance
5
Properties of the Least Squares Estimators
6
Analysis of Variance Approach to Test
Significance of Regression
A method called analysis of variance can be used to test for significance of regression, The
procedure partitions the total variability in the response variable into meaningful components
as the basis for the test. The analysis of variance identity is as follows:
7
Analysis of Variance Approach to Test
Significance of Regression
8
Example 11.3 | Oxygen Purity ANOVA
• We will use the analysis of variance approach to test for significance of regression using the oxygen
purity data model from Example 11.1. Recall that 𝑆𝑆𝑆𝑆𝑇𝑇 = 173.38, 𝛽𝛽̂1 = 14.947, 𝑆𝑆𝑥𝑥𝑥𝑥 = 10.17744, and n
= 20. The regression sum of squares is
SS R = βˆ 1S xy = (14.947)10.17744 = 152.13
and the error sum of squares is 𝑆𝑆𝑆𝑆𝐸𝐸 = 𝑆𝑆𝑆𝑆𝑆𝑆 − 𝑆𝑆𝑆𝑆𝑆𝑆 = 173.38 − 152.13 = 21.25
• The analysis of variance for testing H0: β1 = 0 is summarized in the Minitab output in Table 11.2.
• The test statistic is 𝑓𝑓0 = 𝑀𝑀𝑀𝑀𝑀𝑀/𝑀𝑀𝑀𝑀𝑀𝑀 = 152.13/1.18 = 128.86, for which we find that the P-value is
𝑃𝑃 ≅ 1.23 × 10−9 , so we conclude that 𝛽𝛽1 is not zero.
• There are frequently minor differences in terminology among computer packages. For example,
sometimes the regression sum of squares is called the “model” sum of squares, and the error sum
of squares is called the “residual” sum of squares.
9
Multiple Linear Regression Model
• For example, suppose that the gasoline mileage performance of a vehicle depends on the
vehicle weight and the engine displacement. A multiple regression model that might
describe this relationship is
𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥1 + 𝛽𝛽2 𝑥𝑥2 + 𝜖𝜖
where 𝑌𝑌 = 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚, 𝑥𝑥1 = 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤, 𝑥𝑥2 = 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
10
Estimator of Variance
This is an unbiased estimator of 𝜎𝜎 2
11
Test for Significance of Regression
12
Test for Significance of Regression
13
Example 12.3 | Wire Bond Strength ANOVA
We will test for significance of regression (with α = 0.05) using the wire
bond pull strength data from2
Example 12.1. The total sum of squares is
n
∑ yi
(725.82) 2
i =1
SST = y ′y − = 27,178.5316 −
n 25
= 6105.9447
The regression or model sum of squares is computed as follows:
2
n
∑ yi
(725.82) 2
ˆ i =1
SS R = β′ X′ y − = 27,063.3581 −
n 25
and by subtraction = 5990.7712
14
Example 12.3b | Wire Bond Strength ANOVA
• The analysis of variance is shown in Table 12.6. To test H0: β1 = β2 = 0, we
calculate the statistic
MS R 2995.3856
f0 = = = 572.17
MS E 5.2352
• Since f0 > f0.05,2,22= 3.44 (or since the P-value is considerably smaller than
α = 0.05), we reject the null hypothesis and conclude that pull strength is
linearly related to either wire length or die height, or both.
• Practical Interpretation: Rejection of H0 does not necessarily imply that the relationship found is an
appropriate model for predicting pull strength as a function of wire length and die height. Further
tests of model adequacy are required before we can be comfortable using this model in practice.
15
Example 12.3c | Wire Bond Strength ANOVA
16
Important Terms and Conditions
• Analysis of variance (ANOVA) • Nuisance factor
• Blocking • Operating characteristic (OC) curves
• Completely randomized design • Random factor
(CRD)
• Random-effects model
• Components of variance model
• Randomization
• Error mean square
• Randomized complete block design
• Fisher’s least significant difference (RCBD)
(LSD) method
• Residual analysis and model
• Fixed-effects model checking
• Graphical comparison of means • Treatment
• Least significant difference • Treatment mean square
• Levels of a factor • Variance components
• Multiple comparisons methods
17
END OF
PRESENTATION