Business Statistics
Analysis of Variance (ANOVA)
For finding the solutions,
kindly apply either p-value (area) method or critical
value method
An Introduction to Experimental Design and Analysis of
Variance
Statistical studies can be classified as being either experimental or observational.
In an experimental study, one or more factors are controlled so that data can be obtained
about how the factors influence the variables of interest.
In an observational study, no attempt is made to control the factors.
Cause-and-effect relationships are easier to establish in experimental studies than in
observational studies.
Analysis of variance (ANOVA) can be used to analyze the data obtained from experimental
or observational studies.
A factor is a variable that the experimenter has selected for investigation.
A treatment is a level of a factor.
Experimental units are the objects of interest in the experiment.
A completely randomized design is an experimental design in which the treatments are
randomly assigned to the experimental units.
Analysis of Variance: A Conceptual Overview
Analysis of Variance (ANOVA) can be used to test for the equality of three or more
population means.
Data obtained from observational or experimental studies can be used for the
analysis.
We want to use the sample results to test the following hypotheses:
If �0 is rejected, we cannot conclude that all population means are different.
Rejecting �0 means that at least two population means have different values.
Assumptions for Analysis of Variance
1. For each population, the response (dependent) variable is normally distributed.
2. The variance of the response variable, denoted σ2, is the same for all of the
populations.
3. The observations must be independent.
Sampling distribution of �, given �0 is true.
Sampling distribution of �, given �0 is false.
Analysis of Variance and the Completely Randomized
Design
Between-Treatments Estimate of Population Variance
Within-Treatments Estimate of Population Variance
Comparing the Variance Estimates: The F Test
ANOVA Table
One-Way Analysis of Variance (ANOVA)
One-way ANOVA is appropriate under the following conditions:
1. We would like to study the impact of a single treatment (also
known as factor) at different levels (thus forming different
groups) on a continuous response variable (or outcome variable).
For the above example discussed , the variable ‘price discount’ is
the treatment (or factor) and 0%, 10%, and 20% price discounts
are the different levels (3 levels in this case), different levels of
discount is likely to have varying impact on the sales of the
product, where sales is the outcome variable. We would like to
understand the impact of different levels of price discount on
the response variable, sales. The term ‘treatment’ is used since
one of the initial applications of ANOVA was to find the impact
of different fertilizer treatments on agricultural yield as studied
by British statistician R A Fisher (1934).
One-Way Analysis of Variance (ANOVA)
2. In each group, the population response variable
follows a normal distribution and the sample subjects are
chosen using random sampling.
3. The population variances for different groups are
assumed to be same. That is, variability in the response
variable values within different groups is same
Although conditions 2 and 3 are necessary for one-way
ANOVA, the model is robust and minor violations of the
assumptions may not result in incorrect decision about
the null hypothesis.
Setting up an Analysis of Variance
Assume that we would like to study the impact of a factor
(such as discount) with k levels on a continuous variable
(such as sales quantity). Then the null and alternative
hypotheses for one way ANOVA are given by
• H0: 1 = 2 = 3=…= k
• HA: Not all values are equal
Note that the alternative hypothesis, ‘not all values are
equal’, implies that some of them could be equal. The null
hypothesis is equivalent to stating that the factor effects 1,
2, …, k defined in Eq Yij i are
ij zero
Between-Treatments Estimate of Population Variance
σ2
The estimate of σ2 based on the variation of the sample means is called the
mean square due to treatments and is denoted by MSTR.
Numerator is called the sum of squares due to treatments (SSTR).
Denominator is the degrees of freedom associated with SSTR.
Within-Treatments Estimate of Population Variance
σ2
The estimate of σ2 based on the variation of the sample observations within each
sample is called the mean square error and is denoted by MSE.
Numerator is called the sum of squares due to error (SSE).
Denominator is the degrees of freedom associated with SSE.
Comparing the Variance Estimates: The F Test
If the null hypothesis is true and the ANOVA assumptions are valid, the sampling
distribution of MSTR/MSE is an � distribution with MSTR degrees of freedom equal
to � – 1 and MSE degrees of freedom equal to
• If the means of the � populations are not equal, the value of MSTR/MSE will be
inflated because MSTR overestimates σ2.
• Hence, we will reject �0 if the resulting value of MSTR/MSE appears to be too large to
have been selected at random from the appropriate � distribution.
15
Comparing the Variance Estimates: The F Test
Sampling Distribution of MSTR/MSE
ANOVA Table for a Completely Randomized Design
SST is partitioned into SSTR and SSE.
SST’s degrees of freedom (df) are partitioned into SSTR’s df and SSE’s df.
Source of Variation Sum of Squares Degrees of Freedom Mean Square F p-Value
Treatments SSTR K minus 1 Begin equation. MSTR Begin fraction. MSTR empty cell
equals Start fraction, over MSE. End
SSTR over k minus 1. fraction.
End fraction. End
equation.
Error SSE N subscript T baseline Begin equation. MSE empty cell empty cell
minus k equals start fraction
SSE over n subscript T
baseline minus k end
fraction. End equation.
Total SST N subscript baseline empty cell empty cell empty cell
minus 1
17
ANOVA Table for a Completely Randomized Design
SST divided by its degrees of freedom �� − 1 is the overall sample variance that
would be obtained if we treated the entire set of observations as one data set.
With the entire data set as one sample, the formula for computing the total sum of
squares, SST, is:
18
ANOVA Table for a Completely Randomized Design
ANOVA can be viewed as the process of partitioning the total sum of squares and the
degrees of freedom into their corresponding sources: treatments and error.
Dividing the sum of squares by the appropriate degrees of freedom provides the variance
estimates, the � value and the p-value used to test the hypothesis of equal population
means.
Test for the Equality of � Population Means
Hypotheses
• Test Statistic
20
Testing for the Equality of � Population Means: A Completely
Randomized Design
AutoShine, Inc. is considering marketing a long- lasting car wax. Three different waxes (Type
1, Type 2, and Type 3) have been developed. In order to test the durability of these waxes, 5
new cars were waxed with Type 1, 5 with Type 2, and 5 with Type 3. Each car was then
repeatedly run through an automatic carwash until the wax coating showed signs of
deterioration.
The number of times each car went through the carwash before its wax deteriorated is shown
on the next slide. AutoShine, Inc. must decide which wax to market. Are the three waxes
equally effective?
Factor . . . Car wax
Treatments . . . Type I, Type 2, Type 3
Experimental units . . . Cars
Response variable . . . Number of washes
Wax Wax Wax
Observation Type 1 Type 2 Type 3
1 27 33 29
2 30 28 28
3 29 31 30
4 28 30 32
5 31 30 31
Sample Mean 29.0 30.4 30.0
Sample Variance 2.5 3.3 2.5
Mean Square Between Treatments: (Because the sample sizes are all equal)
Mean Square Error:
Rejection Rule:
Test Statistic:
p-value method: from ANOVA table or
right side tail=[Link](x-axis, dof numerator, dof denominator, 1)=1-
[Link](0.939,2,12,1)=0.4179=.42=42%
P>α. So accept null hypothesis
Conclusion:
There is insufficient evidence to conclude that the mean number of washes
ANOVA Table
Source of Sum of Degrees of Mean Squares F p-Value
Variation Squares Freedom
Treatments 5.2 2 2.60 0.939 0.42
Error 33.2 12 2.77 EMPTY CELL EMPTY CELL
Total 38.4 14 EMPTY CELL EMPTY CELL EMPTY CELL
Example
Ms Rachael Khanna the brand manager of ENZO
detergent powder at the ‘one stop’ retail was interested
in understanding whether the price discounts has any
impact on the sales quantity of ENZO. To test whether
the price discounts had any impact, price discounts of 0%
(no discount), 10% and 20% were given on randomly
selected days. The quantity (in kilograms) of ENZO sold in
a day under different discount levels is shown in
Table(next slide). Conduct a one-way ANOVA to check
whether discount had any significant impact on the sales
quantity at = 0.05.
Sales of ENZO at different price discounts
No Discount (0% discount)
39 32 25 25 37 28 26 26 40 29
37 34 28 36 38 38 34 31 39 36
34 25 33 26 33 26 26 27 32 40
10% Discount
34 41 45 39 38 33 35 41 47 34
47 44 46 38 42 33 37 45 38 44
38 35 34 34 37 39 34 34 36 41
20% Discount
42 43 44 46 41 52 43 42 50 41
41 47 55 55 47 48 41 42 45 48
40 50 52 43 47 55 49 46 55 42
Testing for the Equality of � Population Means: An Observational
Study
Example: Reed Manufacturing
Janet Reed would like to know if there is any significant difference in the mean number of
hours worked per week for the department managers at her three manufacturing plants (in
Buffalo, Pittsburgh, and Detroit). An � test will be conducted using α = 0.05.
A simple random sample of five managers from each of the three plants was taken and the
number of hours worked by each manager in the previous week is shown on the next slide.
Factor . . . Manufacturing plant
Treatments . . . Buffalo, Pittsburgh, Detroit
Experimental units . . . Managers
Response variable . . . Number of hours worked
Observation Plant 1 Buffalo Plant 2 Pittsburgh Plant 3 Detroit
1 48 73 51
2 54 63 63
3 57 66 61
4 54 64 54
5 62 74 56
Sample Mean 55 68 57
Sample Variance 26.0 26.5 24.5
1. Develop the hypotheses.
2. Specify the level of significance. α = 0.05
3. Compute the value of the test statistic.
3. Compute the value of the test statistic.
ANOVA Table
Source of Sum of Degrees of Mean Square F p-Value
Variation Squares Freedom
Treatment 490 2 245 9.55 .0033
Error 308 12 25.667 EMPTY CELL EMPTY CELL
Total 798 14 EMPTY CELL EMPTY CELL EMPTY CELL
p-value approach
4. Compute the p –value.
With 2 numerator df and 12 denominator df, the p-value is 0.01 for � = 6.93.
Therefore, the p-value is less than 0.01 for � = 9.55.
p-value: =[Link](6.93,2,12,1)=0.0033 or from ANOVA table, p<α, So reject null
hypothesis
5.
We can conclude that the mean number of hours worked per week by
department managers is not the same at all 3 plants.
Critical Value Approach
4. Determine the critical value and rejection rule.