Estimation, Prediction of Regression Model Residual
Analysis: Validating Model Assumptions - II
Dr. A. Ramesh
DEPARTMENT OF MANAGEMENT STUDIES
1
Agenda
• Understanding different types of residual analysis
• Plotting residual plots using python
2
Residual Analysis: Validating Model Assumptions
• Residual analysis is the primary tool for determining whether the assumed
regression model is appropriate
3
Assumptions about the error term .
4
Importance of the Assumptions
• These assumptions provide the theoretical basis for the t test and the F
test used to determine whether the relationship between x and y is
significant, and for the confidence and prediction interval estimates
• If the assumptions about the error term appear questionable, the
hypothesis tests about the significance of the regression relationship and
the interval estimation results may not be valid.
5
Residuals for Ice cream parlours
Source: Statistics for Business & Economics, David R. Anderson, Dennis J. Sweeney, Thomas A. Williams, Jeffrey D. Camm, James J. Cochran, Cengage Learning,2013
6
Residual analysis is based on an examination of graphical plots
• A plot of the residuals against values of the independent variable x
^
• A plot of residuals against the predicted values of the dependent variable y
• A standardized residual plot
• A normal probability plot
7
Residual Plot Against x
8
Residual Plot Against x
9
Assumption: the variance is the same for all values of x
• The residual plot should give an
overall impression of a horizontal
band of points
10
Violation of Assumption:
The variance of ‘e’ is not the same for all values of x
• Assumption of a constant
variance of ‘e’ is violated
• If variability about the regression
line is greater for larger values of
x
11
Assumed regression model is not an adequate
representation
A curvilinear regression model or
multiple regression model
should be considered.
12
^
Residual Plot Against y
• The pattern of this residual plot is the
same as the pattern of the residual plot
against the independent variable x.
• It is not a pattern that would lead us to
question the model assumptions.
13
^
Residual Plot Against y
• For simple linear regression, both the
residual plot against x and the residual
plot against provide the same pattern
• For multiple regression analysis, the
^
residual plot against y is more widely
used because of the presence of more
than one independent variable.
14
Standardized Residuals
• Many of the residual plots provided by computer software packages use a
standardized version of the residuals.
• A random variable is standardized by subtracting its mean and dividing the
result by its standard deviation.
• With the least squares method, the mean of the residuals is zero.
• Thus, simply dividing each residual by its standard deviation provides the
standardized residual
15
Python Code
16
17
Python Code
18
Standardized Residuals
19
Computation of standardized residuals for Icecream parlors
20
Computation of standardized residuals for Icecream parlors
21
Plot of The Standardized Residuals Against The Independent
Variable x
22
Plot of The Standardized Residuals Against The Independent
Variable x
23
Studentized residual
• The standardized residual plot can provide insight about the assumption
that the error ‘e’ term has a normal distribution.
• If this assumption is satisfied, the distribution of the standardized
residuals should appear to come from a standard normal probability
distribution.
24
Studentized residual
• Thus, when looking at a standardized residual plot, we should expect to
see approximately 95% of the standardized residuals between -2 and 2.
• We see in Figure that for the Armand’s example all standardized residuals
are between -2 and 2.
• Therefore, on the basis of the standardized residuals, this plot gives us no
reason to question the assumption that ‘e’ has a normal distribution.
25
Normal Probability Plot
• Another approach for determining the validity of the assumption that the
error term has a normal distribution is the normal probability plot.
• To show how a normal probability plot is developed, we introduce the
concept of normal scores.
26
Normal Probability Plot
• Suppose 10 values are selected randomly from a normal probability
distribution with a mean of zero and a standard deviation of one, and that
the sampling process is repeated over and over with the values in each
sample of 10 ordered from smallest to largest.
• For now, let us consider only the smallest value in each sample.
• The random variable representing the smallest value obtained in repeated
sampling is called the first-order statistic.
27
Normal Probability Plot
28
Normal Probability Plot
29
Normal scores and ordered standardized residuals for
Armand’s pizza parlors
30
Normal Probability Plot
• If the normality assumption is satisfied, the smallest standardized residual
should be close to the smallest normal score, the next smallest
standardized residual should be close to the next smallest normal score,
and so on.
• If we were to develop a plot with the normal scores on the horizontal axis
and the corresponding standardized residuals on the vertical axis, the
plotted points should cluster closely around a 45-degree line passing
through the origin if the standardized residuals are approximately
normally distributed.
• Such a plot is referred to as a normal probability plot.
31
Normal probability plot for Ice Cream parlors
32
33
Thank You
34