k 8
e e
W
Copyright© Dorling Kindersley India Pvt. Ltd
Correlation and Simple
Linear Regression
Analysis
Correlation and Simple Linear Regression Analysis 1
Learning Objectives
Upon completion of this chapter, you will be able to:
Copyright© Dorling Kindersley India Pvt. Ltd
Use the simple linear regression equation
Compute the coefficient of correlation and understand its
interpretation.
Understand the concept of measures of variation, coefficient of
determination, and standard error of the estimate
Understand and use residual analysis for testing the
assumptions of regression
Understand statistical inference about slope, correlation
coefficient of the regression model, and testing the overall
model
Correlation and Simple Linear Regression Analysis 2
Measures of Association
Measures of association are statistics for measuring the strength of
relationship between two variables.
Copyright© Dorling Kindersley India Pvt. Ltd
Correlation measures the degree of association between two
variables.
Karl Pearson’s coefficient of correlation is a quantitative measure
of the degree of relationship between two variables. Suppose these
variables are x and y, then Karl Pearson’s coefficient of correlation
is defined as
The coefficient of correlation lies in between +1 and –1.
Correlation and Simple Linear Regression Analysis 3
Figure 15.1: Interpretation of correlation coefficient
Copyright© Dorling Kindersley India Pvt. Ltd
Correlation and Simple Linear Regression Analysis 4
Example 15.1
Table 15.2 shows the sales revenue and advertisement expenses of a
Copyright© Dorling Kindersley India Pvt. Ltd
company for the past 10 months. Find the coefficient of correlation
between sales and advertisement.
Correlation and Simple Linear Regression Analysis 5
Table 15.3 : Calculation of correlation coefficient between sales
and advertisement
Copyright© Dorling Kindersley India Pvt. Ltd
Correlation and Simple Linear Regression Analysis 6
Figure 15.9: Five examples of correlation coefficient
Copyright© Dorling Kindersley India Pvt. Ltd
Correlation and Simple Linear Regression Analysis 7
Using MS Excel, Minitab and SPSS for
Computing Correlation Coefficient
Ch 15 Solved Examples\Excel\Ex 15.1.xls
Copyright© Dorling Kindersley India Pvt. Ltd
Ch 15 Solved Examples\Minitab\Ex 15.1.MPJ
Ch 15 Solved Examples\SPSS\Ex 15.1.sav
Ch 15 Solved Examples\SPSS\Output Ex 15.1.spv
Correlation and Simple Linear Regression Analysis 8
Introduction to Simple Linear Regression
Regression analysis is the process of developing a statistical
model, which is used to predict the value of a dependent variable
Copyright© Dorling Kindersley India Pvt. Ltd
by at least one independent variable.
In simple linear regression analysis, there are two types of
variables. The variable whose value is influenced or to be
predicted is called dependent variable and the variable which
influences the value or is used for prediction is called
independent variable.
In regression analysis, independent variable is also known as
regressor or predictor, or explanatory while the dependent
variable is also known as regressed or explained variable. In a
simple linear regression analysis, only a straight line relationship
between two variables is examined.
Correlation and Simple Linear Regression Analysis 9
A Deterministic and Probabilistic Model
Copyright© Dorling Kindersley India Pvt. Ltd
ε is the error of the regression line in fitting the points of the
regression equation. If a point is on the regression line, the
corresponding value of ε is equal to zero. If the point is not on the
regression line, the value of ε measures the error.
It can be noticed that in the deterministic model, all the points are
assumed to be on the regression line and hence, in all the cases
random error ε is equal to zero. Probabilistic model includes an error
term which allows the value of y to vary for any given value of x.
Correlation and Simple Linear Regression Analysis 10
Assumptions
Copyright© Dorling Kindersley India Pvt. Ltd
• Assumption #1: Your dependent variable should be
measured at the continuous level (i.e., it is either
an interval or ratio variable). Examples of continuous
variables include revision time (measured in hours),
intelligence (measured using IQ score), exam
performance (measured from 0 to 100), weight
(measured in kg), and so forth.
• Assumption #2: Your independent variable should
also be measured at the continuous level (i.e., it is
either an interval or ratio variable). See the bullet
above for examples of continuous variables.
Correlation and Simple Linear Regression Analysis 11
Assumptions
Copyright© Dorling Kindersley India Pvt. Ltd
• There needs to be a linear relationship between
the two variables. Whilst there are a number of
ways to check whether a linear relationship exists
between your two variables, we suggest creating a
scatterplot using SPSS Statistics where you can
plot the dependent variable against your
independent variable and then visually inspect the
scatterplot to check for linearity. Your scatterplot
may look something like one of the following:
Correlation and Simple Linear Regression Analysis 12
Copyright© Dorling Kindersley India Pvt. Ltd
Correlation and Simple Linear Regression Analysis 13
Copyright© Dorling Kindersley India Pvt. Ltd
• There should be no significant outliers. An outlier
is an observed data point that has a dependent
variable value that is very different to the value
predicted by the regression equation. As such, an
outlier will be a point on a scatterplot that is
(vertically) far away from the regression line
indicating that it has a large residual, as highlighted
below:
Correlation and Simple Linear Regression Analysis 14
Copyright© Dorling Kindersley India Pvt. Ltd
Correlation and Simple Linear Regression Analysis 15
Copyright© Dorling Kindersley India Pvt. Ltd
This table provides the R and R2 values. The R value represents the simple
correlation and is 0.873 (the "R" Column), which indicates a high degree of
correlation. The R2 value (the "R Square" column) indicates how much of the total
variation in the dependent variable, Price, can be explained by the independent
variable, Income. In this case, 76.2% can be explained, which is very large.
Correlation and Simple Linear Regression Analysis 16
Copyright© Dorling Kindersley India Pvt. Ltd
This table indicates that the regression model predicts the dependent variable significantly
well. How do we know this? Look at the "Regression" row and go to the "Sig." column. This
indicates the statistical significance of the regression model that was run. Here, p < 0.0005,
which is less than 0.05, and indicates that, overall, the regression model statistically
significantly predicts the outcome variable (i.e., it is a good fit for the data).
Correlation and Simple Linear Regression Analysis 17
Copyright© Dorling Kindersley India Pvt. Ltd
Correlation and Simple Linear Regression Analysis 18
Figure 15.10: Error in simple regression
Copyright© Dorling Kindersley India Pvt. Ltd
Correlation and Simple Linear Regression Analysis 19
Figure 15.11: Summary of the estimation process for simple linear regression.
Copyright© Dorling Kindersley India Pvt. Ltd
Correlation and Simple Linear Regression Analysis 20
Example 15.2
A cable wire company has spent heavily on advertisements. The sales
and advertisement expenses (in thousand rupees) for the 12 randomly
selected months are given in Table 14.2. Develop a regression model
Copyright© Dorling Kindersley India Pvt. Ltd
to predict the impact of advertisement on sales.
Correlation and Simple Linear Regression Analysis 21