_______________________________________________Chapter 3: Correlation & Regression Analysis
CHAPTER 3
CORRELATION & REGRESSION ANALYSIS
3.1 WHAT IS CORRELATION ANALYSIS?
Ù Correlation analysis is a statistical technique used to measure the relationship and
strength of the relationship between two variables, commonly known as independent
variable (x) and dependent variable (y).
Ù Determining which variable is dependent and which variable is independent is a very
important step in correlation analysis.
Ù For example:
In many business situations, managers often want to seek relationship between
(sales and profits) or between (advertising expenditure and sales). Basically, we would
say that the sales would determine the profits. In this case, the profit is dependent
variable (y) and sales as independent variable (x).
Similarly, if we have production cost and production units, than we would say that
production costs will depend on the production units. Thus, the production cost is a
dependent variable (y) while the production unit is an independent variable (x).
3.2 METHODS TO DETERMINE CORRELATION BETWEEN 2
VARIABLES
1) Scatter diagram
à Graphical method
2) Pearson’s product moment coefficient of correlation, (𝑟)
à For quantitative variable
3) Spearman’s rank coefficient of correlation, (𝑟! )
à For qualitative & quantitative variable
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 81
_______________________________________________Chapter 3: Correlation & Regression Analysis
3.3 STRENGTH OF CORRELATION
Value of 𝑟 Strength
𝑟=1 Perfect
0.80 ≤ 𝑟 ≤ 0.99 Strong
0.50 ≤ 𝑟 ≤ 0.79 Moderate
0 < 𝑟 ≤ 0.49 Weak
𝑟=0 No linear correlation
Ù Value of correlation (𝑟) ranges from -1 to 1. (– 𝟏 ≤ 𝒓 ≤ 𝟏)
Ù When interpreting the value of correlation, the strength & direction of the correlation
must be stated.
Ù For examples:
1) If 𝒓 = 𝟎, there is no linear relationship between x and y.
2) If 𝒓 = 𝟏, there is a perfect positive linear relationship between x and y.
3) If 𝒓 = 𝟎. 𝟖𝟑, there is a strong positive linear relationship between x and y.
4) If 𝒓 = −𝟎. 𝟔𝟏, there is a moderate negative linear relationship between x and y.
5) If 𝒓 = −𝟎. 𝟐𝟑, there is a weak negative linear relationship between x and y.
3.4 SCATTER DIAGRAM
Ù A scatter diagram is a tool for analyzing relationships between two variables.
Ù One variable is plotted on the horizontal axis (independent variable, x) and the other is
plotted on the vertical axis (dependent variable, y).
Ù The pattern of their intersecting points can graphically show relationship patterns.
Ù If the diagram does not show any pattern or is randomly scattered, we can assume that
the two variables do not have relationship between them.
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 82
_______________________________________________Chapter 3: Correlation & Regression Analysis
Ù Examples of scatter diagram:
STRONG POSITIVE CORRELATION STRONG NEGATIVE CORRELATION
WEAK POSITIVE CORRELATION WEAK NEGATIVE CORRELATION
NO CORRELATION NO LINEAR CORRELATION
PERFECT POSITIVE CORRELATION
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 83
_______________________________________________Chapter 3: Correlation & Regression Analysis
3.5 PEARSON’S PRODUCT MOMENT COEFFICIENT OF
CORRELATION
Ù Pearson’s product moment coefficient of correlation measures the relationship
between the two variables and also the strength or degree of correlation.
Ù Strength of correlation can either be perfectly correlated, strongly correlated,
moderately correlated, weakly correlated and not correlated.
Ù Pearson’s product moment coefficient of correlation is given by:
∑𝒙∑𝒚
∑ 𝒙𝒚 −
𝒓= 𝒏
(∑ 𝒙)𝟐 (∑ 𝒚)𝟐
=> ∑ 𝒙𝟐 − ? >∑ 𝒚𝟐 − ?
𝒏 𝒏
Ù The above formula also can be written as:
𝑺𝑿𝒀
𝒓=
√𝑺𝑿𝑿. 𝑺𝒀𝒀
Where:
∑𝒙∑𝒚
𝑺𝑿𝒀 = D 𝒙𝒚 −
𝒏
(∑ 𝒙)𝟐
𝑺𝑿𝑿 = D 𝒙𝟐 −
𝒏
(∑ 𝒚)𝟐
𝑺𝒀𝒀 = D 𝒚𝟐 −
𝒏
Ù Note that:
∑x2 ≠ (∑x)2
∑y2 ≠ (∑y)2
∑xy ≠ ∑x
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 84
_______________________________________________Chapter 3: Correlation & Regression Analysis
EXERCISE 1
A marketing officer in a company wants to know the relationship between annual advertising
expenditures (RM million) and annual sales (RM million) of the company. For the study, he
collected data on advertising expenditures and annual sales of the company for the last 8 years.
Annual Advertising
Expenditure 2 1 4 3 2 4 5 3
(RM million)
Annual Sales
5 3 6 5 4 7 8 6
(RM million)
a) Determine the independent and dependent variable.
b) Draw a scatter plot to show the relationship between the annual advertising expenditures
and annual sales. What conclusion can be made from the plot?
c) Calculate the Pearson’s product moment coefficient of correlation and explain its meaning.
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 85
_______________________________________________Chapter 3: Correlation & Regression Analysis
EXERCISE 2
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 86
_______________________________________________Chapter 3: Correlation & Regression Analysis
An economist wants to study a relationship between family income and food expenditure. The
following table shows the result of the study based on 8 families that had been chosen
randomly.
Annual Income
8 12 9 24 13 37 19 16
(RM ‘0000)
Food
Expenditure 2.88 3.00 2.97 3.60 3.64 7.03 3.80 3.52
(RM ‘0000)
a) Name the dependent and independent variables used in this study.
b) By calculating the product moment correlation coefficient, determine and explain the
correlation of annual income and food expenditure.
3.6 SPEARMAN’S COEFFICIENT OF CORRELATION
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 87
_______________________________________________Chapter 3: Correlation & Regression Analysis
Ù Spearman’s rank coefficient of correlation is a measure of association between two
variables that are at least of ordinal scale, which means suitable for qualitative data. It
also can be used if the two given variables are quantitative.
Ù The Spearman’s rank coefficient of correlation is given by:
𝟔 ∑ 𝒅𝟐𝒊
𝒓𝒔 = 𝟏 − E G
𝒏(𝒏𝟐 − 𝟏)
Where n = Number of observations
di2 = Difference between the ranks
Ù Computation of 𝑟! is simple since it does not use the actual values of data instead it
uses the ranks representing the actual data values.
Ù We usually give rank 1 for the smallest data value and highest rank for the largest data
value.
Ù For tied observations, that is two or more observations receiving the same score on the
same variable, each of them is assigned the average of the ranks which would have been
assigned had no ties occurred.
EXERCISE 3
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 88
_______________________________________________Chapter 3: Correlation & Regression Analysis
The grades of Mathematics and Accounting of 10 students were taken randomly to study the
relationship between the grades of Mathematics and Accounting. The following information is
based on the grades obtained for the two subjects in an examination.
Mathematics (x) A C D B C A B E B A
Accounting (y) B D D A C A C D B B
Using the rank correlation, what conclusion can be made about the grades of Mathematics and
Accounting of the students?
EXERCISE 4
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 89
_______________________________________________Chapter 3: Correlation & Regression Analysis
The data below show the marks obtained by 8 students in Statistics test and Accounting test.
Is there any relationship between the marks in the two tests using rank correlation?
STATISTICS ACCOUNTING
STUDENT
(x) (y)
Farrish 87 82
Khairina 65 72
Aiman 46 65
Marissa 95 82
Adam 54 61
Athirah 60 68
Farid 79 60
Suri 48 52
3.7 SIMPLE LINEAR REGRESSION
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 90
_______________________________________________Chapter 3: Correlation & Regression Analysis
Ù Regression analysis is a statistical technique to estimate the best fitted line to show the
relationship between dependent and independent variables. This best fitted line is also
known as a regression line or regression equation.
Ù The regression equation is in the form of:
𝒚 = 𝒂 + 𝒃𝒙
Where:
𝑎 - Is the y-intercept
𝑏 - Is the slope of the line
𝑥 - Is the independent variable
Ù The values of 𝑎 and 𝑏 can be obtained by using the least squares method. Using this
method values 𝑎 and 𝑏 is given by the following formula.
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦 𝑏= 𝑛
𝑏= % (∑ 𝑥 )%
%
𝑛 ∑ 𝑥 − (∑ 𝑥 )% OR ∑𝑥 −
𝑛
∑𝑦 ∑𝑥
𝑎= −𝑏
𝑛 𝑛
Ù INTERPRETATION OF VALUES OF 𝒂 AND 𝒃.
𝒂 à The meaning is, when 𝑥 = 0, 𝑦 = 𝑎.
𝒃 à For every one unit increase in 𝑥, 𝑦 will increase (if 𝑏 positive) or decrease (if 𝑏
negative) by 𝑏 units.
Example: if 𝑏 = 32 means that for every one unit increase 𝑥, 𝑦 in will increase by 32 unit.
3.8 COEFFICIENT OF DETERMINATION, (R2)
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 91
_______________________________________________Chapter 3: Correlation & Regression Analysis
Ù Coefficient of determination measures the proportion of variation in dependent variable
(y) that can be explained by independent variable (x).
Ù The coefficient of determination is the square of correlation coefficient,(𝑟)% . It is
expressed as a percentage where:
𝑹𝟐 = (𝒓)𝟐
Ù For example, if 𝑟 = 0.91, thus:
𝑅% = (𝑟)% = (0.91)% = 0.83
Ù INTERPRETATION OF R2
R2 = 0.83 means that 83% of the total variation in y can be explained by x using the
regression line.
∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ∞Õ
EXERCISE 5
A lecturer wants to know the relationship between the number of study hours in a week and
GPA obtained by 10 students selected randomly from a class. The data below gives the following
results.
Number of Study Hours 9 7 10 6 7 8 12 4 5 6
GPA 3.20 3.00 3.15 2.84 2.98 3.05 3.48 2.01 2.28 2.90
a) State the independent and dependent variable.
b) Find the Pearson’s product moment coefficient of correlation and explain its meaning.
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 92
_______________________________________________Chapter 3: Correlation & Regression Analysis
c) Find the regression equation of GPA based on the number of study hours.
d) Explain the meaning of regression coefficients obtained in (c).
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 93
_______________________________________________Chapter 3: Correlation & Regression Analysis
e) Calculate the coefficient of determination and explain its meaning.
f) Estimate the GPA obtained by Lisa if she studies for 11 hours in a week.
EXERCISE 6
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 94
_______________________________________________Chapter 3: Correlation & Regression Analysis
A supervisor of a factory that produces electrical appliances finds that there exists a
relationship between age of worker and the number of absent days. He then collected the
following data from 10 production operators taken at random.
Age (Years) 42 27 36 25 22 39 57 19 33 30
No. of Absent Days 2 7 5 9 10 4 4 8 6 5
a) Name the independent and dependent variable.
b) By calculating the product moment coefficient of correlation, determine and explain the
correlation of the age and the number of absent days.
c) Obtained a regression equation of number of absent days with respect to the ages of
workers using the least squares method.
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 95
_______________________________________________Chapter 3: Correlation & Regression Analysis
d) If Harez is 28 years old, what would be the expected number of absent days?
e) Calculate the coefficient of determination and explain its meaning.
EXERCISE 7
The following statistic was obtained from a survey:
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 96
_______________________________________________Chapter 3: Correlation & Regression Analysis
𝑛 = 19 𝑋U = 1.87 𝑌U = 80.37 D 𝑋𝑌 = 2901.7 D 𝑋 % = 70.83 D 𝑌 % = 124 561
a) Determine the strength of correlation between X and Y.
b) Find the least squares equation of Y based on X.
USING CALCULATOR TO COMPUTE PEARSON’S PRODUCT MOMENT COEFFICIENT OF CORRELATION,
REGRESSION INTERCEPT (𝒂) AND SLOPE (𝒃)
Calculator Model Casio fx-570MS
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 97
_______________________________________________Chapter 3: Correlation & Regression Analysis
Enter data into calculator in the format of: 𝒙, 𝒚
Press Function
SHIFT CLR 1 = To clear all memory
MODE MODE 2 1 Regression
Then input each given data: x,y then press M+
SHIFT 1 1 = ∑x2
SHIFT 1 2 = ∑x
SHIFT 1 3 = n
SHIFT 1 41 = ∑y2
SHIFT 1 42 = ∑y
SHIFT 1 43 = ∑xy
SHIFT 2 443 = 𝑟 (Pearson’s product moment)
SHIFT 2 442 = B (Regression slope, 𝑏 )
SHIFT 2 441 = A (Regression intercept, 𝑎)
TUTORIAL 3
REVIEW QUESTIONS 6
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 98
_______________________________________________Chapter 3: Correlation & Regression Analysis
Text Book Page 163
Please do all the questions listed below & show your calculations clearly.
QUESTION QUESTION
Question 1 Question 21
Question 2 Question 22
Question 3 Question 25
Question 4
Question 5
Question 6
Question 7
Question 12
Question 13
Question 14
Question 15
Question 16
Question 17
Question 18
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 99