Understanding Measures of Correlation
Understanding Measures of Correlation
Measures of Correlation
LEARNING
O OUTCOMES
After successfully completing this module, the student should be able to:
1. Identify different measures of correlation.
2. Differentiate Pearson’s and Spearman Rho correlation
3. To provide understanding and skills in using linear regression.
4. To be able to define commonly used terms in regression
PRE-TEST
Directions: Read the following statements and choose the correct answer inside the box below.
____________1. The degree of relationship between the variables under consideration is
measure through the correlation analysis.
____________2. It analyze and recognizes more than two variables but considers only two
variables keeping the other constant.
____________3.It is a graph of observed plotted points where each points represents the
values of X & Y as a coordinate.
____________4. It is the ratio of change between the two variables either in the same direction
or opposite direction and the graphical representation of the one variable with respect to other
variable is straight line.
____________5. It is also called Pearson’s R.
____________6. is a statistical method used in finance, investing, and other disciplines that
attempts to determine the strength and character of the relationship between one dependent
variable (usually denoted by Y) and a series of other variables (known as independent
variables).
____________7. The graphical representation of the two variables will be a curved line. Such
a relationship between the two variables is termed as _______________.
____________8. It means no relationship between the two variables X and Y.
____________9. It determines the strength and direction of the monotonic
relationship between your two variables rather than the strength and
direction of the linear relationship between your two variables
____________10.Quantifies the relationship between one or more predictor variable(s) and
one outcome variable.
T R E G R E S S I O N
N J D A F Q J D S O A
E M L S D R N F I J M
I O K ‘S H H V T M U R
C D W N J I A O P T A
I E H O O L L R L C E
F L F S E I U O E U P
F T S R P N E Y K D S
E Q R A A E M E I O K
O O B E S A N D L R J
C Q V P F R Y Z O P Q
Measures of Correlation
The degree of relationship between the variables under consideration is measure through
the correlation analysis. The measure of correlation called the correlation coefficient .
Correlation is a statistical tool that helps to measure and analyze the degree of relationship
between two variables. Correlation analysis deals with the association between two or more
variables.
1.2 Discussion
Types of Correlation
Correlation
Simple Multiple
Partial Total
• Simple correlation: Under simple correlation problem there are only two variables
are studied.
• Multiple Correlation: Under Multiple Correlation three or more than three variables
are studied. Ex. Qd = f ( P,PC, PS, t, y )
• Partial correlation: analysis recognizes more than two variables but considers only
two variables keeping the other constant.
• Total correlation: is based on all the relevant variables, which is normally not
feasible.
Example :
X = 1, 2, 3, 4, 5, 6, 7, 8,
Y = 5, 7, 9, 11, 13, 15, 17, 19,
Y = 3 + 2x
Types of Correlation
Interpretation of Value
Pearson’s R
Lesson 2
Meaning
A correlation coefficient of 1 means that for every positive increase in one variable,
there is a positive increase of a fixed proportion in the other. For example, shoe sizes
go up in (almost) perfect correlation with foot length.
A correlation coefficient of -1 means that for every positive increase in one variable,
there is a negative decrease of a fixed proportion in the other. For example, the
amount of gas in a tank decrease in (almost) perfect correlation with speed.
Zero means that for every increase, there isn’t a positive or negative increase. The two
just aren’t related.
Pearson’s R formula:
Example:
Find the value of the Pearson correlation coefficient from the following table:
Subject Age Glucose
(x)` Level
(y)
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
Step 1: Make a chart. Use the given data, and add three more columns: xy, x2, and y2.
Step 2: Multiply x and y together to fill the xy column. For example, row 1 would be 43 × 99
= 4,257.
Subject Age Glucose xy x² y²
(x)` Level
(y)
1 43 99 4257
2 21 65 1365
3 25 79 1975
4 42 75 3150
6 59 81 4779
Step 3: Take the square of the numbers in the x column, and put the result in the x2 column
Step 4: Take the square of the numbers in the y column, and put the result in the y2 column.
Subject Age Glucose xy x² y²
(x)` Level
(y)
Step 5: Add up all of the numbers in the columns and put the result at the bottom of
the column. The Greek letter sigma (Σ) is a short way of saying “sum of” or summation.
Subject Age Glucose xy x² y²
(x)` Level
(y)
(122,910) – (120,024)
r=
[68,454 −61,009)] [240,132 – 236,196)]
2,868
r=
[7,445] [3,936]
2,868
r=
[29,303,520]
2,868
r= , .
r= 0.5298
Our result is 0.5298 or 52.98%, which means the variables have a moderate positive
correlation.
Example No.2
Find out the number of pairs of variables, which is denoted by n. Let us presume
x consists of 3 variables – 6, 8, 10. Let us presume that y consists of corresponding 3
variables 12, 10, 20.
Step 1: Make a chart. Use the given data, and add three more columns: xy, x2, and y2.
x y xy x² y²
6 12
8 10
10 20
Step 5: Add up all of the numbers in the columns and put the result at the bottom of the
column. The Greek letter sigma (Σ) is a short way of saying “sum of” or summation.
x y xy x² y²
6 12 72 36 144
8 10 80 64 100
10 20 200 100 400
∑ 24 42 352 200 644
Step 6: Use the following correlation coefficient formula.
( ) ( )( )
r=
[3 (200) – (24²)] [3(644) – (42)²)]
(1056) – (1008)
r=
[600 −567] [1932 – 1764)]
48
r=
[24] [168]
Our result is 0.7559 or 75.59 %, which means the variables have a STRONG
positive correlation.
Here,
n= number of data points of the two variables
di= difference in ranks of the “ith” element
The Spearman Coefficient, ⍴, can take a value between +1 to -1 where,
Example:
The scores for nine students in physics and math are as follows:
Compute the student’s ranks in the two subjects and compute the Spearman rank correlation.
Step 1: Find the ranks for each individual subject. To rank by hand, order the scores from
highest to smallest; assign the rank 1 to the highest score, 2 to the next highest and so on:
Step 2: Add a third column, d, to your data. The d is the difference between ranks. For
example, the first student’s physics rank is 3 and math rank is 5, so the difference is -2 points.
In a fourth column, square your d values.
6 (12)
𝜌 =1−
(9)(9² − 1)
6 (12)
𝜌 =1−
(9)(81 − 1)
72
𝜌 =1−
(9)(80)
72
𝜌 =1−
720
𝜌 = 1 − 0.10
ρ = 0.9
The Spearman’s Rank Correlation for this data is 0.9 and as mentioned above if
the ⍴ value is nearing +1 then they have a perfect association of rank.
Example No.2
Calculate the Spearman’s rank correlation coefficient for the following data
Candidates
Geography 75 40 52 65 60
English 25 42 35 20 33
Step 1: Find the ranks for each individual subject. To rank by hand, order the scores from
highest to smallest; assign the rank 1 to the highest score, 2 to the next highest and so on:
Geography Rank (x) English Rank (y)
75 1 25 4
40 5 42 1
52 4 35 2
60 3 33 3
Step 2: Add a third column, dᵢ, to your data. The dᵢ is the difference between ranks. For
example, the first student’s geography rank is 1 and English rank is 4, so the difference is -3
points. In a fourth column, square your dᵢ values.
75 1 25 4 -3 9
40 5 42 1 4 16
52 4 35 2 2 4
65 2 20 5 -3 9
60 3 33 3 0 0
Σ 38
6 (38)
𝜌= 1−
(5)(5² − 1)
228
𝜌= 1−
(5)(24)
228
𝜌= 1−
120
𝜌 = 1 − 1.9
ρ = -0.9
The Spearman’s Rank Correlation for this data is -0.9 and as mentioned above if
the ⍴ value is nearing -1 then they have a negative association of rank.
Regression
WHAT IS REGRESSION AND LINEAR REGRESSION?
TYPES OF REGRESSION
LINEAR REGRESSION
Where:
a = the intercept.
b = the slope.
Often referred as THE PREDICTION LINE. The predicted value of Y equals the Y
intercept plus the slope multiplied by the value of X.
Ŷ1 = b0+ b1X1
Where:
b1 = sample slope
A regression line is the “best fit” line for your data. You basically draw a line that best
represents the data points. It’s like an average of where all the points line up. In linear
regression, the regression line is a perfectly straight line:
Example
X units of fertilizer used in corn field 0.3 06 0.9 1.2 1.5 1.8
2.1 2.4
Y Corn Yield 10 15 30 35 25 30 50 45
Solution:
b = Σx Σy – nΣ xy
(Σ x)2 –n Σx² and a= Σy - b Σx
x y x² y²
0.3 10 0.09 100
0.6 15 0.36 225
0.9 30 0.81 900
1.2 35 1.44 1225
1.5 25 2.25 625
1.8 30 3.24 900
2.1 50 4.41 2500
2.4 45 5.76 2025
Substituting :
r = 8( 385.5) – 10.8 (240)_____________
√8(18.36) – 116.64 √8( 8500-576,600
r = 0.877
r² = 0.77
SALES IN
ADVERTISEMENT
YEAR MILLION
IN MILLION EURO
EUROS
1 651.00 23.00
2 762.00 26.00
3 856.00 30.00
4 1,063.00 34.00
5 1,190.00 43.00
6 1,298.00 48.00
7 1,421.00 52.00
8 1,440.00 57.00
9 1,518.00 58.00
Σ 10,199.00 371.00
60.00
40.00
20.00
-
- 500.00 1,000.00 1,500.00 2,000.00
60.00
-15.71
50.00 Error/ Residual
40.00
+18.22
30.00 Error/ Residual
20.00
10.00
-
- 200.00 400.00 600.00 800.00 1,000.00 1,200.00 1,400.00 1,600.00
60.00
50.00
Ӯ = 41.22
Best fit line
40.00
18.22
30.00
Error/
Residual
20.00
10.00
-
- 200.00 400.00 600.00 800.00 1,000.00 1,200.00 1,400.00 1,600.00
23.00 18.22
26.00 15.22
30.00 11.22
34.00 7.22
43.00 - 1.78
48.00 - 6.78
52.00 -10.78
57.00 - 15.78
58.00 - 16.78
371.00 0
Σ 371.00 0 1437.56
The goal of simple linear regression is to create a linear model that minimize the sum
of squares of the residuals/ errors (SSE).
When conducting simple linear regression with TWO variables, we will determine
how good that line “ fits” the data by comparing it to THIS TYPE; where we pretend the
second variable does not even exist.
BASIS FOR
CORRELATION REGRESSION
COMPARISON
Correlation is a statistical
Regression describes how to numerically
measure that determines the
Meaning relate an independent variable to the
association or co-relationship
dependent variable.
between two variables.
To represent a linear
To fit the best line and to estimate one
Usage relationship between two
variable based on another.
variables.
Dependent and
Independent
variables No difference Both variables are different.
I. Directions: Write TRUE if the statement is correct and FALSE if the statement is wrong.
_________1. Pearson’s defined in statistics as the measurement of the strength of the
relationship between two variables and their association with each other and
Spearman rho determines the strength and direction of the monotonic
relationship between your two variables rather than the strength and
direction of the linear relationship between your two variables
_________2. Correlation is a statistical measure that determines the association or co-
relationship between two variables while Regression describes how to
numerically relate an independent variable to the dependent variable
_________3. . Partial Correlation it analyze and recognizes more than two variables but
considers only two variables keeping the other constant.
_________4. Positive Correlation means no relationship between the two variables X and
Y.
_________5. Regression quantifies the relationship between one or more predictor
variable(s) and one outcome variable.
Find out the Pearson correlation coefficient from the below data.
With the help of the following details in the table of the 6 people having a different age and
different weights given below for the calculation of the value of the Pearson R.
1 40 78
2 21 70
3 25 60
4 31 55
5 38 80
6 47 66
To calculate a Spearman rank-order correlation on data without any ties we will use the
following data:
Marks
English 56 75 45 71 62 64 58 80 76 61
Math 66 70 40 60 65 56 59 77 67 63
Then complete the following table and find the Spearman correlation efficient.
56 66
75 70
45 40
71 60
62 65
64 56
58 59
80 77
76 67
61 63
Online references:
1. https://www.statisticshowto.com/probability-and-statistics/correlation-coefficient-
formula/
2. https://www.slideshare.net/jherylmata/measures-of-correlation-pearsons-r-correlation-
coefficient-and-spearman-rho
3. https://www.analyticsvidhya.com/blog/2015/06/correlation-common-questions/
4. https://slideplayer.com/slide/6118301/
5. https://byjus.com/pearson-correlation-formula/
6. https://en.wikipedia.org/wiki/Linear_regression
7. https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/what-
is-linear-regression/
8. https://www.healthknowledge.org.uk/e-learning/statistical-methods/specialists/linear-
regression-correlation
9. https://www.statstutor.ac.uk/resources/uploaded/coventrycorrelation.pdf
10. https://www.investopedia.com/ask/answers/060315/what-difference-between-linear-
regression-and-multiple-regression.asp