Correlation Analysis
Correlation Analysis
7
INTRODUCTION
Sofar we have studied problems relating to onevariable only. In business we come across a large
number of problems involving the useof two or more than two variables. If twoquantities vary in such
away that movements inone are accompanied by movements in the other, these quantities are said to be
correlated. For example, there exists some relationship between family income and expenditure on luxury
items,price of a commodity and amount demanded, increase in rainfall up to a point and production of
rice, an increase in the number of television licences and number of cinemaadmissions, etc. The statistical
tool with the help of which these relationships between two or more than two variables is studied is
called correlation*. The measure of correlation called the coefficient of correlation (denoted by the
analysis
symbol r) summarizes in one figure the direction and degree of correlation. Thus correlation
refers to the techniques used in measuring the closeness of the relationship between
the variables. A
correlation as : An analy
very simple definition of correlation is that given by A.M. Tuttle. He defines
sis of the covariation of two or more variables is usually called correlation.
The problem of analysing the relation between different series should
be broken down into three
steps :
(1)Determining whether a relation exists and, if it does, measuring it;
(2) Testing whether it is significant; and
(3) Establishing the cause-and-effect relations, if any.
aspect a reference may be made to
Inthis chapter only the first aspect will be discussed. For second
establishing the cause-effect
chapter on Tests on Hypothesis. The third aspect in the analysis, that of
significant correlation between the
relation, is beyond the scope of this text. An extremely high and cancer.
smoking causes lung
increase in smoking and increase in lung cancer would not prove that
convariation) between two
It should be noted that the detection and analysis of correlation (i.e.,
statistical variables requires relationship of somne sort which associates the observation
in pairs, one of
may be of almost
each pair being a value of each of the twovariables. In general, the pairing relationship places.
of time or different
any nature, such as observations at the same time or place or over a period
Significance of the Study of Correlation
reasons :
The study of correlation is of immense use in practical life because of the following
1. Most of the variables show some kind of relationship between price and supply, income and
Cxpenditure, etc. With the help of correlation analysis we can measure in one figure the degree of
relationshipexisting between the variables.
When the relationship is of a quantitative nature, the appropriate statistical tool for discovering and mcasuring the
auonship and expressing it in a brief formula is known as correlation."-Croxton and Cowden:Applied General Statistics.
200 Business Statistics
X
80 50
10 15
70 45
12 20
60 30
11 22
40 20
18 25
30 10
20 37
NEGATIVE CORRELATION NEGATIVE CORRELATION
X
100 10
20 40
90 20
30 30
60 30
40 22
40 40
60 15
80 16 30 50
202 Business Statistics
and multi
(17) Simple, Partialand Multiple Correlation. The distinction between simple, partialstudied
are it i
ple correlation is based upon the number of variahles studied When only two variables
a problem of either
it is
a problem of simple correlation. When three or more variables are studied
are Studied Slmuitane
multiple or partial correlation, In multiple correlation three or more variables per acre and both the
relationship between the yield of rice
Ousiy. ror example, when we study the
amount of rainfall and the amount of fertilisers used it is
a problem of multiple correlation. Similarly,
partial correlation we
of plastic hardness. temperature and nressure is multivariate. In
variables to be influencing each other, the
ne relationship
variables. But consider onlv two
recogntse more than two example, in the rice problem taken above if
variable being kept constant. For
enect of other influencing rainfall to periods when a certain average
daily tempera
correlation analysis of vield and problems relating
we limitour this chapter, we shall study
ture existed, it becomes aproblem of partial correlation. In
to simple correlation only. between linear and non
(Curvilinear) Correlation. The distinction
(7) Linear and Non-linear change between the variables. If
the amount
the constancy ofthe ratio of variable,
Iinear correlation is based upon constant ratio to the amount of change in the other
tends to bear a following two variables Xand Y:
of change in one variable linear. For example, observe the
then the correlation is said to be 50
30 40
X: 10 20
280 350
210
Y: 70 140 variables are plotted
between the two variables is the same. Ifsuch
Itis clear that the ratio of change
straight line.
graph paper, all the plotted points would fall on a change inone variable does
on a or curvilinear if the amount of
non-linear
Correlation would be called
other variable. For example,
if we double the
amount of change in the
not bear a constant ratio to
the necessarily be doubled. It may be
production of rice or wheat, et., would not variables. How
amount of rainfall, the a non-linear relationship between the
practical cases we find complicated than
pointed out that in most non-linear correlation are far more
for measuring
ever, since techniques of analysis make an assumption that the relationship
between the variables
generally
those for linear correlation, we
is of the linear type. curvilinear correlation:
diagrams will illustrate the difference between linear and
The following two
CURVILINEAR CORRELATION
POSITIVE LINEAR CORAELATION
X X
X X
II
plotted points fall in a narrow band, there
perfectly negative (i.e., r = -1) (diagram II). If thevariables-correlation shall be positive if the
would be a high degree of correlation between the
left-hand corner to the upper right-hand corner
points show a rising tendency from the lower
tendency from upper left-hand corner to
(diagram III) and negative if the points show a declining
the other hand, if the points are widely
the lower right-hand corner of the diagram (diagram IV). On
HGH DHGREE OF HGH DBRE F
POSIIIVE CORREATION NEGATIVE ORREATICN
X X
Y
X
X
III IV
*This method is discussed in detail in Chapter on 'Regression Analysis".
204 Business Statistics
scattered over the diagrams it indicates very low degree of relationship between the variables
correlation shall be positive if the points are rising from the lower left-hand corner to the upper
upper left-hand side
ignt-nand corner (diagram V) and negative if the points are running from the
lie on a straight line
tothe lower right-hand side to the diagram (diagram VI). If the plotted pointsrelationship
the absence of any between
Paraiiel to the X-axis, or in a haphazard manner. it shows
the variables (i.e., r= 0) as shown by diagram VII.
LOW DGRE
LOW DRRE OF
NGATIVE ORRHAIICN
POSIIE ORREATION
X X X
X
X
X
X
X X X
X
X X X
X X
X X
X X
X
X
VI
NO CORRELATION (r = 0)
X
X
X X
X
X X
X
X
X X X
X
X X X
X
X X
X
VII
Ilustration 1. Given the following pairs of values :
Capital employed (Rs. Crore): 2 3 4 5 7 8 9 11 12
Profits (Rs. Lakhs): 3 5 4 8 10 12 14
(a) Make ascatter diagram.
(b) Do vouthink that there is any correlation between profits and capital employed ? Is it positive ? Is it high or low ?
Solution. By looking the scatter diagram we can say that the variables : profits and capital employed are corelated.
Further, correlation is positive because the trend to the points is upward rising from the lower left-hand cormer to the upper rieht
hand corner of the diagram. The diagram also indicatcs that the degrce of relationship is high because the plotted points are in anarrow
band which shows that it is a case of high degree of positive correlation.
Correlation Analysis 205
14
PR(LORaFkITshS.) 12
10
X
0 2 4 6 8 10 12
CAPITAL EMPLOYED (Rs. Crore)
This formula is to be used only where the deviations are taken from actua means and not from
assumed means.
The coefficient of correlation can also be calculated from the original set of observations (ie.,
without taking deviations from mean) by applving the following formula
EX EY
** = N
Er?
(E)? / E _ ( E n ?
N N
NEXY- 2X £Y
...(üi)
JNEx'- (EX)' JNEr' -(En?
always lie between
The value of the coefficient of correlation as obtained by the above formula shall When r = -1, it
between the variables.
Il. When r =+l, it means there is perfect positive correlation
means there is perfect negative correlation between the variables. When r = 0,
it means there is no
relationship between the two variables. However, in practice, such value of r as +1, -l, and 0 are
rare.
of correlation
We normally get values which lie between + l and-l such as 0.8, -0.4, etc. The coefficient
describes not only the magnitude of correlation but also its direction. Thus, +0.8 would mean that
correlation is positive because the signof r is +ve and the magnitude of correlation is 0.8.
The following illustration will clarify the procedure of computing the coefficient of correlation :
Illustration2. Find correlation coefficient between the sales and expenses from the data given below :
Firm 2 3 4 5 6 7 8 10
Sales (Rs. Lakhs) 50 50 55 60 65 65 65 60 60 50
Expenses (Rs. Lakhs): 11 13 14 16 16 15 15 14 13 13
1
Correlation Analysis
Introduction
Meaning
between two variables. Statistically, when
The term correlation implies a relationship
is a corresponding change in the
with the change in direction of one variable, there correlated. For example, there may
direction of the other, the two variables are said to
be
increase in the years of experience (). The
be change in the salary of a person (X) with which ranges between + 1. An absence of
coefficient
degree of change is expressed by a the
correlation is indicated by zero. Correlation has nothing to do with the units in which
variables are expressed.
Definitions
correlation are
Some of the important definitions of
According to Croxton and Cowden, "When
the relationship is of a quantitative
discovering and measuring the relationship
and
nature, the appropriate statistical tool for
expressing it ina brief formula is known
as correlation."
attemptstodetermine the 'degree
According to Ya Lun Chou, "Correlation analysis
of relationship' between variables.
According to W.I. King, "Correlation means
that between two series or groups of
connection."
data there exists some casu.:l
items are recorded with respect to
According to Wessel and Willet, *When a group of
the values of two distinct variables and it
is found that pairs of values tend to be
correlated."
associated, the two variables are said to be
According to L.R. Connor, "If two or more quantities vary
in sympathy so that
corresponding movements in the
movements in the one tend to be accompanied by
correlated."
other(s) then they are said to be
USES OFCORRELATION
OR
CORRELATION
IMPORTANCE OR SIGNIFICANCE OF
60 60 12
70 70 26
80 12 80 40
Ifthe corresponding values of two variables with linear correlation are plotted on a
graph, a straight line is obtained. Mathematically, this relationshipmay be expressed as
Y = a+ bX
The correlation is said to be non-linear (Curvi-linear) if the amount of change in one
variable does not bear a constant ratio to the amount of change in the other related
variable. For example, if we double the use of fertilizers, the agricultural pruductior
would not necessarily be double.
Correlation, generally speaking,refers to a linear relationship.
Statlstice
DEGREE OF CORRELATION
be determined
The degrcc or the intensity of relationship betwcen two variables can
by computing the value of cocfficient of correlation. On the basis of coefficicnt of
correlation, the degree of correlation may be of three types :
) Perfect correlation-If the relationship between two variables is such that with
an increase in the value of one variable, the value of other variable increases or decreascs
in a fixed proportion,correlation betwcen them is said to be perfcct. It is of two types -
(a) Perfect positive correlation-If both the series move in the sarme dircction and
inthe same proportion there would be perfect positive correlation between them.
The cocfficient of correlation in this case would be + 1.
(b) Perfect negative correlation-If both the series move in reverse direction and
in same proportion, there would be perfect negative correlation between them.
The coefficient of correlation in this case would be - 1.
Perfect correlation is obtained when there is complete mutual dependence between
the two series.
(2) Limited degree of Correlation-Limited degree of correlation is common in
cconomic, business and social activities and can be very high, high moderate, low or very
low. It is of two types
in two variables in
(a) Limited positive correlation--If there are unequal changes
positive. The coefficient of
the same direction, correlation is said to bc limited
correlation in this case would be between 0 and 1.
changes in two variables in
(b) Limited negative correlation--If there are unequal be limited negative. The
to
the opposite direction, the correlation is said between 0 and - 1.
coefficient of correlation in this case
would be
is observed between the two
(3) Absence of Correlation-If no relationship this case
correlation. The coefficient of correlation in
variables, it is known as absence of
would be 0 (zero).
correlation according to Karl Pearson's
The following chart shows degrees of
formula
DEGREEOF CORRELATION
Positive Negative
Degree
1. Perfect correlation
(a) Perfect positive +1
(b) Perfect negative
2. Limited degree -0-9 to -0-99
+0-9to+ 0-99
(a) Very high -0-75 to -0-9
+0:75 to +0:9
(b) Fairly high -050 to -0-75
+0-50to+ 0:75
(c) Moderate -0-25 to 0-50
+025 to0 +0-50
(d) Low below 0 to 025
below 0to +025
(e) Very low
0
3. Absence of correlation
CORRELATION
METIODS OFDETERMINING SIMPLE
correlation the different methods ot
Initially, we confine oursclves to simple linear
finding simple lincar correlation are as follows -
Correlation Analysis " 5
() Graphic Methods
(a) Correlation Graph
(b) Scatter Diagram or (Dotogram)
(1) Algebraic (or Mathematical)Methods
(a) Karl Pearson's coefficient of correlation or (Covariance Method)
Difference Method)
(b) Spcarman's Rank Co-efficient of correlation or (Rank
(c) Concurrent Deviation Method
(d) Least Squares Method (or Method of least squares)
()CORRELATION GRAPII
for each of the variable under
Under this method, two curves are drawn on thcgraph
y-axis and the values of a
study. The values of two relatcd variables are represented on
common reference viz. time, place etc. are prepresented
on x-axis i.e. the base linc. Such
semi-logarithmic or ratio scale
graphs can be drawn either on a natural scale, or on a
depending upon the size of the magnitude of the
data. Further, if the minimum values of
line is drawn in order to avoid the
the variables are much above zero, a false base
unnecessary empty spaces in the graph.
Interpretation of Correlation Graph
After viewing the graph, the inference about
the nature and degrce of correlation is
of the two curves :
drawn roughly by observing the direction and closeness
curves drawn on the graph are moving
(1) Perfect positive correlation-If both the
the correlation is perfect positive.
in the same direction (either upward or downward),
the curves move in different directions.
(2) Perfect negative correlation--If both perfect
moves upward and the other downward or vice-versa, it would indicate a
i.e. one
negative correlation between the variables.
curves move criss-croSs and show erratic
(3) Absence of Correlation-If the
movements, it would indicate that either there is
no correlation or there is very low degree
under study.
of correlation between the two variables
Illustration 1.
correlation between the two variables of income
From the following data, study the R Crores)
and expenditure using the graphic method : 2013 2014 2015
2008 2009 2010 2011 2012
Year
7 10 6 5 7
Income 4 5
5 4 3 2 8
Expenditure 6
Solution :
Correlation Graph
Rs.) -+Income --e- Expenditure
crore
Expenditur
(in
Income
&
X
2006 2007
2000 2001 2002 2003 2004 2005
Years
6" Statistics
Conclusion-I is clear from the above
is rising. the expenditure curve is falling correlation graph that when the
there isinncome
correlation between and vice-versa. Thus, curve
income and expenditure variables. a
Illustration 2. negative
Prepare a correlation graph on
correlation between age and the basis of following
blood-pressure : data and comment about he
S. No.
2 3 4
Age (Years) 55 6 7
40 70 35
8
Blood-pressure: 145 125 160 120
60
150
45 S5 50 40
Solutior 130 150 145 140
CORRELATION GRAPH
-+Age (years) ---Blood Pressure
75
160
65 155
150
145
o 140
135
130
125
120
1 2 3 4 6 7 8 9
S.No.
Conclusion-As both curves move in the same
very close to each other. Hence, there is a direction and also these curves are
age and blood-pressure. high degree of positive correlation between
Advantages
I: is the simplest to use.
2 It can be used for simple as well as multiple
correlation.
Disadvantage
We can conclude only a rough estimate of the
nature of correlation and an exact
degree of correlation may not be known.
(II)SCATTER DIAGRAM OR (DOTOGRAM OR
Under this method, a diagram is prepared on the basis ofSCATTERGRAM)
variables. The values of one of the variables are representedcorresponding values of two
on x-axis and those of ne
other variable on y-axis through natural scale.
For each paid or X and Y values, we mark a dot and we get as
many points on the
graph as the number of observations. The diagram of dots so obtained is called a scatter
diagram.
By examining the shape of the plotted dots, the degree of correlation between the
variables, can be estimated.
Interpretation of Scatter Diagram
After viewing the scatter diagram, the inference about the nature and degree
correlation is drawn as follows :
(1) Positive or Negative Correlation-If the trend of the points is upward ris
from the lower left to upper right, the correlation is positive since it shows that the Valu
Correlation Analysis "7
points
variables move in the same direction. On the other hand, if the trend of
of the two negative.
isreverse from upper left to the bottom right, the corrclation is
X 0 +X
Positive correlation Ncgative correlation
Y+
X X
High Degree of Low Degree of
Positive Correlation
Positive Correlation
X O +X
No Correlation
No Correlation
Illustration 3.
correlation through a scattcr diagram
From the following pair of data, study the 50
20 30 40
X 10
6 10
4
Y
Solution:
o
Y
Values
f
8
+X
10 20 30 40
Values of X
8 S t a t i s t i c s
M(Aacironkustcy)
7
8
1 2 3 4 5 6 7 8 9 10 +X
Marks (in Statistics)
Conclusion--I is clear from the above graph that there is very high degree of
positive correlation between marks of
Statistics and Accountancy.
Advantages
(1) Simple-It is a sinmple and attractive
between variables. method for determining the correlation
(2) Easy to draw interpret--It is very
easy to draw a scatler diagram and a
common-man can understocd it easily and make interprelations.
(3) Useful in estimating the value of missing
value of the independent variable is given. dependent variable--When the
(4) Useful in detecting abnormal variations or
questionable data.
(5) No effect of extreme values-Values of exreme items do
not affect the result.
Such points remain isolated 1n the diagrn.
Disadvantages
(1) It does not provide the precise degree of correlation-It shows only a visual
picture of relationship between two variables which indicates the direction of relationship
i.e. positive or negative etc.
(2) Not suitable for further mathematical treatment-Since it does not
provide
an exact measure of the extent of relationship between the variables.
(II) KARI. PEARSON'S COEFFICIENT OF CORRELATION
Karl Pearson's method, popularly known as Pearson's cocfficient of correlation iS
mostwidely used in practice. It is denoted by symbol 'r, and based on the covariance of
the concerned variables.
The formula for computing Pearsorian 'r an take various alternative forms
depending upon the choice of the user.
Karl Pearson's Co-variance
Co-variance is the method of finding out joint variations between two variables. It is
given by the formula
Illustration 7.
Calculate the co-efficient of correlation from the following data through Karl
Pearson's Method:
12 9 8 10 11 13 7
14 6 9 11 12 3
[CCS Univ. BBA, June 2012]
Correlation Analysis " 11
Solution :
CALCULATION OF COEFFICIENT OF CORRELATION
d'y dxdy
X dx = X-X dy = Y
25 10
2 4 14 5
12 1
-1 1 8 -1
9 6
-3 9
-2 6
8
0 0 9
10 2 4
1 11 9
11 3 9
3 12 18
13 6 36
3
7 -3
Ldy= 84|Sdxdy = 46
SX= 70 Ldx= 28 SY= 63
-2X 70 = 10
N 7
£Y_ 63 =9
N 7
Illustration 8.
correlation coefficient
for the following data
Calculate Karl Pearson's 4 8
10
6 2
X 5
9 11
2012]
Y [CCS Univ. BCA, Dec.
Solution : COEFFICIENT OF
CORRELATION
CALCULA TION OF
d'y dxdy
dy =y - y
dñ = x - I d'x 1 0
X 1
0 9
6 9 - 12
11 3
-4 16 9 - 12
2 5 -3
4 16 0
10
4 8
4 -2 1 -2
7 -1
2 4
8 Ldy = 20 dx dy =-26
Sdx= 40 y = 40
EX=30
EX 30 =6
=
N
SY 40= 8
N 5
correlation bctween X and Yis
The coefficient of Ldxdy - 26
- 26 =-0-92
28-284
12 " Statistics
(2) Short-cut method- This method may
both the series are not in whole numbers. Under bethisused when the Arithmetic
from assumed mean (A). Steps involved are : method, the Deviations Means of
are taken
(a) Select any number as assumed
mean (from or outside the serics) say for
(A) and for series- Y(B). series. y
(b) Compute deviations from A in the
Similarly, compute Deviations from first series i.e. (X- A) and denote it by
B in the second series i.e. (Y- B) and d
it by 'dy'. Summate them to obtain dr and Sdy. denote
(c) Multiply the Deviation of both the series and
sum the product to obtain Edxdy
(d) Square the Deviations of both the series
and obtain the sum of their respect
squares of Deviations i.e. SA'x and d'y.
(e) Finally, use the following formula to get the value of coefficient of
correlation
Edx dy EdxN Edy
VEdr2de)2 (Edy)?
N N
Note : 1. Edx Edy in the numerator and and (Edy² in the denominator are
N N N
correlation introduced due to assumed means.
2.Above formula, although looks large and complex, saves a lot of
Computational work.
Illustration 9.
Calculate Karl-Pearson's Coefficient of Correlation between exports and imports:
Exports: 42 44 58 55 89 98 66
Imports: 56 49 53 58 65 76 58
Jlustration 10.
for two
From the following information relating to the Stock Exchange quotationsshares
ares Aand B, ascertain by using Pearson's Coefficient of correlation,whether A
and Bare correlated in their price :
160 164 172 182 166 170 178
Price of share A ()
280 260 234 266 254 230
Price of share B (3) 292
Solution :
CALCULATION OFCOEFFICIENT OF CORRELATION
Share A ShareB
Deviatsion
Deviation
d' Y d'y dxdy
dy Y-B
dx = X-A
32 1,024 -320
-10 100 292
160 400 - 120
36 280 20
164 -6
0
2 4 260 4- B
172 676 -312
12 144 234 -26
182 -24
6 36
-4 16 266
166 0
254 -6 36
170 A -240
64 230 -30 900
178 8
N=7 dx = 2 N=7 Ldy = - 4 Edy Edrdy
= 364 = 3,072 =-1016
Ddx Edy
Edxdy - N
r=
N N
- 1,016_(-4)
7
r=
V 364)2
7 V3,072. 7
- 1,016 + 1·143
V363-43 V3,069-71
- 1,014-86
=-0-961
1,056-24
Conclusion-There is avery high degree of negative correlation between
prices of
shares A and B.
Product Moment Còrrelation
Correlation Analysis " 19
N VEay_ ay?
(Ddy)
=
-4.090 - (55)(-21l)
V9,155
(55)2 4 , 9 9 9 _ 2 1 ) 2
10 10
-2,935 - 2,935 = 0-443
6,622-93
V8,852-5 x 4,954-9
on the basis of given data--In such
(5) Preparation of X-series, Y-serics or both given
not clearly. Hence, we identify the
cases, the values of serics x or series yboth are
characteristics between which, the coefficient
of correlation is to be calculated and
prepare the series.
Illustration 19.
and playing habit of the following
Find the coefficient of correlation between age
students
18 19 20
15 16 17
Age 100 80
250 200 150 120
No. of students
48 30 12
200 150 90
No. of players 2009;
[B.Com., Agra, 2003, 2008; Kanpur, 2002,
Garhwal, 2003; CCS Univ., 2006, 2011]
Solution :
while playing habit (Y-series) will be
Here. age (X-serics) will be taken as it is,
No. of players X100
calculated by : No. of students
CALCULATION OF COEFFICIENT OF CORRELATION
19 1 30 - 10 100 -10
20 2 4 15 -25 625 - 50