0% found this document useful (0 votes)
4 views48 pages

Lecture 7 - Correlation

Uploaded by

callback64409
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views48 pages

Lecture 7 - Correlation

Uploaded by

callback64409
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

ENV 1205 – Mathematics and Statistics for

Environmental Science
Lecture 07: Correlation

Prepared by
ISRAT JAHAN
Assistant Professor
Department of Environmental Science
Faculty of Science and Technology (FST)
Bangladesh University of Processionals (BUP)
Email: [email protected]
TOPICS
•Correlation
•Various type of Diagram
•Coefficient of Correlation
•Problems
Bivariate data

• Bivariate data is data


for which there are two
variables for each
observation.
• As an example, the
following bivariate data
show the temperatures
of different day and sales
of ice-creams on that
corresponding days.
Correlation
• Correlation is a statistical technique which
measure and analyses the degree or extent
to which two or more variables fluctuate
with reference to one another.
• Correlation thus denotes the
interdependence amongst variates. The
degrees are expressed by a coefficient which
ranges between +1and -1. The direction of
change is indicated by + or - signs.
Correlation
• If the increase (decrease) in
one variable results in the
corresponding increase
(decrease) in the others i.e. if
the changes are in the same
directions the variables are
positively correlated.
• For example, the application of
fertilizers and crop production
are positively correlated,
advertising and sales.
Correlation
• If the increase (decrease) in one
variable results in the
corresponding decrease
(increase) in the others i.e. if the
changes are in the opposite
directions the variables are
negatively correlated.
• For example, T.V registration and
cinema attendance is negatively
correlated.
Correlation
An absence of correlation is indicated
by zero. Correlation thus expresses
the relationship through a relative
measure of change, and it has nothing
to do with the units in which the
variables are expressed
What are the uses of
Correlation?
• Economic theory and business studies
relationships between variables like price and
quantity demanded, advertising, expenditure
scales promotion measure etc. The correlation
analysis helps in deriving precisely the degree and
direction of such relationships.
• In environmental studies example of correlation
could be relationship between our consumer
income and rate of pollution.
• The concepts of regression are also based upon
the measure of correlation.
Scatter Diagram
• Scatter diagram ( Dot gram or Scattergram) is a simple
and attractive method of diagrammatic represent of
bivariate distribution for ascertaining the nature of
correlation between the variables.
• Thus for the bivariate distribution (xi, yi) where
i=1,2,3…… if the values of the variables X and Y be
plotted along the x-axis and y-axis respectively in the XY
plane, the diagram of dots so obtained is known as
scatter diagram.
• On the other hand, a scatter plot of two variables shows
the values of one variable on the y-axis and the values
of the other variable on the x -axis. Scatter plots are well
suited for revealing the relationship between two
variables.
70
............... y ........

60
50
40
30
20
10
0
0 20 40 60 80 100 120
............. x ........
Types of Correlation
Correlation is described or classified in
several different ways. Three of the
most important are:

i. Positive and negative Correlation


ii. Simple, partial and multiple Correlation
iii. Linear and non-linear Correlation
If two variables changes in
the same direction (i.e. if
Positive and one increases the other
Negative also increase or if one
Correlation decreases the other also
decreases) then this is
called a positive correlation.
Positive Positive
Correlation Correlation
X Y X Y

10 15 80 50

12 20 70 45

14 22 60 30

18 25 40 20

20 37 30 10
If two variables change in the opposite direction (i.e. if one increases,
the other decreases and vice versa), then the correlation is called a
negative correlation.
For example: T.V registrations and cinema attendance.

.
Negative Negative
Correlation Correlation
X Y X Y

20 40 100 10

30 30 90 20

40 22 60 30

60 15 40 40

80 12 30 50
2. Simple, Partial and Multiple Correlation
• When only two variables are studied then it is a problem of
simple correlation.
• When three or more variables are studied then it is a problem of
either multiple or partial correlation. In multiple correlation
three or more variables are studied simultaneously.
• For example, when we study the relationship between the yield of rice
per acre and both the amount of rainfall and the amount of fertilizers
used, it is problem of multiple correlation.
• Similarly the relationship of plastic hardness, temperature and
pressure is multivariate.
• In partial correlation we recognize more than two variables. But consider
only two variables to be influencing variable being kept constant. For
example, in the rice problem taken above if we limit our correlation
analysis of yield and rainfall to periods when a certain average daily
temperature existed, it becomes a problem of partial correlation.
3. Linear and non-linear correlation
• The nature of the graph gives us the idea of the
linear type of correlation between two variables. If
the graph is in a straight line, the correlation is
called a “linear correlation” and if the graph is not
in a straight line, the correlation is non-linear and
curve-linear.
• The distinction between linear and non-linear
correlation is based upon the constancy of the
ratio of change between the variables.
• If the amount of change in one variable tends to bear
a constant ratio to the amount of change in the other
variable, then the correlation is said to be linear.
Example: observe the following two variables X and Y. It is clear that
the ratio of change between the two variables is the same. If such
variables are plotted on a graph paper all the plotted points would fall
on a straight line.

X: 10 20 30 40 50
Y: 70 140 210 280 350

Scatter Diagram

400
........... y ............

300
200
100
0
0 20 40 60
........... x ............
Correlation would be called non-linear or curvilinear if the
amount of change in one variable doesn’t bear a constant ratio
to the amount of change in the other variable.
Example: if we double the amount of rainfall, the
production of rice or wheat etc. would not necessarily be
doubled.

Scatter Diagram
........... y ..........

2000
1000
0
0 10 20 30 40 50
....... x ...........
Properties of the Coefficient of Correlation

The following are the important properties of the coefficient of


correlation, :
► The coefficient of correlation lies between +1 and -1, -1 r  +1
►The coefficient of correlation is the geometric mean of the two
regression coefficients.
Symbolically: r = b bxy yx

►If X and Y are independent variables then coefficient of


correlation is zero.
Degrees of Correlation
Through the coefficient of correlation, we can
measure the degree or extent of the correlation
between two variables. On the basis of the
coefficient of correlation we can also determine
whether the correlation is positive or negative and
also its degree or extent.
Perfect correlation: If two variables changes in
the same direction and in the same proportion,
the correlation between the two is perfect
positive. According to Karl Pearson the
coefficient of correlation in this case is +1.
On the other hand, if the variables change in the
opposite direction and in the same proportion, the
correlation is perfect negative. Its coefficient of
correlation is -1.
In practice we rarely come across these types of
correlations.
Degrees of Correlation
Absence of correlation: If two series of two
variables exhibit no relations between them or
change in variable does not lead to a change in the
other variable, then we can firmly say that there is
no correlation or absurd correlation between the
two variables. In such a case the coefficient of
correlation is 0.
Limited degrees of correlation: If two variables
are not perfectly correlated or is there a perfect
absence of correlation, then we term the
correlation as Limited correlation. It may be
positive, negative or zero but lies with the limits.
Degrees of Correlation
➢If the points lie in narrow strip, rising
upwards, the correlation is high degree of
positive.
➢If the points lie in a narrow strip, falling
downwards, the correlation is high degree
of negative.
➢If the points are spread (scattered) without
any specific pattern, the correlation is
absent. i.e. r = 0.
Degrees Positive Negative
High degree, moderate
degree or low degrees are Absence of
Zero 0
the three categories of this correlation →
kind of correlation. The Perfect correlation
+1 -1
following table reveals the →
effect (or degree) of High degree →
coefficient or correlation. + 0.75 to + 1 - 0.75 to -1
Moderate degree →
+ 0.25 to + - 0.25 to -
0.75 0.75
Low degree →
0 to 0.25 0 to - 0.25
Degrees of
Correlation
❖Scatter Plot.

Methods of ❖ Karl Pearson’s coefficient


Determining of correlation.
Correlation ❖ Spearman’s Rank-
correlation coefficient.
❖ Method of Least Squares.
1. Scatter Plot (Scatter diagram or
dot diagram)
In this method the values of the two variables are
plotted on a graph paper. One is taken along the
horizontal (x-axis) and the other along the vertical (y-
axis). By plotting the data, we get points (dots) on the
graph which are generally scattered and hence the
name ‘Scatter Plot’. The manner in which these points
are scattered, suggest the degree and the direction of
correlation. The degree of correlation is denoted by ‘r’
and its direction is given by the signs positive and
negative.
If all points lie on a rising straight line the correlation
is perfectly positive and r = +1.
If all points lie on a falling straight line the
correlation is perfectly negative and r = -1.

Scatter Diagram
.......... y .........

100
80
60
40
20
0
10 20 30 40 50 60
.............. x ..........
Example: Given the following pairs of values:

1. Make a scatter diagram

2. Do you think that there is any correlation between profits and


capital employed? Is it positive? Is it high or low?
Inference
• By looking at the scatter diagram we can say that
the variables profits, and capital employed are
correlated.
• Further, correlation is positive because the trend to
the points is upward rising from the lower left hand
corner to the upper right hand corner of the
diagram.

• The diagram also indicate that the degree of


relationship is high because the plotted points are
in a narrow band which shows that it is a case of
high degree of positive correlation..
2. Karl Pearson’s Coefficient of Correlation
Of the several mathematical methods of measuring correlation,
the Karl Pearson’s method, popularly known as Pearsonian
coefficient of correlation, is most widely used in practice. The
coefficient of correlation is denoted by the symbol r.
If the two variables under study are X and Y, the following
formula suggested by Karl Pearson can be used for measuring
the degree of relationship.
The value of the coefficient of correlation as obtained by the
above formula shall always lie between  1.
When r =+1, it means there is perfect positive correlation
between the variables.
When r= -1 , it means there is a perfect negative correlation
between the variables.
When r = 0, it means there is no relationship between the
variables.
Example1: Calculate the coefficient of correlation between the
heights of father and his son for the following data.
Height of father (cm):
165 166 167 168 167 169 170 172

Height of son (cm):


167 168 165 172 168 172 169 171
Solution:
We know that. Correlation of coefficient

Let us consider the height of father is X and height of son is Y.


By using calculator we get,

 = 225825
X 2
 X =1344 N=8

 = 228532
Y 2
Y =1352  XY = 227160
Example2:
The following data consist of observations for the weights of
10 different automobiles (in 1000 pounds) and the
corresponding fuel consumptions (gallons per 100 miles).
Weight (x) Fuel Consumption (y)
3.4 5.5
3.8 5.9
4.1 6.5
2.2 3.3
2.6 3.6
2.9 4.6
2.0 2.9
2.7 3.6
1.9 3.1
3.4 4.9

We would like to find out how y is correlated to x.


3. Spearman’s Rank Correlation
The association between two series of rank is called rank
correlation. The method of ascertaining the coefficient of
correlation by ranks was devised by Charles Edwards Spearman
in 1904.This method is especially useful in case when the actual
magnitudes or item values are not given and simply their ranks in
the series are known. Spearman’s rank correlation coefficient,
usually denoted by ρ (Rho) is given by the formula:

Where d stands for the difference between the pair of ranks and
n the number of paired observations.
The value of Spearman’s rank correlation coefficient ranges
between -1 and +1 .
When ρ = +1 , the concordance between rankings is perfect
and the ranks are in the same direction.
When ρ = -1 , there is also perfect concordance between
rankings but the ranks in opposite direction.
In rank correlation we may have two types of problems:
A. Where actual ranks are given.
B. Where ranks are not given.
A. Where Actual Ranks are given
Where Actual Ranks are given the steps required for computing rank correlation
are:
(i) Take the difference of the two ranks i.e (R1- R2) and denote these differences
by d.

2
(ii) Square these differences and obtain the total di
(iii) Apply the formula
6 d i
2

 = 1− 3
n −n
Example1:
Two managers are asked to rank a group of employees in order of
potential for eventually becoming top managers .The rankings are
as follows:
Employee Ranked by manager I Ranked by Manager II
A 10 9
B 2 4
C 1 2
D 4 3
E 3 1
F 6 5
G 5 6
H 8 8
I 7 7
J 9 10
Compute the coefficient of rank correlation and comment on the value.
Calculation of Rank Correlation Coefficient
Employee Ranked by manager I Ranked by Manager II d 2 = (R1 – R 2) 2
(R1) (R2 )
A 10 9
B 2 4 By using
C 1 2 Calculator
D 4 3
E 3 1
F 6 5
G 5 6
H 8 8

d
I 7 7
= 14
2
J 9 10
Total i
6 d i
2
614
We know that,  = 1− = 1− 3 = 0.915
n3 −n 10 −10
Thus we find that there is a high degree of positive correlation in the ranks assigned
by the two managers.
B. Where Ranks are not given
When we are given the actual data and not the ranks it will
be necessary to assigns the ranks .
Ranks can be assigned by taking either the highest value
as 1 or the lowest value as 1.
But whether we start with the lowest value or the highest
value we must follow the same method in case of all the
variables.

Example1:
Calculate the rank correlation coefficient for the following data of marks
of 2 tests given to candidates for a clerical job:

Preliminary
92 89 87 86 83 77 71 63 53 50
test

Final test 86 83 91 77 68 85 52 82 37 57
Solutions: Calculation of Rank Correlation Coefficient

Thus there is a high degree of positive correlation between preliminary and final test.
Merits and Limitations of the Rank Method
➢ Merits
• This method is simpler to understand and easier to apply compared to
the Karl Pearson’s method.
• Where the data are of a qualitative nature like honesty, efficiency,
intelligence etc., this method can be used with great advantage. For
example the workers of two factories can be ranked in order of
efficiency and the degree of correlation established by applying the
method.
• This is the only method that can be used where we are given the ranks
and not the actual data.
• Even where actual data are given rank method can be applied for
ascertaining rough degree of correlation.
➢ Limitations:
• This method cannot be used for finding out correlation in a grouped
frequency distribution.
• Where the number of observations exceed 30 the calculations
becomes quite tedious and require a lot of time. Therefore this method
should not applied where n exceeding 30 unless we are given the
ranks and not the actual values of the variable.
4. Method of Least Squares

For finding out correlation by the coefficient method of


least squares we have to calculate the values of two
regression coefficients that of x on y and y on x .
The correlation coefficient is the square root of the
product of two regression coefficients.
Symbolically,

r = bxy  byx
Coefficient of Determination

➢ One very convenient and useful way of interpreting the


value of coefficient of correlation between two variables is
to use the square of coefficient of correlation, which is
called coefficient of determination. The coefficient of
determination thus equals r2.
➢ If the value of r= 0.9, will be 0.81 and this would mean
that 81% of the variation in the dependent variable has
been explained by the independent variable.
Reference:
1. Methods of Statistics - K. C. Bhuyan
Thank You

You might also like