2.
6 Statistics and Queueing Theory
2.2 Karl pearson’s Correlation Co-efficient
Correlation is the study of relationship between two independent variables.
Karl pearson’s correlation co-efficient is
cov(x, y)
r = r(x, y) = rxy =
σx σy
where, P
xy
cov(x, y) = −xy
n
rP
x2
σx = − (x)2
n
rP
y2
σy = − (y)2
n
n is the number of data
P
x
x=
Pn
y
y=
n
Note: Correlation co-efficient between -1 and 1. i.e., −1 ≤ r ≤ 1
Problem 1 Calculate the Karl pearson’s co-efficient of correlation to the following data.
x 65 66 67 67 68 69 70 72
.
y 67 68 65 68 72 72 69 71
Solution:
X Y X2 Y2 XY
65 67 4225 4489 4355
66 68 4356 4624 4488
67 65 4489 4225 4355
67 68 4489 4624 4556
.
68 72 4624 5184 4896
69 72 4761 5184 4968
70 69 4900 4761 4830
72 71 5184 5041 5112
544 552 37028 38132 37560
Dr. P.Sambath, Asst.Professor(SG), SRM IST, KTR
Regression Methods 2.7
n=8
P
x 544
x = = = 68
Pn 8
y 552
y = = = 69
n
rP 8
r
x2 2
37028
σx = − (x) = − (68)2 = 2.12
n 8
rP r
y2 38132
σy = − (y)2 = − (69)2 = 2.34
n 8
P
xy 37560
cov(x, y) = −xy = − (68 × 69) = 3
n 8
cov(x, y) 3
∴ rxy = = = 0.6047
σx σy 2.12 × 2.34
2.3 Rank correlation
Spearsman’s rank correlation coefficient
6 d2i
P
ρ=1−
n(n2 − 1)
Where, di = xi − yi
Note: If ranks are repeated, then
d2i + C.F1 + C.F2 + · · ·
P
6
ρ=1−
n(n2 − 1)
Where, di = xi − yi
m(m2 − 1)
C.F’s are correction factor and it can be calculated by C.F = Here m is the number of
12
times, the data has been repeated.
Problem 1 Calculate the spearsman’s rank correlation to the following data.
x 68 64 75 50 64 80 75 40 55 64
.
y 62 58 68 45 81 60 68 48 50 70
Solution:
Dr. P.Sambath, Asst.Professor(SG), SRM IST, KTR
2.8 Statistics and Queueing Theory
X Y Rank of X Rank of Y di = xi − yi d2i
68 62 4 5 −1 1
64 58 6 7 −1 1
75 68 2.5 3.5 −1 1
50 45 9 10 −1 1
64 81 6 1 −5 25
.
80 60 1 6 −5 25
75 68 2.5 3.5 −1 1
40 48 10 9 1 1
55 50 8 8 0 0
64 70 6 2 4 16
d2i = 72
P
In value of X,
2+3
75 is repeated 2 times and which having the rank as 2 and 3. ∴ the rank of 75 = = 2.5 and
2
m(m2 − 1) 2(22 − 1)
C.F1 = = = 0.5
12 12
5+6+7
64 is repeated 3 times and which having the rank as 5, 6 and 7. ∴ the rank of 64 = = 6 and
3
2
m(m − 1) 2
3(3 − 1)
C.F2 = = =2
12 12
In value of Y,
3+4
68 is repeated 2 times and which having the rank as 3 and 4. ∴ the rank of 68 = = 3.5 and
2
m(m2 − 1) 2(22 − 1)
C.F3 = = = 0.5
12 12
d2i + C.F1 + C.F2 + C.F3
P
6
∴ ρ =1−
n(n2 − 1)
6 [72 + 0.5 + 2 + 0.5]
=1−
10(102 − 1)
= 1 − 0.4545
= 0.5454
Dr. P.Sambath, Asst.Professor(SG), SRM IST, KTR
Regression Methods 2.9
Exercise
Problem 1 10 competitors in a musical contest were ranked by 3 judges x, y and z. Find out which pair
of judges having the same likings of music.
x 1 2 3 4 5 6 7 8 9 10
y 10 6 7 9 5 4 3 2 1 8 .
z 8 10 9 7 6 5 4 3 2 1
2.4 Regression
Regression is the mathematical study of average relationship between the independent variables x and y.
Lines of regression of x on y
(x − x) = bxy (y − y)
Lines of regression of y on x
(y − y) = byx (x − x)
where bxy and byx are regression co-efficients. It is given by
P P
(x − x)(y − y) (x − x)(y − y)
bxy = and byx =
(y − y)2 (x − x)2
Note:
p
r= bxy byx
σx
bxy = r
σy
σy
byx = r
σx
The point of intersection of the lines of regression of y on x and x on y is the mean value
of x and y.
Problem 1 From the following data find
1. Two lines of regressions
2. Coefficient of correlation between the marks of economics and statistics
3. The most likely marks in statistics when the marks in economics is 30.
Marks in Economics 25 28 35 32 31 36 29 38 34 32
.
Marks in Statistics 43 46 49 41 36 32 31 30 33 39
Dr. P.Sambath, Asst.Professor(SG), SRM IST, KTR
2.10 Statistics and Queueing Theory
Solution:Let x be marks in Economics and y be marks in Statistics
P P
320
x y 380
x= = = 32 and y = = = 38
n 10 n 10
x y (x − x) (y − y) (x − x)2 (y − y)2 (x − x)(y − y)
25 43 −7 5 49 25 −35
28 46 −4 8 16 64 −32
35 49 3 11 9 121 33
32 41 0 3 0 9 0
31 36 −1 −2 1 4 2
.
36 32 4 −6 16 36 24
29 31 −3 −7 9 49 21
38 30 6 −8 36 64 −48
34 33 2 −5 4 25 −10
32 39 0 1 0 1 0
320 380 0 0 140 398 −93
P
(x − x)(y − y)
bxy =
(y − y)2
−93
= = −0.2336
398
and
P
(x − x)(y − y)
byx =
(x − x)2
−93
= = −0.6642
140 p √
correlation co-efficient is = bxy byx = −0.2336 × −0.6642 = 0.393
Line of regression of x on y is (x − x) = bxy (y − y)
(x − 32) = −0.2336(y − 38)
x − 32 = −0.2336y + 8.8768
x = −0.2336y + 8.8768 + 32
x = −0.2336y + 40.8768 − − − −(1)
Line of regression of y on x is(y − y) = byx (x − x)
(y − 38) = −0.6642(x − 32)
y − 38 = −0.6642x + 21.2544
y = −0.6642x + 21.2544 + 38
y = −0.6642x + 59.2544 − − − −(2)
Dr. P.Sambath, Asst.Professor(SG), SRM IST, KTR
Regression Methods 2.11
Now, to find y when x = 30
eqn.(2) ⇒ y = −0.6642(30) + 59.2544 = 39.3284
∴ Marks in Statistics = 39.32
Problem 2 Two variables x and y have the regression lines 3x + 2y − 26 = 0, 6x + y − 31 = 0 find the
1. mean value of x and y
2. correlation co-efficient between x and y
3. the variance of y when the variance of x is 25
Solution:
Given 3x + 2y − 26 = 0 (1)
6x + y − 31 = 0 (2)
1. mean value of x and y
Solving (1) and (2), we get x = 4 and y = 7
∴ x = 4 and y = 7
2. correlation co-efficient between x and y
Let 3x + 2y − 26 = 0 be line of regression of x on y
Then
2
3x + 2y − 26 = 0 ⇒ 3x = −2y + 26 ⇒ x = − y + 12
3
2
∴ bxy = −
3
Let 6x + y − 31 = 0 be line of regression of y on x
Then
6x + y − 31 = 0 ⇒ y = −6x + 31 ⇒ y = −6x + 31
∴ byx = −6
r
p 2
r= bxy byx = − × −6 > 2
3
Since the correlation coefficient should not exceed 1, 3x + 2y − 26 = 0 can not be a line of regression
of x on y and 6x + y − 31 = 0 can not be a line of regression of y on x. ∴ we have to consider
Dr. P.Sambath, Asst.Professor(SG), SRM IST, KTR
2.12 Statistics and Queueing Theory
3x + 2y − 26 = 0 be line of regression of y on x
3
3x + 2y − 26 = 0 ⇒ 2y = −3x + 26 ⇒ y = − y + 13
2
3
∴ byx = −
2
and consider 6x + y − 31 = 0 be line of regression of x on y
1 31
6x + y − 31 = 0 ⇒ 6x = −y + 31 ⇒ x = − y +
6 6
1
∴ bxy = −
6
r
p 3 1
r = bxy byx = − × − = 0.5 < 1
2 6
3. the variance of y when the variance of x is 25 (σx2 = 25)
i.e., σx = 5, we have to find σy
σx
bxy = r
σy
σx
σy =r
bxy
5
= 0.5 = −15
1
−
6
σy2 = 225
−−−−−−−
Dr. P.Sambath, Asst.Professor(SG), SRM IST, KTR