Unit Five
Unit Five
MEASURES OF DISPERSION
General Objective
Know the meaning of the types of Measure of Dispersion and Measure of Shape
(skewness, Kurtosis) as well as the advantage of computing such measurements.
Know how to compute measure of dispersion such as Range, Quartile Deviation, Mean
Deviation and Standard Deviation
Know how to compute and interpret the coefficient of skewness and kurtosis
5.1 Introduction
In this unit we shall discus the most commonly used measure of dispersion like Range,
Quartile Deviation, Mean Deviation, Standard Deviation, coefficient of variation. And measure
shapes such as skewness and kurtosis.
We have seen that averages are representatives of a frequency distribution. But they fail to give
a complete picture of the distribution. They do not tell anything about the scatterness of
observations within the distribution. Suppose that we have the distribution of the yields (kg per
plot) of two paddy varieties from 5 plots each.
Variety 1: 45 42 42 41 40
Variety 2: 54 48 42 33 30
The mean yield for both varieties is 42kg. But we cannot say that the performances of the two
varieties are the same. The first variety may be preferred since it is more consistent in yield
performance.
1
From the above example, it is obvious that a measure of central tendency alone is not sufficient to
describe a frequency distribution.
Example
Consider the following data on weight of 7 individuals and compute range for weight.
= 47 - 15 = 32
For a grouped data, range is the difference between the upper-class boundary of the last
class interval and lower-class boundary of the first-class interval.
Properties of range
2
It’s easy to calculate and to understand
It can be affected by extreme values
It can’t be computed when the distribution has open ended classes.
It cannot take the entire data in to account.
It does not tell anything about the distribution of values in the series.
5.2.2 Quartile deviation:
Inter Quartile Range: Is the difference between 3rd and 1st quartile and it is a good indicator of
the absolute variability than range
I.Q.R = Q3Q1
Quartile Deviation (semi – inter quartile Range) is a half of inter quartile range
Q3 Q2 Q2 Q1 Q3 Q1
Q. D =
2 2
3
Example
21 – 22 10
23 – 24 22
25 – 26 20
27 – 28 14
29 – 30 14___
Total 80
Solution
Q1 LC b Q 1
N 4 cf w 22.5
20 10 2
23.4
f Q1 22
n 80
Q2 2 2 40, Q2 is 40 th obeservation
4 4
Therefore
Q2 L C b Q 2
2 N 4 cf w
f Q2
= 24.5
40 3 x 2
20
4
= 25.3
N
And Q3 3 60,
4
Q3 L C bQ 3
3 N 4
cf w
26.5
60 52
27.84
f Q3 14
The quartile deviation is more stable than the range as it defenses on two intermediate values.
This is not affected by extreme values since the extreme values are already removed. However,
quartile deviation also fails to take the values of all deviations.
5.2.3. Mean Deviation: Mean deviation is the mean of the deviations of individual values from
their average. The average may be either mean or median.
M.D=
XA for raw data. M.D =
f XA for grouped data.
n f
Where A is either mean or median.
Example 1
Consider the following data and compute mean deviation from mean
6
X Xi 59
i 1
6
5
Xi 53 56 57 59 63 66
Xi X
22
Mean deviation = = 3.67
n 6
Example 2
Calculate the mean deviation for the following data using both mean & median.
Xi :- 14 , 15 , 26 , 20 , 10 , Median 15 , mean = 17
Mean deviation = 10 15 14 15 15 15 20 15 26 15 22 / 5
Xi 10 14 15 20 26 Total
/di / = /xi - / 5 1 0 5 11 22
xi mean
M. D from mean = N
24
4 .8
5
6
Example 3
Calculate the mean deviation from mean and median
Xi 6 7 8 9 10 11 12
fi 3 6 9 13 8 5 4
Xi fi 18 42 72 117 80 55 48
Solution
Mean =
f x
i i
432 / 48 9
f i
th th
n n
1
2 2 24th 25th 9 9
Median = 9
2 2 2
Xi 6 7 8 9 10 11 12 Total
fi 3 6 9 13 8 5 4 48
| | 3 2 1 0 1 2 3
| | 9 12 9 0 8 10 12 60
M. D from median =
fi di
60
1.25
f i 48
7
5.2.4 Variance (S2 or 2)
Variance is the arithmetic mean of square deviation about the mean. When our data constitute
a sample, the variance averaging done by dividing the sum of squared deviation from mean
by n-1 and it is denoted by s2. When our data constitute an entire population variance
averaging done by dividing by N and denoted by 2 .It is commonly used absolute measure
of dispersion
- s2 =
n
x x i
2
N
1 2
- 2 =
N
X
1
; puplation var iance
n
xi xi 2 n
2
S2 = i 1 , since
n 1
xi
2 2
x x i 2 xi x x
2 2
= X i 2 x. xi x
2
= x i2 2.n x n x
2 2 2 2 2
= x i 2n x n x = x i nx
2
2 xi
n 2 x i
2
= x i
n = x i
n
2 x i
2
x i
n
S2 =
n 1
8
Variance for simple frequency distribution
xi x1 x2 . . . xk
fi f1 f2 . . . fk
n
f x
2
i i x
i 1
S2 =
n 1
where n f i
2 f i mi
2
mi f i 2
n
S
k
f 1
i
i 1
Activity
frequency 4 1 2 3
9
Properties of Variance
x x / M 1
2 2
Old x1 , x2 , . . . ., xn Sold i
=
c x i x 2
n 1
2
c 2 xi x
2
=
c x 2
i x
=
n 1 n 1
= C 2 Sold
2
3. When a constant c is added to all measurement of the distribution, the variance doesn’t change
xi (old) = xi , x2 , . . . . , xn
xi (new) = x1 + c1 , x2 + c , . . . . , xn + c
X new
x i c
x i c
n n
=
x i
nc
= X c
n n
S 2
new
x i c x c
x i x
S 2 old
n 1 n 1
i.e. c, c , c , . . . . , c , x c , S 2 0.
10
Example
If the mean & variance of x are 10 & 5 , respectively. Find the mean and variance of y, where
y = 10x - 5
n
y
i 1
i
10 x 5 10 x 5
y
n n n
= 10 x 5 = 10 (10) – 5 = 100 – 5 = 95
2 2
var (y) =
y i y
=
10 x 5 95
n 1 n 1
2
10 x 2
10
10 2
xi 10
2
=
n 1
= n 1
The standard deviation is defined as the square root of the mean of the squared deviations of
individual values from their mean.
X X
2
S.D =
n
- Its advantage over variance is that it is in the same unit as the variable under consideration.
- It is a measure of average variation in the set of data.
Example 1
Compute the variance & S.D. for the data given below.
xi 32 36 40 44 48 Total
11
frequency 2 5 8 4 1 20
Solution
Xi : 32 36 40 44 48 Total
Fi: 2 5 8 4 1 20
2 f i xi
2
x f 2 n
i i
S
f i 1
31376 788 2 20
= = 328.8/19 = 17.31
19
S S2
= 17.31 = 4.16
- If the s .d of set of data is small then the values are scattered widely a bout the mean.
S2 11 , S . D S2 11 3.316
12
Example 2
Solution
1–3 1 2 4 2
3–5 9 4 144 36
9 – 11 17 10 1700 170
11 – 13 10 12 1440 120
13 – 15 3 14 588 42
2 f m i i
2
f m 2 n
i i
S
f i 1
7016 800 2
= 100 = 6.22
99
S S2 6. 22 2.49
13
5. 3. Relative Measure of dispersion
Suppose that the two distributions to be compared are expressed in the same units and their means
are equal or nearly equal. Then their variability can be compared directly by using their standard
deviations. However, if their means are widely different or if they expressed indifferent units of
measurement, we can not use the standard deviation as such for comparing their variability. We
have to use the relative measures of dispersion in such situation.
5.3.1. Coefficient of variation (CV) : The CV is a unit free measure. It is always expressed as
percentage.
SD
CV = 100%
Mean
The CV will be small if the variation is small. Of the two groups, the one with less CV is said to
be more consistent. The coefficient of variation is unreliable if the mean is near zero. Also it is
unstable if the measurement scale used is not ratio scale. The CV is informative if it is given along
with the mean and standard deviation. Otherwise, it may be misleading.
Example
Consider the distribution of the yields (per plot) of two paddy varieties. For the first variety, the
mean and standard deviation are 60kg & 10kg, respectively. For the second variety, the mean and
standard deviation are 50kg & 9kg, respectively. Then we have,
It is apparent that the variability in first variety is less as compared to that in the second variety.
But in terms of standard deviation the interpretation could be reverse.
5.3.2. Coefficient of Mean Deviation: The coefficient of mean deviation is founded by dividing
the mean deviation by the measure of central tendency about which the deviation is computed.
14
Meandeviation Mean deviation
C.M.D = or C.M .D
Mean Median
Example 1
Coefficient of the mean deviation from mean & median for above example 2 is
24
M. D from mean = 4.8
5
4.4
= = 0.293
15
xi x
The standard score is denoted by Z and defined as Z
S
- This measures the deviation of individual observation from the mean of the total observation in
the unit of standard deviation and termed as Z – Score. .
The Z – scores of individuals in different groups are then added to give a true Measure of relative
performance
15
Example
A 84 75 159
B 74 85 159
Average mark for Accounting is 50 with standard deviation of 11. Whose performance is better
A or B?
⎧ Economics 84 60
⎪ 1.846
13
Z score for A
⎨ 75 50
⎪ Accounting 2.273
⎩ 11
⎧ Economics 74 60
1.077
⎪ 13
Z score for B
⎨
⎪ Accouniting 75 50
3.182
⎩ 11
= 4.25
Since B’s Z – score is higher; student B had good performance than student A.
We have seen that averages and measure of dispersion can help in describing the frequency
distribution. However, they are not sufficient to describe the nature of the distribution. For this
purpose, we use the other concepts known as Skewness and Kurtosis.
16
5.5.1. Skewness: Skewness means lack of symmetry. When the values are uniformly distributed
around the mean a distribution is said to be symmetrical. For example, the following distribution
is symmetrical about its mean 3.
Xi : 1 2 3 4 5
fi : 5 9 12 9 5
~
In a symmetrical distribution the mean, median and mode coincide, that is, X = X = X̂ .
~
X = X Xˆ Symmetrical distribution
When a distribution is skewed to the right; mean > median > mode. If we take income distribution
for different number of families; Income distribution is skewed to the right mean that a large
number of families have relatively have low income and a small number of families have
extremely high income. In such a case, the mean is pulled up by the extreme high incomes and the
relation among these three measures is as shown in figure. Here, we find that mean > median >
mode.
When a distribution is skewed to the left, then mode > median > mean. This is because here mean
is pulled down below the median by extremely low values.
Note that the income along the x- axis and the number of family in the y – axis. This is shown in
figure
17
Negatively Skewed Distribution Positively Skewed Distribution
Karl person’s Measure of skewness: In case the distribution is symmetric we will have
Arithmetic mean. = Median = Mode; unless they will not be equal if the distribution is skewed.
Therefore the distance between the A.M. and the Mode (A.M – Mode) can also be used as a
measure of skewness. However since the measure of skewness should be a pure number we define.
A . M Mode
Sk Where is the standard deviation of the distribution.
For distribution which are bell shaped and are moderately skewed, we have an approximate
relation ship between the A.M, Median and mode.
For a symmetrical distribution Sk = 0. If the distribution negatively skewed, then the value of Sk
is negative, and if it is positively skewed then Sk is positive. The range for values of Sk is from -3
to 3.
18
The other measure uses the β (beta) coefficient which is given by,
β1 = µ32/µ23. Where µ2 & µ3 are the second and the third central moments.
The second central moment is nothing but the variance. The sample estimate of this coefficient is
b1 = m32 /m23 where m2 & m3 are sample central moments given by,
X X f X X
2 2
m2 = or = variance and
n 1 n 1
f X X
3
X X
3
m3 = or
n 1 n 1
For a symmetrical distribution b1 is zero. And also Skewness is positive or negative depending
upon whether m3 is positive or negative. To minimize uncertainty for sign we can use anther
formula i.e.
3
1 1` 3
If 3 > 0 we will have 1 = 1 > 0 then the distribution is positively (right) skewed. Further,
If 3 < 0, 1 = 1 < 0, and we have a negatively (left) skewed distribution. Hence the shape of
Example
The first four moments about mean of the distribution are 0, 2.5, 0.7, and 18.75. Test the Skewness
of distribution
Solution
19
0 .7
r1
1.58 3
0.1772
r1 0 , it is positively (right ) skewed .
5.5.2 Kurtosis
Leptokurtic
Mesokurtic
Platykurtic
All the three curves are symmetrical about the mean. Still they are not of the same type. One
has different peak as compared to that of others. Curve (1) is known as meso-kurtic (normal curve);
curve (2) is known as leptokurtic (leaping curve) and curve (3) is known as platy-kurtic (flat curve).
4
β2 = µ4/µ22
4
20
X X
4
b2 = m4/m22, th
where m4 is the 4 central moment given by m4 =
n 1
The distribution is called meso-kurtic if the value of b2 = 3. When b2 is more than 3 the
distribution is said to be leptokurtic. And also, if b2 is less than 3 the distribution is said to be
platykurtic.
Example The measure of skewness and kurtosis are given below for data in table.
Value(xi) 3 4 5 6 7 8 9 10
Frequency(f) 4 6 10 26 24 15 10 5
Solution
f X X
2
2 i 275
m2 = s = = = 2.7777
n 1 99
f X X
3
i 43.8
m3 = = = -0.4424
n 1 99
f X X
4
i 2074.13
m4 = 20.9508
n 1 99
21
2
b1 =
m3
0.44242 0.0091 or 1
m3
0.4424 0.15927
3
m2 2.77773 m2 2.7777
m4 20.9508
b2 = 2.7153
m2
2
2.7777 2
The value of b2 is 2.7153 which is less than 3. Hence the distribution is platykurtic
Exercises
Number of students 2 6 8 4 20
Find
i. Range
ii. The first and third quartile
iii. Quartile deviation
iv. Mean and median deviation
v. Variance and standard deviation
2. The final exam of a course consists of two exams, mathematics and History. If a student
scored 66 in Mathematics and 80 in History. How ever, all students’ average score is 51
with a standard deviation 12 in mathematics and 72 with the standard deviation 16 in
history.
a. In which subject a student had better performance?
b. In which subject all students have similar (consistant) results?
3. For a moderately skewed frequency distribution, the mean is 10 and the median is 8.5. If
the coefficient of variation is 20%, find the Pearsonian coefficient of skewness and the
probable mode of the distribution.
22
4. Some characteristics of annually family income distribution (in Birr) in two regions is as
follows:
Region Mean Median Standard Deviation
23