0% found this document useful (0 votes)
43 views18 pages

Measures of Dispersion Explained

This document discusses measures of dispersion, which quantify how spread out or varied values in a data set are. It defines absolute and relative measures of dispersion and describes some common measures like range, interquartile range, and quartile deviation. Examples are provided to demonstrate calculating these dispersion measures from data sets.

Uploaded by

Belay bekele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views18 pages

Measures of Dispersion Explained

This document discusses measures of dispersion, which quantify how spread out or varied values in a data set are. It defines absolute and relative measures of dispersion and describes some common measures like range, interquartile range, and quartile deviation. Examples are provided to demonstrate calculating these dispersion measures from data sets.

Uploaded by

Belay bekele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

CHAPTER 4: MEASURES OF DISPERSION (VARIATION)

4.1 Introduction
Just as central tendency can be measured by a number in the form of an average, the amount of
variation (dispersion, spread, or scatter) among the values in the data set can also be measured.
The measures of central tendency describe that the major part of values in the data set appears to
concentrate around a central value called average with the remaining values scattered
(distributed) on either sides of that value. But these measures do not reveal how these values are
dispersed (spread or scatter) on each side of the central value. The dispersion of values is
indicated by the extent to which these values tend to spread over an interval rather than cluster
closely around an average.
The term dispersion is generally used in two senses. Firstly, dispersion refers to the variations of
the items among themselves. If the value of all the items of a series is the same, there will be no
variation among different items of a series. Secondly, dispersion refers to the variation of the
items around an average. If the difference between the value of items and the average is large,
the dispersion will be high and on the other hand if the difference between the value of the items
and averaging is small, the dispersion will be low. Thus, dispersion is defined as scatteredness or
spreadness of the individual items in a given series.

After studying this chapter, you should be able to:

 Explain the meaning of measures of dispersion


 Compare two or more sets of data using relative measures of dispersion.
 Apply the Z-score to find out the relative standing of values.
 Explain measures of skewness and kurtosis.
Objectives of measuring Variation:
 To judge the reliability of measures of central tendency
 To control variability itself.
 To compare two or more groups of numbers in terms of their variability.
 To make further statistical analysis.
1.2 Absolute and Relative Measures of Dispersion
Absolute measures of dispersion: Absolute measure is expressed in the same
statistical unit in which the original data are given such as kilograms, tones etc. These

1|Page
measures are suitable for comparing the variability in two distributions having variables
expressed in the same units and of the same averaging size. These measures are not
suitable for comparing the variability in two distributions having variables expressed in
different units.

Absolute
measure of
dispersion

Based on Based on all


selected items items

Mean deviation
Range & Inter-
& Standard
quartile range deviation

2|Page
Relative measures of dispersion: A relative measure of dispersion is the ratio of a measure of
absolute dispersion to an appropriate average or the selected items of the data.

Relative
measure of
dispersion

Based on selected
items Based on
all items

Coefficient of range Coefficient of mean


and coefficient of deviation
quartile deviation &coefficient of
standard deviation
or coefficient of
variation

4.3 Types of Measures of Variation

4.3.1 The Range and Relative Range

Range is the simplest measures of dispersion. It is defined as the difference between the largest
and smallest value in a given set of data. Its formula is:

R=L−S

Where R=Range, L= Largest value in a given set of data, S= smallest value in a given set of data

For a continuous grouped distribution, the range may be obtained as:

 The difference between upper class limit of the last class and the lower class limit of the
first class, or

3|Page
 The difference between the largest class mark and the smallest class mark, or

 The difference between the upper class boundary of the last class and the lower class
boundary of the first class.

The range is used in describing like the maximum change in daily temperature, rainfall, etc.
When the sample size is small, it can be an adequate measure of variation. It is commonly used
in quality control.

The relative measures of range, also called coefficient of range, is defined as

Relative Range (RR)=¿

Example 4.1: Five students obtained the following marks in statistics:20 , 35 , 25 ,30 , 15. Find
the range and relative range

Solution: Here, L=35 ,∧S=15

Range=L−S=35−15=20

α
RR 2
=

Example 4.2: Find out range and relative range of the following given data.

Size 5-10 11-15 16-20 21-25 26-30

Frequency 4 9 15 30 40

4|Page
Solution: Here, L = Upper class limit of the largest class = 30, S = lower class limit of the
smallest class = 5

Range = 30 – 5 = 25

RR=¿ .

Merits of the Range

 It is well-defined, easy to compute and simple to understand.


 It helps in giving an idea about the variation, just by giving the lowest value and the
greatest value of variable.
Demerits of the Range

 It is not based on all observations of the series.


 It can’t be calculated in case of open-ended distribution.
 It is affected by sampling fluctuation.
 It is affected by extreme values in the series.
4.3.2 The Quartile Deviation and Coefficient of Quartile Deviation

Inter-quartile range and quartile deviation are other measures of dispersion. The difference
between the upper quartile ( Q3 ) and lower quartile ( Q1 ) is called inter-quartile range.
Symbolically,

I nter Quartile R ange (IQR)=Q3−Q1.

The inter-quartile ranges covers dispersion of middle 50% of the items of the series. Quartile
deviation, also called semi-inter-quartile range, is half of the difference between the upper and
lower quartile. That is, half of the inter-quartile range. Its formula is

Q3−Q1
Quartile Deviation ( QD )=
2

The relative measure of quartile deviation also called the coefficient of quartile deviation (CQD)
is defined as:

5|Page
Q3 −Q1
CQD=
Q 3+Q 1

Example 4.3: Find inter-quartile range, quartile deviation and coefficient of quartile
deviation from the following age of patients.

18, 59, 24, 42, 21, 23, 24, 32

Solution: First arrange the data in ascending order. 18, 21, 23, 24, 24, 32, 42, 59

( ) ( ) item = (2.25)
th th
n+1 8+1
Q1=¿ item=¿ th
item= 2 nd item + 0.25(3 rd item - 2 nd item) = 21 +
4 4
0.25(23 - 21) = 21.5

( ) ( ) item = (6.75)
th th
n+1 8+1
Q3=¿ 3 item=¿ 3 th
item = 6 th item + 0.75(7 th item - 6 th item) = 32 +
4 4
0.75(42 - 32) = 39.5

Q3−Q1 39.5−21.5
IQR=Q 3−Q1=39.5−21.5=18 , QD= = =9
2 2

Q3 −Q1 39.5−21.5 18
CQD= = = =0.295.
Q 3+Q 1 39.5+21.5 61

Example 4.4: Find inter-quartile range, quartile deviation and coefficient of quartile
deviation from the following data.

Marks 2 3 4 5 6 7 8 9
No. Of students 10 11 12 13 5 12 7 5

6|Page
Solution:

Marks No. Of students CF

2 10 10
3 11 21
4 12 33
5 13 46
6 5 51
7 12 63
8 7 70
9 5 75=N

Q 1= ( N4+ 1 )= 75+1
4
th
=19 item=3

Q 3=3 ( N4+ 1 )=3( 75+1


4 )
=7

IQR=Q 3−Q1=7−3=4

Q3−Q1 7−3
QD= = =2
2 2

Q3 −Q1 7−3
CQD= = =0.4 .
Q 3+Q 1 7+ 3

Remark: Q.D or CQD includes only the middle 50% of the observation.

Merits of QD

 It is well-defined, easy to compute and simple to understand.


 It helps in studying the middle 50% item in the series.
 It is not affected by the extreme items.
 It is useful in measuring variations in the case of open-ended distributions.
Demerits of QD

7|Page
 It is not based on all the items (it ignores 50% items, i.e., the first 25% and the last
25%).
 It is greatly influenced by sampling fluctuations.
 It is not amenable to algebraic manipulations.
4.3.3 The Mean Deviation and Coefficient of Mean Deviation

The mean deviation (MD) measures the average deviation of a set of observations about their
central value, generally the mean or the median, ignoring the plus/minus sign of the deviations.
In other words the mean deviation of a set of items is defined as the arithmetic mean of the
values of the absolute deviations from a given average. Depending up on the type of averages
used we have different mean deviations.
 The mean deviation of a sample of n observations x1, x2, . . ., xn is given as

MD=
∑ |X i− A|
n
Where | X i− A| denotes the absolute value of the deviation. Generally, arithmetic mean and
median are used in calculating mean deviation. So, A stands for the average used for calculating
MD . That is, A=median (~X ) ∨A=mean( X ).

 In case of grouped data, the formula for MD becomes

MD=
∑ f i|X i −A| , where X i is the class mark of the i th class, f i is the frequency of the
n
ith class and
n = ∑ f i.
1. The mean deviation about the arithmetic mean is, therefore, given by

MD ( X )=
∑ |X i− X|,for ungrouped data.
n

MD (X )=
∑ f i|X i −X| , for discrete data arranged in FD & for grouped frequency
n
distribution; where X i is the value or class mark of the ith class, f i is the frequency of the
ith class and n = ∑ f i.
Steps to calculate M.D for ( X )
 Find the arithmetic mean, X

8|Page
 Find the deviations of each reading from X
 Find the arithmetic mean of the deviations, ignoring sign.
2. The mean deviation about the median is also given by

MD (~
X )=
∑ |X i −~x| , f or ungrouped data.
n

MD (~
X )=
∑ f i|X i −~x| , for discrete data arranged in FD & for grouped frequency
n
distribution; where X i is value or the class mark of the ith class, f i is the frequency of the
ith class and n = ∑ f i.
~
Steps to calculate M.D ( X )
 Find the median, ~ X
~
 Find the deviations of each reading from X
 Find the arithmetic mean of the deviations, ignoring sign.
3. The mean deviation about the mode is also given by

MD ( ^x )=
∑|X i−^x| ,for ungrouped data.
n

MD (^x )=
∑ f i|X i−^x| , for discrete data arranged in FD & for grouped frequency
n
distribution; where X i is value or the class mark of the ith class, f i is the frequency of the
ith class and n = ∑ f i.
Steps to calculate M.D ( ^x )
 Find the mode, ^x
 Find the deviations of each reading from ^x
 Find the arithmetic mean of the deviations, ignoring sign.
Example 4.5: The following are the number of visit made by ten mothers to the local doctor’s
surgery. 8, 6, 5, 5, 7, 4, 5, 9, 7, 4. Find mean deviation about mean, median and mode.
Solution:
First calculate the three averages
X =6, ~ X=5.5, ^x =5
Then take the deviations of each observation from these averages.
xi 4 4 5 5 5 6 7 7 8 9 Total
9|Page
| X i−X| 2 2 1 1 1 0 1 1 2 3 14

| X i−~x| 1.5 1.5 0.5 0.5 0.5 0.5 1.5 1.5 2.5 3.5 14

|X i− ^X| 1 1 0 0 0 1 2 2 3 4 14

Since the distribution is ungrouped the mean deviation about mean, median and mode:

MD ( X )=
∑ |X i− X|= 14 =1.4
n 10

MD (~
X )=
∑ |X i −~x|= 14 =1.4
n 10

MD ( ^x )=
∑|X i−^x|= 14 =1.4.
n 10

Merits of MD

 It is well-defined, easy to compute and simple to understand.


 It is based on all observations
 It is not greatly affected by the extreme items
 It can be calculated by using any average
Demerit of MD

 It does not take in to account the signs of the deviations of items from the average
Remark: Of all the mean deviations taken about different averages or any arbitrary value, the
mean deviation about the median has the smallest value.

Coefficient of mean deviation (CMD):


The relative measure of mean deviation, also called the coefficient of mean deviation is obtained
by dividing mean deviation by the particular average used in computing mean deviation. Thus,

 CMD about the arithmetic mean is given by:


MD (X )
CMD ( X)= , where MD is the mean deviation calculated about the arithmetic
X
mean.
 CMD about the median is given by:

10 | P a g e
~
~ MD ( X )
CMD( X)= ~ In which case MD is calculated about the median of the
X
observations.

 CMD about the mode is given by:


MD( ^x )
CMD ( x^ )= in which case MD is calculated about the mode of the observations.
^x

Example 4.6
Calculate the coefficient of mean deviation about the mean, median and mode for the data in
Example 4.5 above.
Solution:
MD ( X ) 1.4
CMD ( X )= = =0.23
X 6
~
MD ( X ) 1.4
CMD (~
X )= ~ = =0.25
X 5.5
MD ( ^x ) 1.4
CMD ( ^x )= = =0.28
x^ 5

4.3.4 The Variance, Standard Deviation and Coefficient of Variation

Variance and Standard Deviation

Like the mean deviation, the variance is also based on all observations in a set of data. But
the variance is the average of squared deviations from the mean. Recall that the sum of squared
deviations is minimum only when taken from the mean. Squared deviations are mathematically
manipulated than absolute deviations. Thus, if we averaged the squared deviations from the
mean and take the square root of the result (to compensate for the fact that the deviations were
squared), we obtain the standard deviation. This overcomes the limitation of the mean deviation.

Population Variance (σ 2)

11 | P a g e
If we divide the squared variation by the number of values in the population, we get something
called the population variance. This variance is the "average squared deviation from the mean".
 For ungrouped data
N

∑ (X i −μ)2
[∑ ]
N
2 i =1 1 , where μ is the population arithmetic mean and N is
σ = = X i2−N μ 2
N N i=1

the total number of observations in the population.

 For discrete data arranged in FD and for continuous grouped data

σ =
2 ∑ f i ( X i−μ)2 = 1 [ ∑ f X i −N μ ]where μ is the population arithmetic mean, X i is the value
2 2
i
N N
or class mark of the ith class, f i is the frequency of the ithclass and N=∑ f i
Sample Variance ( S2)
One would expect the sample variance to simply be the population variance with the population
mean replaced by the sample mean. However, one of the major uses of statistics is to estimate
the corresponding parameter. This formula has the problem that the estimated value isn't the
same as the parameter. To offset this, the sum of the squares of the deviations is divided by one
less than the sample size.
 For ungrouped data
n

∑ (xi −x)2
[∑ ]
n
1 where x is the sample arithmetic mean and n is the
S2= i=1 = xi2−n x 2
n−1 n−1 i=1

total number of observations in the sample.


2
If the values have frequencies fi (i=1,2,…,m), then the sample variance is given by:

S=
∑2f i ( x i−x)
2

n−1
=
1
n−1
[ ∑ f i x i −n x ]
2 2

or
α
 For discrete data arranged in FD and for grouped data

12 | P a g e
2
S=
∑ f i ( x i−x)
2
=
1
[∑ f i x i −n x ]where x is the sample arithmetic mean, x i is the value or
2 2
n−1 n−1
class mark of the ith class, f i is the frequency of the ithclass and n=∑ f i.

The Standard Deviation


There is a problem with variances. Recall that the deviations were squared. That means that the
units were also squared. To get the units back the same as the original data values, the square
root must be taken.
 Population Standard Deviation (σ )
σ =√ σ 2 where σ is the population variance.
2

 Sample Standard Deviation ( S )


S= √ S2 where S is the sample variance.
2

Example 4.7: Find the sample variance and standard deviation for frequency distribution of height
in cms of students in a AU given below.

Heights in cms 150 152 154 156 158 160 162 164 166

Number of students 28 40 52 100 60 48 32 20 7

Solution: Prepare the following table:

xi fi fixi xi2 fixi2


150 28 4200 22500 630000

924160
152 40 6080 23104
1233232
154 52 8008 23716
2433600
156 100 15600 24336
1497840
158 60 9480 24964
1228800
160 48 7680 25600

13 | P a g e
839808
162 32 5184 26244
537920
164 20 3280 26896
192892
166 7 1162 27556
224916 9518252
Sum 387 60674

Thus, n=∑ f i=387 , ∑ f i x i=60674 , ∑ f i x i2=9518252 , ∑ xi2=224916.

1
2
S=
n−1
[ ∑ f i x i −n x ]
2 2

[ ( ) ]= 3861 ( 5760.54) =14.92,∧¿


2
1 60674
= 9518252−387
386 387

S= √ 14.92=3.86

Example 4.8: Calculate the sample variance and standard deviation of the blood glucose level,
in milligrams per deciliter, for 60 patients shown below.

Class limit 55 – 63 64 – 72 73 – 81 82 – 90 91 – 99 100 – 108 109 –117

Frequency 9 5 12 17 7 6 4

Solution: In a continuous F.D., xi is the class mark representing the ith class.

Class limit α α −Z α f i xi
2

55 – 63 59 9 531 31329

14 | P a g e
64 – 72 68 5 340 23120

73 – 81 77 12 924 71148

82 – 90 86 17 1462 125732

91 – 99 95 7 665 63175

100 – 108 104 6 624 64896

109 –117 113 4 452 51076

Total 60 4998 430476

Where, n=∑ f i=60 , x =


∑ f i x i = 4998 =83.3 ,
n 60
∑ f i x i2=430476, so that

1 1 14142.6
2
S=
n−1
[ ∑ f i x i −n x ] =
2 2
59
[ 430476−60 ( 83.3 ) ]=
2
59
=239.71 ,

S= √ 239.71 = 15.48

Properties of Variance & Standard Deviation

1. If a constant is added to (or subtracted from) all the values, the variance remains the
H 0 : μ= μ0
same; i.e., for any constant k, .

Example 4.9 Consider the 6 sample values xi: 54, 52,53,50,51, and 52.

μ≠μ 0
The sample variance is 2 = . Now, subtract 50 from each value to get:

−Zα/2<ZC<Zα/2 ZC >Z α/2 or Z C <−Z α /2


: 4, 2, 3, 0, 1, 2; and, the variance of this new series is 2. i.e. .

1. If each and every value is multiplied by a non-zero constant (k), the standard deviation is
ZC =Z α / 2
multiplied by /k/ and the variance is multiplied by k2 ; i.e., .

15 | P a g e
2. Both the variance and the standard deviation are give more weight to extreme values and
less to those which are near to the mean.
Coefficient of Variation
The standard deviation is an absolute measure of dispersion. The corresponding relative measure
is known as the coefficient of variation (CV).
Of course, standard deviation is an absolute measure of dispersion that expresses the variation in
the same unit as the original data but it can not be the sole basis for comparing two distributions.
For instance, if we have a standard deviation of 10 and a mean of 5, the values vary by an
amount twice as large as the mean itself. If, on the other hand, we have a standard deviation of
10 and a mean of 5000, the variation relative to the mean is significant. Therefore, we cannot
know the dispersion of a set of data until we know the standard deviation, the mean, and how the
standard deviation compares with the mean.
Coefficient of variation is used in such problems where we want to compare the variability of
two or more different series. Coefficient of variation is the ratio of the standard deviation to the
arithmetic mean, usually expressed in percent.
Standard deviation
CV = ×100 %
mean
For population data:
σ
CV = × 100
μ
Where σ is the population standard deviation and μ is population mean.
For sample data:
S
CV = ×100
x
Where S is the sample standard deviation and x is sample mean.
Remark: A distribution having less coefficient of variation is said to be less variable or more
consistent or more uniform or more homogeneous.
Example 4.10: One patient’s blood pressure, measured daily over several weeks, averaged 182
with a standard deviation of 12.6, while that of another patient averaged 124 with a standard
deviation of 9.4. Which patient’s blood pressure is relatively more variable?

16 | P a g e
Solution:

orZC =−Zα/2 μ>μ 0


Given: S1=12.6 =182 S2=9.4 = 124

S1 12.6
CV1 = ×100% = ×100 % = 6.923
x1 182

S2 9.4
CV2 = ×100% = × 100 % = 7.58
x2 124

Blood pressure of the second patient is relatively more variable.

4.4 Standard Scores (Z-Scores)

A standard score for sample value in a data set is obtained by subtracting the mean of the data
set from the value and dividing the result by the standard deviation of the data set. Basically, the
standard score (z-score) tells us how many standard deviations a specific value is above or below
the mean value of the data set. That is, the z-score is the number of standard deviations the data
value falls above (positive z-score) or below (negative z-score) the mean for the data set.

Z-score computed from the population

X −μ
Z−score=
σ

Z-score computed from the sample

X −X
Z−score=
S

Example 4.11: What is the Z-score for the value of 14 in the following sample data set?

3 8 6 14 4 12 7 10

Solution:

17 | P a g e
14−8
X = 8, S = 3.8173 thus, Z = ≈ 1.57
3.8173

 The data value of 14 is located 1.57 standard deviations above the mean 8 because the
z-score is positive.

Example 4.12: Suppose that a student scored 66 in Statistics and 80 in Biology. The score of the
summary of the courses is given below.
Course Average score Standard deviation of the score
Statistics 51 12
Biology 72 16

In which course did the student scored better as compared to his classmates?
Solution:
X−x 66−51 15
Z-score of student in Statistics: Z= = = =1.25
S 12 12

X−x 80−72 8
Z-score of student in Biology: Z= = = =0.5
S 16 16

From these two standard scores, we can conclude that the student has scored better in Statistics
course relative to his classmates than in Biology course.

18 | P a g e

You might also like