0% found this document useful (0 votes)
50 views23 pages

Unit Five

Unit Five covers measures of dispersion and shape, including concepts like range, quartile deviation, mean deviation, variance, skewness, and kurtosis. It emphasizes the importance of these measures in understanding the variability and distribution of data beyond just central tendency. The unit provides definitions, properties, and examples for calculating these measures to aid in statistical analysis.

Uploaded by

abenihabib990
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views23 pages

Unit Five

Unit Five covers measures of dispersion and shape, including concepts like range, quartile deviation, mean deviation, variance, skewness, and kurtosis. It emphasizes the importance of these measures in understanding the variability and distribution of data beyond just central tendency. The unit provides definitions, properties, and examples for calculating these measures to aid in statistical analysis.

Uploaded by

abenihabib990
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT FIVE

MEASURES OF DISPERSION

General Objective

After completing this unit, you should be able to:

 Know the meaning of the types of Measure of Dispersion and Measure of Shape
(skewness, Kurtosis) as well as the advantage of computing such measurements.
 Know how to compute measure of dispersion such as Range, Quartile Deviation, Mean
Deviation and Standard Deviation
 Know how to compute and interpret the coefficient of skewness and kurtosis
5.1 Introduction

In this unit we shall discus the most commonly used measure of dispersion like Range,
Quartile Deviation, Mean Deviation, Standard Deviation, coefficient of variation. And measure
shapes such as skewness and kurtosis.

We have seen that averages are representatives of a frequency distribution. But they fail to give
a complete picture of the distribution. They do not tell anything about the scatterness of
observations within the distribution. Suppose that we have the distribution of the yields (kg per
plot) of two paddy varieties from 5 plots each.

The distribution as follows

Variety 1: 45 42 42 41 40

Variety 2: 54 48 42 33 30

The mean yield for both varieties is 42kg. But we cannot say that the performances of the two
varieties are the same. The first variety may be preferred since it is more consistent in yield
performance.

1
From the above example, it is obvious that a measure of central tendency alone is not sufficient to
describe a frequency distribution.

I n addition to it we should have a measure of scatterness of observations. The scatterness or


variation of observations from their average is called dispersion. There are different measures of
dispersion.

Desirable Properties of Measures of Dispersion

1. It should be based on all observations.


2. It should be easy to compute and to understand.
3. It should not be affected much by extreme values.
4. It should not be affected by sampling fluctuation
5.2. Absolute measures of dispersion
5.2.1 Range: The simplest measure of dispersion is the range. The range is the difference
between the two extreme values (highest and lowest value) of data. Range takes only
maximum and minimum values into account and not all the values. Hence it is a very
unstable or unreliable indicator of the amount of deviation.
- The major area in which range is applied is statistical quality control.
- It is also applicable in the cases where extreme values are important like maximum rainfall,
temperature, etc.
Range = Xmax - Xmin

Example

Consider the following data on weight of 7 individuals and compute range for weight.

24, 25, 30, 15, 47, and 35.

Range = maximum value - minimum value

= 47 - 15 = 32

For a grouped data, range is the difference between the upper-class boundary of the last
class interval and lower-class boundary of the first-class interval.
Properties of range
2
 It’s easy to calculate and to understand
 It can be affected by extreme values
 It can’t be computed when the distribution has open ended classes.
 It cannot take the entire data in to account.
 It does not tell anything about the distribution of values in the series.
5.2.2 Quartile deviation:
Inter Quartile Range: Is the difference between 3rd and 1st quartile and it is a good indicator of
the absolute variability than range

I.Q.R = Q3Q1

Quartile Deviation (semi – inter quartile Range) is a half of inter quartile range

Q3  Q2   Q2  Q1  Q3  Q1
Q. D  =
2 2

Properties of Quartile Deviations

i) The size of quartile deviation gives an indication about the uniformity. If Q. D is


small, it denotes large uniformity. Thus, a coefficient of quartile deviation is used
for comparing uniformity or variation in different distribution.
ii) Quartile deviation is not a measure of dispersion in the sense that it doesn’t show the
scatter around an average but only a distance on scale. As result it is regarded as a
measure of partition.
iii) It can be computed when the distribution has an open-ended class. it is quite suitable
in the case of open – ended distribution
iv) As compared to range, it is considered a superior measure of dispersion.
v) Since it not influenced by the extreme values in a distribution. It is particularly
suitable in highly skewed or irregular distribution.
vi) Like the range, it fails to cover all items in the distribution.
vii) It is not speak clearly for mathematical manipulation.
viii) It varies widely from sample to sample based on the same population.

3
Example

For the following frequency distribution find

a) Inter– quartile range.


b) Quartile deviation
Class limit Frequency

21 – 22 10

23 – 24 22

25 – 26 20

27 – 28 14

29 – 30 14___
Total 80
Solution

N/4 = 80/4 = 20, (20) th ordered observation

The 1st quartile class is 23 -24

Q1  LC b Q 1 
N 4  cf w   22.5 
20  10  2
 23.4
f Q1 22

n  80 
Q2  2    2    40, Q2 is 40 th obeservation
4  4

The class interval containing Q2 is 25 – 26.

Therefore

Q2  L C b Q 2 
2 N 4   cf  w
f Q2

= 24.5 
40  3 x 2
20

4
= 25.3

N
And Q3  3    60,
4

Q3 is 60th position observation.

The class limits containing Q3 is 27 – 28

Q3  L C bQ 3 
3 N 4  
 cf w
 26.5 
60  52 
 27.84
f Q3 14

a) Inter quartile range = Q3  Q1

= 27.64 - 23.44 = 4.23

b) Q . D  1 Q3  Q1   4.23 / 2  2.115


2

The quartile deviation is more stable than the range as it defenses on two intermediate values.
This is not affected by extreme values since the extreme values are already removed. However,
quartile deviation also fails to take the values of all deviations.

5.2.3. Mean Deviation: Mean deviation is the mean of the deviations of individual values from
their average. The average may be either mean or median.

M.D=
XA for raw data. M.D =
f XA for grouped data.
n f
Where A is either mean or median.

Example 1

Consider the following data and compute mean deviation from mean

53, 56, 57, 59, 63, and 66

6
X   Xi  59
i 1
6
5
Xi 53 56 57 59 63 66

Arithmetic deviation. From mean 6 3 2 0 4 7

 Xi  X
22
Mean deviation = =  3.67
n 6

 The data deviates on average 3.67 from the arithmetic mean

Example 2

Calculate the mean deviation for the following data using both mean & median.

Xi :- 14 , 15 , 26 , 20 , 10 , Median 15 , mean = 17

Mean deviation = 10  15  14  15  15  15  20  15  26  15  22 / 5

Xi 10 14 15 20 26 Total

/di / = /xi - / 5 1 0 5 11 22

/di / = /xi – Mean / 7 3 2 3 9 24

M. D from median =  xi  median



22
 4 .4
N 5

Mean Deviation from median is 4.4


 xi  mean
M. D from mean = N
24
  4 .8
5

6
Example 3
Calculate the mean deviation from mean and median

Xi 6 7 8 9 10 11 12
fi 3 6 9 13 8 5 4
Xi fi 18 42 72 117 80 55 48

Solution

Mean =
 f x
i i
 432 / 48  9
 f i

th th
n n 
     1
2 2 24th  25th 9  9
Median =       9
2 2 2

Xi 6 7 8 9 10 11 12 Total
fi 3 6 9 13 8 5 4 48

| | 3 2 1 0 1 2 3

| | 9 12 9 0 8 10 12 60

Where di = (Xi – median (or mean))

M. D from median =
 fi di

60
 1.25
f i 48

Property of Mean Deviation

- The mean deviation takes all values into consideration.


- It is fairly stable compared to range or quartile deviation. But it is not stable as standard
deviation. Since, it ignores signs of deviations.
- It is not possible to use for further statistical investigation.

7
5.2.4 Variance (S2 or 2)
Variance is the arithmetic mean of square deviation about the mean. When our data constitute
a sample, the variance averaging done by dividing the sum of squared deviation from mean
by n-1 and it is denoted by s2. When our data constitute an entire population variance
averaging done by dividing by N and denoted by 2 .It is commonly used absolute measure
of dispersion

- s2 =
 
n

 x x i
2

an unbiased estimator for population variance


i  1
n 1

N
1 2
- 2 =
N
 X   
1
; puplation var iance

The computing formula for variance is can be simplified as given

 n 
  xi   xi  2 n 
2

S2 = i  1  , since
n 1

 xi 
2 2
 x    x i  2 xi x  x 

2 2
= X i  2 x. xi  x
2
= x i2  2.n x  n x

2 2 2 2 2
= x i  2n x  n x = x i  nx

2
2   xi 
 n   2  x  i
2

= x i
n  = x i 
n
 

2  x  i
2

x i 
n
 S2 =
n 1
8
 Variance for simple frequency distribution

xi x1 x2 . . . xk
fi f1 f2 . . . fk
n

 f x 
2
i i  x
i 1
S2 =
n  1
where n  f i

 Determination of variance from grouped frequency distributions.


n
2
 f i mi  x 
s2  i 1
Where mi is mid value of class
(n  1)

Simplified formula used for computation is

2  f i mi
2
  mi f i 2
n
S 
 k 
 f  1 
 i
i 1 

Activity

Compute variance for the following frequency distribution

Class interval 1-5 6-10 11-15 16-20

frequency 4 1 2 3

9
Properties of Variance

1. The variance is always non – negative (S2  0 )


2. If every element in the distributions are multiplied by a constant C the new variance is
2
S new  C 2 S old
2

 x  x / M  1
2 2
Old x1 , x2 , . . . ., xn Sold  i

New cx1 , cx2 , . . . . , cxn S 2



 cx i  cx 
new
n 1

=
 c x i  x 2

n 1
2
c 2   xi  x 
2

=
 c x 2
i  x
=
n 1 n 1

= C 2 Sold
2

3. When a constant c is added to all measurement of the distribution, the variance doesn’t change

xi (old) = xi , x2 , . . . . , xn

xi (new) = x1 + c1 , x2 + c , . . . . , xn + c

X new 
 x i  c

x i  c
n n

=
x i

nc
= X  c
n n

S 2
new 
 x i  c  x  c  

 x i  x
 S 2 old
n  1 n 1

4. The variance of constant measured n times is zero.

i.e. c, c , c , . . . . , c , x  c , S 2  0.

10
Example

If the mean & variance of x are 10 & 5 , respectively. Find the mean and variance of y, where
y = 10x - 5
n

 y 
i 1
i
 10 x  5 10 x  5
y   
n n n

= 10 x  5 = 10 (10) – 5 = 100 – 5 = 95

2 2

var (y) =
y i  y
=
 10 x  5  95
n 1 n 1

2
10 x 2
 10
10 2
xi 10 
2
=
n 1
=  n 1

= 100 (5) = 500

5.2.5 Standard Deviation

The standard deviation is defined as the square root of the mean of the squared deviations of
individual values from their mean.

 X  X 
2

S.D =
n

- Its advantage over variance is that it is in the same unit as the variable under consideration.
- It is a measure of average variation in the set of data.
Example 1
Compute the variance & S.D. for the data given below.

xi 32 36 40 44 48 Total

11
frequency 2 5 8 4 1 20

Solution

Xi : 32 36 40 44 48 Total

Fi: 2 5 8 4 1 20

Xi fi: 64 180 320 176 48 788

Xi 2 fi: 2048 6480 12,800 7,744 2,304 31,376

2 f i xi
2
  x f 2 n
i i
S 
f i  1

31376  788 2 20
= = 328.8/19 = 17.31
19

 S  S2

= 17.31 = 4.16

- If the s .d of set of data is small then the values are scattered widely a bout the mean.

S2  11 , S . D  S2  11  3.316

12
Example 2

Calculate the S.D for the following grouped frequency distribution.

Class intervals Frequency(fi)


1–3 1
3–5 9
5–7 25
7–9 35
9 – 11 17
11 – 13 10
13 – 15 3
Totale 100

Solution

Class intervals Frequency (fi ) mi mi2fi mi fi

1–3 1 2 4 2

3–5 9 4 144 36

5–7 25 6 900 150

7–9 35 8 2240 280

9 – 11 17 10 1700 170

11 – 13 10 12 1440 120

13 – 15 3 14 588 42

Total 100 7016 800

2 f m i i
2
  f m  2 n
i i
S 
f i 1

7016  800 2
= 100 = 6.22
99

S  S2  6. 22  2.49

13
5. 3. Relative Measure of dispersion
Suppose that the two distributions to be compared are expressed in the same units and their means
are equal or nearly equal. Then their variability can be compared directly by using their standard
deviations. However, if their means are widely different or if they expressed indifferent units of
measurement, we can not use the standard deviation as such for comparing their variability. We
have to use the relative measures of dispersion in such situation.

5.3.1. Coefficient of variation (CV) : The CV is a unit free measure. It is always expressed as
percentage.

SD
CV = 100%
Mean

The CV will be small if the variation is small. Of the two groups, the one with less CV is said to
be more consistent. The coefficient of variation is unreliable if the mean is near zero. Also it is
unstable if the measurement scale used is not ratio scale. The CV is informative if it is given along
with the mean and standard deviation. Otherwise, it may be misleading.

Example

Consider the distribution of the yields (per plot) of two paddy varieties. For the first variety, the
mean and standard deviation are 60kg & 10kg, respectively. For the second variety, the mean and
standard deviation are 50kg & 9kg, respectively. Then we have,

CV = (10/60)100%=16.7%, for first variety.

CV = (9/50)100%=18.0%, for second variety.

It is apparent that the variability in first variety is less as compared to that in the second variety.
But in terms of standard deviation the interpretation could be reverse.

5.3.2. Coefficient of Mean Deviation: The coefficient of mean deviation is founded by dividing
the mean deviation by the measure of central tendency about which the deviation is computed.

- It is a relative measure of dispersion, coefficient of mean deviation can be computed as

14
Meandeviation Mean deviation
C.M.D = or C.M .D 
Mean Median

Example 1

Coefficient of the mean deviation from mean & median for above example 2 is

Mean Deviation from median is 4.4

24
M. D from mean =  4.8
5

mean deviatin from median


C. M. D from median =
median

4.4
= = 0.293
15

mean deviation from mean 4.8


C. M. D from mean = = = 0.283
mean 17

5.3 The standard Score-

 xi  x
The standard score is denoted by Z and defined as Z 
S

Where S – Standard deviation of the distribution

Xi each observation value

- This measures the deviation of individual observation from the mean of the total observation in
the unit of standard deviation and termed as Z – Score. .

The Z – scores of individuals in different groups are then added to give a true Measure of relative
performance

15
Example

Compare the performance of the following two students

Candidate Marks in economics Marks in Acct. Total

A 84 75 159

B 74 85 159

Average mark for Accounting is 50 with standard deviation of 11. Whose performance is better
A or B?

⎧ Economics 84  60
⎪  1.846
13
Z score for A
⎨ 75  50
⎪ Accounting  2.273
⎩ 11

Total Z score for A = 1.846 + 2.27 = 4.119

⎧ Economics 74  60
 1.077
⎪ 13
Z score for B

⎪ Accouniting 75  50
 3.182
⎩ 11

Total Z – Score for B = 1.077 + 3.182

= 4.25

Since B’s Z – score is higher; student B had good performance than student A.

5.5. Measure of shape

We have seen that averages and measure of dispersion can help in describing the frequency
distribution. However, they are not sufficient to describe the nature of the distribution. For this
purpose, we use the other concepts known as Skewness and Kurtosis.
16
5.5.1. Skewness: Skewness means lack of symmetry. When the values are uniformly distributed
around the mean a distribution is said to be symmetrical. For example, the following distribution
is symmetrical about its mean 3.

Xi : 1 2 3 4 5

fi : 5 9 12 9 5

~
In a symmetrical distribution the mean, median and mode coincide, that is, X = X = X̂ .

~
X = X  Xˆ Symmetrical distribution

When a distribution is skewed to the right; mean > median > mode. If we take income distribution
for different number of families; Income distribution is skewed to the right mean that a large
number of families have relatively have low income and a small number of families have
extremely high income. In such a case, the mean is pulled up by the extreme high incomes and the
relation among these three measures is as shown in figure. Here, we find that mean > median >
mode.

When a distribution is skewed to the left, then mode > median > mean. This is because here mean
is pulled down below the median by extremely low values.

Note that the income along the x- axis and the number of family in the y – axis. This is shown in
figure

17
Negatively Skewed Distribution Positively Skewed Distribution

Karl person’s Measure of skewness: In case the distribution is symmetric we will have

Arithmetic mean. = Median = Mode; unless they will not be equal if the distribution is skewed.

Therefore the distance between the A.M. and the Mode (A.M – Mode) can also be used as a
measure of skewness. However since the measure of skewness should be a pure number we define.

A . M  Mode
Sk  Where  is the standard deviation of the distribution.

For distribution which are bell shaped and are moderately skewed, we have an approximate
relation ship between the A.M, Median and mode.

A. M – Mode = 3 (A. M – Median)


3 A. M  Median 
Accordingly we may define skewness as follows Sk 

For a symmetrical distribution Sk = 0. If the distribution negatively skewed, then the value of Sk
is negative, and if it is positively skewed then Sk is positive. The range for values of Sk is from -3
to 3.

18
The other measure uses the β (beta) coefficient which is given by,

β1 = µ32/µ23. Where µ2 & µ3 are the second and the third central moments.

The second central moment is nothing but the variance. The sample estimate of this coefficient is

b1 = m32 /m23 where m2 & m3 are sample central moments given by,

 X  X   f X  X 
2 2

m2 = or = variance and
n 1 n 1

 f X  X 
3
 X  X 
3

m3 = or
n 1 n 1

For a symmetrical distribution b1 is zero. And also Skewness is positive or negative depending
upon whether m3 is positive or negative. To minimize uncertainty for sign we can use anther
formula i.e.

 3
 1  1`  3

Since 1 is free of the unit of measurement if 3 = 0, then 1 = 0,this is the situation of


symmetry.

If 3 > 0 we will have  1 = 1 > 0 then the distribution is positively (right) skewed. Further,

If  3 < 0,  1 = 1 < 0, and we have a negatively (left) skewed distribution. Hence the shape of

distribution determined based on the sign of 3rd central moment.

Example

The first four moments about mean of the distribution are 0, 2.5, 0.7, and 18.75. Test the Skewness
of distribution

Solution

19
0 .7
 r1 
1.58 3
 0.1772
r1  0 , it is positively (right ) skewed .

5.5.2 Kurtosis

A measure of the peakedness or convexity of a curve is known as Kurtosis.

Leptokurtic

Mesokurtic

Platykurtic

All the three curves are symmetrical about the mean. Still they are not of the same type. One
has different peak as compared to that of others. Curve (1) is known as meso-kurtic (normal curve);
curve (2) is known as leptokurtic (leaping curve) and curve (3) is known as platy-kurtic (flat curve).

Kurtosis is measured by Pearson’s coefficient, β2. It is given by

4
β2 = µ4/µ22 
4

The sample estimate of this coefficient is

20
 X  X 
4

b2 = m4/m22, th
where m4 is the 4 central moment given by m4 =
n 1
The distribution is called meso-kurtic if the value of b2 = 3. When b2 is more than 3 the
distribution is said to be leptokurtic. And also, if b2 is less than 3 the distribution is said to be
platykurtic.

Example The measure of skewness and kurtosis are given below for data in table.

Value(xi) 3 4 5 6 7 8 9 10

Frequency(f) 4 6 10 26 24 15 10 5

Solution

Value(xi) Frequency(f) d=X- X f*d2 f*d3

3 4 -3.7 54.76 -202.612

4 6 -2.7 43.74 -118.098

5 10 -1.7 28.90 -49.130

6 26 -0.7 12.74 -8.918

7 24 0.3 2.16 0.648

8 15 1.3 25.35 32.955

9 10 2.3 52.90 121.670

10 5 3.3 54.45 179.685

 f X  X 
2
2 i 275
m2 = s = = = 2.7777
n 1 99

 f X  X 
3
i  43.8
m3 = = = -0.4424
n 1 99

 f X  X 
4
i 2074.13
m4 =   20.9508
n 1 99
21
2

b1 =
m3

 0.44242  0.0091 or  1 
m3

 0.4424  0.15927
3
m2 2.77773 m2 2.7777

m4 20.9508
b2 =   2.7153
m2
2
2.7777 2

Since,  1 = - 0.15927, it is only slightly skewed. It is negatively skewed since m3 is negative.

The value of b2 is 2.7153 which is less than 3. Hence the distribution is platykurtic

Exercises

1. Consider the marks of 20 students out of 20% in statistics test as follows


Marks of Students’ 0-5 5-10 10-15 15-20 Total

Number of students 2 6 8 4 20
Find

i. Range
ii. The first and third quartile
iii. Quartile deviation
iv. Mean and median deviation
v. Variance and standard deviation
2. The final exam of a course consists of two exams, mathematics and History. If a student
scored 66 in Mathematics and 80 in History. How ever, all students’ average score is 51
with a standard deviation 12 in mathematics and 72 with the standard deviation 16 in
history.
a. In which subject a student had better performance?
b. In which subject all students have similar (consistant) results?
3. For a moderately skewed frequency distribution, the mean is 10 and the median is 8.5. If
the coefficient of variation is 20%, find the Pearsonian coefficient of skewness and the
probable mode of the distribution.

22
4. Some characteristics of annually family income distribution (in Birr) in two regions is as
follows:
Region Mean Median Standard Deviation

A 6250 5100 960

B 6980 5500 940

i. Calculate coefficient of skewness for each region


ii. For which region is, the income distribution more skewed. Give your
interpretation for this Region
iii. For which region is the income more consistent

23

You might also like