Lesson 3.
Measures of Central Tendency and Variability
1. Measures of Central Tendency
2. Measures of Variability
Introduction
Data presentation such as textual, tabular and graphical is essential to
easily comprehend information that a researcher wants to input, however, these
are nor enough for a comprehensive discussion of the data. To completely
describe the data, numerical measures are necessary to give information on the
specific characteristics of the data distribution.
Learning Outcomes:
Upon completion of this unit, you should be able to:
1. Compute the most appropriate measures of central location and
measures of dispersion on a given set of data.
2. Describe and interpret the numerical value arrive for every set of data.
Discussion
Frequency distributions provide useful behaviour of the data. However, they do
not provide with measures, which could quantitatively summarize the
characteristics of the population. Hence, we further need to come up with other
measurable characteristics of the data to describe the population
Quantities that describe statistical data are numerical descriptive measures. They
are quantities computed from a given set of observations and are used to derive
information from data collected by the researcher. There are several descriptive
measures. The most commonly used are the measures of location, dispersion,
skewness, and kurtosis.
Measures of Central Location
A measure of location is a value within the range of the data, which describes its
location or position relative to the entire set of data. The three measures of central
location are the Mean, Median and Mode.
minimum – is the smallest value in the data set.
maximum – is the largest value in the data set.
Measure of Central Tendency
A measure of central tendency describes the "center" of a given set of data. It is
a single value about which the observations tend to cluster. The common
measures of central tendency are the arithmetic mean or simply mean, median
and mode.
a) Arithmetic Mean (or simply, mean)
The mean is denoted by 𝑡ℎ𝑒 𝑠𝑦𝑚𝑏𝑜𝑙 𝑋̅ “read as x-bar for sample mean and
𝑎𝑛𝑑 𝑡ℎ𝑒 𝑔𝑟𝑒𝑒𝑘 𝑠𝑦𝑚𝑏𝑜𝑙 𝜇 for population mean.
The arithmetic mean is the average of the measurements in a set of data or the
sum of a set of measurements divided by the number of measurements in the set.
For ungrouped data it is computed by:
∑𝑁
𝑖=1 𝑋 𝑥1 +𝑥2+⋯+𝑥𝑁
𝑋̅ = 𝑖
=
𝑁 𝑁
Illustration 1. The mean of the data set 3,2,3,4,5,4,7,6,8. Is
3+2+3+4+5+4+7+6+8 42
𝑋̅ = = = 4.666666666667 = 𝟒. 𝟔𝟕
9 9
Example 1. Consider the following set of measurements:
a) 83, 84, 78, 87, 93, 76, 75, 87, 86, 92
b) 15, 30, 16, 28, 25, 19, 20, 25, 22, 18, 19, 17, 24
In the above example 1 a) with n = 10 measurements, the sum of these is
841. Therefore, the arithmetic mean is 841 divided by 10 or 84.1.
𝑋̅ = ∑𝑛𝑖=1 𝑋𝑖 = 𝑋1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑛
83 +84+78+87+93+76+75++87+86+92 841
For a Mean= 𝑋̅ = = =84.10
10 10
~
b) Median - the middle value of an array, denoted by X
To compute the median of ungrouped data, the set of observations are
arranged in an increasing or decreasing order of magnitude. Then, the point such
that half the observations fall above it and half below it is the median. The median
is the middle value when the number of observations is odd or the average of the
two middle values when the number of observations is even.
The Arrangement is either increasing or decreasing
for ungrouped data
𝑋𝑁+1 𝑖𝑓 𝑁 𝑖𝑠 𝑂𝑑𝑑
2
𝑋̃ = {𝑋𝑁 +𝑋𝑁+1
2 2
𝑖𝑓 𝑁 𝑖𝑠 𝑒𝑣𝑒𝑛
2
Illustration1: the median of 3,2,3,4,5,4,7,6,8.
First arrange the data set in increasing or decreasing
2, 3, 3, 4, 4, 5,6,7,8,
𝑁+1
N= 9, therefore = 5 means the fifth data set that is 4.
2
Illustration2: the median of 3,2,3,4,5,4,7,6,5,8.
First arrange the data set in increasing or decreasing
2, 3, 3, 4, 4, 5, 5, 6,7,8,
𝑋𝑁 +𝑋𝑁
+1 5+6
N= 10, therefore 2 2
= = 5.5 means it’s the midway between 5th data set
2 2
and 6th data set. 2, 3, 3, 4, 4, 5,5, 6,7,8,The fifth data set is 4 and sixth data set is 5,
4+5
so the median 2 is 4.5
Consider the following set of measurements:
a) 83, 84, 78, 87, 93, 76, 75, 87, 86, 92
b) 15, 30, 16, 28, 25, 19, 20, 25, 22, 18, 19, 17, 24
To solve for the median, we need to arrange first the set of measurements in
each either in the ascending or descending order. And for letter example a) we
have
75, 76, 78, 83, 84, 86, 87, 87, 92, 93
Since n = 10 is even, we consider the average of the two middle values. The two
middle values are 84 and 86 and therefore, the median = (84 + 86)/2 = 85.
In example 1 b), the arrangement of the data is
15, 16, 17, 18, 19, 19, 20, 22, 24, 25, 25, 28, 30
Since n = 13 measurements and n is odd, therefore the middle score which is 20 is
the median.
Median = 20.
c) Mode – is the observation which occurs most frequently in the data set. From
the French word moda which means fashion.
denoted by X .
Illustration 1: what is the mode of the given data set: 2, 3, 3, 4, 4, 5,6,7,8. Since the
most common is 4 so the mode is 4
example 1 a), note that 87 is the number that appears most frequently (i.e. it
appears twice) while the others appear only once, hence the mode ( X ) is 87.
This is called unimodal. A unimodal has 1 mode.
In example 1 b), there are two values that appear twice in the set of data.
These are 19 and 25, hence the mode is 19 & 25. Mode = 19, 25. This is called
bimodal. A bimodal has 2 modes.
We note and remember that the median and mode respond only to some
changes in the terms, while the mean responds to every change in the terms. It is
for this reason that the mean is the most used measure of central tendency
because it is the average value of the distribution.
Summary of answers on the computation of mean, median & mode for
Ungrouped Data.
a) 83, 84, 78, 87, 93, 76, 75, 87, 86, 92
1. Mean = 84.1
2. Median = 85
3. Mode = 87
b) 15, 30, 16, 28, 25, 19, 20, 25, 22, 18, 19, 17, 24
1. Mean = 19.5
2. Median = 20
3. Mode = 19, 25
Computation for The mean, median and mode f𝒐𝒓 𝒈𝒓𝒐𝒖𝒑𝒆𝒅 𝒅𝒂𝒕𝒂
The Mean
The formula for mean is
∑ 𝑓𝑋
𝑋̅ = 𝑁
Where: f is the frequency, X is the midpoint and N is the total
observations
Example. Given the data set in table 1a. Find the mean
classes f
50-54 2
55-59 4
60-64 5
65-69 6
70-74 8
75-79 6
80-84 4
85-89 3
90-94 2
n=40
Table 1 a
The first step is to find the midpoint (x) and the product of f and X
The next step is to get the sum of fX, then substitute in the formula for finding
the mean
X
classes f (midpoint) fX
50-54 2 52 2 *52= 104
55-59 4 57 4 *57= 228
60-64 5 62 5 *62= 310
65-69 6 67 6 *67= 402
70-74 8 72 8 *72= 576
75-79 6 77 6 *77= 462
80-84 4 82 4 *82= 328
85-89 3 87 3 *87= 261
90-94 2 92 2 *92= 184
n=40 ∑ 𝑓𝑋 = 2855
Table 1 b
∑ 𝑓𝑋 2855
𝑋̅ = = = 71.38
𝑁 40
The Median
for grouped data the formula is
𝑁
− 𝑐𝑓𝑏
̃
𝑋 = 𝐿𝑚𝑒 + ( 2 ) 𝑐𝑚𝑒
𝑓𝑚𝑒
where: 𝐿𝑚𝑒 = lower boundary of the median class
𝑐𝑓𝑏 = cumulative frequency of the interval below the median class
𝑓𝑚𝑒 = frequency of the median class
𝑐𝑚𝑒 = size of the median class
Example given the data set below find the median
classes f
50-54 2
55-59 4
60-64 5
65-69 6
70-74 8
75-79 6
80-84 4
85-89 3
90-94 2
40
Table 1 a
The first thing to do is set up the table by finding the lower and upper limit, the
second is to find the less than cumulative frequency.
Lower upper
classes f limit limit <cf
50-54 2 49.5 54.5 2
55-59 4 54.5 59.5 6
60-64 5 59.5 64.5 11
65-69 6 64.5 69.5 17
70-74 8 69.5 74.5 25
75-79 6 74.5 79.5 31
80-84 4 79.5 84.5 35
85-89 3 84.5 89.5 38
90-94 2 89.5 94.5 40
40
Table 1c
Step 3 find the following
𝑛 40
a) = = 𝟐0
2 2
b) Locate the 20th score on the cumulative frequency
Lower upper
classes f limit limit <cf
50-54 2 49.5 54.5 2
55-59 4 54.5 59.5 6
60-64 5 59.5 64.5 11
65-69 6 64.5 69.5 17
70-74 8 69.5 74.5 25
75-79 6 74.5 79.5 31
80-84 4 79.5 84.5 35
85-89 3 84.5 89.5 38
90-94 2 89.5 94.5 40
40
Table 1 c
The 20th score falls at row interval 70-74
c. the lower limit of the median class (Lme)-is 69.5.
cf b
d. the = cumulative frequency of the interval below the median class is
17
f me
e. the = frequency of the median class is 8
c me
f. = size of the median class is 5
substitute in the formula we have
N
~ cf b 20 17
X Lm e 2 cm e 69.5 3
5 69.5 5 71.38
f m e 8 8
The Mode
The class interval with the highest frequency is the modal class. The modal
value is computed based on the formula:
𝑑1
̂ = 𝐿𝑚𝑜 + (
𝑿 ) 𝑐𝑚𝑜
𝑑1 +𝑑2
where: 𝐿𝑚𝑜 = lower boundary of the modal class
𝑑1 = difference between the frequency of modal class and
the frequency of the next lower class
𝑑2 = difference between the frequency of modal class and
the frequency of the next higher class
𝑐𝑚𝑜 = size of the modal class.
. example. Given the data set below. Find the mode
classes f
50-54 2
55-59 4
60-64 5
65-69 6
70-74 8
75-79 6
80-84 4
85-89 3
90-94 2
n=40
Table 1a
The first step is to find the most common or the highest frequency in the given
data set, and it is 8.
The second step is to find the following
L mo
a) = lower boundary of the modal class=69.5
d1
b) = difference between the frequency of modal class and
the frequency of the next lower class
=8-6=2
d
c) 2 = difference between the frequency of modal class and
the frequency of the next higher class
=8-6=2
c mo
d) = size of the modal class.
=5
d1 2
X Lm o cm o 69.5 5 72
d1 d 2 22
Quantiles:
The computation of quantiles is similar to median. Median is divided the dat
set in to two
Three types
a) Quartile –the data set is divided in to 4
b) Decile- data set is divided into ten
c) Percentile –data set is divided into 100
Example
Find the sixth decile and 60th percentile of the data from table 1c.
classes f <cf
50-54 2 2
55-59 4 6
60-64 5 11
65-69 6 17
70-74 8 25
75-79 6 31
80-84 4 35
85-89 3 38
90-94 2 40
40
To find for the sixth decile
nN/10=6*40/10=24 the 24th data set so
LDn=69.5; cfb=17; fdn=8
nN
cf b 24 17
Dn LDn 10 c 69.5
5 69.86
f dn 8
To find for the sixth percentile
nN/10=60*40/100=24 the 24th data set so
LDn=69.5; cfb=17; fdn=8
nN
cf b 24 17
Pn LDn 100 c 69.5 5 69.86
f dn 8
From the above computation sixth decile is the same as 6oth percentile.
Now for D1( first decile) = P10 (tenth percentile)
Median = fifth decile=5oth percentile
Measures of Variability
Measure of central tendency means to describe the given set of data.
These measures indicated the point where the items are centrally located.
However, they do not show whether the terms in the distribution are far from or
close to each other.
For instance take five sets of observations:
Set A: 15, 15, 17, 18, 20
Set B: 15, 16, 16, 18, 20
Set C: 14, 15, 16, 19, 21
Set D: 11, 13, 18, 18, 25
Set E: 14, 15, 18, 19, 19
The five sets of observations have the same mean of 17, but do not totally
describe each of the five sets.
The measures of central position are of little value unless degree of spread
or variability which occurs about them are given. Hence, the description of the
set of data becomes more meaningful if the degree of clustering a central point
is measured. Information on how apart the observations are from each other in
every set will be very useful. Set D is the most variable. In set A and B we cannot
right away see the spread of the values of the items.
All these can be answered through the use of the measures of spread or
variability.
Range Is the simplest of the measures of spread or variability? It is the
difference between the highest and the lowest score. From the above illustration
Set A and B the range is 5, while C is 7, D is 14 and E is 5.
Although range is the easiest to compute and easiest to understand, it is
also least satisfactory since its values is dependent only upon the two extremes
and does not consider the scatter of the values in between these two extremes.
For instance, consider the following test scores of two students:
Maria 17 18 7 15 14 13
Ana 18 10 17 11 18 10
If we compare the test scores we see that Maria’s scores have higher range
than Ana. These ranges tell us that Maria’s score are apparently more scattered
than Ana. If we look closely at Maria’s score, except for 7, her scores are more
consistent or more clustered than Ana. Can we say that Maria’s score are more
scattered or variable than Ana?
The range is not considered a stable measure of variability because its
values can fluctuate greatly with the change in just a single score-either the
highest or the lowest.
Mean Absolute Deviation
- to arrive at a more reliable indicator of reliability or spread in the distribution we
should consider the value of each individual score and determine the amount by
which each varies the mean of the distribution.
We consider the extent to which each individual score in a distribution
deviates from the mean from each score.
__
XX
MAD
N For ungrouped data
for instance, consider the illustration on set A:
Values Absolute deviation
15 2
15 2
17 0
18 1
20 3
C
1.6
∑│X - X │= 8 MAD = 5
This would mean that, on the average, the values deviated from the mean
value of 17 is 1.6.
__
f X X
MAD
N for grouped data
Note: Although it gives a better approximation of the spread of the distribution
than the range or the quartile deviation it does not lend itself readily to
mathematical treatment for further analysis.
Standard Deviation
- Is a special form of average deviations from the mean, it is therefore also
affected by all the individual values of the items in the distribution.
-For instance, if the standard deviation of IQ scores of a class of 50 students
is numerically big, we can say that there is heterogeneity in their intelligence, while
it is small, we can say there is homogeneity in their intelligence.
X X
2
sd= N 1 for ungrouped data
table 2
Mea Rang standard
scores
n e deviation
Set A: 15 15 17 18 20 17 5 2.12
Set B: 15 16 16 18 20 17 5 2.00
Set C: 14 15 16 19 21 17 7 2.92
Set D: 11 13 18 18 25 17 14 5.43
Set E: 14 15 18 19 19 17 5 2.35
Computation of Standard deviation ungrouped
For set A, we have
∑(𝑥−𝑥̅ )2 (15−17)2 (15−17)2 (17−17)2 (18−17)2 (20−17)2
𝑠𝑑 = √ =√ + + + + =2.12
𝑛−1 4 4 4 4 4
f X X
2
sd= for grouped data
N 1
Example. From the data in the example in finding the mean
X
classes f (midpoint)
50-54 2 52
55-59 4 57
60-64 5 62
65-69 6 67
70-74 8 72
75-79 6 77
80-84 4 82
85-89 3 87
90-94 2 92
40
Table 1a
The mean is 71.38. The next step is to set up the table below
X (𝑋
classes f (midpoint) 𝑋 − 𝑋̅ − 𝑋̅)2 𝑓(𝑋 − 𝑋̅)
-
50-54 2 52-71.38
52 19.38 375.58 2 * 375.58 751.17
-
55-59 4 57-71.38
57 14.38 206.78 4 * 206.78 827.14
60-64 5 62 62-71.38 -9.38 87.98 5 * 87.98 439.92
65-69 6 67 67-71.38 -4.38 19.18 6 * 19.18 115.11
70-74 8 72 72-71.38 0.62 0.38 8 * 0.38 3.08
75-79 6 77 77-71.38 5.62 31.58 6 * 31.58 189.51
80-84 4 82 82-71.38 10.62 112.78 4 * 112.78 451.14
85-89 3 87 87-71.38 15.62 243.98 3 * 243.98 731.95
90-94 2 92 92-71.38 20.62 425.18 2 * 425.18 850.37
∑ 𝑓(𝑥 −
40
𝑥̅ )2 = 4359.38
Table 3
∑ 𝑓(𝑥−𝑥̅ )2 4359.38
𝑠𝑑 = √ =√ =10.57
𝑛−1 40−1
Example:
A simple interest designed to investigate the effect of drug on a cognitive
task such as coding. An experimental group of subjects, who receive a drug, and
a control group, who do not receive a drug are used. Each group contains 10
subjects. Let as assume the scores on the coding task for the groups are as follows:
Experimental Group(EG): 5 7 17 31 45 47 68 85 96 99
Control Group(CG): 29 36 37 42 49 58 62 63 69 70
Mean EG = 50.0 Mean CG = 51.5
The investigator might be led to conclude from inspecting the means that
the drug had a little or no effect on the performance of the subject.
sd EG = 35.63 sd CG = 14.86
The experimental group being much more variable in the performance
than the control group. Quite clearly the treatment appears to be exerting a
substantial influence on the variation in performance, although it’s on the level of
performance is negligible.
Variability of different data sets can be compared. However the range,
Mean Average deviation and Standard deviation are not appropriate measures
to use especially when they have different means or different units. This led to
another measure of dispersion.
Coefficient of variation
- is the ratio of the standard deviation to its mean, expressed in percent. It
is relative measure of dispersion useful when comparing dispersion of two or more
data sets with different units.
Standard deviation
CV x100%
Mean
Example: A random sample of 7 students were thought in Mathematics in
standard classroom situation. Another sample of 9 students taught themselves
using programmed text and consulted the teacher only when they had questions.
At the end of the semester, both students were given standardized exams.
Standard Experimental
Mean 91.9 142.3
Standard Deviation 20.9 20.8
Coefficient of Variation 22.7% 14.6%
The experimental group is less dispersed than the standard group because
of smaller Coefficient of Variation.
Self-assessment:
Find the three measures of central tendency, the third quartile, the seventh
decile, the 65th percentile and the standard deviation of your assignment number
1, lesson 2
Assignment:
Find the find the measures of central tendency and the standard deviation
1. Given the table below, find a) median, b) mode, c) median, d) 75th
percentile and e) standard deviation.
Classes f
99 - 93 3
92 - 86 4
85 - 79 6
78 - 72 7
71 - 65 10
64 - 58 9
57 - 51 7
50 - 44 10
43 - 37 6
36 - 30 5
29 - 23 4
22 - 16 3
15 - 9 1
2. The result of a given assignment in statistics for lesson 2 is given the table
below, find a) median, b) mode, c) median, d) 75th percentile and
e) standard deviation.
scores f
5.0 3
5.5 4
6.0 6
6.5 7
7.0 11
7.5 9
8.0 8
8.5 11
9.0 7
9.5 5
10.0 4
10.5 3
11.0 2