SSC201: STATISTICAL TECHNIQUES IN ECONOMICS
Department of Economics
Faculty of Social Sciences
Obafemi Awolowo University
Ile-Ife, Nigeria
Topic 4: Measures of Central Tendency/Location
Introduction
Measures of central tendency/location are representatives of a set of data. The most commonly used
methods include arithmetic mean (simply called mean), harmonic mean, geometric mean, median, and
mode. In understanding the computations of these measures, the concept of summation notation is
important.
Summation Notation
Summation notation is a short and convenient way of writing the sum of an nth term in a series. For
n
example, the sum of an nth term series X1, X2, X3, X4, ……… Xn can be written as X
i =1
i . The notation, Σ,
is the Greek word pronounces as Sigma, S. It is known as summarisation operator or sigma operator. The
suffix “i” is often called a dummy variable.
Illustration
∑ 𝑋𝑖 = 𝑋1 + 𝑋2 + 𝑋3 + 𝑋4
𝑖=1
It will be seen that this operator is useful in meaning context and will be used freely throughout this course
Examples:
1. ∑4𝑖=1 𝑋𝑖 = 𝑋1 + 𝑋2 + 𝑋3 + 𝑋4
2. ∑3𝑖=1 𝑌𝑖 = 𝑌1 + 𝑌2 + 𝑌3
3. ∑5𝑖=1(𝑋𝑖2 ) = 𝑋12 + 𝑋22 + 𝑋32 + 𝑋42 + 𝑋52
4. ∑𝑛𝑖=1 𝑖 = 1 + 2 + 3 + ⋯ + (𝑛 − 1) + 𝑛
5. ∑4𝑖=1 𝑖 2 = 12 + 22 + 32 + 42 = 1 + 4 + 9 + 16 = 30
Note that “i” does not have to begin at 1 unless otherwise specified. “i” always increases in step 1. Using
the notation, we also have:
1. ∑4𝑖=1 𝑎 = 𝑎 + 𝑎 + 𝑎 + 𝑎 = 4𝑎 = na
2. ∑5𝑖=1(4 + 𝑋𝑖 ) = (4 + 𝑋1 ) + (4 + 𝑋2 ) + (4 + 𝑋3 ) + (4 + 𝑋4 ) + (4 + 𝑋5 )
3. ∑4𝑖=1 2𝑋1 = 2𝑋1 + 2𝑋2 + 2𝑋3 + 2𝑋4
Some important properties of sigma operator
1. ∑𝑛𝑖=1 𝑎 = 𝑛𝑎
where ‘a’ is a constant
Illustration
4
∑ 2 = 2 + 2 + 2 + 2 = 4(2) = 8
𝑖=1
2. ∑𝑛𝑖=1 𝑎𝑋𝑖 = 𝑎 ∑𝑛𝑖=1 𝑋𝑖
where ‘a’ is a constant
Illustration
2 2
∑ 3𝑋𝑖 = 3𝑋1 + 3𝑋2 = 3(𝑋1 + 𝑋2 ) = 3 ∑ 𝑋𝑖
𝑖=1 𝑖=1
3. ∑𝑛𝑖=1(𝑋𝑖 ± 𝑌𝑖 ) = ∑𝑛𝑖=1 𝑋𝑖 ± ∑𝑛𝑖=1 𝑌𝑖
Illustration
2 2 2
∑ (𝑋𝑖 + 𝑌𝑖 ) = (𝑋1 + 𝑌1 ) + (𝑋2 + 𝑌2 ) = (𝑋1 + 𝑋2 ) + (𝑌1 + 𝑌2 ) = ∑ 𝑋𝑖 + ∑ 𝑌𝑖
𝑖=1 𝑖=1 𝑖=1
2
∑ (𝑋𝑖 − 𝑌𝑖) = (𝑋1 − 𝑌1 ) + (𝑋2 − 𝑌2 ) = 𝑋1 + 𝑋2 − 𝑌1 − 𝑌2 = (𝑋1 + 𝑋2 ) − (𝑌1 + 𝑌2 )
𝑖=1
2 2
= ∑ 𝑋𝑖 − ∑ 𝑌𝑖
𝑖=1 𝑖=1
4. ∑ 𝑋𝑌 ≠ ∑ 𝑋 ∑ 𝑌
5. ∑ 𝑋 2 ≠ (∑ 𝑋)2
6. ∑(𝑋 ± 𝐾) = ∑ 𝑋 ± 𝑛𝐾
Exercise
1. Expand the following:
4 4
i. 3X i `
i =1
ii. 1
i=1
2. Express the following sums in sigma/summation notations as compactly as possible:
i. aX 1 + aX 2 + aX 3 + aX 4 + aX 5
ii. X 1Y1 + X 2Y2 + X 3Y3 + X 4Y4
n n
3. Show that (a + bX i ) = na + b X i
i =1 i =1
The Mean
The focus here will be on the arithmetic mean (simply called the mean), harmonic and geometric mean.
Arithmetic Mean
This is the average of all the members in the data. It is the most commonly used average. It is the sum of
all members in the data divided by the number in the data. We can calculate the arithmetic mean for a set
of numbers, ungrouped frequency distribution, and grouped frequency distribution.
Case A: A set of numbers (Series of Individual Observations)
The arithmetic mean for a set of numbers, X1, X2, X3, X4, ……… Xn, is denoted by X and can be written as
n
1 n Xi
X . This formula simply states that we can obtain the mean by summing up the
X = Xi = i =1
=
n i =1 n n
number of observations and divided by the number of observations.
Illustration
Find the arithmetic mean of the following set of numbers: 1, 2, 3, 5, 7, 9, 8.
Solution
1 n Xi
X
X = Xi = i =1
=
n i =1 n n
1+ 2 + 3 + 5 + 7 + 9 + 8
X=
7
35
X= =5
7
Case B: Ungrouped frequency distribution (Discrete Series)
Suppose we are given a set of numbers, X1, X2, X3, X4, ……… Xn with the corresponding frequencies, f1, f2,
f3, f4, ……… fn, then the arithmetic mean is given as:
𝑛
1 ∑ 𝑓𝑖 𝑋𝑖 ∑ 𝑓𝑋
𝑋̄ = ∑𝑛 𝑓 ∑𝑛𝑖=1 𝑓𝑖 𝑋𝑖 = ∑𝑖=1
𝑛
𝑓
= ∑𝑓
𝑖=1 𝑖 𝑖=1 𝑖
Illustration
Find the mean (arithmetic mean) of the following set of numbers by grouping them into frequency
distribution:
4 3 6 7 5 5 3 4 9 6
5 5 6 8 3 6 6 3 5 4
7 6 4 1 9 7 8 6 4 6
Solution
X f fX
1 1 1
3 4 12
4 5 20
5 5 25
6 8 48
7 3 21
8 2 16
9 2 18
Σf = 30 ΣfX = 161
∑ 𝑓𝑋 161
𝑋= = = 5.3667 ≅ 5.37
∑𝑓 30
Case C: Grouped frequency distribution (Continuous Series)
For a continuous frequency distribution or a grouped discrete distribution, we clearly cannot use the
previous method because it does not have a distinct value but ranges of values of X. What we do here is
simply take the mid-point of the class to represent X value for the class and proceed in the usual way.
Illustration
Obtain the arithmetic mean from the information below.
Class interval 5.00-5.49 5.50-5.99 6.00-6.49 6.50-6.99 7.00-7.49
Frequency 12 32 11 8 2
Solution
Class interval Class marks (X) Frequency (f) fX
5.00-5.49 5.25 12 63
5.50-5.99 5.75 32 184
6.00-6.49 6.25 11 68.75
6.50-6.99 6.75 8 54
7.00-7.49 7.25 2 14.5
Σf = 65 ΣfX = 384.25
∑ 𝑓𝑋 384.25
𝑋= ∑𝑓
= 65
= 5.9115 ≅ 5.91
There are other methods for finding the arithmetic mean (mean) of a distribution. These include:
i. Assumed mean method
ii. Step deviation method
i. Calculation of Arithmetic Mean Using an Assumed Mean Method
This is another method for finding the mean of a distribution, whether it is grouped or ungrouped data. It
involves subtracting each value of a variable, X, from a specified assumed mean, A, which is usually the
mid-point of the class with the highest frequency. Although, any value used as the assumed mean yields
the same result. Mathematically, arithmetic mean using an assumed mean method is given as:
X = A+
fd
f
Where A is the assumed mean and “d” represents the deviation from the mean, that is, d = X – A.
Illustration
Use the method of assumed mean to find the mean of the following distribution.
Class marks (X) 49 52 55 58 61 64
Frequency (f) 3 6 9 5 5 2
Solution
X = A+
fd
f
A = 55 (it has the highest frequency, 9); d = X – A = X – 55
X f d = X - 55 fd (A=55) d = X - 62 fd (A=62)
49 3 49 - 55 = -6 -18 49 - 62 = -13 -39
52 6 52 - 55 = -3 -18 52 - 62 = -10 -60
55 9 55 - 55 = 0 0 55 - 62 = -7 -63
58 5 58 - 55 = 3 15 58 - 62 = -4 -20
61 5 61 - 55 = 6 30 61 - 62 = -1 -5
64 2 64 - 55 = 9 18 64 - 62 = 2 4
Total Σf = 30 Σfd = 27 Σfd = -183
X = A+
fd
f
27
X = 55 +
30
X = 55 + 0.9 = 55.9
Using another Assumed mean, A = 62.
X = A+
fd
f
X = 62 +
( −183)
30
X = 62 − 6.1 = 55.9
ii. Calculation of Arithmetic Mean using a Step Deviation Method
Under this method, the mean of a distribution can be obtained using an assumed and a constant factor. To
achieve this, the step deviation method is given as:
X = A+
( fd ) c
l
f
Where A is the assumed mean, “c” is the class size and “ d l ” represents the deviation from the mean, that
d X −A
is, d l = = .
c c
Note: the step deviation method is only applicable to a continuous series because of the class size, c.
Illustration
Obtain the mean of the distribution below using a step deviation method.
Class interval 10-20 20-30 30-40 40-50 50-60
Frequency 8 6 10 7 9
Solution
Class interval X f d X − 35 fdl
dl = =
c 10
10-20 15 8 -2 -16
20-30 25 6 -1 -6
30-40 35 (A) 10 0 0
40-50 45 7 1 7
50-60 55 9 2 18
Σf = 40 Σf d l = 3
X = A+
( fd ) c
l
f
d X −A
dl = =
c c
3
𝑋̄ = 35 + (10)
40
𝑋̄ = 35 + 0.75
𝑋̄ = 35.75
Activity
Using the above example, show whether the mean will be the same using the arithmetic mean and
assumed mean methods.
Properties of Arithmetic Mean
i. The sum of the deviations of all values of X, from their arithmetic mean is zero. That is,
f (X − X ) = 0.
To prove:
f ( X − X ) = fX − fX
= fX − X f
Substitute for X as
fX
f
= fX −
fX f
f
= fX − fX = 0
ii. The product of the arithmetic mean and the number of items gives the total of all items. That is,
fX f = fX .
f
iii. If m1 and m2 are the arithmetic means of two samples of sizes n1 and n2, respectively, then the
arithmetic, M, of the distribution combining the two samples can be calculated as:
n1m1 + n2 m2
M=
n1 + n2
Harmonic Mean
This is a special measure of location/central tendency.
Case A: Series of Individual Observations
The harmonic mean, 𝑋̄H, of the set of n numbers is defined as:
𝑋̄ 𝐻= 𝑛
1
∑
𝑥
Illustration
The distribution of marks scored by three candidates in a professional examination is 54, 24, and
36. Calculate the harmonic mean of the marks.
Solution
𝑋̄ 𝐻= 𝑛
1
∑
𝑥
X 1/x
54 0.019
24 0.042
36 0.028
Total 0.089
𝑋̄ 3
𝐻=
0.089
𝑋̄ 𝐻=33.708
Case B: Discrete/Continuous Series
The harmonic mean for a discrete/continuous series can be calculated using the formula below:
𝑋̄ ∑𝑓
𝐻= 𝑓
∑
𝑥
Illustration
Given the series below, obtain the harmonic mean for the distribution.
X 40 42 44 46
F 8 7 10 5
Solution
x f f/x
40 8 0.2
42 7 0.167
44 10 0.227
46 5 0.109
Total 30 0.703
𝑋̄ 30
𝐻=
0.703
𝑋̄ 𝐻=42.674
Geometric Mean
This is another measure of location/central tendency.
Case A: Series of Individual Observations
The geometric mean, XG, of the set of n numbers is defined as:
𝑋̄ 𝐺= 𝑛√𝑋1.𝑋2…𝑋𝑛
At times, the product and the nth root of the numbers may be too boring or difficult to calculate.
When this situation arises, the calculation of XG may be simplified as follows:
∑ 𝑙𝑜𝑔𝑋
𝑙𝑜𝑔𝑋̄ 𝐺 =
𝑛
∑ 𝑙𝑜𝑔𝑋
𝑋𝐺 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔( )
𝑛
Illustration
Given the set of numbers: 1, 2, 3, 4, 5, and 6, find the geometric mean.
Solution
𝑋̄ 𝐺= 𝑛√𝑋1.𝑋2…𝑋𝑛
𝑋̄ 𝐺= 6√1.2.3.4.5.6
𝑋̄ 𝐺= 6√720
𝑋̄ 𝐺= 2.99
Alternatively,
∑ 𝑙𝑜𝑔𝑋
𝑋̄ 𝐺 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔( )
𝑛
X logX
1 0
2 0.301
3 0.477
4 0.602
5 0.699
6 0.778
Total 2.857
2.857
𝑋̄ 𝐺 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔( )
6
𝑋̄ 𝐺 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔(0.476)
𝑋𝐺 = 2.99
Case B: Discrete/Continuous Series
The geometric mean for a discrete/continuous series can be calculated using the formula below:
∑ 𝑓𝑙𝑜𝑔𝑋
𝑋̄ 𝐺 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔( )
∑𝑓
Illustration
Obtain the geometric mean of the given distribution.
Class Interval 5-7 8-10 11-13 14-16 17-19
Frequency 15 18 27 10 6
Solution
Class Interval f X logx flogx
5-7 15 6 0.778 11.67
8-10 18 9 0.954 17.172
11-13 27 12 1.079 29.133
14-16 10 15 1.176 11.76
17-19 6 18 1.255 7.53
Total 76 77.265
∑ 𝑓𝑙𝑜𝑔𝑋
𝑋̄ 𝐺 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔( )
∑𝑓
77.265
𝑋̄ 𝐺 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔( )
76
𝑋̄ 𝐺 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔(1.017)
𝑋̄ 𝐺 = 10.399
The Median
The Median of a set of numbers X1, X2, ... , Xn is defined as the middle value in a distribution when
arranged in size order. It divides the data in any distribution into two equal parts. In order to find the
median of a set of numbers (individual series), we first arrange the values in such distribution in either
ascending or descending order and then select the middle value. If the set has an even number of items,
the median is taken as the mean of the two middle values.
Case A: Series of Individual Observations
The median is a measure of central tendency i.e. a measure of an observation that occupies the middle
position in an array of values. Determination of the median requires that the data be re-arranged either in
an ascending or descending order. For any ordered observations when N is odd, the median represented
by Me = (N+1/2)th item. When N is even Me is determined by finding the mean of the two values.
Illustration
1. Suppose we have the following data: 6, 4, 1, 9, 8, 3, 7, 10, 5
Arranging the data in ascending order:
1, 3, 4,5, 6, 7, 8, 9, 10
Me = (N+1/2)th item = (9+1/2)th item =5th item
The median value (Me) = 6
2. Suppose we have the following set of data: 40, 42, 44, 44, 45, 51, 59, 60.
Me = (N+1/2)th item = (8+1/2)th item = 4.5th item
The median value is the mean value of the 4th and 5th item.
Me = 44 + 45/2 = 44.5
Case B: Median of an Ungrouped Data (Discrete Series)
For a discrete frequency distribution taking the value X1, X2, X3, ..., Xn and corresponding frequency f1,
f2, ..., fn, the median frequency is (N+1/2)th item, were N = Σf. If Σf is large enough such that there is a
difference, it is usually found convenient to include a column of cumulative frequency when calculating
the median for discrete frequency distribution.
Illustration
Find the median to the frequency distribution
X 0 1 2 3 4 5 6
f 5 5 10 20 30 20 10
Solution
X f Cf
0 5 5
1 5 10
2 10 20
3 20 40
4 30 70
5 20 90
6 10 100
Σf = 100
Me = (N+1/2) th item = (100+1/2) th item = 50.5th item = 4
The median is 4 because the 50.5th item falls at x = 4 when the cumulative frequency is 70.
Case C: Median of a Grouped Data (Continuous Series)
In this case, the median value is obtained by using the formula:
Me = Lc + [(N/2 – CFb)/Fm]c
where: Lc = lower class boundary of the median class
N = Σf = number of observations in the data set i.e. total number of frequencies
CF = sum of the frequencies up to but not including the median class (that is, cumulative frequency before
the median class)
Fm = frequency of the median class
C = class size of the median class
Illustration
Consider the following frequency distribution of scores of students in an examination, obtain the median
score.
Score 60-62 63-65 66-68 69-71 72-74 75-77
No of students 1 2 13 20 11 3
Solution
Class Interval F Class boundary Cum. Freq.
60-62 1 59.5-62.5 1
63-65 2 62.5-65.5 3
66-68 13 65.5-68.5 16
69-71 20 68.5-71.5 36
72-74 11 71.5-74.5 47
75-77 3 74.5-77.5 50
Total 50
First, locate the median class: Me class = (N/2)th item = (50/2)th item = 25th item. Hence, the median value
is the 25th person. The group that contains this value is 69-71 class
Me = Lc + [(N/2 – CFb)/Fm]c
Lc = 68.5, N = Σf = 50, CFb = 16, Fm=20 and C = 71.5-68.5 = 3
Me = 68.5 + [25-16/20]3 = 68.5 + (1.35) = 69.85
The technique discussed above for estimating the median value is called the method of interpolation. The
second method requires operating graphically. The method involves drawing a smooth cumulative
frequency curve for the given data. The cumulative frequency is drawn on the vertical axis against the
upper-class boundaries on the horizontal axis.
Illustration: Obtain the median of the above distribution using the graphical method.
Using the graphical method, we need to: (a) plot the cumulative frequency against the upper-class
boundary (b) find the point on the x-axis that corresponds to the value (N/2)th item (N = Σf) on the
cumulative frequency axis. The table for plotting the graph is given below.
Less than cumulative frequency 1 3 16 36 47 50
More than cumulative frequency 49 47 34 14 3 0
Upper-class boundary 62.5 65.5 68.5 71.5 74.5 77.5
Recall that the position median = 25th item. Hence, from the graph Me = 70 (approximately). Usually, the
median estimate is a better estimate provided we draw a smooth curve through the plotted point.
LTCF MTCF
60 60
Cumulative Frequency
Cumulative Frequency
50 50 50 49
47 47
40 40
36 34
30 30
20 20
16 14
10 10
3 3
0 1 0 0
62.5 65.5 68.5 71.5 74.5 77.5 62.5 65.5 68.5 71.5 74.5 77.5
Upper-class boundary Upper-class boundary
LTCF & MTCF
60
Cumulative Frequency
50
40
30
20
10
0
62.5 65.5 68.5 71.5 74.5 77.5
Upper-class boundary
LTCF MTCF
The Mode
The mode of a set of values is defined as that one that occurred with the greatest frequency. The mode is
the highest frequency in a set of observations. The mode of a set of observations can be easily found by
identifying the value that occurs most.
Illustration
Case A: A set of numbers and Ungrouped Data (Series of Individual Observations and Discrete
Series)
For example, for the following set of values 2, 3, 3, 1, 3, 2, 4, 5, 8, 3, 2, 4, 4, 3, the mode is 3 since it
occurred mostly.
This set of observations can be arranged in a discrete form:
Series 1 2 3 4 5 8
Frequency 1 3 5 3 1 1
The mode is 3 because it has the highest frequency which is 5.
Case B: Grouped frequency distribution (Continuous Series)
For continuous grouped frequency distribution, the mode is calculated as follows:
Mode = Lc + [d1/d1+d2]c
where Lc = lower class boundary of the modal class
d1 = frequency of the modal class minus frequency before the modal class
d2 = frequency of the modal class minus frequency after the modal class
c = width of class interval (class size of the modal class).
Note that the quantity d1/d1+d2 is always between 0 and 1 ensuring that the mode must lie in the predefined
modal class.
Illustration
Estimate the mode of the following distribution.
C.I 9.3-9.7 9.8-10.2 10.3-10.7 10.8-11.2 11.3-11.7 11.8-12.2 12.3-12.7 12.8-13.2
f 2 5 12 18 14 6 4 1
Solution
Class Interval frequency Class Boundary
9.3-9.7 2 9.25-9.75
9.8-10.2 5 9.75-10.25
10.3-10.7 12 10.25-10.75
10.8-11.2 18 10.75-11.25
11.3-11.7 14 11.25-11.75
11.8-12.2 6 11.75-12.25
12.3-12.7 4 12.25-12.75
12.8-13.2 1 12.75-13.25
Modal class = 10.8-11.2 (this class has the highest frequency, 18)
Class boundaries of the modal class = 10.75-11.25
Lc = 10.75, d1 = 18-12 = 6, d2 = 18-14 = 4, c = 11.25-10.75 = 0.5
Mode = Lc + [d1/d1+d2]c
Mode = 10.75 + [6/6+4]0.5
Mode = 10.75 + 0.3
Mode = 11.05
The mode of the grouped data can also be obtained graphically. The lines AB in the figure below are
drawn on the highest rectangle or histogram. The mode is obtained by the X-value i.e. horizontal value of
the interception of the two lines.
Frequency
Mode Class Boundary
Relationship among mode, mean, and median
For uni-modal frequency curves which are moderately skewed, the relation:
mean – mode = 3(mean – median). The shape of the frequency distribution refers to its symmetry or width.
(i) Sk = 0. This implies that the distribution is perfectly symmetrical about its mean. It also means that the
distribution is normally distributed.
A distribution has zero skewness above its mean if it is symmetrically negligible about its mean. For a
symmetrical uni-modal distribution, the mean, median, and mode are equal i.e. mean = median = mode.
This is graphically illustrated below:
Mean
Median
Mode
(ii) Sk = +ve
A distribution is positively skewed if the right tail is longer. Then, mean > median > mode
Mode Median Mean
(iii) Sk = -ve
A distribution is negatively skewed if the left tail is longer. Then, mode > median > mean
Mean Median Mode
Illustration
1. Given the mean to be 97.68 and median to be 92.43, find the mode.
Solution
Mean – mode = 3(mean -median)
97.68 - mode = 3(97.68 - 92.43)
97.68 – mode = 15.75
97.68 - 15.75 = mode
Mode = 81.93 (this is not symmetrical and it is positively skewed because mean > median > mode)
81.93 92.43 97.68
2. Given that mean = 5 and median = 5, find the mode.
Solution
Mean – mode = 3(mean -median)
5 - mode = 3(5 - 5)
5 – mode = 0
5 = mode
Mode = 5 (this is symmetrical and the distribution has zero skewness because mean=median=mode)
Mean=5
Median=5
Mode=5