As per the SEP Syllabus for 2nd Semester B.
Com
Bangalore University
BUSINESS DATA ANALYSIS
MODULE 3: MEASURES OF CENTRAL TENDENCY &
DISPERSION
Central Tendencies in Statistics are the numerical values that are used
to represent mid-value or central value in a large collection of numerical
data. These obtained numerical values are called central or average
values in Statistics.
Measures of Central Tendency Meaning
The representative value of a data set, generally the central
value or the most occurring value that gives a general idea of
the whole data set is called Measure of Central Tendency.
Measures of Central Tendency
Some of the most commonly used measures of central
tendency are:
● Mean
● Median
● Mode
Objectives of Measures of Central Tendency
The objectives of measures of central tendency include:
● Summarizing Data: Measures of central tendency provide
a concise summary of the central or typical value within a
dataset, allowing researchers, analysts, and
decision-makers to grasp the overall characteristics of the
data quickly.
● Identifying Typical Values: By indicating the most
common or representative values in the dataset, measures
of central tendency help identify typical observations,
patterns, and trends within the data.
● Facilitating Comparison: Central tendency measures
enable comparisons between different datasets or subsets
of data by providing a common reference point. They help
assess similarities, differences, and variations in the central
values across various groups or time periods.
● Supporting Decision-Making: Central tendency measures
assist in decision-making processes by providing insights
into the central value around which data tends to cluster.
This information is valuable for setting benchmarks,
establishing targets, and making informed judgments.
● Assessing Distribution: Central tendency measures offer
indications of the distributional characteristics of the data,
such as symmetry, skewness, or multimodality. They
complement measures of dispersion by providing context
for understanding the spread of data around the central
value.
● Detecting Outliers: By highlighting the central or typical
value within the dataset, central tendency measures help
identify outliers or extreme values that may significantly
influence the data's overall distribution.
● Interpreting Statistical Analyses: Central tendency
measures serve as essential components of statistical
analyses, aiding in the data interpretation of results,
hypothesis testing, and drawing meaningful conclusions
from research findings.
● Communicating Results: Central tendency measures
provide a clear and concise way to communicate the
central value of the data to diverse audiences, including
stakeholders, policymakers, and the general public,
facilitating understanding and interpretation of statistical
information.
Requisites of an Ideal Average
An ideal average should be rigidly defined, easy to understand
and calculate, based on all observations, capable of further
mathematical treatment, and minimally affected by extreme
values or sampling fluctuations.
Here's a more detailed breakdown of the requisites:
● Rigidly Defined:
The average should have a clear and consistent definition,
allowing for precise calculation and interpretation.
● Easy to Understand and Calculate:
The method for calculating the average should be simple
and straightforward, making it accessible to a wide
audience.
● Based on All Observations:
The average should incorporate all data points in the
dataset, providing a representative measure of the central
tendency.
● Suitable for Further Mathematical Treatment:
The average should be amenable to further statistical
analysis and calculations, allowing for deeper insights into
the data.
● Minimally Affected by Extreme Values:
A good average should not be unduly influenced by
outliers or extreme values, ensuring a more accurate
representation of the data.
● Minimally Affected by Sampling Fluctuations:
The average should be relatively stable and consistent
across different samples drawn from the same population.
MEAN
Arithmetic mean is often referred to as the mean or arithmetic
average. It is calculated by adding all the numbers in a given
data set and then dividing it by the total number of items within
that set.
Arithmetic Mean Formula
The general formula to find the arithmetic mean of a given data
is:
Arithmetic mean (x
̄ ) = Sum of all observations / Number of
observations
It is denoted by ̄ x, (read as x bar).
The below-given image presents the general formula to find the
arithmetic mean:
Calculating Arithmetic Mean for Ungrouped Data
The arithmetic mean of ungrouped data is calculated using the
formula:
Mean ̄x = Sum of all observations / Number of observations
Example 1: Compute the arithmetic mean of the first 6 odd
natural numbers.
Solution: The first 6 odd natural numbers: 1, 3, 5, 7, 9, 11
̄ x = (1+3+5+7+9+11) / 6 = 36/6 = 6.
Thus, the arithmetic mean is 6.
Example 2: A proof reads through 73 pages manuscript The
number of mistakes found on each of the pages are
summarized in the table below Determine the mean number of
mistakes found per page
Solution:
The mean number of mistakes is 4.09
Example 3: The following the distribution of persons according
to different income groups
Find the average income of the persons.
Solution :
Example 4: The weights assigned to different components in an
examination or Component Weightage Marks scored
Calculate the weighted average score of the student who scored
marks as given in the table
Solution:
Example 5:
A class consists of 4 boys and 3 girls. The average marks
obtained by the boys and girls are 20 and 30 respectively. Find
the class average.
Solution:
MEDIAN
Median is the value of the variable which divides the whole set
of data into two equal parts. It is the value such that in a set of
observations, 50% observations are above and 50% observations
are below it. Hence the median is a positional average.
(a) Median for Ungrouped or Raw data:
In this case, the data is arranged in either ascending or
descending order of magnitude.
Median = value of (n+1 / 2)th observation in the data array
Example 6:
The number of rooms in the seven five stars hotel in Chennai
city is 71, 30, 61, 59, 31, 40 and 29. Find the median number of
rooms
Solution:
Arrange the data in ascending order 29, 30, 31, 40, 59, 61, 71
n = 7 (odd)
Median = 7+1 / 2 = 4th positional value
Median = 40 rooms
Example 7:
The export of agricultural products in million dollars from a
country during eight quarters in 1974 and 1975 was recorded as
29.7, 16.6, 2.3, 14.1, 36.6, 18.7, 3.5, 21.3.Find the median of the given
set of values.
Solution:
We arrange the data in descending order
36.6, 29.7, 21.3, 18.7, 16.6, 14.1, 3.5, 2.3
(b) Median for Continuous grouped data
In this case, the data is given in the form of a frequency table
with class-interval etc., The following formula is used to calculate
the median.
Where
l = Lower limit of the median class
N = Total Numbers of frequencies
f = Frequency of the median class
m = Cumulative frequency of the class preceding the median
class
c = the class interval of the median class.
From the formula, it is clear that one has to find the median
class first. Median class is that class which corresponds to the
cumulative frequency just greater than N/2.
Example 8:
The following data obtained from garden records of a certain
period. Calculate the median weight of the apple.
Solution:
Example 9:
The following table shows age distribution of persons in a
particular region:
Find the median age.
Solution:
We are given an upper limit and less than cumulative
frequencies. First find the class-intervals and the frequencies.
Since the values are increasing by 10, hence the width of the
class interval is equal to 10.
Example 10:
The following are the marks obtained by 140 students in a
college. Find the median marks
Solution:
MODE
According to Croxton and Cowden, ‘The mode of a distribution is
the value at the point around which the items tend to be most
heavily concentrated.
Mode is defined as the value which occurs most frequently in a
data set.
Computation of mode:
(a) For Ungrouped or Raw Data:
The mode is defined as the value which occurs frequently in a
data set
Example 11:
The following are the marks scored by 20 students in the class.
Find the mode 90, 70, 50, 30, 40, 86, 65, 73, 68, 90, 90, 10, 73, 25,
35, 88, 67, 80, 74, 46
Solution:
Since the mark 90 occurs the maximum number of times, three
times compared with the other numbers, mode is 90.
Example 12:
A doctor who checked 9 patients’ sugar levels is given below.
Find the mode value of the sugar levels. 80, 112, 110, 115, 124, 130,
100, 90, 150, 180
Solution:
Since each value occurs only once, there is no mode.
Example 13:
Compute mode value for the following observations.
2, 7, 10, 12, 10, 19, 2, 11, 3, 12
Solution:
Here, the observations 10 and 12 occur twice in the data set, the
modes are 10 and 12.
For discrete frequency distribution, mode is the value of the
variable corresponding to the maximum frequency.
Example 14:
Calculate the mode from the following data
Solution:
Here, 7 is the maximum frequency, hence the value of x
corresponding to 7 is 8.
Therefore 8 is the mode.
(b) Mode for Continuous data:
The mode or modal value of the distribution is that value of the
variate for which the frequency is maximum. It is the value
around which the items or observations tend to be most heavily
concentrated. The mode is computed by the formula.
Modal class is the class which has maximum frequency.
f1 = frequency of the modal class
f0 = frequency of the class preceding the modal class
f2 = frequency of the class succeeding the modal class
c = width of the class limits
Example 15:
The following data relates to the daily income of families in an
urban area. Find the modal income of the families.
Solution:
Determination of Modal class:
For a frequency distribution modal class corresponds to the
class with maximum frequency. But in any one of the following
cases that is not easily possible.
i. If the maximum frequency is repeated.
ii. If the maximum frequency occurs in the beginning or at the
end of the distribution
iii. If there are irregularities in the distribution, the modal class is
determined by the method of grouping.
Steps for preparing Analysis table:
We prepare a grouping table with 6 columns:
i. In column I, we write down the given frequencies.
ii. Column II is obtained by combining the frequencies two by
two.
iii. Leave the Ist frequency and combine the remaining
frequencies two by two and write in column III
iv. Column IV is obtained by combining the frequencies three by
three.
v. Leave the Ist frequency and combine the remaining
frequencies three by three and write in column V
vi. Leave the Ist and 2nd frequencies and combine the
remaining frequencies three by three and write in column VI
Mark the highest frequency in each column. Then form an
analysis table to find the modal class. After finding the modal
class use the formula to calculate the modal value.
Example 16:
Calculate mode for the following frequency distribution:
Solution:
Analysis Table:
The maximum occurred corresponding to 20-25, and hence it is
the modal class.
Empirical Relationship among mean, median
and mode
A frequency distribution in which the values of arithmetic mean,
median and mode coincide is known of symmetrical
distribution, when the values of mean, median and mode are
not equal the distribution is known as asymmetrical or skewed.
In moderately skewed asymmetrical distributions a very
important relationship exists among arithmetic mean, median
and mode.
Karl Pearson has expressed this relationship as follows
Mode = 3 Median – 2 Arithmetic Mean
Example 17:
In a moderately asymmetrical frequency distribution, the values
of median and arithmetic mean are 72 and 78 respectively;
estimate the value of the mode.
Solution:
The value of the mode is estimated by applying the following
formula:
Mode = 3 Median – 2 Mean = 3 (72) – 2 (78)
216 - 156 = 60 Mode = 60
Example 18:
In a moderately asymmetrical frequency distribution, the values
of mean and mode are 52.3 and 60.3 respectively, Find the
median value.
Solution:
The value of the median is estimated by applying the formula:
Mode = 3 Median – 2 Mean
60.3 = 3 Median – 2 × 52.3
3 Median = 60.3 + 2 × 52.3
60.3 + 104.6 = 164.9
Median = 164.9/3 = 54.966 = 54.97
GRAPHICAL REPRESENTATION OF MODE - HISTOGRAM
A histogram is a graphical representation of the frequency
distribution of continuous series using rectangles.
The x-axis of the graph represents the class interval, and the
y-axis shows the various frequencies corresponding to
different class intervals.
A histogram is a two-dimensional diagram in which the width
of the rectangles shows the width of the class intervals, and
the length of the rectangles depicts the corresponding
frequency.
There are no gaps between two consecutive rectangles based
on the fact that histograms can be drawn when data are in the
form of the frequency distribution of a continuous series.
No histogram can be drawn for a data set in the form of
discrete series, and this makes histograms different from bar
graphs as they can be plotted for both discrete and continuous
series.
The major difference between a histogram and a bar graph is
that the former is two-dimensional; i.e., both the width and
length of the rectangles are used for comparison, whereas the
latter is one-dimensional, which means only the length of the
rectangles is used for comparison.
A histogram is used to determine the value of the Mode of a
data set in the form of a continuous series.
Types of Histogram
Histograms of Frequency Distribution are of two types:
1. Histogram of Equal Class Intervals
2. Histogram of Unequal Class Intervals
1. Histogram of Equal Class Intervals
When histograms are drawn based on the data with equal class
intervals, they are known as Histograms of equal class
intervals.
The histogram of equal class intervals includes rectangles with
equal width; however, the length of the rectangles is
proportional to the frequency distribution of the class
intervals.
Example of Histogram of Equal Class Intervals:
Present the following information in the form of a Histogram:
Marks 0-10 10-20 20-30 30-40 40-50
Number of
16 36 70 50 28
Students
Solution:
● It is visible that the set of data given is of the equal
class interval; i.e., the difference between the upper
limit and the lower limit of each class interval is 10. So,
drawing a Histogram is feasible.
● The X-axis represents the marks (class intervals), and
Y-axis represents the number of students (frequency
distribution).
2. Histogram of Unequal Class Intervals
When histograms are drawn based on the data with unequal
class intervals, they are known as Histograms of unequal class
intervals.
Histogram of unequal class intervals includes rectangles of
different width sizes. Therefore, before drawing a histogram in
case of unequal class intervals, frequency distribution has to
be adjusted.
Adjustment of frequencies of unequal class intervals:
1. Determine the class of the smallest interval ( lowest class
interval ).
2. Then, calculate the adjustment factor using the formula
3. Now, adjust the given frequencies using the adjustment
factor:
Example of Histogram of Unequal Class Intervals:
Present the following information in the form of a Histogram:
Wages Number of Workers
10-15 14
15-20 20
20-25 54
25-30 30
30-40 24
40-60 24
60-80 16
Solution:
1. It can be seen clearly that the given class interval is unequal.
So, before plotting the histogram, frequencies have to be
adjusted.
2. Determine the class of the smallest interval, i.e., 10-15. Thus,
the lowest class interval in the given frequency distribution is
5.
3. Formulate the Adjusted Table as shown below:
Frequency
Number of Adjustment Density
Wages
Workers Factor (Adjusted
Frequency)
10-15 14 5÷5=1 14 ÷ 1 = 14
15-20 20 5÷5=1 20 ÷ 1 = 20
20-25 54 5÷5=1 54 ÷ 1 = 54
25-30 30 5÷5=1 30 ÷ 1 = 30
30-40 24 10 ÷ 5 = 2 24 ÷ 2 = 12
40-60 24 20 ÷ 5 = 4 24 ÷ 4 = 6
60-80 16 20 ÷ 5 = 4 16 ÷ 4 = 4
In the above table, the class interval is calculated as the
difference between the upper-class limit and lower-class limit,
i.e.,
15-10=5, 20-15=5, 20-25=5, 30-25=5, 40-30=10, 60-40=20, and
80-60=20.
4. Plotting Histogram
Finding the Mode of a Histogram (With Example)
The mode of a dataset represents the value that occurs most
often.
To find the mode in a histogram, we can use the following
steps:
1. Identify the tallest bar.
2. Draw a line from the left corner of the tallest bar to the left
corner of the bar immediately after it.
3. Draw a line from the right corner of the tallest bar to the
right corner of the bar immediately before it.
4. Identify the point where the two lines intersect. Then draw a
line straight down to the x-axis. The point where the line hits
the x-axis is our best estimate for the mode.
The following step-by-step example shows how to find the
mode of the following histogram:
Step 1: Identify the Tallest Bar
First, we need to identify the tallest bar in the histogram.
This is the bar with the bin range of 16 to 20:
Step 2: Draw the First Line
Next, we need to draw a line from the left corner of the tallest
bar to the left corner of the bar immediately after it:
Step 3: Draw the Second Line
Next, we need to draw a line from the right corner of the tallest
bar to the right corner of the bar immediately before it:
we need to identify the point where the two lines intersect.
Then draw a line straight down to the x-axis:
The point where the line hits the x-axis is our best estimate for
the mode.
In this example, our best estimate for the mode is roughly 17.
Note: Since the data in a histogram is grouped into bins, it’s
not possible to know the exact value of the mode but the
method that we used here allows us to make our best estimate.
GRAPHICAL REPRESENTATION OF MEDIAN - OGIVE
An Ogive or Cumulative Frequency Curve is a curve of a data
set obtained by an individual through the representation of
cumulative frequency distribution on a graph. As there are two
types of cumulative frequency distribution; i.e., Less than
cumulative frequencies and More than cumulative frequencies,
the ogives are also of two types:
1. Less than Ogive
2. More than Ogive
Less than Ogive
The steps required to present a less than ogive graph are as
follows:
Step 1: To present a less than ogive graph, add the frequencies
of all the preceding class intervals to the frequency of a class.
Step 2: After that, plot the less than cumulative frequencies on
the Y-axis against the upper limit of the corresponding class
interval on the X-axis.
Step 3: In the last step, join these points by a smooth freehand
curve, which is the resulting less than ogive.
A less than ogive curve is an increasing curve that slopes
upwards from left to right.
Example:
Draw a ‘less than’ ogive curve from the following distribution
of the marks of 50 students in a class.
Marks 10-20 20-30 30-40 40-50 50-60 60-70 70-80
No. of 6 4 15 5 8 7 5
Students
Solution:
First of all, we have to convert the frequency distribution into a
less than cumulative frequency distribution.
Marks No. of No. of Students (cf)
Students (f)
Less than 20 6 6
Less than 30 4 6 + 4 = 10
Less than 40 15 6 + 4 + 15 = 25
Less than 50 5 6 + 4+ 15 + 5 = 30
Less than 60 8 6 + 4 + 15 + 5 + 8 = 38
Less than 70 7 6 + 4 + 15 + 5 + 8 + 7 = 45
Less than 80 5 6 + 4 + 15 + 5 + 8 + 7 + 5 = 50
Now, plot these values of cumulative frequency on a graph.
More than Ogive
The steps required to present a more than ogive graph are as
follows:
Step 1: To present a more than ogive graph, add the frequencies
of all the succeeding class intervals to the frequency of a class.
Step 2: After that, plot the more than cumulative frequencies on
the Y-axis against the upper limit of the corresponding class
interval on the X-axis.
Step 3: In the last step, join these points by a smooth freehand
curve, which is the resulting more than ogive.
A more than ogive curve is a decreasing curve that slopes
downwards from left to right.
Example:
Draw a ‘more than’ ogive curve from the following distribution
of the marks of 50 students in a class.
Marks 10-20 20-30 30-40 40-50 50-60 60-70 70-80
No. of
6 4 15 5 8 7 5
Students
Solution:
First of all, we have to convert the frequency distribution into a
more than cumulative frequency distribution.
Marks No. of Students
(f) No. of Students (cf)
More than 10 6 5 + 7 + 8 + 5 + 15 + 4 + 6 = 50
More than 20 4 5 + 7 + 8 + 5 + 15 + 4 = 45
More than 30 15 5 + 7 + 8 + 5 + 15 = 40
More than 40 5 5 + 7 + 8 + 5 = 25
More than 50 8 5 + 7 + 8 = 20
More than 60 7 5 + 7 = 12
More than 70 5 5
Now, plot these values of cumulative frequency on a graph.
Both ‘Less Than’ and ‘More Than’ Ogives
Both the ‘less than’ and the ‘more than’ ogives can be plotted on
the same graph, and the point at which these two curves
intersect is the median of the given data set.
Example:
Draw both ‘less than’ and ‘more than’ ogive curve from the
following distribution of the marks of 50 students in a class.
Marks 10- 20- 30- 40- 50- 60- 70-
20 30 40 50 60 70 80
No. of 6 4 15 5 8 7 5
Students
Solution:
First of all, we have to convert the frequency distribution into a
less than and more than cumulative frequency distribution.
Marks No. of No. of
Students Marks Students
(cf) (cf)
Less than 20 6 More than 10 50
Less than 30 10 More than 20 45
Less than 40 25 More than 30 40
Less than 50 30 More than 40 25
Less than 60 38 More than 50 20
Less than 70 45 More than 60 12
Less than 80 50 More than 70 5
Now, plot these values of less than and more than cumulative frequency
on a graph.
In the above graph, both less than and more than ogive
curves intersect and if a perpendicular line is dropped from
the intersection point to X axis, MEDIAN is located. Hence
Median is 40 marks.
Measures of Dispersion
A measure of dispersion indicates the scattering of data. It
explains the disparity of data from one another, delivering a
precise view of their distribution. The measure of dispersion
displays and gives us an idea about the variation and the central
value of an individual item.
Types of measures of dispersion
Range, Quartile deviation, Mean deviations, Standard deviation
and their Relative measures.
STANDARD DEVIATION
Standard deviation is the positive square root of average of
the deviations of all the observations taken from the mean.
a. Standard Deviation for Ungrouped data
x1 , x2 , x3 ... xn are the ungrouped data then standard
deviation is calculated by
B.Standard Deviation for Grouped Data (Discrete)
Where,
f = frequency of each class interval
N = total number of observation (or elements) in the
population
x = mid – value of each class interval
where A is an assumed A.M.
c. Standard Deviation for Grouped Data (continuous)
Where,
f = frequency of each class interval
N = total number of observation (or elements) in the
population
c = width of class interval
x = mid-value of each class interval where A is an assumed
A.M.
Variance : Sum of the squares of the deviation from mean
is known as Variance.
The square root of the variance is known as standard
deviation.
Example 19:
The following data gives the number of books taken in a
school library in 7 days find the standard deviation of the
book taken
7, 9, 12, 15, 5, 4, 11
Solution:
Actual mean method
Example 20:
Weights of children admitted in a hospital are given below
to calculate the standard deviation of weights of children.
13, 15, 12, 19, 10.5, 11.3, 13, 15, 12, 9
Solution:
Example 21:
Find the standard deviation of the first ‘n’ natural numbers.
Solution:
The first n natural numbers are 1, 2, 3,…, n. The sum and the
sum of squares of these n numbers are
Example 22:
The wholesale price of a commodity for seven consecutive
days in a month is as follows:
Calculate the variance and standard deviation.
Solution:
The computations for variance and standard deviation are
cumbersome when x values are large. So, another method
is used, which will reduce the calculation time. Here we
take the deviations from an assumed mean or arbitrary
value A such that d = x – A
In this example, if we take deviation from an assumed A.M.
=255. The calculations then for standard deviation will be as
shown in below Table;
Example 23:
The mean and standard deviation from 18 observations are
14 and 12 respectively. If an additional observation 8 is to be
included, find the corrected mean and standard deviation.
Solution:
Example 24: A study of 100 engineering companies gives
the following information
Calculate the standard of the profit earned.
Solution:
Example 25:
From the analysis of monthly wages paid to employees in
two service organizations X and Y, the following results
were obtained
i. Which organization pays a larger amount as monthly
wages?
ii. In which organization is there greater variability in
individual wages of all the wage earners taken together?
Solution:
i. For finding out which organization X or Y pays larger
amount of monthly wages, we have to compare the total
wages:
Total wage bill paid monthly by X and Y is
Organization Y pays a larger amount as monthly wages as
compared to organization X.
ii. For calculating the combined variance, we will first
calculate the combined mean
Coefficient of Variation
The standard deviation is an absolute measure of
dispersion. It is expressed in terms of units in which the
original figures are collected and stated. The standard
deviation of heights of students cannot be compared with
the standard deviation of weights of students, as both are
expressed in different units, ie., heights in centimetres and
weights in kilograms. Therefore the standard deviation
must be converted into a relative measure of dispersion for
the purpose of comparison. The relative measure is known
as the coefficient of variation.
If we want to compare the variability of two or more series,
we can use C.V. The series or groups of data for which the
C.V is greater indicate that the group is more variable, less
stable, less uniform, less consistent or less homogeneous. If
the C.V is less, it indicates that the group is less variable,
more stable, more uniform, more consistent or more
homogeneous.
Example 26:
If the coefficient of variation is 50 percent and a standard
deviation is 4, find the mean.
Solution:
Example 27:
The scores of two batsmen, A and B, in ten innings during a
certain season, are as under:
Mean score = 50; Standard deviation = 5
Mean score = 75; Standard deviation = 25
Find which of the batsmen is more consistent in scoring.
Solution:
The batsman with the smaller C.V is more consistent.
Since for Cricketer A, the C.V is smaller, he is more
consistent than B.
Example 28:
The weekly sales of two products A and B were recorded as
given below
Find out which of the two shows greater fluctuations in
sales.
Solution:
For comparing the fluctuations in sales of two products, we
will prefer to calculate coefficients of variation for both the
products.
Product A: Let A=56 be the assumed mean of sales for
product A.
Product B:
Since the coefficient of variation for product A is more than
that of product B, Therefore the fluctuation in sales of
product A is higher than product B.
*****************************************************************