MAT2001-Statistics for Engineers:Module-1
Dr. Nalliah M
Assistant Professor
Department of Mathematics
School of Advanced Sciences
Vellore Institute of Technology
Vellore,Tamil Nadu,India.
[email protected]
July 24, 2020
Dr. Nalliah M Module-1 July 24, 2020 1 / 40
Mean deviation and Coefficient of Mean Deviation
The range and quartile deviation are not based on all observations. They
are positional measures of dispersion. They do not show any scatter of the
observations from an average.
The mean deviation is measure of dispersion based on all items in a
distribution.
Definition:Mean deviation
Mean deviation is the arithmetic mean of the deviations of a series
computed from any measure of central tendency; i.e., the mean, median or
mode, all the deviations are taken as positive i.e., signs are ignored.
According to Clark and Schekade.
Dr. Nalliah M Module-1 July 24, 2020 2 / 40
We usually compute mean deviation about any one of the three averages
mean, median or mode.
Some times mode may be ill defined and as such mean deviation is
computed from mean and median.
Median is preferred as a choice between mean and median. But in general
practice and due to wide applications of mean, the mean deviation is
generally computed from mean. M.D can be used to denote mean
deviation.
Dr. Nalliah M Module-1 July 24, 2020 3 / 40
Mean deviation calculated by any measure of central tendency is an
absolute measure.
For the purpose of comparing variation among different series, a relative
mean deviation is required.
The relative mean deviation is obtained by dividing the mean deviation by
the average used for calculating mean deviation.
Mean deviation
Coefficient of mean deviation=
Mean or Median or Mode
Dr. Nalliah M Module-1 July 24, 2020 4 / 40
Mean deviation-Ungrouped data
Calculate the average(mean, median or mode) of the series.
Take the deviations of items from average ignoring signs and denote
these deviations by |D|.
P
Compute the total of these deviations, i.e., |D|
Divide this total obtained by the number of items.
P
|D|
M.D = .
n
Dr. Nalliah M Module-1 July 24, 2020 5 / 40
Problem
Calculate mean deviation from mean and median for the following data:
100,150,200,250,360,490,500,600,671. Also,calculate coefficients of M.D.
Dr. Nalliah M Module-1 July 24, 2020 6 / 40
Mean deviation-Grouped data:Discrete Series
Calculate the average( mean, median or mode) of the series.
Find out the deviation of the variable values from the average,
ignoring signs and denote these deviations by |D|.
P
Compute f |D|
Divide this total obtained by the total frequency N.
P
f |D|
M.D = .
N
Dr. Nalliah M Module-1 July 24, 2020 7 / 40
Problem
Calculate mean deviation from mean and median for the following data:
Also,calculate coefficients of M.D.
Height in cms Number of persons
158 15
159 20
160 32
161 35
162 33
163 22
164 20
165 10
166 8
Dr. Nalliah M Module-1 July 24, 2020 8 / 40
Mean deviation-Grouped data:continuous Series
Calculate the average (mean, median or mode) of the series.
Compute |D| = |Mid − average|, where Mid is the mid point of class
interval.
P
Compute f |D|
Divide this total obtained by the total frequency N.
P
f |D|
M.D = .
N
Dr. Nalliah M Module-1 July 24, 2020 9 / 40
Problem
Calculate mean deviation from mean and median for the following data:
Also,calculate coefficients of M.D.
Marks Number of students
0-10 20
10-20 25
20-30 32
30-40 40
40-50 42
50-60 35
60-70 10
70-80 8
Dr. Nalliah M Module-1 July 24, 2020 10 / 40
Solution
To find Mean and Median.
P
fMid
Mean x̄ = N , where Mid is the mid point of the class interval
and N is the total frequency.
Therefore, x̄ = 36.5
To find Median
Now, N2 = 212
2 = 106
Therefore, Median class is 30-40
!
N
− cf 2
Median Md = l + h
f
Here l = 30, cf = 77, f = 40, h = 10.
Hence Median Md = 37.25.
Dr. Nalliah M Module-1 July 24, 2020 11 / 40
Solution Cont...
Here |D1 | = |Mid − x̄| and |D2 | = |Mid − Md|,x̄ = 36.5 and Md = 37.25.
Class Mid f fMid c.f |D1 | f |D1 | |D2 | f |D2 |
0-10 5 20 100 20 31.5 630 32.5 645
10-20 15 25 375 45 21.5 537.5 22.25 556.25
20-30 25 32 800 77 11.5 368 12.25 392
30-40 35 40 1400 117 1.5 60 2.25 90
40-50 45 42 1890 159 8.5 357 7.75 325.5
50-60 55 35 1925 194 18.5 647.5 17.75 621.25
60-70 65 10 650 204 28.5 285 27.75 277.5
70-80 75 8 600 212 38.5 308 37.75 302
Total 212 7740 3192.5 3209.5
Dr. Nalliah M Module-1 July 24, 2020 12 / 40
P
f |D1 | 3192.5
Mean deviation from mean M.D1 = N = 212 = 15.06
Mean deviation 15.06
Coefficient of mean deviation from mean= = 36.5 = 0.41
Mean
P
f |D2 | 3209.5
Mean deviation from median M.D2 = N = 212 = 15.14
Coefficient of mean deviation from
Mean deviation 15.14
Median= = 37.25 = 0.41
Median
Dr. Nalliah M Module-1 July 24, 2020 13 / 40
Standard Deviation and Coefficient of variation
It is defined as the positive square-root of the arithmetic mean of the
Square of the deviations of the given observation from their arithmetic
mean.
The standard deviation is denoted by σ.
Standard deviation for Ungrouped data
qP
(x−x̄)2
σ= n , where x̄ is the mean.
r P 2
P 2
d d
σ= n − n , where d = x − A, A is assumed value.
Problem
Calculate the standard deviation from the following data.
14, 22, 9, 15, 20, 17, 12, 11.
Dr. Nalliah M Module-1 July 24, 2020 14 / 40
Coefficient of Variation
The Standard deviation is an absolute measure of dispersion. It is
expressed in terms of units in which the original figures are collected and
stated. The standard deviation of heights of students cannot be compared
with the standard deviation of weights of students, as both are expressed
in different units, i.e heights in centimeter and weights in kilograms.
Therefore the standard deviation must be converted into a relative
measure of dispersion for the purpose of comparison. The relative measure
is known as the coefficient of variation.
standard deviation σ
Coefficient of variation(C.V) = × 100 = x̄ × 100.
Mean
Dr. Nalliah M Module-1 July 24, 2020 15 / 40
Coefficient of Variation
If we want to compare the variability of two or more series, we can use C.V.
The series or groups of data for which the C.V. is greater indicate that the
group is more variable, less stable, less uniform, less consistent or less
homogeneous.
If the C.V. is less, it indicates that the group is less variable, more stable,
more uniform, more consistent or more homogeneous.
Dr. Nalliah M Module-1 July 24, 2020 16 / 40
Standard deviation for Grouped data-Discrete Series
qP
fd 2
σ= N , where d = x − x̄ and N is total frequency.
r P 2
fd 2
P
σ= N − Nfd , where d = x − A, A is assumed value.
r P 2
fd 02 fd 0
P
σ= N − N × c, where d 0 = x−A
c , A is assumed value
and c is the interval between each value.
Problem
Calculate Standard deviation from the following data.
X 20 22 25 31 35 40 42 45
f 5 12 15 20 25 14 10 6
Dr. Nalliah M Module-1 July 24, 2020 17 / 40
Standard deviation for Grouped data- Continuous Series
r P 2
fd 2
P
fd x−A
σ= N − N × c, where d = c , A is assumed value and
c is the interval between each value. N is total frequency.
Problem
Calculate Standard Deviation for the following data.
Marks Number of students
0-10 6
10-20 5
20-30 8
30-40 15
40-50 7
50-60 6
60-70 3
Dr. Nalliah M Module-1 July 24, 2020 18 / 40
Solution
m−35
x Mid=m f d= 10 fd fd 2
0-10 5 6 -3 -18 54
10-20 15 5 -2 -10 20
20-30 25 8 -1 -8 8
30-40 35 15 0 0 0
40-50 45 7 1 7 7
50-60 55 6 2 12 24
60-70 65 3 3 9 27
Total N=50 -8 140
Dr. Nalliah M Module-1 July 24, 2020 19 / 40
Solution Cont...
sP P 2
fd 2 fd
σ= − ×c
N N
s 2
140 −8
= − × 10
50 50
r
140 64
= + × 10
50 2500
r
7000 64
= + × 10
2500 2500
r
7064
= × 10
2500
= 1.68 × 10 = 16.8
Dr. Nalliah M Module-1 July 24, 2020 20 / 40
Problem
Find the coefficient of variation for the above problem.
Problem
Prices of a particular commodity in five years in two cities are given below:
Price in City A Price in City B
20 10
22 20
19 18
23 12
16 15
Which city has more stable prices?
Dr. Nalliah M Module-1 July 24, 2020 21 / 40
Moments
Moments can be defined as the arithmetic mean of various powers of
deviations taken from the mean of a distribution. These moments are
known as central moments.
Dr. Nalliah M Module-1 July 24, 2020 22 / 40
Moments
The first four moments about arithmetic mean or central moments are
defined below.
The firsr four Individual Discrete Continuous
Central moments series series series
P P P
(x−x̄) f (x−x̄) f (mid−x̄)
µ1 n =0 N =0 N =0
(x−x̄)2 f (x−x̄)2 f (mid−x̄)2
P P P
µ2 n = σ2 N N
(x−x̄)3 f (x−x̄)3 f (mid−x̄)3
P P P
µ3 n N N
(x−x̄)4 f (x−x̄)4 f (mid−x̄)4
P P P
µ4 n N N
Dr. Nalliah M Module-1 July 24, 2020 23 / 40
If the mean is a fractional value, then it becomes a difficult task to work
out the moments. In such cases, we can calculate moments about a
working origin and then change it into moments about the actual mean.
The moments about an origin are known as raw moments.
Dr. Nalliah M Module-1 July 24, 2020 24 / 40
The first four raw moments are defined below.
The first Individual series Discrete series Continuous series
X −A mid−A
four Raw d1 = X − A d2 = C d3 = C
Moments A- any origin C -common point c-class interval width
P P P
d1 fd2 fd3
µ01 n N ×C N ×C
P 2
fd22 fd32
P P
d1
µ02 n N × C2 N × C2
P 3 P 3 P 3
d1 fd2 fd3
µ03 n N × C3 N × C3
P 4 P 4 P 4
d1 fd2 fd3
µ04 n N × C4 N × C4
Dr. Nalliah M Module-1 July 24, 2020 25 / 40
Relationship between Raw Moments and Central moments
Relation between moments about arithmetic mean and moments about an
origin are given below.
µ1 = µ01 − µ01 = 0
µ2 = µ02 − (µ01 )2
µ3 = µ03 − 3µ01 µ02 + 2(µ01 )3
µ4 = µ04 − 4µ03 µ01 + 6µ02 (µ01 )2 − 3(µ01 )4
Dr. Nalliah M Module-1 July 24, 2020 26 / 40
Problem
From the data given below, calculate the first four raw and central
moments
X f
30-33 2
33-36 4
36-39 26
39-42 47
42-45 15
45-48 6
Dr. Nalliah M Module-1 July 24, 2020 27 / 40
Skewness
Skewness means ’lack of symmetry’. We study skewness to have an idea
about the shape of the curve which we can draw with the help of the given
data.If in a distribution mean= median = mode, then that distribution is
known as symmetrical distribution. If in a distribution mean 6= median 6=
mode, then it is not a symmetrical distribution and it is called a skewed
distribution and such a distribution could either be positively skewed or
negatively skewed.
Dr. Nalliah M Module-1 July 24, 2020 28 / 40
Symmetrical distribution
It is clear from the above diagram that in a symmetrical distribution the
values of mean, median and mode coincide. The spread of the frequencies
is the same on both sides of the center point of the curve.
Dr. Nalliah M Module-1 July 24, 2020 29 / 40
Positively skewed distribution
It is clear from the above diagram, in a positively skewed distribution, the
value of the mean is maximum and that of the mode is least, the median
lies in between the two. In the positively skewed distribution the
frequencies are spread out over a greater range of values on the right hand
side than they are on the left hand side.
Dr. Nalliah M Module-1 July 24, 2020 30 / 40
Negatively skewed distribution
It is clear from the above diagram, in a negatively skewed distribution, the
value of the mode is maximum and that of the mean is least. The median
lies in between the two. In the negatively skewed distribution the
frequencies are spread out over a greater range of values on the left hand
side than they are on the right hand side.
Dr. Nalliah M Module-1 July 24, 2020 31 / 40
Measures of skewness
The important measures of skewness are
Karl-Pearason’s coefficient of skewness
Bowley’s coefficient of skewness
Measure of skewness based on moments
Dr. Nalliah M Module-1 July 24, 2020 32 / 40
Karl Pearson’s Coefficient of skewness
According to Karl Pearson, the absolute measure of skewness = mean
mode. This measure is not suitable for making valid comparison of the
skewness in two or more distributions because the unit of measurement
may be different in different series. To avoid this difficulty use relative
measure of skewness called Karl Pearson s coefficient of skewness given
by:
Mean − Mode
Karl Pearson’s Coefficient Skewness = ,
σ
where σ-Standard Deviation.
In case of mode is ill-defined, the coefficient can be determined by the
formula:
3(Mean − MMedian)
Coefficient of skewness = ,
σ
where σ-Standard Deviation.
Dr. Nalliah M Module-1 July 24, 2020 33 / 40
Bowley’s Coefficient of skewness
In Karl Pearson’s method of measuring skewness the whole of the series is
needed. Prof. Bowley has suggested a formula based on relative position
of quartiles. In a symmetrical distribution, the quartiles are equidistant
from the value of the median; ie., Median-Q1 =Q3 -Median. But in a
skewed distribution, the quartiles will not be equidistant from the median.
Hence Bowley has suggested the following formula:
Q3 + Q1 − 2Median
Bowley’s Coefficient of skewness= .
Q3 − Q1
Dr. Nalliah M Module-1 July 24, 2020 34 / 40
Measure of skewness based on moments
The measure of skewness based on moments is denoted by β1 and is given
by:
µ23
β1 = .
µ32
Dr. Nalliah M Module-1 July 24, 2020 35 / 40
Kurtosis
The expression ’Kurtosis’ is used to describe the peakedness of a curve.
The three measures central tendency, dispersion and skewness describe
the characteristics of frequency distributions. But these studies will not
give us a clear picture of the characteristics of a distribution.
As far as the measurement of shape is concerned, we have two
characteristics skewness which refers to asymmetry of a series and
kurtosis which measures the peakedness of a normal curve. All the
frequency curves expose different degrees of flatness or peakedness. This
characteristic of frequency curve is termed as kurtosis.
Dr. Nalliah M Module-1 July 24, 2020 36 / 40
Measure of kurtosis denote the shape of top of a frequency curve. Measure
of kurtosis tell us the extent to which a distribution is more peaked or more
flat topped than the normal curve, which is symmetrical and bellshaped, is
designated as Mesokurtic. If a curve is relatively more narrow and peaked
at the top, it is designated as Leptokurtic. If the frequency curve is more
flat than normal curve, it is designated as platykurtic.
Dr. Nalliah M Module-1 July 24, 2020 37 / 40
Dr. Nalliah M Module-1 July 24, 2020 38 / 40
Measure of Kurtosis
The measure of kurtosis of a frequency distribution based moments is
denoted by β2 and is given by
µ4
β2 = .
µ22
If β2 = 3, the distribution is said to be normal and the curve is
mesokurtic.
If β2 > 3, the distribution is said to be more peaked and the curve is
leptokurtic.
If β2 < 3, the distribution is said to be flat topped and the curve is
platykurtic.
Dr. Nalliah M Module-1 July 24, 2020 39 / 40
Thank you
Dr. Nalliah M Module-1 July 24, 2020 40 / 40