Chapter-3-Descriptive Statistic
Chapter-3-Descriptive Statistic
STSTISTICS
CHAPTER 3
3. DESCRIPTIVE STSTISTICS
Objectives:
To comprehend the data easily.
To facilitate comparison.
To make further statistical analysis.
The expression is read, "the sum of X sub i from i equals 1 to N." It means "add up all the
numbers."
Page 1 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Example: Suppose the following were scores made on the first homework assignment for
five students in the class: 5, 7, 7, 6, and 8. In this example set of five numbers, where
N=5, the summation could be written:
The "i=1" in the bottom of the summation notation tells where to begin the sequence of
summation. If the expression were written with "i=3", the summation would start with the
third number in the set. For example:
In the example set of numbers, this would give the following result:
The "N" in the upper part of the summation notation tells where to end the sequence of
summation. If there were only three scores then the summation and example would be:
For example:
PROPERTIES OF SUMMATION
Page 2 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
4.
X Y
5 6
7 7
7 8
6 7
8 8
a)
b)
c)
d)
Page 3 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
e)
f)
g)
h)
Solutions:
a)
b)
c)
d)
e)
f)
g)
h)
Page 4 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Is defined as the sum of the magnitude of the items divided by the number of
items.
The mean of X1, X2 ,X3 …Xn is denoted by A.M ,m or and is given by:
If X1 occurs f1 times
If X2occurs f2 times
.
.
of classes and
Solution:
Xi fi Xifi
2 2 4
3 1 3
7 3 21
8 1 8
Total 7 36
If data are given in the shape of a continuous frequency distribution, then the mean is
obtained as follows:
Page 5 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
class
Example: calculate the mean for the following age distribution.
Class Frequency
6- 10 35
11- 15 23
16- 20 15
21- 25 12
26- 30 9
31- 35 6
Solutions:
First find the class marks
Find the product of frequency and class marks
Find mean using the formula.
Class fi Xi Xifi
6- 10 35 8 280
11- 15 23 13 299
16- 20 15 18 270
21- 25 12 23 276
26- 30 9 28 252
31- 35 6 33 198
Total 100 1575
Exercises:
1. Marks of 75 students are summarized in the following frequency distribution:
Page 6 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
45-49 10
50-54 22
55-59 f4
60-64 f5
65-69 6
70-74 3
If 20% of the students have marks between 55 and 59
i. Find the missing frequencies f4 and f5.
ii. Find the mean.
If the values in a series or mid values of a class are large enough, coding of values is a
good device
to simplify the calculations.
For raw data suppose we have used the following coding system.
In both cases the true mean is the assumed mean plus the average of the deviations
from the assumed mean.
Suppose the data is given in the shape of continuous frequency distribution with a
constant class size of w then the following coding is appropriate.
Page 7 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Solutions:
a)
Page 8 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
2. The sum of the squared deviations of a set of items from their mean is the
minimum. i.e.
Solutions:
4. If a wrong figure has been used when calculating the mean the correct mean can be
obtained with out repeating the whole process using:
Page 9 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Example:
1. The mean of n Tetracycline Capsules X1, X2, …,Xn are known to be 12 gm.
New set of capsules of another drug are obtained by the linear transformation
Yi = 2Xi – 0.5 ( i = 1, 2, …, n ) then what will be the mean of the new set of
capsules
Solutions:
Solutions:
Weighted Mean
When a proper importance is desired to be given to different data a weighted mean
is appropriate.
Weights are assigned to each item in proportion to its relative importance.
Let X1, X2, …Xn be the value of items of a series and W 1, W2, …Wn their
corresponding weights , then the weighted mean denoted is defined as:
Page 10 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Example:
A student obtained the following percentage in an examination:
English 60, Biology 75, Mathematics 63, Physics 59, and chemistry [Link] the
students weighted arithmetic mean if weights 1, 2, 1, 3, 3 respectively are allotted
to the subjects.
Solutions:
Page 11 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
The geometric mean of a set of n observation is the nth root of their product.
The geometric mean of X1, X2 ,X3 …Xn is denoted by G.M and given by:
The logarithm of the G.M of a set of observation is the arithmetic mean of their
logarithm.
Example:
Find the G.M of the numbers 2, 4, 8.
Solutions:
Remark:
The Geometric Mean is useful and appropriate for finding averages of ratios.
The harmonic mean of X1, X2 , X3 …Xn is denoted by H.M and given by:
If observations X1, X2, …Xn have weights W1, W2, …Wn respectively, then their
harmonic mean is given by
Page 12 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Remark:- The Harmonic Mean is useful and appropriate in finding average speeds and
average rates.
Example: A cyclist pedals from his house to his college at speed of 10 km/hr and back
from the college to his house at 15 km/hr. Find the average speed.
Solution: Here the distance is constant
The simple H.M is appropriate for this problem.
X1= 10km/hr X2=15km/hr
The Mode
- Mode is a value which occurs most frequently in a set of values
- The mode may not exist and even if it does exist, it may not be unique.
- In case of discrete distribution the value having the maximum frequency is the model
value.
Examples:
1. Find the mode of 5, 3, 5, 8, 9
Mode =5
2. Find the mode of 8, 9, 9, 7, 8, 2, and 5.
It is a bimodal Data: 8 and 9
3. Find the mode of 4, 12, 3, 6, and 7.
No mode for this data.
- The mode of a set of numbers X1, X2, …, Xn is usually denoted by .
If data are given in the shape of continuous frequency distribution, the mode is defined
as:
Page 13 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Where:
Example: Following is the distribution of the size of certain farms selected at random
from a district. Calculate the mode of the distribution.
Page 14 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Merits:
It is not affected by extreme observations.
Easy to calculate and simple to understand.
It can be calculated for distribution with open end class
Demerits:
It is not rigidly defined.
It is not based on all observations
It is not suitable for further mathematical treatment.
It is not stable average, i.e. it is affected by fluctuations of sampling to
some extent.
Often its value is not unique.
Note: being the point of maximum density, mode is especially useful in finding the most
popular size in studies relating to marketing, trade, business, and industry. It is the
appropriate average to be used to find the ideal size.
The Median
- In a distribution, median is the value of the variable which divides it in to two
equal halves.
- In an ordered series of data median is an observation lying exactly in the middle of the
series. It is the middle most value in the sense that the number of values less than the median is
equal to the number of values greater than it.
-If X1, X2, …Xn be the observations, then the numbers arranged in ascending order
th
will be X[1], X[2], …X[n], where X[i] is i smallest value.
X[1]< X[2]< …<X[n]
-Median is denoted by .
Page 15 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Solutions:
a) First order the data: 2, 4, 5, 6, 8, 9
Here n=6
If data are given in the shape of continuous frequency distribution, the median is defined
as:
Remark
The median class is the class with the smallest cumulative frequency (less than type) greater
than or equal to .
Example: Find the median of the following distribution.
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
Page 16 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
60-64 12
65-69 6
70-74 3
Solutions:
First find the less than cumulative frequency.
Identify the median class.
Find median using formula.
Page 17 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Merits:
Median is a positional average and hence not influenced by extreme observations.
Can be calculated in the case of open end intervals.
Median can be located even if the data are incomplete.
Demerits:
It is not a good representative of data if the number of items is small.
It is not amenable to further algebraic treatment.
It is susceptible to sampling fluctuations.
Quantiles
When a distribution is arranged in order of magnitude of items, the median is the value of the
middle term. Their measures that depend up on their positions in distribution quartiles, deciles,
and percentiles are collectively called quantiles.
Quartiles:
- Quartiles are measures that divide the frequency distribution in to four equal parts.
- The value of the variables corresponding to these divisions are denoted Q 1, Q2, and Q3
often called the first, the second and the third quartile respectively.
- Q1 is a value which has 25% items which are less than or equal to it. Similarly Q 2 has
50%items with value less than or equal to it and Q 3 has 75% items whose values are
less than or equal to it.
- To find Qi (i=1, 2, 3) we count of the classes beginning from the lowest class.
Page 18 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Remark:
The quartile class (class containing Qi ) is the class with the smallest cumulative frequency
(less than type) greater than or equal to .
Deciles:
- Deciles are measures that divide the frequency distribution in to ten equal parts.
- The values of the variables corresponding to these divisions are denoted D 1, D2,.. D9
often called the first, the second,…, the ninth decile respectively.
- To find Di (i=1, 2,..9) we count of the classes beginning from the lowest class.
Remark:
The decile class (class containing D i )is the class with the smallest cumulative frequency
(less than type) greater than or equal to .
Page 19 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Percentiles:
- Percentiles are measures that divide the frequency distribution in to hundred equal
parts.
- The values of the variables corresponding to these divisions are denoted P 1, P2,.. P99
often called the first, the second,…, the ninety-ninth percentile respectively.
- To find Pi (i=1, 2,..99) we count of the classes beginning from the lowest class.
Remark:
The percentile class (class containing Pi )is the class with the smallest cumulative
frequency (less than type) greater than or equal to .
Example: Considering the following distribution
Calculate:
a) All quartiles.
b) The 7th decile.
c) The 90th percentile.
Values Frequency
140- 150 17
150- 160 29
160- 170 42
170- 180 72
180- 190 84
Page 20 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
200- 210 49
210- 220 34
220- 230 31
230- 240 16
240- 250 12
Solutions:
First find the less than cumulative frequency.
Use the formula to calculate the required quantile.
a) Quartiles:
i. Q1
- determine the class containing the first quartile.
Page 21 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
ii. Q2
- determine the class containing the second quartile.
iii. Q3
- determine the class containing the third quartile.
b) D7
- determine the class containing the 7th decile.
Page 22 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
c) P90
- determine the class containing the 90th percentile.
Page 23 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Measures of dispersions are statistical measures which provide ways of measuring the
extent in which data are dispersed or spread out.
Objectives of measuring Variation:
To judge the reliability of measures of central tendency
To control variability itself.
To compare two or more groups of numbers in terms of their variability.
To make further statistical analysis.
Absolute and Relative Measures of Dispersion
The measures of dispersion which are expressed in terms of the original unit of a series
are termed as absolute measures. Such measures are not suitable for comparing the
variability of two distributions which are expressed in different units of measurement and
different average size. Relative measures of dispersions are a ratio or percentage of a
measure of absolute dispersion to an appropriate measure of central tendency and are thus
pure numbers independent of the units of measurement. For comparing the variability of
two distributions (even if they are measured in the same unit), we compute the relative
measure of dispersion instead of absolute measures of dispersion.
Types of Measures of Dispersion
Various measures of dispersions are in use. The most commonly used measures of
dispersions are:
1) Range and relative range
2) Quartile deviation and coefficient of Quartile deviation
3) Mean deviation and coefficient of Mean deviation
4) Standard deviation and coefficient of variation.
Page 24 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
The range is the largest score minus the smallest score. It is a quick and dirty measure of
variability, although when a test is given back to students they very often wish to know
the range of scores. Because the range is greatly affected by extreme scores, it may give a
distorted picture of the scores. The following two distributions have the same range, 13,
yet appear to differ greatly in the amount of variability.
Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45
Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45
For this reason, among others, the range is not the most important measure of variability.
Page 25 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Example:
1. Find the relative range of the above two distribution.(exercise!)
2. If the range and relative range of a series are 4 and 0.25 respectively. Then
what is the value of:
a) Smallest observation
b) Largest observation
Solutions :( 2)
The inter quartile range is the difference between the third and the first
quartiles of a set of items and semi-inter quartile range is half of the inter
quartile range.
It gives the average amount by which the two quartiles differ from the
median.
Page 26 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Example: Compute Q.D and its coefficient for the following distribution.
Values Frequency
140- 150 17
150- 160 29
160- 170 42
170- 180 72
180- 190 84
190- 200 107
200- 210 49
210- 220 34
220- 230 31
230- 240 16
240- 250 12
Solutions:
In the previous chapter we have obtained the values of all quartiles as:
Q1= 174.90, Q2= 190.23, Q3=203.83
Remark: Q.D or C.Q.D includes only the middle 50% of the observation.
The mean deviation of a set of items is defined as the arithmetic mean of the
values of the absolute deviations from a given average. Depending up on the
type of averages used we have different mean deviations.
a) Mean Deviation about the mean
Denoted by M.D( ) and given by
Page 27 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Page 28 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Examples:
1. The following are the number of visit made by ten mothers to the local
doctor’s surgery. 8, 6, 5, 5, 7, 4, 5, 9, 7, 4
Find mean deviation about mean, median and mode.
Solutions:
First calculate the three averages
2. Find mean deviation about mean, median and mode for the following
distributions.(exercise)
Class Frequency
Page 29 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Example: calculate the C.M.D about the mean, median and mode for the
data in example 1 above.
Solutions:
Page 30 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
The Variance
Population Variance
If we divide the variation by the number of values in the population, we
get something called the population variance. This variance is the "average
squared deviation from the mean".
Sample Variance
Page 31 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Standard Deviation
There is a problem with variances. Recall that the deviations were squared.
That means that the units were also squared. To
Class Frequency get the units back the same as the original data
40-44 7 values, the square root must be taken.
45-49 10
50-54 22
55-59 15
The following steps are used to calculate the
60-64 12 sample variance:
65-69 6
70-74 3 1. Find the arithmetic mean.
2. Find the difference between each observation and the mean.
3. Square these differences.
4. Sum the squared differences.
5. Since the data is a sample, divide the number (from step 4 above) by the
number of observations minus one, i.e., n-1 (where n is equal to the number
of observations in the data set).
Examples: Find the variance and standard deviation of the following sample data
1. 5, 17, 12, 10.
2. The data is given in the form of frequency distribution.
Solutions:
1.
Xi 5 10 12 17 Total
2
(Xi- 36 1 1 36 74
2.
Page 32 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Xi(C.M) 42 47 52 57 62 67 72 Total
fi(Xi- 2 1183 640 198 60 588 864 867 4400
1.
2. For normal (symmetric distribution the following holds.
Approximately 68.27% of the data values fall within one standard
deviation of the mean. i.e. with in
Approximately 95.45% of the data values fall within two standard
deviations of the mean. i.e. with in
Approximately 99.73% of the data values fall within three standard
deviations of the mean. i.e. with in
3. Chebyshev's Theorem
For any data set ,no matter what the pattern of variation, the proportion
of the values that fall with in k standard deviations of the mean or
will be at least , where k is a number greater than 1. i.e.
the proportion of items falling beyond k standard deviations of the mean is
at most
Example: Suppose a distribution has mean 50 and standard deviation
[Link] percent of the numbers are:
a) Between 38 and 62
b) Between 32 and 68
c) Less than 38 or more than 62.
d) Less than 32 or more than 68.
Page 33 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Solutions:
a) 38 and 62 are at equal distance from the mean,50 and this distance is 12
A
By applying the above theorem at least of the numbers lie
between 38 and 62.
b) Similarly done.
c) It is just the complement of a) i.e. at most of the numbers
lie less than 32 or more than 62.
d) Similarly done.
Example 2:
The average score of a special test of knowledge of wood refinishing has
a mean of 53 and standard deviation of 6. Find the range of values in which
at least 75% the scores will lie. (Exercise)
4. If the standard deviation of , then the standard
deviation of
a)
b)
c)
Exercise: Verify each of the above relationship, considering k and a as
constants.
Examples:
1. The mean and standard deviation of n Tetracycline Capsules
are known to be 12 gm and 3 gm respectively. New set of
capsules of another drug are obtained by the linear transformation Yi =
2Xi – 0.5 ( i = 1, 2, …, n ) then what will be the standard deviation of the
new set of capsules
2. The mean and the standard deviation of a set of numbers are respectively
500 and 10.
Page 34 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Solutions:
1. Using c) above the new standard deviation =
2. a. They will remain the same.
b. New standard deviation=
Solutions:
Calculate coefficient of variation for both firms.
Since [Link] < [Link], in firm B there is greater variability in individual wages.
Page 35 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Solutions:
Calculate the standard score of both students.
Page 36 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Solutions:
a) Use coefficient of variation.
Page 37 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
taken by child A is only one standard deviation shorter than the average time
taken by group 1.
Skewness
- Skewness is the degree of asymmetry or departure from symmetry of a
distribution.
- A skewed frequency distribution is one that is not symmetrical.
- Skewness is concerned with the shape of the curve not size.
- If the frequency curve (smoothed frequency polygon) of a distribution has
a longer tail to the right of the central maximum than to the left, the
distribution is said to be skewed to the right or said to have positive
skewness. If it has a longer tail to the left of the central maximum than to
the right, it is said to be skewed to the left or said to have negative
skewness.
- For moderately skewed distribution, the following relation holds among
the three commonly used measures of central tendency.
Measures of Skewness
-Denoted by
-There are various measures of skewness.
1. The Pearsonian coefficient of skewness
Note:
Page 38 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Remark:
o In a positively skewed distribution, smaller observations are more
frequent than larger observations. i.e. the majority of the observations
have a value below an average.
o In a negatively skewed distribution, smaller observations are less
frequent than larger observations. i.e. the majority of the observations
have a value above an average.
Examples:
1. Suppose the mean, the mode, and the standard deviation of a certain
distribution are 32, 30.5 and 10 respectively. What is the shape of the
curve representing the distribution?
Solutions:
Use the Pearsonian coefficient of skewness
Solutions:
Given: Required:
Page 39 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Solutions: (exercise)
4. For a moderately skewed frequency distribution, the mean is 10 and the
median is 8.5. If the coefficient of variation is 20%, find the Pearsonian
coefficient of skewness and the probable mode of the distribution.
(exercise)
5. The sum of fifteen observations, whose mode is 8, was found to be 150
with coefficient of variation of 20%
(a) Calculate the pearsonian coefficient of skewness and give
appropriate conclusion.
(b) Are smaller values more or less frequent than bigger values for this
distribution?
(c) If a constant k was added on each observation, what will be the
new pearsonian coefficient of skewness? Show your steps. What
do you conclude from this?
Solutions: (exercise)
Kurtosis
Page 40 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Solutions:
a)
b)
Page 41 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS
Page 42 of 42