0% found this document useful (0 votes)
13 views42 pages

Chapter-3-Descriptive Statistic

This ppt doc provides information about descriptive stastics.

Uploaded by

Gebre Garmame
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views42 pages

Chapter-3-Descriptive Statistic

This ppt doc provides information about descriptive stastics.

Uploaded by

Gebre Garmame
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd

Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE

STSTISTICS

CHAPTER 3

3. DESCRIPTIVE STSTISTICS

3.1 measures of central tendency


Introduction
 When we want to make comparison between groups of numbers it is good to have a
single value that is considered to be a good representative of each group. This single value
is called the average of the group. Averages are also called measures of central tendency.
 An average which is representative is called typical average and an average which is
not representative and has only a theoretical value is called a descriptive average. A typical
average should posses the following:
 It should be rigidly defined.
 It should be based on all observation under investigation.
 It should be as little as affected by extreme observations.
 It should be capable of further algebraic treatment.
 It should be as little as affected by fluctuations of sampling.
 It should be ease to calculate and simple to understand.

Objectives:
 To comprehend the data easily.
 To facilitate comparison.
 To make further statistical analysis.

The Summation Notation:


 Let X1, X2, X3… XN be a number of measurements where N is the total number of
observation and Xi is ith observation.
 Very often in statistics an algebraic expression of the form X 1+X2+X3+...+XN is
used in a formula to compute a statistic. It is tedious to write an expression like this
very often, so mathematicians have developed a shorthand notation to represent a
sum of scores, called the summation notation.

 The symbol is a mathematical shorthand for X1+X2+X3+...+XN

The expression is read, "the sum of X sub i from i equals 1 to N." It means "add up all the
numbers."

Page 1 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

Example: Suppose the following were scores made on the first homework assignment for
five students in the class: 5, 7, 7, 6, and 8. In this example set of five numbers, where
N=5, the summation could be written:

The "i=1" in the bottom of the summation notation tells where to begin the sequence of
summation. If the expression were written with "i=3", the summation would start with the
third number in the set. For example:

In the example set of numbers, this would give the following result:

The "N" in the upper part of the summation notation tells where to end the sequence of
summation. If there were only three scores then the summation and example would be:

 Sometimes if the summation notation is used in an expression and the expression


must be written a number of times, as in a proof, then a shorthand notation for the
shorthand notation is employed. When the summation sign "" is used without
additional notation, then "i=1" and "N" are assumed.

For example:

PROPERTIES OF SUMMATION

Page 2 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

1. where k is any constant

2. where k is any constant

3. where a and b are any constant

4.

The sum of the product of the two variables could be written:

Example: considering the following data determine

X Y

5 6

7 7

7 8

6 7

8 8

a)

b)

c)

d)

Page 3 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

e)

f)

g)

h)

Solutions:

a)

b)

c)

d)

e)

f)

g)

h)

Types of measures of central tendency


There are several different measures of central tendency; each has its advantage and
disadvantage.
 The Mean (Arithmetic, Geometric and Harmonic)
 The Mode
 The Median
 Quantiles (Quartiles, Deciles and Percentiles)
The choice of these averages depends up on which best fit the property under discussion.

The Arithmetic Mean

Page 4 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

 Is defined as the sum of the magnitude of the items divided by the number of
items.
 The mean of X1, X2 ,X3 …Xn is denoted by A.M ,m or and is given by:

 If X1 occurs f1 times
 If X2occurs f2 times
 .
 .

 If Xn occurs fn times, then the mean will be , where k is the number

of classes and

Example: Obtain the mean of the following number


2, 7, 8, 2, 7, 3, 7

Solution:
Xi fi Xifi
2 2 4
3 1 3
7 3 21
8 1 8
Total 7 36

Arithmetic Mean for Grouped Data

If data are given in the shape of a continuous frequency distribution, then the mean is
obtained as follows:

Page 5 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

Xi =the class mark of the i th class and fi = the frequency of the i th

class
Example: calculate the mean for the following age distribution.
Class Frequency
6- 10 35
11- 15 23
16- 20 15
21- 25 12
26- 30 9
31- 35 6

Solutions:
 First find the class marks
 Find the product of frequency and class marks
 Find mean using the formula.
Class fi Xi Xifi
6- 10 35 8 280
11- 15 23 13 299
16- 20 15 18 270
21- 25 12 23 276
26- 30 9 28 252
31- 35 6 33 198
Total 100 1575

Exercises:
1. Marks of 75 students are summarized in the following frequency distribution:

Marks No. of students


40-44 7

Page 6 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

45-49 10
50-54 22
55-59 f4
60-64 f5
65-69 6
70-74 3
If 20% of the students have marks between 55 and 59
i. Find the missing frequencies f4 and f5.
ii. Find the mean.
 If the values in a series or mid values of a class are large enough, coding of values is a
good device
to simplify the calculations.
 For raw data suppose we have used the following coding system.

Where A is an assumed mean and is the mean of the coded data.


 If the data are expressed in terms of ungrouped frequency distribution

 In both cases the true mean is the assumed mean plus the average of the deviations
from the assumed mean.
 Suppose the data is given in the shape of continuous frequency distribution with a
constant class size of w then the following coding is appropriate.

Page 7 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

Where: Xi is the original class mark for the ith class.


di is the transformed class mark for the ith class.
A is an assumed mean usually the mean of the class marks.
(i =1, 2… k)
Example:
1. Suppose the deviations of the observations from an assumed mean of 7 are: 1, -
1, -2, -2, 0, -3, -2, 2, 0, -3.
a) Find the true mean
b) Find the original observation.

Solutions:

a)

The true mean is 6.


b) Using Xi=A+di we obtain the following original observations:
8, 6, 5, 5, 7, 4, 5, 9, 7, 4.

Special properties of Arithmetic mean


1. The sum of the deviations of a set of items from their mean is always zero. i.e.

Page 8 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

2. The sum of the squared deviations of a set of items from their mean is the

minimum. i.e.

3. If is the mean of observations


If is the mean of observations
If is the mean of observations
Then the mean of all the observation in all groups often called the combined mean is
given by:

Example: In a class there are 30 females and 70 males. If females averaged 60 in an


examination and boys averaged 72, find the mean for the entire class.

Solutions:

4. If a wrong figure has been used when calculating the mean the correct mean can be
obtained with out repeating the whole process using:

Where n is total number of observations.

Page 9 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

Example: An average weight of 10 students was calculated to be [Link] it was


discovered that one weight was misread as 40 instead of 80 k.g. Calculate the
correct average weight.
Solutions:

5. The effect of transforming original series on the mean.


a) If a constant k is added/ subtracted to/from every observation then the new
mean will be the old mean± k respectively.
b) If every observations are multiplied by a constant k then the new mean will
be k*old mean

Example:
1. The mean of n Tetracycline Capsules X1, X2, …,Xn are known to be 12 gm.
New set of capsules of another drug are obtained by the linear transformation
Yi = 2Xi – 0.5 ( i = 1, 2, …, n ) then what will be the mean of the new set of
capsules
Solutions:

2. The mean of a set of numbers is 500.


If 10 is added to each of the numbers in the set, then what will be the mean of the new
set?
a) If each of the numbers in the set are multiplied by -5, then what will be the
mean of the new set?

Solutions:

Weighted Mean
 When a proper importance is desired to be given to different data a weighted mean
is appropriate.
 Weights are assigned to each item in proportion to its relative importance.
 Let X1, X2, …Xn be the value of items of a series and W 1, W2, …Wn their
corresponding weights , then the weighted mean denoted is defined as:

Page 10 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

Example:
A student obtained the following percentage in an examination:
English 60, Biology 75, Mathematics 63, Physics 59, and chemistry [Link] the
students weighted arithmetic mean if weights 1, 2, 1, 3, 3 respectively are allotted
to the subjects.
Solutions:

Merits and Demerits of Arithmetic Mean


Merits:
 It is rigidly defined.
 It is based on all observation.
 It is suitable for further mathematical treatment.
 It is stable average, i.e. it is not affected by fluctuations of sampling to some extent.
 It is easy to calculate and simple to understand.
Demerits:
 It is affected by extreme observations.
 It can not be used in the case of open end classes.
 It can not be determined by the method of inspection.
 It can not be used when dealing with qualitative characteristics, such as intelligence,
honesty, beauty.
 It can be a number which does not exist in a serious.
 Some times it leads to wrong conclusion if the details of the data from which it is
obtained are not available.
 It gives high weight to high extreme values and less weight to low extreme values.

The Geometric Mean

Page 11 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

 The geometric mean of a set of n observation is the nth root of their product.
 The geometric mean of X1, X2 ,X3 …Xn is denoted by G.M and given by:

 Taking the logarithms of both sides

The logarithm of the G.M of a set of observation is the arithmetic mean of their
logarithm.

Example:
Find the G.M of the numbers 2, 4, 8.
Solutions:

Remark:
 The Geometric Mean is useful and appropriate for finding averages of ratios.

The Harmonic Mean

The harmonic mean of X1, X2 , X3 …Xn is denoted by H.M and given by:

, This is called simple harmonic mean.

In a case of frequency distribution:

If observations X1, X2, …Xn have weights W1, W2, …Wn respectively, then their
harmonic mean is given by

Page 12 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

, This is called Weighted Harmonic Mean.

Remark:- The Harmonic Mean is useful and appropriate in finding average speeds and
average rates.

Example: A cyclist pedals from his house to his college at speed of 10 km/hr and back
from the college to his house at 15 km/hr. Find the average speed.
Solution: Here the distance is constant
The simple H.M is appropriate for this problem.
X1= 10km/hr X2=15km/hr

The Mode
- Mode is a value which occurs most frequently in a set of values
- The mode may not exist and even if it does exist, it may not be unique.
- In case of discrete distribution the value having the maximum frequency is the model
value.

Examples:
1. Find the mode of 5, 3, 5, 8, 9
Mode =5
2. Find the mode of 8, 9, 9, 7, 8, 2, and 5.
It is a bimodal Data: 8 and 9
3. Find the mode of 4, 12, 3, 6, and 7.
No mode for this data.
- The mode of a set of numbers X1, X2, …, Xn is usually denoted by .

Mode for Grouped data

If data are given in the shape of continuous frequency distribution, the mode is defined
as:

Page 13 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

Where:

Note:- The modal class is a class with the highest frequency.

Example: Following is the distribution of the size of certain farms selected at random
from a district. Calculate the mode of the distribution.

Size of farms No. of farms


5-15 8
15-25 12
25-35 17
35-45 29
45-55 31
55-65 5
65-75 3
Solutions:

Page 14 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

Merits and Demerits of Mode

Merits:
 It is not affected by extreme observations.
 Easy to calculate and simple to understand.
 It can be calculated for distribution with open end class
Demerits:
 It is not rigidly defined.
 It is not based on all observations
 It is not suitable for further mathematical treatment.
 It is not stable average, i.e. it is affected by fluctuations of sampling to
some extent.
 Often its value is not unique.
Note: being the point of maximum density, mode is especially useful in finding the most
popular size in studies relating to marketing, trade, business, and industry. It is the
appropriate average to be used to find the ideal size.

The Median
- In a distribution, median is the value of the variable which divides it in to two
equal halves.
- In an ordered series of data median is an observation lying exactly in the middle of the
series. It is the middle most value in the sense that the number of values less than the median is
equal to the number of values greater than it.
-If X1, X2, …Xn be the observations, then the numbers arranged in ascending order
th
will be X[1], X[2], …X[n], where X[i] is i smallest value.
X[1]< X[2]< …<X[n]
-Median is denoted by .

Median for ungrouped data

Example: Find the median of the following numbers.


a) 6, 5, 2, 8, 9, 4.
b) 2, 1, 8, 3, 5, 8.

Page 15 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

Solutions:
a) First order the data: 2, 4, 5, 6, 8, 9
Here n=6

b) Order the data :1, 2, 3, 5, 8


Here n=5

Median for grouped data

If data are given in the shape of continuous frequency distribution, the median is defined

as:

Remark
The median class is the class with the smallest cumulative frequency (less than type) greater
than or equal to .
Example: Find the median of the following distribution.

Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15

Page 16 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

60-64 12
65-69 6
70-74 3

Solutions:
 First find the less than cumulative frequency.
 Identify the median class.
 Find median using formula.

Class Frequency [Link](less


than type)
40-44 7 7
45-49 10 17
50-54 22 39
55-59 15 54
60-64 12 66
65-69 6 72
70-74 3 75

Page 17 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

Merits and Demerits of Median

Merits:
 Median is a positional average and hence not influenced by extreme observations.
 Can be calculated in the case of open end intervals.
 Median can be located even if the data are incomplete.

Demerits:
 It is not a good representative of data if the number of items is small.
 It is not amenable to further algebraic treatment.
 It is susceptible to sampling fluctuations.
Quantiles

When a distribution is arranged in order of magnitude of items, the median is the value of the
middle term. Their measures that depend up on their positions in distribution quartiles, deciles,
and percentiles are collectively called quantiles.

Quartiles:
- Quartiles are measures that divide the frequency distribution in to four equal parts.
- The value of the variables corresponding to these divisions are denoted Q 1, Q2, and Q3
often called the first, the second and the third quartile respectively.
- Q1 is a value which has 25% items which are less than or equal to it. Similarly Q 2 has
50%items with value less than or equal to it and Q 3 has 75% items whose values are
less than or equal to it.
- To find Qi (i=1, 2, 3) we count of the classes beginning from the lowest class.

- For grouped data: we have the following formula

Page 18 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

Remark:
The quartile class (class containing Qi ) is the class with the smallest cumulative frequency
(less than type) greater than or equal to .
Deciles:
- Deciles are measures that divide the frequency distribution in to ten equal parts.
- The values of the variables corresponding to these divisions are denoted D 1, D2,.. D9
often called the first, the second,…, the ninth decile respectively.
- To find Di (i=1, 2,..9) we count of the classes beginning from the lowest class.

- For grouped data: we have the following formula

Remark:
The decile class (class containing D i )is the class with the smallest cumulative frequency
(less than type) greater than or equal to .

Page 19 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

Percentiles:
- Percentiles are measures that divide the frequency distribution in to hundred equal
parts.
- The values of the variables corresponding to these divisions are denoted P 1, P2,.. P99
often called the first, the second,…, the ninety-ninth percentile respectively.
- To find Pi (i=1, 2,..99) we count of the classes beginning from the lowest class.

- For grouped data: we have the following formula

Remark:

The percentile class (class containing Pi )is the class with the smallest cumulative
frequency (less than type) greater than or equal to .
Example: Considering the following distribution
Calculate:
a) All quartiles.
b) The 7th decile.
c) The 90th percentile.

Values Frequency

140- 150 17
150- 160 29
160- 170 42

170- 180 72
180- 190 84

Page 20 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

190- 200 107

200- 210 49
210- 220 34
220- 230 31
230- 240 16
240- 250 12

Solutions:
 First find the less than cumulative frequency.
 Use the formula to calculate the required quantile.

Values Frequency [Link](less


than type)
140- 150 17 17
150- 160 29 46
160- 170 42 88
170- 180 72 160
180- 190 84 244
190- 200 107 351
200- 210 49 400
210- 220 34 434
220- 230 31 465
230- 240 16 481
240- 250 12 493

a) Quartiles:
i. Q1
- determine the class containing the first quartile.

Page 21 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

ii. Q2
- determine the class containing the second quartile.

iii. Q3
- determine the class containing the third quartile.

b) D7
- determine the class containing the 7th decile.

Page 22 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

c) P90
- determine the class containing the 90th percentile.

Page 23 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

3.2 Measures of Dispersion (Variation)

Introduction and objectives of measuring Variation


The scatter or spread of items of a distribution is known as dispersion or variation. In
other words the degree to which numerical data tend to spread about an average value is
called dispersion or variation of the data.

Measures of dispersions are statistical measures which provide ways of measuring the
extent in which data are dispersed or spread out.
Objectives of measuring Variation:
 To judge the reliability of measures of central tendency
 To control variability itself.
 To compare two or more groups of numbers in terms of their variability.
 To make further statistical analysis.
Absolute and Relative Measures of Dispersion

The measures of dispersion which are expressed in terms of the original unit of a series
are termed as absolute measures. Such measures are not suitable for comparing the
variability of two distributions which are expressed in different units of measurement and
different average size. Relative measures of dispersions are a ratio or percentage of a
measure of absolute dispersion to an appropriate measure of central tendency and are thus
pure numbers independent of the units of measurement. For comparing the variability of
two distributions (even if they are measured in the same unit), we compute the relative
measure of dispersion instead of absolute measures of dispersion.
Types of Measures of Dispersion

Various measures of dispersions are in use. The most commonly used measures of
dispersions are:
1) Range and relative range
2) Quartile deviation and coefficient of Quartile deviation
3) Mean deviation and coefficient of Mean deviation
4) Standard deviation and coefficient of variation.

Page 24 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

The Range (R)

The range is the largest score minus the smallest score. It is a quick and dirty measure of
variability, although when a test is given back to students they very often wish to know
the range of scores. Because the range is greatly affected by extreme scores, it may give a
distorted picture of the scores. The following two distributions have the same range, 13,
yet appear to differ greatly in the amount of variability.

Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45

Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45

For this reason, among others, the range is not the most important measure of variability.

Range for grouped data:


If data are given in the shape of continuous frequency distribution, the range is computed
as:

This is sometimes expressed as:

Merits and Demerits of range


Merits:
 It is rigidly defined.
 It is easy to calculate and simple to understand.
Demerits:
 It is not based on all observation.
 It is highly affected by extreme observations.
 It is affected by fluctuation in sampling.
 It is not liable to further algebraic treatment.
 It cannot be computed in the case of open end distribution.

Page 25 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

 It is very sensitive to the size of the sample.


Relative Range (RR)
it is also sometimes called coefficient of range and given by:

Example:
1. Find the relative range of the above two distribution.(exercise!)
2. If the range and relative range of a series are 4 and 0.25 respectively. Then
what is the value of:
a) Smallest observation
b) Largest observation
Solutions :( 2)

The Quartile Deviation (Semi-inter quartile range), Q.D

The inter quartile range is the difference between the third and the first
quartiles of a set of items and semi-inter quartile range is half of the inter
quartile range.

Coefficient of Quartile Deviation (C.Q.D)

 It gives the average amount by which the two quartiles differ from the
median.

Page 26 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

Example: Compute Q.D and its coefficient for the following distribution.

Values Frequency
140- 150 17
150- 160 29
160- 170 42
170- 180 72
180- 190 84
190- 200 107
200- 210 49
210- 220 34
220- 230 31
230- 240 16
240- 250 12

Solutions:
In the previous chapter we have obtained the values of all quartiles as:
Q1= 174.90, Q2= 190.23, Q3=203.83

Remark: Q.D or C.Q.D includes only the middle 50% of the observation.

The Mean Deviation (M.D):

The mean deviation of a set of items is defined as the arithmetic mean of the
values of the absolute deviations from a given average. Depending up on the
type of averages used we have different mean deviations.
a) Mean Deviation about the mean
 Denoted by M.D( ) and given by

 For the case of frequency distribution it is given as:

Page 27 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

Steps to calculate M.D ( ):


1. Find the arithmetic mean,
2. Find the deviations of each reading from .
3. Find the arithmetic mean of the deviations, ignoring sign.

b) Mean Deviation about the median.


 Denoted by M.D( ) and given by

 For the case of frequency distribution it is given as:

Steps to calculate M.D ( ):

1. Find the median,


2. Find the deviations of each reading from .
3. Find the arithmetic mean of the deviations, ignoring sign.

c) Mean Deviation about the mode.


 Denoted by M.D( ) and given by

 For the case of frequency distribution it is given as:

Page 28 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

Steps to calculate M.D ( ):


1. Find the mode,
2. Find the deviations of each reading from .
3. Find the arithmetic mean of the deviations, ignoring sign.

Examples:
1. The following are the number of visit made by ten mothers to the local
doctor’s surgery. 8, 6, 5, 5, 7, 4, 5, 9, 7, 4
Find mean deviation about mean, median and mode.
Solutions:
First calculate the three averages

Then take the deviations of each observation from these averages.


Xi 4 4 5 5 5 6 7 7 8 9 total
2 2 1 1 1 0 1 1 2 3 14
1.5 1.5 0.5 0.5 0.5 0.5 1.5 1.5 2.5 3.5 14
1 1 0 0 0 1 2 2 3 4 14

2. Find mean deviation about mean, median and mode for the following
distributions.(exercise)

Class Frequency

Page 29 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3

Remark: Mean deviation is always minimum about the median.

Coefficient of Mean Deviation (C.M.D)

Example: calculate the C.M.D about the mean, median and mode for the
data in example 1 above.

Solutions:

Exercise: Identify the merits and demerits of Mean Deviation

Page 30 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

The Variance
Population Variance
If we divide the variation by the number of values in the population, we
get something called the population variance. This variance is the "average
squared deviation from the mean".

For the case of frequency distribution it is expressed as:

Sample Variance

One would expect the sample variance to simply be the population


variance with the population mean replaced by the sample mean. However,
one of the major uses of statistics is to estimate the corresponding parameter.
This formula has the problem that the estimated value isn't the same as the
parameter. To counteract this, the sum of the squares of the deviations is
divided by one less than the sample size.

For the case of frequency distribution it is expressed as:

We usually use the following short cut formula.

Page 31 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

Standard Deviation

There is a problem with variances. Recall that the deviations were squared.
That means that the units were also squared. To
Class Frequency get the units back the same as the original data
40-44 7 values, the square root must be taken.
45-49 10
50-54 22
55-59 15
The following steps are used to calculate the
60-64 12 sample variance:
65-69 6
70-74 3 1. Find the arithmetic mean.
2. Find the difference between each observation and the mean.
3. Square these differences.
4. Sum the squared differences.
5. Since the data is a sample, divide the number (from step 4 above) by the
number of observations minus one, i.e., n-1 (where n is equal to the number
of observations in the data set).

Examples: Find the variance and standard deviation of the following sample data
1. 5, 17, 12, 10.
2. The data is given in the form of frequency distribution.

Solutions:
1.

Xi 5 10 12 17 Total
2
(Xi- 36 1 1 36 74

2.

Page 32 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

Xi(C.M) 42 47 52 57 62 67 72 Total
fi(Xi- 2 1183 640 198 60 588 864 867 4400

Special properties of Standard deviations

1.
2. For normal (symmetric distribution the following holds.
 Approximately 68.27% of the data values fall within one standard
deviation of the mean. i.e. with in
 Approximately 95.45% of the data values fall within two standard
deviations of the mean. i.e. with in
 Approximately 99.73% of the data values fall within three standard
deviations of the mean. i.e. with in
3. Chebyshev's Theorem
For any data set ,no matter what the pattern of variation, the proportion
of the values that fall with in k standard deviations of the mean or
will be at least , where k is a number greater than 1. i.e.
the proportion of items falling beyond k standard deviations of the mean is
at most
Example: Suppose a distribution has mean 50 and standard deviation
[Link] percent of the numbers are:
a) Between 38 and 62
b) Between 32 and 68
c) Less than 38 or more than 62.
d) Less than 32 or more than 68.

Page 33 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

Solutions:

a) 38 and 62 are at equal distance from the mean,50 and this distance is 12

A
By applying the above theorem at least of the numbers lie
between 38 and 62.

b) Similarly done.
c) It is just the complement of a) i.e. at most of the numbers
lie less than 32 or more than 62.
d) Similarly done.

Example 2:
The average score of a special test of knowledge of wood refinishing has
a mean of 53 and standard deviation of 6. Find the range of values in which
at least 75% the scores will lie. (Exercise)
4. If the standard deviation of , then the standard
deviation of
a)
b)
c)
Exercise: Verify each of the above relationship, considering k and a as
constants.

Examples:
1. The mean and standard deviation of n Tetracycline Capsules
are known to be 12 gm and 3 gm respectively. New set of
capsules of another drug are obtained by the linear transformation Yi =
2Xi – 0.5 ( i = 1, 2, …, n ) then what will be the standard deviation of the
new set of capsules
2. The mean and the standard deviation of a set of numbers are respectively
500 and 10.

Page 34 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

a. If 10 is added to each of the numbers in the set, then


what will be the variance and standard deviation of
the new set?
b. If each of the numbers in the set are multiplied by -5,
then what will be the variance and standard deviation
of the new set?

Solutions:
1. Using c) above the new standard deviation =
2. a. They will remain the same.
b. New standard deviation=

Coefficient of Variation (C.V)

 Is defined as the ratio of standard deviation to the mean usually expressed


as percents.

 The distribution having less C.V is said to be less variable or more


consistent.
Examples:
1. An analysis of the monthly wages paid (in Birr) to workers in two firms
A and B belonging to the same industry gives the following results

Value Firm A Firm B


Mean wage 52.5 47.5
Median wage 50.5 45.5
Variance 100 121

In which firm A or B is there greater variability in individual wages?

Solutions:
Calculate coefficient of variation for both firms.

Since [Link] < [Link], in firm B there is greater variability in individual wages.

Page 35 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

2. A meteorologist interested in the consistency of temperatures in three


cities during a given week collected the following data. The temperatures
for the five days of the week in the three cities were
City 1 25 24 23 26 17
City2 22 21 24 22 20
City3 32 27 35 24 28
Which city have the most consistent temperature, based on these data?
(Exercise)

Standard Scores (Z-scores)

 If X is a measurement from a distribution with mean and


standard deviation S, then its value in standard units is

 Z gives the deviations from the mean in units of standard deviation


 Z gives the number of standard deviation a particular observation
lie above or below the mean.
 It is used to compare two observations coming from different
groups.
Examples:
1. Two sections were given introduction to statistics examinations. The
following information was given.

Value Section 1 Section 2


Mean 78 90
[Link] 6 5

Student A from section 1 scored 90 and student B from section 2 scored


[Link] speaking who performed better?

Solutions:
Calculate the standard score of both students.

Page 36 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

 Student A performed better relative to his section because the score of


student A is two standard deviations above the mean score of his section
while, the score of student B is only one standard deviation above the mean
score of his section.
2. Two groups of people were trained to perform a certain task and tested to
find out which group is faster to learn the task. For the two groups the
following information was given:

Value Group one Group two


Mean 10.4 min 11.9 min
[Link]. 1.2 min 1.3 min
Relatively speaking:
a) Which group is more consistent in its performance
b) Suppose a person A from group one take 9.2 minutes while person B
from Group two take 9.3 minutes, who was faster in performing the task?
Why?

Solutions:
a) Use coefficient of variation.

Since C.V2 < C.V1, group 2 is more consistent.


b) Calculate the standard score of A and B

Child B is faster because the time taken by child B is two standard


deviation shorter than the average time taken by group 2 while, the time

Page 37 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

taken by child A is only one standard deviation shorter than the average time
taken by group 1.

Skewness and Kurtosis

Skewness
- Skewness is the degree of asymmetry or departure from symmetry of a
distribution.
- A skewed frequency distribution is one that is not symmetrical.
- Skewness is concerned with the shape of the curve not size.
- If the frequency curve (smoothed frequency polygon) of a distribution has
a longer tail to the right of the central maximum than to the left, the
distribution is said to be skewed to the right or said to have positive
skewness. If it has a longer tail to the left of the central maximum than to
the right, it is said to be skewed to the left or said to have negative
skewness.
- For moderately skewed distribution, the following relation holds among
the three commonly used measures of central tendency.

Measures of Skewness
-Denoted by
-There are various measures of skewness.
1. The Pearsonian coefficient of skewness

2. The Bowley’s coefficient of skewness ( coefficient of skewness


based on quartiles)

3. The moment coefficient of skewness

Note:

Page 38 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

The shape of the curve is determined by the value of .




Remark:
o In a positively skewed distribution, smaller observations are more
frequent than larger observations. i.e. the majority of the observations
have a value below an average.
o In a negatively skewed distribution, smaller observations are less
frequent than larger observations. i.e. the majority of the observations
have a value above an average.

Examples:
1. Suppose the mean, the mode, and the standard deviation of a certain
distribution are 32, 30.5 and 10 respectively. What is the shape of the
curve representing the distribution?
Solutions:
Use the Pearsonian coefficient of skewness

2. In a frequency distribution, the coefficient of skewness based on the


quartiles is given to be 0.5. If the sum of the upper and lower quartile is
28 and the median is 11, find the values of the upper and lower quartiles.

Solutions:

Given: Required:

Page 39 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

3. Some characteristics of annually family income distribution (in Birr) in


two regions is as follows:
Region Mean Median Standard Deviation
A 6250 5100 960
B 6980 5500 940
a) Calculate coefficient of skewness for each region
b) For which region is, the income distribution more skewed. Give
your interpretation for this Region
c) For which region is the income more consistent?

Solutions: (exercise)
4. For a moderately skewed frequency distribution, the mean is 10 and the
median is 8.5. If the coefficient of variation is 20%, find the Pearsonian
coefficient of skewness and the probable mode of the distribution.
(exercise)
5. The sum of fifteen observations, whose mode is 8, was found to be 150
with coefficient of variation of 20%
(a) Calculate the pearsonian coefficient of skewness and give
appropriate conclusion.
(b) Are smaller values more or less frequent than bigger values for this
distribution?
(c) If a constant k was added on each observation, what will be the
new pearsonian coefficient of skewness? Show your steps. What
do you conclude from this?
Solutions: (exercise)

Kurtosis

Kurtosis is the degree of peakdness of a distribution, usally taken relative


to a normal distribution. A distribution having relatively high peak is
called leptokurtic. If a curve representing a distribution is flat topped, it is
called platykurtic. The normal distribution which is not very high peaked
or flat topped is called mesokurtic.
Measures of kurtosis
The moment coefficient of kurtosis:
 Denoted by and given by

Page 40 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

The peakdness depends on the value of .





Examples:
1. If the first four central moments of a distribution are:

a) Compute a measure of skewness


b) Compute a measure of kurtosis and give your
interpretation.

Solutions:

a)

b)

Page 41 of 42
Lecture notes on Introduction To Statistics Chapter 3: DESCRIPTIVE
STSTISTICS

Page 42 of 42

You might also like