0% found this document useful (0 votes)
51 views72 pages

Module 1

The document covers the basics of probability and statistics, focusing on data collection, descriptive and inferential statistics, and measures of central tendency including mean, median, and mode. It provides definitions, examples, and calculations for various statistical measures, as well as discussions on geometric and harmonic means. The document serves as an introductory guide for understanding statistical concepts and their applications in decision-making.

Uploaded by

kishore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views72 pages

Module 1

The document covers the basics of probability and statistics, focusing on data collection, descriptive and inferential statistics, and measures of central tendency including mean, median, and mode. It provides definitions, examples, and calculations for various statistical measures, as well as discussions on geometric and harmonic means. The document serves as an introductory guide for understanding statistical concepts and their applications in decision-making.

Uploaded by

kishore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

BMAT202L Probability and Statistics

Prof. S. Vaishnavi
Module -I

Statistics and data analysis;


Measures of central tendency
Measure of Dispersion
Moments-Skewness-Kurtosis
(Concepts only).
Introduction

• Data - collection of any number of related observations


• Statistics deals with collection, presentation, analysis, and
interpretation of the numerical data
• Statistics answers questions using data or information about a
situation
• Statistic is a property of data (E.g Average)
• Statistics is the art and science of extracting answers from data
• Help decision making in an uncertain environment.
• We collect and analyze data to make decisions
Introduction

• There are two major types of statistics: (a) Descriptive Statistics, (b)
Inferential Statistics
• Descriptive Statistics - consist of methods of organizing and
summarizing information. (includes construction of graphs, charts,
tables and calculation of mean, median, mode, measures of variation.
• Inferential Statistics - consist of methods of drawing conclusions
based on the information. (includes estimation, hypothesis testing)
Example:
Example
Let the blood types of 40 persons are as follows:
O, O, A, B, A, O, A, A, A, O, B, O, B, O, O, A, O, O, A, A, A, A, AB, A, B, A, A,
O, O, A, O, O, A, A, A, O, A, O, O, AB.
By using descriptive statistics, we have
Statistics

Population Population is a complete set of all


items that interest an investigator.

Example: The books in our library,


The students of VIT.

Sample is a subset of a population that are


actually collected in the course of an
investigation.

Example: The collection of Statistics books in our library, The students of our class.
Measures of Central Tendancy

Data are classified as


• Individual observations or raw data
• Discrete data
• Continuous data

The Central measures are


• Mean
• Median
• Mode
• Geometric mean
• Harmonic mean
Mean (Individual observations or raw data)

Example Given the monthly income of 10 employees in an office,


1780,1760,1690,1750,1840,1920,1100,1810,1050,1950.

Find the mean of their monthly income (or Average).

1780 + 1760 + 1690 + 1750 + 1840 + 1920 + 1100 + 1810 + 1050 + 1950
𝑋ത = = 1665
10
Mean (Short-cut method)

Example Given the per day income of 7 employees in an office,


250,290,300,270,450,350,420.
Mean 332.8
Mean (Discrete data)

Where 𝑁 = ∑𝑓

Arithmetic mean=41
Mean (Continuous Data)
𝚺 𝒇𝑿 𝟒∗𝟖 + 𝟏𝟐∗𝟕 + 𝟐𝟎∗𝟏𝟔 + 𝟐𝟖∗𝟐𝟒 + 𝟑𝟔∗𝟏𝟓 + 𝟒𝟒∗𝟕
ഥ=
𝑿 = = 𝟐𝟓. 𝟒𝟎
𝑵 𝟕𝟕

𝑋
𝑋
Shortcut method

𝚺 𝒇𝒅

𝑿=A+ ×ℎ
𝑵
−𝟐𝟓
ഥ = 28
𝑿 + ×8
𝟕𝟕
= 25.404
1. From the following data, compute arithmetic mean

AM=33

2. Compute mean for the following data:


Median
Arrange the data in ascending or descending order.
1. Compute median for the following data:
Median= 1150
1100, 1150, 1080, 1120, 1200, 1160, 1400
Pbm 1: Find the median of the following frequency distribution

Daily wages Number of workers (f) Cumulative frequency This is a discrete frequency
5 7 7 distribution. Median value
10 12 19 is given by the A.M. of the
15 37 56 two middle observations
20 25 81
25 22 103 Since 114 is even, Median
𝑁
30 11 114 is arithmetic mean of
2
Total N = 114 𝑁
and +1 observation.
2

• The two middle items of 114 items are 57th and 58th observations.
• Both 57 and 58 wages come under 20.
20+20
• Thus, Median = = 20
2
Pbm 3: Find the median wage of the following distribution
Wages (in Rs) 2000-3000 3000-4000 4000-5000 5000-6000 6000-7000
No of 3 5 20 10 5
workers

Daily No. of Cumulative


wages workers Frequency N/2 = 43/2 = 21.5
(f)
2000-3000 3 3 L = 4000 ; c.f = 8 ; f = 20 ;
3000-4000 5 8
m = 21.5 ;
4000-5000 20 28
5000-6000 10 38 =4000+((21.5-8)/20)*1000
6000-7000 5 43 Median = 4675 rupees
Total 43
2. Compute median for the following data:

Median = 1500

3. Calculate the median for the following data:

Median = 39.6
Example 1: Find the mode of the following marks obtained
by 25 students in a mathematics test out of 50.

34, 46, 45, 39, 43, 22, 27, 37, 46, 35, 34, 39, 40, 30, 30,
41, 37, 46, 39, 29, 34, 39, 35, 43, 30
Solution: The ascending order of the data:

22, 27, 29, 30, 30, 30, 34, 34, 34, 35, 35, 37, 37, 39, 39, 39,
39, 40, 41, 43, 43, 45, 46, 46, 46

The most frequently occurred value is 39.


and Hence, the mode of given marks is 39.
Pbm 1: Find the mode of the following frequency distribution

Daily wages Number of workers (f)

5 7
10 12
15 37
Mode – Highest frequency
20 25
25 22 Mode - 15
30 11
Total 114
Pbm 2: Find the mode wage of the following distribution
Wages (in Rs) 2000-3000 3000-4000 4000-5000 5000-6000 6000-7000
No of 3 5 20 10 5
workers

Daily wages N of workers (f) 𝑀𝑜𝑑𝑒 = 𝑙 +


𝑓1 −𝑓0
×ℎ
2𝑓1 −𝑓0 −𝑓2
2000-3000 3
3000-4000 5 𝑙 = 4000, 𝑓1 = 20, 𝑓0 = 5, 𝑓2 = 10,
4000-5000 20 (Modal Class) ℎ = 1000,
5000-6000 10
6000-7000 5 Mode = 4600
Total 43
Calculate the mode of the following frequency distribution.
• 2. Find the median/mode of the following distribution:
Class limits 46-50 51-55 56-60 61-65 66-70 71-75 76-80 81-85 86-90 91-95
Frequency 2 3 5 7 9 11 7 2 3 1

True class limit Frequency


45.5-50.5 2 True class limit Frequency
50.5-55.5 3 80.5-85.5 2
55.5-60.5 5 85.5-90.5 3
60.5-65.5 7 90.5-95.5 1
65.5-70.5 9
70.5-75.5 11
Mode = 72.17, Median = 69.9
75.5-80.5 7

Note: Empirical relation between mean, median and mode: 𝑀𝑒𝑎𝑛 −𝑀𝑜𝑑𝑒=3(𝑀𝑒𝑎𝑛 −𝑀𝑒𝑑𝑖𝑎𝑛)
Geometric Mean
❖ nth root of the product of n observations of a distribution

• Geometric series is 𝑎, 𝑎𝑟, 𝑎𝑟 2 , … , 𝑎𝑟 𝑛 , …


1. The Geometric mean of the observations 𝑥1 , 𝑥2 , … 𝑥𝑛 is 𝐺𝑀 = 𝑥1 . 𝑥2 . … . 𝑥𝑛 1/𝑛

log 𝑥1 +log 𝑥2 +⋯+log 𝑥𝑛


2. If n is large log 𝐺𝑀 =
𝑛

• 3. If 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 occur with frequency 𝑓1 , 𝑓2 , … , 𝑓𝑛 respectively then


1
𝑓 𝑓 𝑓𝑛 𝑁
• 𝐺𝑀 = 𝑥11 . 𝑥22 … 𝑥𝑛
• Where 𝑁 = 𝑓1 + 𝑓2 + ⋯ + 𝑓𝑛

𝑓1 log 𝑥1 +𝑓2 log 𝑥2 +𝑓3 log 𝑥3 +⋯+𝑓𝑛 log 𝑥𝑛


• 4. If n is large log 𝐺𝑀 =
𝑁
(∑ 𝑓 log 𝑚)
• 5. Similarly for grouped data log 𝐺𝑀 =
𝑁
• where 𝑚 is mid-value of a particular class
• 6. If 𝐺1 is GM of a group of 𝑛1 observations and 𝐺2 is the GM of another group of 𝑛2 observations
then the GM of the combined group is
1
𝑛 𝑛
• 𝐺= 𝐺1 1 . 𝐺2 2 𝑛1+𝑛2

1
𝑛 𝑛 𝑛
• This can be extended to 𝑟 groups as 𝐺 = 𝐺1 1 . 𝐺2 2 … 𝐺𝑟 𝑟 𝑛1+𝑛2 +⋯+𝑛𝑟
0𝑅 𝑛1 𝑙𝑜𝑔𝐺1 +⋯+𝑛𝑛 𝐺𝑛
log𝐺 =
𝑛1 +⋯+𝑛𝑛

• Note : GM is more suitable for average rate of changes, Compound interest formulae, discounting,
capitalization.
Harmonic Mean
1 1 1 1
1. W.K.T Harmonic Series is , , ,…, .
𝑎 𝑎+𝑑 𝑎+2𝑑 𝑎+ 𝑛−1 𝑑
2. Used most commonly in averaging ratios
3. The harmonic mean of a set of 𝑛 observations 𝑥1 , 𝑥2 , … , 𝑥𝑛 is defined by,

𝑛
𝐻𝑀 =
1 1 1
+ + ⋯+
𝑥1 𝑥2 𝑥𝑛
• 4. In a frequency distribution,
𝑁
• 𝐻𝑀 = 𝑓1 𝑓2 𝑓𝑛 ,
+
𝑥1 𝑥2
+⋯+ 𝑥𝑛

• where 𝑁 = 𝑓1 + 𝑓2 + ⋯ + 𝑓𝑛
Problem 1: Calculate the A.M., G.M., and H.M., of the following
quantities: 3, 6, 24, 48.
Solution:
3+6+24+48
𝐴. 𝑀. = = 20.25
4
1/4
𝐺. 𝑀. = 3 × 6 × 24 × 48 = 34 × 44 1/4
= 12
4 192
𝐻. 𝑀 = 1 1 1 1 = = 7.11
+ + + 27
3 6 24 48
1. Find the geometric mean for the following distribution:
𝑀𝑎𝑟𝑘𝑠 0 − 10 10 − 20 20 − 30 30 − 40 40 − 50
No. of 5 7 15 25 8
stds
G=25.64

2. The following table gives the weight of 31 persons in a sample


enquiry. Calculate the mean weight using (1) Geometric mean and (2)
Harmonic Mean

Weight 130 135 140 145 146 148 149 150 157
persons 3 4 6 6 3 5 2 1 1

G=142.5 HM=142.36
2. Three groups of observations contain 8, 7, and 5 observations. Their geometric
means are 8.52,10.12, and 7.75 respectively. Find the geometric mean of the 20
observations in the single group formed by combining the three groups.

G=8.837
Some Properties of AM
❑ The sum of deviations from mean is equal to Zero ∑ 𝑋 − 𝑋ത = 0 𝑜𝑟 ∑ 𝑓 𝑋 − 𝑋ത = 0.

❑ The sum of squared deviations from the mean is smaller than the sum of squared
deviation from any arbitrary value or provisional mean i.e.,

෍ 𝑋 − 𝑋ത 2 < ෍ 𝑋−𝐴 2

❑ If 𝑛1 and 𝑛2 are the sizes and 𝑋ത1 and 𝑋ത2 are the respective means of two groups,
then the combined group of the size 𝑛1 + 𝑛2 is given by
𝑛1 𝑋ത1 + 𝑛2 𝑋ത2
𝑋ҧ=
𝑛1 + 𝑛2
Problem
In a factory employing 3,000 persons, 5 percent earn less than Rs. 3 per hour, 580
earn from Rs. 3.01 to Rs. 4.50 per hour, 30 percent earn from Rs.4.51 to Rs. 6.00per
hour, 500 earn from Rs. 6.01 to Rs. 7.50 per hour, 20 percent earn from Rs. 7.51 to Rs.
9.00 per hour, and the rest earn Rs. 9.01 or more per hour. What is the median wage?
1. Find the mode of the following frequency distribution Ref. Slide 25

Solution

By the method of grouping


Mode is 6
Probs. Find the mode for the following distribution: (grouping approach)

C_I 0-4 4--8 8-10 10-14 14-16 16-22 22-24 24-30 30-32 32-40 40-44 44-48
Freq. 2 7 4 8 5 6 3 2 5 14 8 3

By Grouping approach, we can get

C_I 0-8 8-16 16-24 24-32 32-40 40-48


Freq. 9 17 9 7 14 11

Note: we can calculate mean and median for continuous distribution with unequal interval
Measures of dispersion

• Range
• Quartile Deviation
• Mean Deviation
• Standard Deviation
Range is the difference between max observation and min observation
Quartile Deviation

Quartiles: “when the observation are arranged in increasing order then the values,
that divide the whole data into four (4) equal parts, are called quartiles” These
values are denoted by 𝑄1 , 𝑄2 and 𝑄3 . It is to be noted that 25%of the data falls
below 𝑄1 , 50% of the data falls below 𝑄2 and 75% of the data falls below 𝑄3
1. Find all the inter-quartile range, quartile deviation, and coefficient of quartile
deviation for the following data:
Class 0—15 15—30 30—45 45—60 60--75 75-90 90-105
Interval
frequen 8 26 30 45 20 17 4
cy
2. Find all the inter-quartile range, quartile deviation, and coefficient of quartile
deviation for the following data:
Income Less 50-70 70-90 90-110 110-130 130-150 Above
in Rs than 50 150
No. of 54 100 140 300 230 125 51
persons

𝑄1 = 83.714, 𝑄3 = 123.565, 𝑄. 𝐷 = 19.925, Co.eff of QD=0.1923


3. Find all the quartiles, quartile deviation, and coefficient of quartile
deviation for the following data:
∑ 𝑋𝑖 −𝑋ത
For Raw data, mean deviation
𝑁

∑ 𝑓𝑖 𝑋𝑖 −𝑋ത
For discrete or continuous data, mean deviation
𝑁
• Calculate the mean deviation about mean for the following data
Size 2 4 6 8 10 12 14 16
Frequency 2 2 4 5 3 2 1 1
• Solution:
𝑥 𝑓 𝑓𝑥 𝑥 − 𝑥ҧ 𝑓 𝑥 − 𝑥ҧ
𝑥ഥ = 160/20 = 8
2 2 4 6 12
4 2 8 4 8
6 4 24 2 8
Mean deviation about
8 5 40 0 0
mean
10 3 30 2 6 𝑓 𝑥 − 𝑥ҧ 56
෍ =
12 2 24 4 8 𝑁 20
14 1 14 6 6
16 1 16 8 8
= 2.8
𝑁 = 20
෍ 𝑓𝑥 = 160 ෍ 𝑓 𝑥 − 𝑥ҧ = 56
∑ |𝑋−𝑀𝑒𝑑𝑖𝑎𝑛| ∑ |𝐷|
MD about median= =
𝑛 𝑛

∑ 𝑓|𝑋−𝑀𝑒𝑑𝑖𝑎𝑛| ∑ 𝑓|𝐷|
MD about median= =
𝑁 𝑁

• The relative measure corresponding to the mean deviation is called the


coefficient of mean deviation and it is obtained as follows

Mean deviation about mean


• Coefficient of MD about mean =
Mean

Mean deviation about median


• Coefficient of MD about median =
Median
• Calculate the Mean deviation about median and its relative measure for the following d
𝑋 15 25 35 45 55 65 75 85
Frequency 12 11 10 15 22 13 18 19
𝑁+1
Median = 𝑆𝑖𝑧𝑒 𝑜𝑓 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
2
𝑋 𝑓 𝑐𝑓 |D| = 𝑓|𝐷| = 𝑠𝑖𝑧𝑒 𝑜𝑓 60.5𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
|𝑋 − 55| = 55
15 12 12 40 480 ∑ 𝑓 𝐷 = 2440
25 11 23 30 330
2440
35 10 33 20 200 MD about Median= = 20.33
120
45 15 48 10 150 Coefficient of MD about median
𝑀𝐷 𝑎𝑏𝑜𝑢𝑡 𝑚𝑒𝑑𝑖𝑎𝑛 20.33
55 22 70 10 220 = = = 0.37
𝑀𝑒𝑑𝑖𝑎𝑛 55
65 13 83 10 130
75 18 101 20 360
85 19 120 30 570

Find out the coefficient of mean deviation about median in the following series
Age in years 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
No. of persons 20 25 32 40 42 35 10 8

𝑋 𝑓 𝑐𝑓
Solution: 0-10 20 20
𝑁 212
10-20 25 45 = = 106.
2 2

20-30 32 77 class interval corresponding to


cumulative frequency 106
30-40 40 117
is (30-40)
40-50 42 159
50-60 35 194 𝑙1 = 30, 𝑙2 = 40, 𝑐 = 77, 𝑓 = 40
60-70 10 202 𝑁
−𝑐
2
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑙1 + × 𝑙2 − 𝑙1
70-80 8 N=212 𝑓

= 37.25
∑ 𝑓|𝐷| 3209.5
• MD about Median= = = 15.14
𝑁 212
𝑀𝐷 𝑎𝑏𝑜𝑢𝑡 𝑚𝑒𝑑𝑖𝑎𝑛 15.14
• Coefficient of MD about median = = = 0.41
𝑀𝑒𝑑𝑖𝑎𝑛 37.25

𝑋 𝑓 𝐷 = |𝑀 − 37.25| 𝑓|𝐷|
0-10 20 32.25 645
10-20 25 22.25 556.25
20-30 32 12.25 392
30-40 40 2.25 90
40-50 42 7.75 325.5
50-60 35 17.75 621.25
60-70 10 27.75 277.5
• 70-80 8 37.75 302
Total 212 3209.5
Standard Deviation
1. Find the SD of the set of numbers 3, 8, 6, 10, 12, 9, 11, 10, 12, 7

Solution:
෍ 𝑥 = 88

∑ 𝑥 88
𝑋ത = = = 8.8
𝑛 10
෍ 𝑥 2 = 32 + 82 + 62 + 102 + 122 + 92 + 112 + 102 + 122 + 72 = 848

2 2
2
∑ 𝑥 ∑𝑥 848
𝜎 = − = − 8.82 = 7.36
𝑛 𝑛 10
𝜎 = 7.36 = 2.71
The weekly salaries of a group of employees are given in the following table. Find the
mean and standard deviation of the salaries.
Salary (in Rs.) 75 80 85 90 95 100
No. of persons 3 7 18 12 6 4

𝑥 𝑓 𝑑 𝑓𝑑 𝑓𝑑 2
𝑥 − 85
∑ 𝑓𝑑 =
𝑋ത = 𝐴 + ×ℎ 5
𝑁 75 3 -2 -6 12
23
= 85 + × 5 = 𝑅𝑠. 87.30 80 7 -1 -7 7
50
∑ 𝑓𝑑 2 ∑ 𝑓𝑑 2 85 18 0 0 0
𝜎= − × 𝑐 = 𝑅𝑠. 6.34
𝑁 𝑁 90 12 1 12 12
95 6 2 12 24
100 4 3 12 36
Total 50 23 91
5. An analysis of the monthly wages gives the following results.
Firm A Firm B
Number of workers 500 600
Average monthly wages Rs. 186 Rs. 175
Variance of the distribution of wages 81 100

(i) Which firm has a larger wage bill?


(ii) In which firm (A or B) is there greater variability in individual wages.

Solution:
𝑁1 = 500, 𝑁2 = 600, 𝑋1 = 186, 𝑋2 = 175, 𝜎12 = 81, 𝜎22 = 100
Total monthly wages in Firm A = 186 × 500 = 𝑅𝑠. 93,000
Total monthly wages in Firm B = 175 × 600 = 𝑅𝑠. 1,05,000
Therefore, Firm B has a larger wage bill
𝜎1
C.V. for A = × 100 = 4.84
𝑋1
𝜎2
C.V. for B = × 100 = 5.174
𝑋2
Since C.V. for B is greater than that of A, firm B has a larger variability
Moments
Calculation of central moments using moments about origin
Find the first, second, third and fourth moments about origin and
moments about mean for 2,3,4,5 and 6
SKEWNESS AND
KURTOSIS
Why Skewness ?

❑ Mean - center of gravity or balance point


❑ Median is that value that divides the distribution into equal areas and
❑ Mode is the value indicating the largest frequency, but they give no
information about the shape of the frequency curve.
❑ Measures of dispersion give some idea of the spread of a variable about its
average.
❑ Both these measures do not study whether a distribution is symmetrical or
not.
❑ Skewness is a measures to study this aspect of a statistical distribution.
❑ Skewness : When the mean, median and mode do not have the same value
in a distribution, then it is known as a skewed distribution. Lack of
symmetry in a distribution – Unsymmetrical distribution
Negatively skewed : Frequency distribution is elongated to the left, that is,
having a longer tail to the left – (mean < median < mode)
Positively skewed : Frequency distribution is elongated to the right, that is,
having a longer tail to the right – (mean > median > mode)
Measure of Skewness :

❑ Skewness = Mean - Mode


𝑀𝑒𝑎𝑛−𝑀𝑜𝑑𝑒 3 𝑀𝑒𝑎𝑛−𝑚𝑒𝑑𝑖𝑎𝑛
• Pearson’s coefficient of skewness = =
𝑆𝐷 𝑆𝐷

• Skewness based on Quartile:

❑ Skewness = 𝑄3 + 𝑄1 − 2𝑄2
𝑄3 +𝑄1 −2𝑄2
Bowley’s coefficient of skewness =
𝑄3 −𝑄1
Measure of Kurtosis
▪ The measures of location, dispersion and skewness alone cannot give a
complete idea of a distribution. All the three distributions are symmetrical
about the mean.
▪ But their frequency curves have different flatness or peakness.
▪ Kurtosis is a measure of flatness or peakness of a distribution.

✓ Mesokurtic - Normal curve (or bell-


shaped curve)

✓ Leptokurtic - Curve which is more


peaked than the normal curve

✓ Platykurtic - Curve which is more


plateopped than the normal curve
Measures of Kurtosis
• Kurtosis is measured by the coefficient

• For the normal distribution 𝛾2 = 0, 𝛽2 = 3


• The normal distribution is taken as the standard distribution to measure kurtosis
• If 𝛾2 = 0, the curve is called mesokurtic
• If 𝛾2 > 0, it is called leptokurtic
• If 𝛾2 < 0, it is called platykurtic.
Otherwise,
• If 𝛽2 = 3, the curve is called mesokurtic
• If 𝛽2 > 3, the curve is called leptokurtic
• If 𝛽2 < 3, the curve is called platykurtic

You might also like