01 - Intro and Descriptive Statistics
01 - Intro and Descriptive Statistics
3(2-1)
1
Theory
Introduction to statistics, Variables, Type of Measurements, Population
and Sample, Descriptive Statistics and decision making with Statistics,
Graphical representation of Data, Bar charts, Pie charts, Stem and leaf
plot, Box plots, Histograms, Frequency curves, Measures of Central
tendency, Measures of dispersion, Moments of frequency distribution,
examples with real life, Use of elementary statistical packages for
explanatory data analysis. Counting techniques, definition of probability
with classical and relative frequency and subjective approaches, sample
space, events, Laws of probability. Conditional probability and Bays
theorem with application to random variable (discrete and continuous)
Binomial, Poisson, Geometric, Negative binomial distributions;
Exponential, Gamma and Normal distributions.
2
Practical
3
R language
• R is an open-source language
used for statistical computing
or graphics. This programming
language is often used in
statistical analysis and data
mining. It can be used for
analytics to identify patterns
and build practical models.
4
What is Statistics?
Statistics is the art and science of extracting information from data
Statistics
Data Information
Information:
Data: Raw facts and Communicated
figures, especially concerning some
numerical facts, particular facts.
collected together for
information.
5
Why study statistics?
1. Data are everywhere
2. Statistical techniques are used to make many decisions that affect
our lives
3. No matter what your career, you will make professional decisions
that involve data. An understanding of statistical methods will help
you make these decisions efectively
Statistics is the science of collecting, organizing,
summarizing, and analyzing data and making
predictions or decisions about a population based on a
sample, while accounting for uncertainty. 6
Statistics
Population Sample
(have Parameters) (have Statistic)
ഥ , S, r
Statistic: 𝑿
Parameters: µ, σ, ρ
10
Branches of Statistics
Statistics
Descriptive Inferential
Involves in Organization,
Using sample information
Summarization, and Display of ഥ , S, r, p to draw
such as 𝑿
Data into Tables, Graphs and
Inference about Unknown
Summary Numbers such as
ഥ , S, r, p Population Parameters.
𝑿
11
Branches of Statistics
Statistics
Descriptive Inferential
Variable
Qualitative Quantitative
Characteristic which
varies in quality (not Discrete Continuous
numerically) e.g.,
Eye colour, Height
No. of students
Education level, Weight
No. of chairs
Behaviour, Marks
No. of deaths
Quality, Time
No. of births in a hospital
Design, Distance
No. of accidents
Performance Temperature 13
Guess the type of Variable!
ID Sex Age Smoke Vitamin VitaminUse Quetelet Calories Fat Fiber Cholesterol
1 Female 64 No 1 Regular 21.4838 1298.8 57 6.3 170.3
2 Female 76 No 1 Regular 23.8763 1032.5 50.1 15.8 75.8
3 Female 38 No 2 Occasional 20.0108 2372.3 83.6 19.1 257.9
4 Female 40 No 3 No 25.1406 2449.5 97.5 26.5 332.6
5 Female 72 No 1 Regular 20.985 1952.1 82.6 16.2 170.8
6 Female 40 No 3 No 27.5214 1366.9 56 9.6 154.6
7 Female 65 No 2 Occasional 22.0115 2213.9 52 28.7 255.1
8 Female 58 No 1 Regular 28.757 1595.6 63.4 10.9 214.1
9 Female 35 No 3 No 23.0766 1800.5 57.8 20.3 233.6
10 Female 55 No 3 No 34.9699 1263.6 39.6 15.5 171.9
11 Female 66 No 1 Regular 20.9465 1460.8 58 18.2 137.4
12 Female 40 No 2 Occasional 36.4316 1638.2 49.3 14.9 130.7
13 Male 57 No 3 No 31.7304 2072.9 106.7 9.6 420
14 Female 66 No 1 Regular 21.7885 987.5 35.6 10.3 254.9
15 Male 66 No 3 No 27.3192 1574.3 75 7.1 14 361.5
Measurement
15
Measurement Scales
16
Nominal
• The nominal scale is the simplest type of measurement. It categorizes data
into distinct groups or categories without any quantitative value or order.
• But, Is the difference between “Very Good” and “Excellent” the same as
the difference between “Good” and “Very Good?” We can’t say.
Example:
Students’ Grades
Class Positions
Cricket teams standings in ICC ranking
18
Ordinal …continue
• With interval data, we can add and subtract, but cannot multiply or divide.
Example:
Temperature
Shoe size
IQ scores
20
Ratio
• Ratio scales tell us about the order, they tell us the exact
value between units, AND they also have a “true zero” point
Example:
Height, Weight, Speed, Length, Age
Storage Capacity: Measured in bytes, a storage capacity of 0 GB means
no storage. A 500 GB hard drive has twice the capacity of a 250 GB
hard drive.
Memory Usage: Memory measured in MB or GB where 0 means no
memory usage. 8 GB of RAM is twice as much as 4 GB
21
Just look at some of the Graphs …
22
Line chart
with
respect to
time
23
Bar chart
24
Multiple bar
chart
25
Types of bar charts
26
A histogram is a type of chart that represents
the distribution of a numeric (continuous)
variable by grouping the data into bins
(intervals) and displaying the frequency or count
of data points within each bin. Unlike a bar
chart, which typically represents categorical
data, a histogram is used for continuous or
quantitative data. 27
Qualitative data
Example 1: Consider the data about Sex of 10 students
Sex M F M M F M F M M M
Frequency
Frequency
6
Sec A
5 3
4 3 2 Sec B
3
2 1
1
0 0
Male Female Male Female
Sex Sex 29
Simple Bar Chart
• A bar chart is a type of chart which shows the values of different
categories of data as rectangular bars with different lengths.
Example: Draw a Simple Bar Chart to represent the Population of 5
cities of the province Punjab.
Bar diagram showing Population of 5 cities
of Punjab
Cities Population (000)
12,000
10,355
Lahore 10,355 10,000
Population in ‘000’
Rawalpindi 4,765 8,000
POPULATION
Sargodha 1,550 806 744 3000 2478
2,287
1911
1,764
2000
0
Lahore Rawalpindi Faisalabad Sargodha
CITIES 31
Component Bar Chart
Population
Faisalabad 3,675 1911 1,764
6000
Example:
• Following data represents the number of infected plants from a
sample of twenty experimental plots. Your task is to present it in
tabular form.
1 2 4 3 0 1 2 3 1 1 0
2 1 0 2 3 0 0 1 3
33
Discrete Frequency Distribution
34
Graphical Representation of Discrete Data
Bar Chart representing the infected items
7
6
6
5
5
Frequency
4
4 4
3
1
1
0
0 1 2 3 4
No. of infected items
35
Pie Chart
• A pie chart is a type of graph in which a circle is divided into sectors
that each represent a proportion of the whole.
Example: The blood group of 70 students were tested and the following
results were obtained.
17% 11%
A 8
A
B 30 29% 43%
B
O
O 20 AB
AB 12
36
Pie Chart
Blood No. of Relative Percent Angle
Groups Students frequency frequency rf x 360
(f)
A 8 8/70 = 0.11 0.11*100 = 11 39.6
B 30 0.43 43 154.8
Divide the total
O 20 0.29 29 104.4
angle of the Circle
AB 12 0.17 17 61.2 360 into four
segments as
Total 70 1.00 100 360 calculated
37
Simple Bar Chart
Blood Groups
Blood No. of
35
Groups Students (f) 30
30
A 8 25
20
B 30 20
15 12
O 20
10 8
AB 12
5
0
A B O AB
38
Simple Bar Chart
Turnover in Rs.
2004 44,000 40,000
2005 30,000
49,000
20,000
2006 60,000
10,000
2007 64,000
0
2002 2003 2004 2005 2006 2007
Years 39
Obtaining Data
Published source
book, journal, newspaper, Published reports
Designed experiment
researcher exerts strict control over units
Survey
a group of people are surveyed and their responses are recorded
Administrative Records
40
we will be dealing with various techniques for summarizing and describing
qualitative data.
Qualitative
Univariate Bivariate
Frequency Frequency
Table Table
Percentages
Component Multiple
Pie Chart Bar Chart Bar Chart
Bar Chart
We will begin with the univariate situation, and will proceed to the
bivariate situation.
41
Frequency Distribution &
Histogram
42
Following data represents Classes Frequency (f) c.f. r.f. % freq
the plant height (cm) of a
sample of 30 plants. 86–90 6 6 0.200 20.0
87 91 89 91–95 4 10 0.133 13.3
88 89 91 96–100 10 20 0.333 33.3
87 92 90 101–105 6 26 0.200 20.0
98 95 97
96 100 101 106–110 3 29 0.100 10.0
96 98 99 111–115 1 30 0.033 3.3
98 100 102 Total 30 1.000 100.0
99 101 105
103 107 105 Histogram
106 107 112 12
10
10
8
Frequency
Frequency 6
6 6
distribution 4
4
3
& 2 1
Histogram 0
85.5–90.5 90.5–95.5 95.5–100.5 100.5–105.5 105.5–110.5 110.5–115.5
Class Boundries 43
Frequency Distribution
44
Some definitions
Class Limits
• The class limits are defined as the number or the values of the variables which are
used to separate two classes. Sometimes classes are taken as 20--25, 25--30 etc In
such a case, these class limits means " 20 but less than 25", "25 but less than 30" etc
Class marks or midpoints
• The class mark or the midpoint is that value which divides a class into two equal parts.
It is obtained by dividing the sum of lower and upper class limits or class boundaries
of a class by 2.
Class interval
• The difference between either two successive lower class limits or two successive
upper class limits OR
• The difference between two successive midpoints.
• denoted by "h". 45
Example
• The following data represents the height of 30 wheat plants taken from the
experimental area. Construct a frequency distribution and appropriate
graphs to explain the distribution of data:
87 91 89 88 89 91 87 92 90 98 95
97 96 100 101 96 98 99 98 100 102 99
101 105 103 107 105 106 107 112
46
Construction of a frequency distribution
48
Class Boundaries
• Class Boundaries
• Subtract any Upper Class Limit from its Subsequent Lower Class limit and
divide the difference with 2, you will get the Continuity correction factor
• Subtract this factor from all Lower Class Limits and add it to all Upper Class
limits.
49
Histogram
Histogram of Height of 30 Students
12
10
10
8
Frequency
6 6
6
4
4 3
2 1
0
85.5–90.5 90.5–95.5 95.5–100.5 100.5–105.5 105.5–110.5 110.5–115.5
Class Boundries
50
Frequency Polygon
10
Frequency
6
0
88 93 98 103 108 113
51
Mid Points
Cumulative Frequency Polygon / Ogive
Cumulative Frequency
25
20
15
10
5
0
90.5 95.5 100.5 105.5 110.5 115.5
Upper Class Boundaries 52
Stem & Leaf Display
53
Example
Use the data below to make a stem- Stem Leaf
and-leaf plot by taking 10 as a unit.
7 0589
85 115 126 92 104 8 4558
85 116 100 121 123 9 022379
79 90 110 129 108
10 0478
107 78 131 114 92
131 88 97 99 116
11 04566
93 84 75 70 132 12 1369
13 112
7 0589
These values are 70, 75, 78 and 79 54
Example
56
Example of Stem & Leaf display – Class Width = 5
Example
Width of class = 5
Stem Leaf
8* -
8. 79897
9* 1120
9. 857668989
10* 010213
10. 57567
11* 2
11. -
* Indicates 0-4 . Indicates 5-9 [(*, .) are called place holder] 57
Back to Back Stem and Leaf display
• Two data sets can be compared using Back-to-Back stem and leaf
display. In this case a single stem is constructed and the values of one
data set are assigned on the left and the value of second data set are
assigned on the right of the stem.
Data 1) 32, 45, 38, 41, 49, 36, 52, 56, 51, 62, 63, 59, 68
Data 2) 23, 58, 26, 57, 55, 65, 29, 36, 59, 69, 60
58
Data 1 Data 2
32 = [Link] = 23
68 = [Link] = 69
Stem unit = 10 Width of class = 10
Data 1 Data 2
(#13) (# 11)
Leaf Leaf
2 369
682 3 6
915 4
9162 5 8759
832 6 590
59
Measures of
Central Tendency
60
Measures of Central Tendency
61
Arithmetic mean / mean
• Most common measure of the center
• Obtained by dividing the SUM of all the observations by the total
number of observations
N
X i
X1 + X 2 + + XN
Population Mean = i =1
=
N N
x i
x1 + x2 + + xn
Sample Mean x= i =1
=
n n
62
Properties of Arithmetic mean
1. Mean of the constant is equal to that constant
2. The sum of the deviations of the observations from their mean is
equal to zero. i.e., 𝒏
ഥ =𝟎
𝑿𝒊 − 𝑿
𝒊=𝟏
𝒊=𝟏 𝒊=𝟏
σ 𝑿 𝟓𝟒𝟖
ഥ=
𝑿 = = 𝟔𝟖. 𝟓
𝒏 𝟖
64
Properties of Arithmetic mean
4. If X1, X2 , …………, Xn have mean 𝑋ത then the mean after multiplying each
observation by a constant ‘a’ is the mean multiplied by that constant.
σ 𝒏
ഥ ∗ 𝒊=𝟏 𝒂𝑿𝒊 ഥ
𝑿 = =𝒂 ×𝑿
𝒏
5. If a constant ‘a’ is added to each of the observation X1, X2 , …………, Xn having
mean 𝑋ത then mean increases by that constant.
σ 𝒏
𝒊=𝟏 (𝒂+𝑿𝒊 )
ഥ =
𝑿∗ ഥ+𝒂
=𝑿
𝒏
65
Weighted arithmetic mean
66
Weighted arithmetic mean
Computations:
For ‘n’ observations, 𝑥1 , 𝑥2 , … , 𝑥𝑛 of a data set with corresponding
weights 𝑤1 , 𝑤2 , … , 𝑤𝑛 then weighted arithmetic mean is defined as:
σ𝑛
𝑖=1 𝑊𝑖 𝑋𝑖 σ 𝑊𝑋
𝑋ത𝑤 = σ𝑛
= σ𝑊
𝑖=1 𝑊𝑖
67
Weighted A.M - Calculations
Subjects Marks (Xi) Weights Weights
(Wi) WiXi
Statistics 80 20 1600
Mathematics 75 10 750
Chemistry 50 40 2000
English 60 30 1800
Total 265 100 6150
σ 𝑋 265 σ 𝑊𝑋 6150
𝑋ത𝑤 = = = 66.25 𝑋ത𝑤 = = = 61.5
𝑛 4 σ𝑊 100
68
Combined Mean
71
Combined Mean
Example: The mean heights and the number of students in three sections of a
statistics class are given below. Calculate overall (or combined) mean height of the
students?
Sections Number of Mean height
students (inches)
A 40 62
B 37 58
C 43 61
Solution:
Note that we have, n1=40, n2=37, n3=43 and ഥ
𝑥1=62, ഥ
𝑥2 =58 and ഥ
𝑥3 =61. So the
Combined mean is :
𝑛1 𝑥ҧ1 + 𝑛2 𝑥ҧ2 + 𝑛3 𝑥ҧ3
𝑥ҧ𝑐 = = 60.4
𝑛1 + 𝑛2 + 𝑛3
72
Tasks
1. The mean weight of 10 students is 50 Kg when two students left the class
the mean weight becomes 48 Kg. Find the mean weight of students who
left the class? Answer = 58
2. There are total 30 students in a class. On thursday,18 students took a
math test and their mean marks was 80. The remaining 12 students took
a math test on Friday and their mean marks was 90. Find the mean marks
of the entire class? Answer = 84
3. Ali took five Math tests during the semester and the mean of his test
score was 85. If his mean after the first three was 83, What was the mean
of his 4th and 5th tests. Answer = 88
73
Geometric mean & harmonic mean
• The Geometric Mean (G.M) of a set of n positive values 𝑥1 , 𝑥2 , … , 𝑥𝑛 is the positive nth root of the
product of the values.
𝒏 𝟏ൗ
𝒏
𝑮. 𝑴 = ෑ 𝑿𝒊
𝒊=𝟏
σ𝒏𝒊=𝟏 𝑳𝒐𝒈 𝑿𝒊
𝑮. 𝑴 = 𝑨𝒏𝒕𝒊𝒍𝒐𝒈
𝒏
• The Harmonic Mean (H) of a set of n values 𝑥1 , 𝑥2 , … , 𝑥𝑛 is defined as the reciprocal of
the arithmetic mean of the reciprocals of the values.
𝒏
𝑯. 𝑴 =
𝟏
σ𝒏𝒊=𝟏
𝑿𝒊 74
Example
Find Geometric Mean and Harmonic Mean from the following data?
5 0.699 0.200
6 0.778 0.167
0.778 0.167 𝒏
6 𝑯. 𝑴 = = 𝟓. 𝟖𝟕
𝟏
7 0.845 0.143 σ𝒏𝒊=𝟏
𝑿𝒊
10 1.000 0.100
12 1.079 0.083
49 5.6567 1.1929
75
Mode
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
76
Median
𝑛+1
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑆𝑖𝑧𝑒 𝑜𝑓 𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
2
77
Quartiles
▪ Divide an array into four equal parts, each part having
25% of the distribution of the data values, denoted by Q j
▪ 25th of the observations are below the 1st quartile.
▪ 1st quartile is the 25th percentile; the 2nd quartile is the
50th percentile, also the median and the 3rd quartile is
the 75th percentile.
𝒏+𝟏
𝑸𝒋 = 𝑺𝒊𝒛𝒆 𝒐𝒇 𝒋 𝒕𝒉 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏
𝟒
Where j = 1, 2, 3
78
Deciles
▪ Divide an array into ten equal parts, each part having ten
percent of the distribution of the data values, denoted by Dj
▪ 10 percent of the total observations fall below D1 and the
rest 90% are above it.
▪ 5th Decile is equal to the Q2 and Median
𝒏+𝟏
𝑫𝒋 = 𝑺𝒊𝒛𝒆 𝒐𝒇 𝒋 𝒕𝒉 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏
𝟏𝟎
Where j = 1, 2, 3, …,9
79
Percentiles
▪ Divide an array (raw data arranged in increasing or
decreasing order of magnitude) into 100 equal parts.
▪ The jth percentile, denoted as Pj, is the data value in the data
set that separates the bottom j% of the data from the top
(100-j)%.
𝒏+𝟏
𝑷𝒋 = 𝑺𝒊𝒛𝒆 𝒐𝒇 𝒋 𝒕𝒉 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏
𝟏𝟎𝟎
Where j = 1, 2, 3, …,99
80
Example
▪ Suppose ALI was told that relative to the other scores on a NTS
test, his score was the 95th percentile i.e., his percentile score
is 95. How do we interpret it?
➔ This means that 95% of those who took the test had scores
less than or equal to Ali’s score, while 5% had scores higher than
Ali’s.
81
Exercise
Sr. No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
18 20 28 29 30 36 37 39 42 53 54 55 58 61 68 70 74 82 93 94
82
Median & Quartiles
𝒏+𝟏
• 𝑴𝒆𝒅𝒊𝒂𝒏 = 𝑺𝒊𝒛𝒆 𝒐𝒇 𝒕𝒉 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏
𝟐
= Size of 10.5th Observation
= 10th Observation + 0.5 (11th Observation – 10th Observation)
= 53 + 0.5 (54 – 53)
= 53.5
𝒏+𝟏
• Q3= 𝑺𝒊𝒛𝒆 𝒐𝒇 𝟑 𝒕𝒉 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏
𝟒
= Size of 15.75th observation
= 15th Observation + 0.75 (16th Observation – 15th Observation)
= 68 + 0.75 (70 – 68)
=69.5
83
Example
Minimum = 20
Q1 = 36.25
Median = 54.5
Q3 = 73
Maximum = 94
84
Measures of Variation
85
Measures of Variation/ Dispersion
• In Statistics, Dispersion (also called variability, scatter, or spread) denotes
how stretched or squeezed a distribution is
• Variability is the extant to which data points in a Statistical Distribution or
data set diverge from the average, or mean, value as well as the extent to
which these data points differ from each other.
• Following are the commonly used measures of variability
• Variance
• Standard Deviation
• Range
• Inter Quartile Range
• Semi Inter Quartile Range
• Mean Deviation
86
Variance
𝑛 ത 2
Sample Variance σ𝑖=1 𝑋𝑖 − 𝑋
𝑆2 =
𝑛−1
87
Standard Deviation
Population Standard σ𝑁
𝑖=1 𝑋𝑖 − 𝜇
2
Deviation 𝜎=
𝑁
σ𝑛𝑖=1 𝑋𝑖 − 𝑋ത 2
Sample Standard 𝑆=
Deviation 𝑛−1
88
X ഥ)
(𝑿 − 𝑿 ഥ )𝟐
(𝑿 − 𝑿
Example 1 2 -4 16
4 -2 4
6 0 0
• Consider the following data of height 8 2 4
(cm) of 5 plants. 10 4 16
2, 4, 6, 8, 10 30 0 40
• Find the average, variance and the σ 𝑿 𝟑𝟎
standard deviation of the yield. ഥ=
𝑿 = =𝟔
𝒏 𝟓
σ 𝒏 ഥ 𝟐
𝟐 𝒊=𝟏 𝑿𝒊 − 𝑿 𝟒𝟎
𝑺 = = = 𝟏𝟎
𝒏−𝟏 𝟓−𝟏
𝑺 = 𝟏𝟎 = 𝟑. 𝟏𝟔
89
X (𝑿 − 𝟔𝟖. 𝟓) (𝑿 − 𝟔𝟖. 𝟓)𝟐
Example 2 65 -3.5 12.25
71 2.5 6.25
67 -1.5 2.25
• Consider the following data of yield of 75 6.5 42.25
wheat (in kgs) from 8 experimental 63 -5.5 30.25
plots. 69 0.5 0.25
75 6.5 42.25
65, 71, 67, 75, 63, 69, 75, 63 63 -5.5 30.25
• Find the average, variance and the 548 0 166
standard deviation of the yield.
σ 𝑿 𝟓𝟒𝟖
ഥ=
𝑿 = = 𝟔𝟖. 𝟓
𝒏 𝟖
σ𝒏 ഥ 𝟐
𝒊=𝟏 𝑿 𝒊 − 𝑿 𝟏𝟔𝟔
𝑺𝟐 = = = 𝟐𝟑. 𝟕𝟏
𝒏−𝟏 𝟖−𝟏
𝑺 = 𝟐𝟑. 𝟕𝟏 = 𝟒. 𝟖𝟕
90
The Range & Coefficient of Range
• The Range R is defined as the difference between the largest and the
smallest observations in a dataset. i.e,
𝑅 = 𝑋𝑚𝑎𝑥 − 𝑋𝑚𝑖𝑛
𝑋𝑚𝑎𝑥 − 𝑋𝑚𝑖𝑛
𝐶𝑜𝑒𝑓𝑓. 𝑜𝑓 𝑅𝑎𝑛𝑔𝑒 =
𝑋𝑚𝑎𝑥 + 𝑋𝑚𝑖𝑛
91
Example
92
Semi Inter Quartile Range / Quartile Deviation
• The inter quartile range (IQR) is a measure of dispersion, defined as
𝐼𝑄𝑅 = 𝑄3 − 𝑄1
• The Semi Inter Quartile Range or Quartile Deviation (QD) is defined as
𝑄3 − 𝑄1
𝑄𝐷 =
2
• The Co-efficient of Quartile Deviation (QD) is defined as
𝑄3 − 𝑄1
𝐶𝑜𝑒𝑓𝑓. 𝑜𝑓 𝑄𝐷 =
𝑄3 + 𝑄1
93
The Mean Deviation OR Average Deviation
94
Co-efficient of Variation (CV)
95
Example
Following data represents the prices Following data represents the
in Rs. of a certain commodity life of car battery in hours
8, 13, 18, 23, 30 130, 150, 180, 250, 345
Sol:
Sol:
𝑋ത = 18.4 𝑅𝑠. 𝑌ത = 211 𝐻𝑟𝑠.
𝑆𝑥 = 8.56 𝑅𝑠. 𝑆𝑦 = 87.63 𝐻𝑟𝑠.
𝑪. 𝑽 = 𝟒𝟔. 𝟓
𝑪. 𝑽 = 𝟒𝟏. 𝟓
96
Types of Distribution
97
Measures of Skewness
98
Measures of Kurtosis
• Describes the extent of
peakedness or flatness of
the distribution of the
data.
• Measured by Coefficient
of Kurtosis (K) computed
as,
σ𝑛𝑖=1 𝑋𝑖 − 𝑋ത 4
𝐾= −3
𝑛𝑆 4 99
Interpretation
K=0
mesokurtic
K>0 K<0
leptokurtic platykurtic
100
Example
Consider the following data:- Mean 32
Standard Error 1.73
25, 27, 36, 31, 33, 35, 37
Median 33
Find Mean, Variance, Coefficient of Standard Deviation 4.58
Skewness and Coefficient of Kurtosis and Sample Variance 21
interpret the results. Kurtosis -1.65
Skewness -0.39
Range 12
Minimum 25
Maximum 37
Sum 224
Count 7
101
How to do it…
𝑿 ഥ
𝑿−𝑿 ഥ
𝑿−𝑿 𝟐 ഥ
𝑿−𝑿 𝟑 ഥ
𝑿−𝑿 𝟒
25 -7 49 -343 2401
27 -5 25 -125 625
36 4 16 64 256
31 -1 1 -1 1
33 1 1 1 1
35 3 9 27 81
37 5 25 125 625
224 0 126 -252 3990
102
Five Number Summary
103
Boxplot / Box & Whisker plot
53 74 82 42 39 28 20 81 68 58
54 93 70 30 61 55 36 37 29 94
Construct Boxplot of the data and interpret it.
Minimum = 20
Q1 = 36.25
Median = 54.5
Q3 = 73
Maximum = 94
105
Question
The breaking strength of 20 test pieces of a Mean 90.15
certain alloy is given as:-
Variance S2 = 269.08
95, 97, 96, 73, 78, 95, 89, 68, 82, 79, 69, 67, 83,
94, 87, 93, 103, 108, 117, 130 SD S = 16.4
Calculate the average breaking strength of
the alloy and the standard deviation.
Calculate the percentage of observations
lying within the limits:-
(i) Mean ± S
(ii) Mean ± 2S
(iii) Mean ± 3S
106
Characteristics of Normal Curve: About 68 percent of the observations fall
between plus and minus one SD from the mean; about 95 percent fall
between plus and minus two SD from the mean; and about 99 percent fall
between plus and minus three SD from the mean.
107
Standardized Variable
108
Solution
𝑋𝑖 − 𝑋ത
𝑍𝑖 =
𝑋𝑖 𝑋𝑖 − 𝑋ത 𝑋𝑖 − 𝑋ത 2
𝑆 𝑋ത = 38
25 -13 -0.8905 -0.8905
26 -12 -0.8220 -0.8220 𝑆𝑋 = 14.598
23
25
-15
-13
-1.0275
-0.8905
-1.0275
-0.8905
𝑍ҧ = 0
45 7 0.4795 0.4795 𝑉𝑎𝑟 𝑍 = 1
45 7 0.4795 0.4795
58 20 1.3700 1.3700 𝑆𝑧 = 1
58 20 1.3700 1.3700
50 12 0.8220 0.8220
25 -13 -0.8905 -0.8905
380 0 1918 0.00
109