DATA SCIENCE-08
(25/01/2024)
By- Ms. Pallavi Mishra
(Faculty Associate)
Topics to be covered:
• Percentiles and Quartiles
• Five number summary and Box plot (Whisker Plot)
make
inferences
Overview about
population
based on a
Statistics is the science of collecting, organizing, summarizing, analysing sample of
information to draw conclusion or answer questions. data
Types of Statistics
Describing and
summarizing
population or Descriptive Inferential
sample Statistics Statistics
Central Measures of Five number Cross Decision
Dispersion Distribution Histogram
Tendency summary Tabulations Tree
Mean Range Minimum Normal Box plot Correlation Plot Scatter Plot
Median Variance 𝑄1 Uniform
Standard Bubble
𝑄2 (Median) Line chart Bar Chart Pie Chart
Mode Deviation Skewness Chart
Coefficient
𝑄3
Kurtosis
of variation Maximum
Percentiles and Quartiles:
(pre-requisite knowledge for Five number Summary )
First rule before calculating percentile ranking
• When calculating the percentile of a set of data, arrange the values in
ascending order, starting with the lowest value and ending with
the highest.
Concept of Percentile
• A percentile is a number where a certain percentage of scores
fall below the given number.
Questions to practice for percentile
calculation
• Example 1: The scores obtained by 10 students are 38, 47, 49, 58,
60, 65, 70, 79, 80, 92. Using the percentile formula, calculate the
percentile for score 70?
Solution 01:
Percentile Calculation
• Question:02 What is the percentile of value 10 in the given dataset?
Solution_02:
Five Number Summary
Five Number Summary
Quartiles
First Quartile (Q1)
• Median of bottom half : First Quartile (Q1)
Second Quartile (Q2): Also known as Median
Third Quartile (Q3)
Summary of Quartiles
• Quartiles split a set of data into four equal parts the first quartile,
Q1, divides the smallest 25.0% of the values from the other 75.0% that
are larger.
• The second quartile, Q2, is the median 50.0% of the values are
smaller than the median and 50.0% are larger.
• The third quartile, Q3, divides the smallest 75.0% of the values from
the largest 25.0%.
How to detect outliers in a dataset using Five
number summary and box plot (whisker plot)
Lets understand with one simple example:
Calculate five number summary for following dataset and also interpret
and analyze the outcomes.
Solution:
Step:01 :
Calculate median (Second quartile :Q2): Sort the dataset and calculate
median
Step: 02
• Calculate First quartile (Q1 ) using formula
Step: 03
• Calculate Third Quartile (Q3) using formula
Step: 04 Determine Inter-Quartile Range
(IQR)
Step: 05 Determination of RANGE (min and max
value) in order to detect any presence of outliers
Five Number summary values
Step: 06 Construction of Box plot or Whisker
Plot