0% found this document useful (0 votes)
23 views45 pages

Probability and Statistics Lecture 07, 08

This document covers the fundamentals of descriptive statistics, focusing on numerical measures such as central tendency, spread, and shape of data distributions. It explains sample statistics versus population parameters, types of averages including arithmetic, geometric, and harmonic means, and provides examples for calculating means from both ungrouped and grouped data. Additionally, it highlights the characteristics of a good measure of central tendency and includes practical examples to illustrate the concepts.

Uploaded by

ajwaqadir121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views45 pages

Probability and Statistics Lecture 07, 08

This document covers the fundamentals of descriptive statistics, focusing on numerical measures such as central tendency, spread, and shape of data distributions. It explains sample statistics versus population parameters, types of averages including arithmetic, geometric, and harmonic means, and provides examples for calculating means from both ungrouped and grouped data. Additionally, it highlights the characteristics of a good measure of central tendency and includes practical examples to illustrate the concepts.

Uploaded by

ajwaqadir121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

F e d e r a l U r d u U n i v e r s i t y o f A r t s , S c i e n c e a n d Te c h n o l o g y, I s l a m a b a d

Lecture No. 07, 08 BS-AI

Probability and Statistics

Instructor: Ms. Mehr Shabab Sundas


Lecturer Statistics (Gold Medalist)
Descriptive Statistics: Numerical Measures
Descriptive Statistics: Numerical Measures

• Descriptive Statistics summarize and describe the main features of a dataset.

• Numerical Measures are quantitative summaries that describe:

Centre of the data distribution

Spread (dispersion) of the data

Shape of the data distribution


Center of the data distribution:

• Mean

• Median

• Mode
Spread (dispersion) of the data:
• Range

• Variance

• Standard deviation

• Interquartile range (IQR)


Shape of the data distribution

• Skewness (symmetry or asymmetry of data)

• Kurtosis (peakedness or flatness of data)

• Distribution type (e.g., normal, uniform, skewed)


Sample Statistics and Population Parameters
Sample Statistics and Population Parameters

• If the measures are computed for data from a sample, they are called sample
statistics.

• If the measures are computed for data from a population, they are called
population parameters.

• A sample statistic is referred to as the point estimator of the corresponding


population parameter.
Measures of Location/ Measures of Central Tendency
Measures of Central Tendency (Averages)
• A data set can be summarized in a single value, usually located near the center, representing
the entire data set.

• This value indicates where the data have a tendency to concentrate.

• Two important points to note:

• A measure of central tendency should lie within the range of the data.

• It should remain unchanged when the observations are rearranged in a different order.

• The measures of central tendency or location are generally known as Averages.


Characteristics of a Good Measure of Central Tendency:

• It should be easy to calculate and simple to understand.

• It should be clearly defined by a mathematical formula.

• It should not be affected by extreme values (outliers).

• It should be based on all observations in the dataset.

• It should be capable of further mathematical treatment.

• It should have sample stability (remain consistent across samples).


Types of Averages:
• The most common types of averages are:
• Arithmetic Mean (or simply Mean)

• Geometric Mean

• Harmonic Mean

• Median

• Mode

• Explanation:

• The first three types (Mean, Geometric Mean, Harmonic Mean) are mathematical in character and indicate the
magnitude of observed values.

• The fourth type (Median) indicates the middle position of the data.

• The fifth type (Mode) provides information about the most frequent value in the data set.
Mean:

• The mean is perhaps the most important measure of location.

• It provides a measure of central location in a data set.

• The mean is calculated as the average of all data values.

• The sample mean ( 𝒙


ഥ ) is the point estimator of the population mean (µ).
Types of Mean:

1. Arithmetic Mean

2. Geometric Mean

3. Harmonic Mean
The Arithmetic Mean/ Simply Mean/ Average

• The arithmetic mean is the most commonly used average.

Definition:

• A value obtained by dividing the sum of all observations by their number, that is

𝑺𝒖𝒎 𝒐𝒇 𝒂𝒍𝒍 𝒕𝒉𝒆 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏𝒔


𝑀𝑒𝑎𝑛 =
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒕𝒉𝒆 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏𝒔
The Arithmetic Mean

• The population mean is a fixed quantity.

• The sample mean is a variable because different samples from the same
population tend to have different mean.
The Arithmetic Mean (Ungrouped Data)
The Arithmetic Mean (Ungrouped Data)
A population mean is traditionally denoted by 𝝁 (the Greek letter mu). Thus, the
population mean of a set of 𝑵 observations 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 , … , 𝒙𝑵 drawn from a population is
given as:

𝒙𝟏 + 𝒙𝟐 + 𝒙𝟑 + ⋯ + 𝒙𝑵
𝝁=
𝑵

σ𝑵
𝒊=𝟏 𝒙𝒊
=
𝑵

Where 𝛴, the Greek capital Sigma, is a convenient symbol for summation.


The Arithmetic Mean (Ungrouped Data)
A sample mean, usually denoted by placing a bar over the symbol used to
represent the observations or the variables.
The mean of a set of n observations 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 , … , 𝒙𝒏 drawn from a sample is
defined as:

𝒙𝟏 + 𝒙𝟐 + 𝒙𝟑 + ⋯ + 𝒙𝒏
ഥ=
𝒙
𝒏

σ𝒏𝒊=𝟏 𝒙𝒊
=
𝒏

Where 𝒙
ഥ is the mean of a sample of size n.
The Arithmetic Mean
(Example of Ungrouped Population Mean)

Example:
The number of AI models deployed by 5 different tech companies are 4, 6, 5, 7, and 8. Treating the data as a
population, find the mean number of AI models deployed.
Solution:
Since the data are considered as a finite population,
σ𝑵
𝒊=𝟏 𝒙𝒊
𝜇=
𝑵

4+6+5+7+8
= =6
5
Interpretation:
On average, each company has deployed 6 AI models.
This means that the central tendency or typical level of AI model deployment among these companies is 6
models per company.
The Arithmetic Mean
(Example of Ungrouped Population Mean)
Example:
The number of AI research papers published by 5 universities in a year are 10, 15, 20, 25, and 30. Treating the data as a
population, find the mean number of AI research papers published.
Solution:
σ𝑵
𝒊=𝟏 𝒙𝒊
𝜇=
𝑵

10 + 15 + 20 + 25 + 30
𝜇=
5
100
𝜇= = 20
5
Interpretation:
On average, each university published 20 AI research papers during the year.
This indicates that the typical research output in AI among these universities is around 20 papers, showing a moderate and
balanced level of AI research activity.
The Arithmetic Mean
(Example of Ungrouped Sample Mean)

Example:
A data scientist tests an AI chatbot’s response time (in seconds) on 7 randomly selected queries.
The response times recorded are: 2.8, 3.1, 2.5, 3.4, 3.0, 2.9, 3.3.
Calculate the sample mean response time.
Solution:
σ𝒏𝒊=𝟏 𝒙𝒊
ഥ=
𝒙
𝒏

2.8 + 3.1 + 2.5 + 3.4 + 3.0 + 2.9 + 3.3


𝑥ҧ =
7
21.0
𝑥ҧ = = 3.0
7
Interpretation:
The average (sample mean) response time of the AI chatbot is 3.0 seconds.
This means that, on average, the chatbot takes 3 seconds to respond to a query based on the selected sample — giving a good
estimate of its typical performance speed.
The Arithmetic Mean
(Example of Ungrouped Sample Mean)
Example:
A robotics researcher measures the battery life (in hours) of an AI-powered delivery robot in five test runs.
The results are: 12, 14, 11, 13, 15 hours.
Calculate the sample mean battery life.
Solution:

σ𝒏𝒊=𝟏 𝒙𝒊
ഥ=
𝒙
𝒏
12 + 14 + 11 + 13 + 15
ഥ=
𝒙 = 13
5

Interpretation:
The average (sample mean) battery life of the AI-powered robot is 13 hours.
This means that, based on the sample tests, the robot typically operates for about 13 hours before needing a recharge.
The Arithmetic Mean (Grouped Data)
The Arithmetic Mean
(Grouped Data)
• To calculate the mean of grouped data, follow these steps:
• Determine the midpoint (𝒙ᵢ) of each class interval.
• Multiply each midpoint (𝒙ᵢ) by its corresponding frequency (𝒇ᵢ).
• Find the sum of all 𝒇ᵢ𝒙ᵢ values.
• Divide this sum by the total frequency (𝚺𝒇ᵢ).
Mathematically,
σ 𝒇𝒊 𝒙𝒊
ഥ=
𝒙 , 𝒊 = 𝟏, 𝟐, 𝟑, … , 𝒏 Sample Mean of Grouped Data
σ 𝒇𝒊
σ 𝒇𝒊 𝒙𝒊
𝝁= , 𝒊 = 𝟏, 𝟐, 𝟑, … , 𝑁 Population Mean of Grouped Data
σ 𝒇𝒊
The Arithmetic Mean
(Example of Population Mean for Grouped Data)
Example:
The following table shows training time (hours) required by an AI model
across different runs.
Find the population mean training time.
Training Time (𝐻𝑟𝑠) 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 (𝑓ᵢ)
0.5 − 9.5 3
10.5 − 19.5 10
20.5 − 29.5 6
30.5 − 39.5 4
40.5 − 49.5 2
The Arithmetic Mean
(Example of Population Mean for Grouped Data)
𝐶𝑙𝑎𝑠𝑠 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝐶𝑙𝑎𝑠𝑠 𝐵𝑜𝑢𝑛𝑑𝑎𝑟𝑖𝑒𝑠 𝑀𝑖𝑑𝑝𝑜𝑖𝑛𝑡 (𝑥ᵢ) 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 (𝑓ᵢ) 𝑓ᵢ𝑥ᵢ

Solution: 0 + 10
0.5 − 9.5 0 – 10 =5 3 5 × 3 = 15
2
10 + 20
10.5 − 19.5 10 – 20 = 15 10 15 × 10 = 150
2
20 + 30
20.5 − 29.5 20 – 30 = 25 6 25 × 6 = 150
2
30 + 40
30.5 − 39.5 30 – 40 = 35 4 35 × 4 = 140
2
40 + 50
40.5 − 49.5 40 – 50 = 45 2 45 × 2 = 90
2
෍ 𝒇𝒊 = 25 ෍ 𝒇𝒊 𝒙𝒊 = 545
The Arithmetic Mean
(Example of Population Mean for Grouped Data)

Solution: σ 𝒇𝒊 𝒙𝒊
𝝁=
σ 𝒇𝒊

𝟓𝟒𝟓
𝝁=
𝟐𝟓

𝝁 = 𝟐𝟏. 𝟖

Interpretation:
The population mean training time is 21.8 hours — i.e., on average each training run took about 21.8 hours.
The Arithmetic Mean
(Example of Sample Mean for Grouped Data)
Example (— Sample Mean for Discrete Data):
The table shows the number of training epochs and the number of AI models
trained using those epochs. Find the sample mean number of epochs.

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐸𝑝𝑜𝑐ℎ𝑠 (𝑥ᵢ) 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 (𝑓ᵢ)

5 3
10 6
15 8
20 5
25 3
The Arithmetic Mean
(Example of Sample Mean for Grouped Data)

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐸𝑝𝑜𝑐ℎ𝑠 (𝑥ᵢ) 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 (𝑓ᵢ) 𝑓ᵢ𝑥ᵢ

5 3 15
Solution:
10 6 60
15 8 120
20 5 100
25 3 75

෍ 𝒇𝒊 = 25 ෍ 𝒇𝒊 𝒙𝒊 = 370
The Arithmetic Mean
(Example of Sample Mean for Grouped Data)

Solution: σ 𝒇𝒊 𝒙𝒊
ഥ=
𝒙
σ 𝒇𝒊

𝟑𝟕𝟎
ഥ=
𝒙
𝟐𝟓

ഥ = 𝟏𝟒. 𝟖
𝒙

Interpretation:
On average, the AI models were trained for about 15 epochs each.
The Geometric Mean
The Geometric Mean
The geometric mean, G.M., of a set of n positive values 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 , … , 𝒙𝒏 is defined as the
positive nth root of their product, i.e.

𝒏
𝑮. 𝑴. = 𝒙𝟏 . 𝒙𝟐 … 𝒙𝒏 , 𝒘𝒉𝒆𝒓𝒆 𝒏 > 𝟎
Alternative Formula:
𝟏
𝑮. 𝑴. = 𝒂𝒏𝒕𝒊𝒍𝒐𝒈 ෍ 𝒍𝒐𝒈𝒙𝒊 , 𝒇𝒐𝒓 𝑼𝒏𝒈𝒓𝒐𝒖𝒑𝒆𝒅 𝑫𝒂𝒕𝒂
𝒏

𝟏
𝑮. 𝑴. = 𝒂𝒏𝒕𝒊𝒍𝒐𝒈 ෍ 𝒇𝒊 𝒍𝒐𝒈𝒙𝒊 , 𝒇𝒐𝒓 𝑮𝒓𝒐𝒖𝒑𝒆𝒅 𝑫𝒂𝒕𝒂
σ 𝒇𝒊

The geometric mean is appropriate to average ratios and rate of change.


The Geometric Mean
(Ungrouped Data)
Example:

The following data represent the accuracy scores (in decimal form) of an AI model over 5 consecutive experiments:
0.80, 0.85, 0.90, 0.88, 0.92

The Geometric Mean (G.M.) is given by:


𝑛
𝐺. 𝑀. = 𝑥1 × 𝑥2 × 𝑥3 × ⋯ × 𝑥𝑛
5
𝐺. 𝑀. = 0.80 × 0.85 × 0.90 × 0.88 × 0.92
5
𝐺. 𝑀. = 0.494
𝐺. 𝑀. = 0.87

Interpretation:
The geometric mean accuracy of the AI model across 5 experiments is 0.87 (or 87%), representing its average multiplicative
performance over time.
The Geometric Mean
(Grouped Data)
Example:
The following table shows the training accuracy ranges (%) of an AI model along with the number of
experiments (frequency) in each range. Find the geometric mean accuracy.

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑅𝑎𝑛𝑔𝑒 (%) 60 − 69 70 − 79 80 − 89 90 − 99

𝑓 9 10 17 10
The Geometric Mean
(Grouped Data)
Solution:

(𝑚𝑖𝑑 𝑝𝑜𝑖𝑛𝑡𝑠)𝒙𝒊
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑅𝑎𝑛𝑔𝑒 (%) 𝒇𝒊 log 𝒙𝒊 𝒇𝒊 log 𝒙𝒊

60 − 69 64.5 9 1.8096 16.2864

70 − 79 74.5 10 1.8722 18.7220

80 − 89 84.5 17 1.9269 32.7573

90 − 99 94.5 10 1.9754 19.7540

𝛴 −− ෍ 𝒇𝒊 = 𝟒𝟔 −− ෍ 𝒇𝒊 log 𝒙𝒊 = 87.5197
The Geometric Mean
(Grouped Data)
Solution:
𝟏
𝐺 = 𝑎𝑛𝑡𝑖log ෍ 𝒇𝒊 log 𝒙𝒊
σ 𝒇𝒊

1
= 𝑎𝑛𝑡𝑖log (87.5197)
46

= 𝑎𝑛𝑡𝑖𝑙𝑜𝑔 1.9026
= 79.9098
Interpretation:
The average multiplicative (geometric) accuracy across all AI model experiments is approximately 79.91%, which reflects a
balanced measure of overall training performance across varying accuracy levels.
The Harmonic Mean
(Ungrouped Data)
The harmonic mean, H, of a set of n values 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 , … , 𝒙𝒏 is the reciprocal of the arithmetic mean of the
reciprocals of the values.

1 1 1 1
+ + + ⋯+
𝑥 𝑥2 𝑥3 𝑥𝑛
𝑯 = 𝑹𝒆𝒄𝒊𝒑𝒓𝒐𝒄𝒂𝒍 𝒐𝒇 1 , 𝒙≠𝟎
𝑛
Hence the harmonic mean, H, is given by

𝑛
𝐻=
1
σ
𝑥𝑖

The harmonic mean is appropriate for rate-type data, such as:


• Speed (distance per time)
• Efficiency (work per unit time)
• Accuracy per iteration in AI models
The Harmonic Mean
(Ungrouped Data)
Example:
The following table shows the training speeds (samples per second) of an AI model in 5 runs:
𝑥𝑖 = 20,25,30,40,50
𝑛
𝐻=
1
σ
𝑥𝑖
5
𝐻=
1 1 1 1 1
+ + + +
20 25 30 40 50
5
𝐻=
0.050 + 0.040 + 0.0333 + 0.025 + 0.020 = 0.1683
1
𝐻= ≈ 29.7
0.16835
Interpretation:
The harmonic mean speed of the AI model across these 5 runs is approximately 29.7 samples per second
The Harmonic Mean
(Grouped Data)
Harmonic mean for grouped data can be calculated by dividing the sum
of observation (σ𝑓) with the sum of reciprocal of given observations
σ𝑓
multiplied by their respective frequencies .
𝑥
Formula:
𝑯. 𝑴. = σ𝒇 / σ𝒇 𝟏/𝒙

Where σ𝑓 = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑎𝑙𝑠𝑜 𝑐𝑎𝑙𝑙𝑒𝑑 𝑛 .


The Harmonic Mean
(Grouped Data)
Example:
The following table shows the training accuracy ranges (%) of an AI model along with the number of
experiments (frequency) in each range. Find the harmonic mean accuracy.

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑅𝑎𝑛𝑔𝑒 (%) 60 − 69 70 − 79 80 − 89 90 − 99

𝑓 9 10 17 10
The Harmonic Mean
(Grouped Data)
Solution:
(𝑚𝑖𝑑 𝑝𝑜𝑖𝑛𝑡𝑠)𝒙𝒊 𝒇𝒊
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑅𝑎𝑛𝑔𝑒 (%) 𝒇𝒊
𝒙𝒊
9
60 − 69 64.5 9 = 0.14
64.5
70 − 79 74.5 10 0.13
80 − 89 84.5 17 0.20
90 − 99 94.5 10 0.11

𝒇𝒊
𝛴 −− ෍ 𝒇𝒊 = 𝟒𝟔 ෍ = 0.58
𝒙𝒊
The Harmonic Mean
(Grouped Data)
Solution:
σ 𝒇𝒊
𝑯. 𝑴. =
𝒇
σ 𝒊
𝒙𝒊

𝟒𝟔
𝑯. 𝑴. =
𝟎. 𝟓𝟖

= 79.31

Interpretation:
The average multiplicative (harmonic) accuracy across all AI model experiments is approximately 79.31%
Relation between the Arithmetic Mean, the
Geometric Mean and the Harmonic Mean

1. The general relation between 𝑨. 𝑴. , 𝑮. 𝑴. 𝐚𝐧𝐝 𝑯. 𝑴. 𝒊𝒔:

𝑨. 𝑴. > 𝑮. 𝑴. > 𝑯. 𝑴.

2. When all the values are equal then:


𝑨. 𝑴. = 𝑮. 𝑴. = 𝑯. 𝑴.
Thank You

You might also like