0% found this document useful (0 votes)

33 views47 pages

Module 5

Statistics of AIDS

Uploaded by

aditideo624

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views47 pages

Module 5

Statistics of AIDS

Uploaded by

aditideo624

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

University of Mumbai

Program – Bachelor of Engineering in

Computer Science and Engineering (Artificial Intelligence
and Machine Learning)

Class - T.E.
Course Code – CSDLO5011

Course Name – Statistics for Artificial

Intelligence Data Science

By
Prof. A.V.Phanse
The Analysis of Variance
 ANOVA (Analysis of Variance) is a statistical method used to compare the
means of three or more groups to determine if at least one of the group means
is significantly different from the others.

 It's commonly used in experiments where researchers are interested in

understanding how different treatments, conditions, or factors influence a
dependent variable.

Why Use ANOVA?

 When comparing the means of two groups, a t-test is appropriate.

 When comparing three or more groups, performing multiple t-tests increases
the chance of Type I errors (false positives).
 ANOVA solves this by using a single test to compare all groups at once.
Types of ANOVA:

One-way ANOVA: Compares means across different groups based on a single

factor (independent variable).
Example: Comparing the average test scores of students from three different
teaching methods.

Two-way ANOVA: Looks at the influence of two different factors on the

dependent variable, and also tests for interaction between the two factors.
Example: Studying the effect of both teaching method (factor 1) and student
gender (factor 2) on test scores.

Repeated Measures ANOVA:

A Repeated Measures ANOVA is used when the same subjects are measured
under different conditions or at different time points.
Example: The study of the effect of three different teaching methods on students'
performance. The same group of students takes three different tests, each
corresponding to one teaching method.
Population of
people with high
blood pressure
Before After Difference in
medication medication measurement
Assumptions in ANOVA
Components of ANOVA:
ANOVA partitions the total variability in the data into two types:
Between-group variability: The variation caused by differences between the
group means.
Within-group variability: The variation due to differences within each group
(random variation or noise).
F-Statistic:

 Compare the variance between groups to the variance within groups.

 If the between-group variability is significantly larger than the within-group
variability, it suggests that the group means are not all the same.
Consider a data
of smartphone
usage in hours
for 3 groups.

Null Hypothesis : There is no difference in average smartphone usage of 3 groups

Alternate Hypothesis : There is no difference in average smartphone usage of 3 groups
 For 2 degrees of freedom between the
groups and for 21 degrees of freedom
within the groups, Ftab = 3.47

 Fcal is compared with 3.47 (Ftab), and then

the decision will be taken whether to
reject the null hypothesis or not to reject
the null hypothesis.
 As Fcal (35.39) is greater than 3.47 (Ftab), the null hypothesis is rejected.
Practice Problem from University Exam
Problem of multiple comparisons

 The problem of multiple comparisons arises when a statistical analysis involves

testing several hypotheses simultaneously.
 As the number of comparisons increases, so does the likelihood of obtaining a
significant result purely by chance.
 This increases the risk of Type I errors (false positives), where we incorrectly
reject a true null hypothesis.

Example of the Problem:

 Suppose you're comparing the effectiveness of four different medications on
lowering blood pressure.
 You could perform t-tests to compare each pair of medications (A vs B, A vs C,
etc.), resulting in a total of six pairwise comparisons.
 If each test is performed at a significance level of 0.05, there's a 5% chance of
incorrectly finding a significant difference for each test.
 Across six tests, the overall probability of making at least one Type I error is
greater than 5%.
Family-Wise Error Rate (FWER)

 The family-wise error rate (FWER) is the probability of making at least one
Type I error across all the tests in a family of comparisons.

 The problem of multiple comparisons is about controlling the FWER so that the
chance of making false discoveries does not increase with the number of tests.

 If we perform m independent tests, each at a significance level of α (e.g. 0.05),

the probability of making at least one Type I error can be calculated as:

For m=6 comparisons at α=0.05, the probability of making at least one Type I
error is:

This means that there’s a 26.5% chance of making a false discovery with six
comparisons, which is much higher than the desired 5%.
Solutions to the Multiple Comparisons Problem:
Bonferroni Correction
 One of the simplest and most widely used methods is the Bonferroni
correction. It controls the FWER by adjusting the significance level.
 The Bonferroni correction divides the original significance level (α) by the
number of comparisons (m). The new significance level for each test is:

So, if you are performing 6 tests and want to maintain an overall significance level
of 0.05, the adjusted level for each test is:

This means that for each individual comparison, you would reject the null
hypothesis only if the p-value is less than 0.0083.
A Nonparametric Method—The Kruskal-Wallis Test

 The Kruskal-Wallis test is a nonparametric alternative to the one-way ANOVA.

 It is used to determine if there are significant differences between the medians

of three or more independent groups.

 Since it is nonparametric, it does not require the assumption of normality and

can be used with ordinal or continuous data that do not meet the normality
assumption required by ANOVA.

When to Use the Kruskal-Wallis Test:

 When the assumptions of ANOVA (e.g., normal distribution, homogeneity of

variance) are not met.

 When you have three or more independent groups.

 When the data is ordinal, not normally distributed, or has outliers.

Hypotheses:

Null Hypothesis (H₀): The medians of all groups are equal (there is no difference
between the groups).

Alternative Hypothesis (H₁): At least one group has a different median.

How the Kruskal-Wallis Test Works:

1. Rank the Data: The test works by converting the data into ranks across all
groups. Each observation is assigned a rank, with the smallest value receiving
rank 1, the next smallest rank 2, and so on.

2. Sum of Ranks for Each Group: The ranks are then summed within each group,
and the test statistic is based on comparing these rank sums between groups.

3. H-Statistic: The Kruskal-Wallis test produces an H-statistic, which is a function

of the sum of ranks, group sizes, and total sample size. The H-statistic follows
a chi-square distribution.
The formula for the H-statistic is:

The H-statistic is compared to a chi-square distribution with (g−1) degrees of freedom

Example : With the following data on content (in ml) of potassium per bottle in
brands of a medicine, determine if there is a significant difference in the potassium
content between brands.

Solution :

Rank the Data: The test works by converting the data into ranks across all groups.
Brand Content Rank
A 4.7 2
A 3.2 1
A 5.1 4 Finally note that n1 = n2 = n3 = 5 and N = 15.
A 5.2 5
For 5% level of significance and 2 (g -1) degrees of
A 5.0 3
freedom,
B 5.3 6
B 6.4 9
B 7.3 14
B 6.8 11
B 7.2 13
C 6.3 8
C 8.2 15
C 6.2 7
C 7.1 12
C 6.6 10
H-statistic is:

Conclusion :
As the calculated H statistic value is greater than the chi square critical value, the
null hypothesis will be rejected.

Interpretation :

The potassium content of at least one of the brands is different. Since R1 is far less
than the rank sums of the other two brands, we know that Brand A is different
before we do any kind of post hoc testing.
Example :
A researcher wants to test whether three different teaching methods lead to
different exam scores. The exam scores (out of 100) for students under each
method are as follows:
Method A: 70, 62, 78, 65, 80
Method B: 68, 75, 60, 85, 72
Method C: 90, 85, 88, 92, 84

Solution :
State the Hypotheses:

Null Hypothesis (H₀): The distributions of exam scores for all three methods are
the same.

Alternative Hypothesis (H₁): At least one method has a different distribution of

exam scores.
The degrees of freedom (df) for the Kruskal-Wallis test is the number of groups
minus one (g - 1). Here, we have 3 groups, so:
Using a chi-square distribution table and a significance level (α) of 0.05, the critical
value for χ² with 2 degrees of freedom is 5.991.

 If the test statistic H=8.35 is greater

than the critical value of 5.991, we
reject the null hypothesis.
 Since 8.35 > 5.991, we reject the null
hypothesis.
 This suggests that at least one of the
teaching methods leads to a
significantly different distribution of
exam scores.
Practice Problem from University Exam
Two way ANOVA

 A Two-Way ANOVA is a statistical method used to examine the effect of two

independent factors on a dependent variable.
 It also tests for interactions between these two factors. This type of ANOVA is
useful when the data is categorized by two factors, and you want to see how
these factors, individually and in combination, affect the outcome.
Collection of Data

Two way ANOVA find answer for following three questions :

Using F distribution table, the critical value of F are found and then compared with calculated
values of F.
A Nonparametric Method—Friedman‘s Test
 Friedman's Test is a nonparametric statistical test used to detect differences in
treatments across multiple test attempts.
 It is often used when the assumptions of parametric tests, such as normality
and homogeneity of variance, are not met.
 This test is particularly useful for analyzing randomized block designs with
repeated measures or matched samples.
Steps to Conduct Friedman's Test

Data Arrangement: Organize the data in a matrix format where rows represent
blocks (subjects) and columns represent treatments.

Rank the Data: Within each block (row), rank the treatment responses. If there are
ties, assign the average rank to tied values.

Calculate Test Statistic: For each treatment (column), sum the ranks across blocks.
Calculate the Friedman statistic Q using the formula

Determine the Degrees of Freedom: The degrees of freedom for Friedman's test
is k−1, where k is the number of treatments.
Critical Value:

Compare the calculated Q statistic to the critical value from the Chi-squared
distribution table with k−1 degrees of freedom at the desired significance level
(e.g., α=0.05).

Make a Decision:

 If Q exceeds the critical value, reject the null hypothesis, indicating that there
are significant differences among the treatments.

 If Q does not exceed the critical value, fail to reject the null hypothesis.
Consider call time of 7 people at 3
different time zones

As 2.57 < 5.991, the null hypothesis will

not be rejected.
Thank You…

How Do We Decide If The Medication Was Successful in Lowering The Patient's Concentration of Blood Glucose?
No ratings yet
How Do We Decide If The Medication Was Successful in Lowering The Patient's Concentration of Blood Glucose?
7 pages
Lesson 22 Hypothesis Testing For Three or More Means
No ratings yet
Lesson 22 Hypothesis Testing For Three or More Means
9 pages
CGRP MSC Stat 3 PDF
No ratings yet
CGRP MSC Stat 3 PDF
36 pages
Live Class - Inferential Statistics & Hypothesis Testing
No ratings yet
Live Class - Inferential Statistics & Hypothesis Testing
14 pages
Statistics (Autosaved)
No ratings yet
Statistics (Autosaved)
75 pages
Chapter7 ANOVA
No ratings yet
Chapter7 ANOVA
20 pages
Hypothesis Testing - Analysis of Variance
No ratings yet
Hypothesis Testing - Analysis of Variance
19 pages
One Way Analysis of Variance
No ratings yet
One Way Analysis of Variance
34 pages
Stats PPT - Signed Rank Test, One Way Anova
No ratings yet
Stats PPT - Signed Rank Test, One Way Anova
13 pages
Sullivan ANOVA
No ratings yet
Sullivan ANOVA
19 pages
Anova - Full
No ratings yet
Anova - Full
25 pages
ANova & Experiemntal Design
No ratings yet
ANova & Experiemntal Design
40 pages
CAMI16 - Data Analytics
No ratings yet
CAMI16 - Data Analytics
55 pages
ANOVA Lectures Slides 2021
No ratings yet
ANOVA Lectures Slides 2021
33 pages
Anova-Ppt For Sonia Kalra Ma'Am
No ratings yet
Anova-Ppt For Sonia Kalra Ma'Am
31 pages
Adobe Scan 03-Jan-2024
No ratings yet
Adobe Scan 03-Jan-2024
33 pages
Unit 8 8614 Research
No ratings yet
Unit 8 8614 Research
38 pages
Da Anova Tests
No ratings yet
Da Anova Tests
6 pages
ANOVA Analysis for Week 4 Assignment
No ratings yet
ANOVA Analysis for Week 4 Assignment
50 pages
Class 20 Chi Square Copy 1 79
No ratings yet
Class 20 Chi Square Copy 1 79
13 pages
Hypothesis Testing - Analysis of Variance (ANOVA)
No ratings yet
Hypothesis Testing - Analysis of Variance (ANOVA)
14 pages
SMuR Complete
No ratings yet
SMuR Complete
114 pages
Parametric Tests
No ratings yet
Parametric Tests
50 pages
ANOVA Reader
No ratings yet
ANOVA Reader
7 pages
Comparing Means and Proportions Measures of Association
No ratings yet
Comparing Means and Proportions Measures of Association
59 pages
18MEO113T - DOE - Unit 5 - AY2023 - 24 ODD
No ratings yet
18MEO113T - DOE - Unit 5 - AY2023 - 24 ODD
76 pages
One-Way ANOVA: Analysis and Methods
No ratings yet
One-Way ANOVA: Analysis and Methods
3 pages
ANOVA
No ratings yet
ANOVA
29 pages
Lesson 4 Exploring Agricultural Insights With Anova in Python
No ratings yet
Lesson 4 Exploring Agricultural Insights With Anova in Python
9 pages
ANOVA
No ratings yet
ANOVA
36 pages
ANOVA
No ratings yet
ANOVA
38 pages
One Way Anova Final 9 Galileo
No ratings yet
One Way Anova Final 9 Galileo
46 pages
BRM Unit-3
No ratings yet
BRM Unit-3
22 pages
Statistics for Analysts
No ratings yet
Statistics for Analysts
52 pages
FTEST@3PM 7th Oct
No ratings yet
FTEST@3PM 7th Oct
15 pages
Anovaparametrictest 240312091837 c0b4bb94
No ratings yet
Anovaparametrictest 240312091837 c0b4bb94
12 pages
Chapter 4 Hypotheses Testing of More Than Two Populations
No ratings yet
Chapter 4 Hypotheses Testing of More Than Two Populations
90 pages
Comparing Independent Groups, T-Tests and Anova
No ratings yet
Comparing Independent Groups, T-Tests and Anova
39 pages
Lecture 13 ANOVA
100% (1)
Lecture 13 ANOVA
36 pages
Chapter 15 PDF
No ratings yet
Chapter 15 PDF
16 pages
18MEO113T - DOE - Unit 5 - AY2023 - 24 ODD
No ratings yet
18MEO113T - DOE - Unit 5 - AY2023 - 24 ODD
76 pages
One-Way ANOVA Is Used To Test If The Means of Two or More Groups Are Significantly Different
No ratings yet
One-Way ANOVA Is Used To Test If The Means of Two or More Groups Are Significantly Different
17 pages
10 F Test and Analysis of Variance ANOVA
No ratings yet
10 F Test and Analysis of Variance ANOVA
7 pages
Fullform: 2. What Scale of Data Was Given in ANOVA in The Test?
No ratings yet
Fullform: 2. What Scale of Data Was Given in ANOVA in The Test?
7 pages
SPSS Lec#20
No ratings yet
SPSS Lec#20
8 pages
Understanding T-Tests, ANOVA, and Chi-Square
No ratings yet
Understanding T-Tests, ANOVA, and Chi-Square
26 pages
Class5 Lecture
No ratings yet
Class5 Lecture
53 pages
ENS185 Module On Statistical Tests
No ratings yet
ENS185 Module On Statistical Tests
9 pages
Inferential Statistics
No ratings yet
Inferential Statistics
26 pages
Intro to Chi-Square & ANOVA
No ratings yet
Intro to Chi-Square & ANOVA
40 pages
Correlation Regression Hypo ANOVA
No ratings yet
Correlation Regression Hypo ANOVA
22 pages
Expi Psych Chapter 14-16
No ratings yet
Expi Psych Chapter 14-16
9 pages
Understanding ANOVA in Inferential Statistics
No ratings yet
Understanding ANOVA in Inferential Statistics
102 pages
Team 11 081838
No ratings yet
Team 11 081838
7 pages
ANOVA Basics for Students
No ratings yet
ANOVA Basics for Students
19 pages
One-Way Analysis of Variance: Using The One-Way
No ratings yet
One-Way Analysis of Variance: Using The One-Way
25 pages
PSY4051 - M3 - Introduction To ANOVA
No ratings yet
PSY4051 - M3 - Introduction To ANOVA
35 pages
Techniques of Annova - 20241103 - 232802 - 0000
No ratings yet
Techniques of Annova - 20241103 - 232802 - 0000
32 pages
Unit 9 Inferential Statistics
No ratings yet
Unit 9 Inferential Statistics
102 pages
BCT Ass3 Ans
No ratings yet
BCT Ass3 Ans
6 pages
BCT All - PYQ
No ratings yet
BCT All - PYQ
3 pages
Module 6
No ratings yet
Module 6
35 pages
Module 2 - Part 1
No ratings yet
Module 2 - Part 1
42 pages
Mod 3
No ratings yet
Mod 3
9 pages
Marxist Criticism Of: "The Kiss"
No ratings yet
Marxist Criticism Of: "The Kiss"
10 pages
Probability & Statistics Tutorial
No ratings yet
Probability & Statistics Tutorial
2 pages
The 3 AM Thought Spiral
No ratings yet
The 3 AM Thought Spiral
2 pages
Elwell Macrosociology
100% (1)
Elwell Macrosociology
98 pages
Understanding Statistics Concepts
No ratings yet
Understanding Statistics Concepts
1,100 pages
Wordsearch Animals
No ratings yet
Wordsearch Animals
1 page
Filipino Struggles in El Filibusterismo
No ratings yet
Filipino Struggles in El Filibusterismo
7 pages
Roald Amundsen and The South Pole
No ratings yet
Roald Amundsen and The South Pole
11 pages
Developing Learning Material Using Diff Media
100% (2)
Developing Learning Material Using Diff Media
17 pages
Parenteral Hyperalimentation Calculations
100% (1)
Parenteral Hyperalimentation Calculations
27 pages
HSI Stepper Motor Theory
No ratings yet
HSI Stepper Motor Theory
14 pages
Atlas of Canine and Feline Peripheral Blood Smears
No ratings yet
Atlas of Canine and Feline Peripheral Blood Smears
516 pages
(WWW - Asianovel.com) - Rebirth of The Strongest Female Emperor Chapter 1 - Chapter 11
100% (1)
(WWW - Asianovel.com) - Rebirth of The Strongest Female Emperor Chapter 1 - Chapter 11
42 pages
Acute Traumatic Spinal Cord Injury
No ratings yet
Acute Traumatic Spinal Cord Injury
18 pages
Aas B. Inggris Kelas 7 Semester I
No ratings yet
Aas B. Inggris Kelas 7 Semester I
4 pages
Roman Leather and Shoes Bibliography
100% (1)
Roman Leather and Shoes Bibliography
6 pages
3rd Annual Radiology For Non-Radiologists (5-6 Oct 2019)
No ratings yet
3rd Annual Radiology For Non-Radiologists (5-6 Oct 2019)
2 pages
Kisho Kurokawa - The Philosophy of Symbiosis
75% (4)
Kisho Kurokawa - The Philosophy of Symbiosis
262 pages
A Christmas Promise (IA Christmaspromise00jaco)
No ratings yet
A Christmas Promise (IA Christmaspromise00jaco)
96 pages
Methadone: Opioid Analgesic Overview
No ratings yet
Methadone: Opioid Analgesic Overview
13 pages
Comparative Study: France vs. Philippines Using Hofstede's Cultural Dimensions
No ratings yet
Comparative Study: France vs. Philippines Using Hofstede's Cultural Dimensions
8 pages
File Imo Sample Papers Class 2 1569924248
100% (6)
File Imo Sample Papers Class 2 1569924248
20 pages
Fundamentals of Urine and Body Fluid Analysis 4th Edition by Nancy A Brunzel MS CLSNCA Ebook and TestBank Bundle Download Instantly
No ratings yet
Fundamentals of Urine and Body Fluid Analysis 4th Edition by Nancy A Brunzel MS CLSNCA Ebook and TestBank Bundle Download Instantly
334 pages
3 ESO Unit 1
No ratings yet
3 ESO Unit 1
3 pages
Rizal Activity 4.1
No ratings yet
Rizal Activity 4.1
5 pages
EQXTenor Packet 2020
No ratings yet
EQXTenor Packet 2020
23 pages
Types of Matrix PDF
No ratings yet
Types of Matrix PDF
36 pages
Astronomy With Nak 00 Servi A La
100% (1)
Astronomy With Nak 00 Servi A La
328 pages
Top 100 Hedge Funds to Watch 2023
No ratings yet
Top 100 Hedge Funds to Watch 2023
7 pages
Synthesis, Properties and Applications of Inorganic-Organic Polymers
No ratings yet
Synthesis, Properties and Applications of Inorganic-Organic Polymers
10 pages

Module 5

Uploaded by

Module 5

Uploaded by

University of Mumbai

Program – Bachelor of Engineering in

Course Name – Statistics for Artificial

 It's commonly used in experiments where researchers are interested in

Why Use ANOVA?

 When comparing the means of two groups, a t-test is appropriate.

One-way ANOVA: Compares means across different groups based on a single

Two-way ANOVA: Looks at the influence of two different factors on the

Repeated Measures ANOVA:

 Compare the variance between groups to the variance within groups.

Null Hypothesis : There is no difference in average smartphone usage of 3 groups

 Fcal is compared with 3.47 (Ftab), and then

 The problem of multiple comparisons arises when a statistical analysis involves

Example of the Problem:

 If we perform m independent tests, each at a significance level of α (e.g. 0.05),

 The Kruskal-Wallis test is a nonparametric alternative to the one-way ANOVA.

 It is used to determine if there are significant differences between the medians

 Since it is nonparametric, it does not require the assumption of normality and

When to Use the Kruskal-Wallis Test:

 When the assumptions of ANOVA (e.g., normal distribution, homogeneity of

 When you have three or more independent groups.

 When the data is ordinal, not normally distributed, or has outliers.

Alternative Hypothesis (H₁): At least one group has a different median.

3. H-Statistic: The Kruskal-Wallis test produces an H-statistic, which is a function

The H-statistic is compared to a chi-square distribution with (g−1) degrees of freedom

Alternative Hypothesis (H₁): At least one method has a different distribution of

 If the test statistic H=8.35 is greater

 A Two-Way ANOVA is a statistical method used to examine the effect of two

Two way ANOVA find answer for following three questions :

As 2.57 < 5.991, the null hypothesis will

You might also like