0% found this document useful (0 votes)

19 views30 pages

Machine Learning Evaluation Metrics Lecturer

Uploaded by

yassmin khaldi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views30 pages

Machine Learning Evaluation Metrics Lecturer

Uploaded by

yassmin khaldi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

.

BDIJOF-FBSOJOH
1FSGPSNBODF.FUSJDT ityof Bergamo

Performance metrics
Outline
1. Metrics

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

4. Worked example

2 /31
Outline
1. Metrics

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

4. Worked example

3 /31
Metrics
It is extremely important to use quantitative metrics for evaluating a machine learning
model

Until now, we relied on the cost function value for regression and classification

Other metrics can be used to better evaluate and understand the model

For classification
Accuracy/Precision/Recall/F1-score, ROC curves,…
For regression
Normalized RMSE, Normalized Mean Absolute Error (NMAE),…

4 /31
Accuracy
Accuracy is a measure of how close a given set of guessing from our model are closed
to their true value.

(

If a classifier make 10 predictions and 9 of them are correct, the accuracy is 90%.

Accuracy is a measure of how well a binary classifier correctly identifies or excludes

a condition.
It’s the proportion of correct predictions among the total number of cases
examined.

5 /31
Classification case: metrics for skewed classes
Disease dichotomic classification example

Train logistic regression model " , with ! ( if disease, ! ( otherwise.

Find that you got error on test set ( correct diagnoses)

The " class has very few samples with

Only of patients actually have disease
respect to the " class

If I use a classifier that always classifies the observations to the % class, I get of
accuracy!!

For skewed classes, the accuracy metric can be deceptive

6 /31
Outline
1. Metrics

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

4. Worked example

7 /31
Precision and recall
Suppose that ! ( in presence of a rare class that we want to detect

Precision (How much we are precise in the detection)

Of all patients where we classified , Confusion matrix
what fraction actually has the disease?
Actual class

"
!

Estiamted class
1 (p) 0 (n)

True positive False positive

Recall (How much we are good at detecting) 1 (Y)
(TP) (FP)
Of all patients that actually have the disease, what
fraction did we correctly detect as having the disease? False negative True negative
0 (N)
(FN) (TN)

"
!

8 /31
F1-score
It is usually better to compare models by means of one number only. The & ' can
be used to combine precision and recall

Precision(P) Recall (R) Average F1 Score

Algorithm 1 0.5 0.4 0.45 0.444 The best is Algorithm 1
Algorithm 2 0.7 0.1 0.4 0.175
Algorithm 3 0.02 1.0 0.51 0.0392
Algorithm 3 classifies always Average says not correctly
that Algorithm 3 is the best

10 /31
Summaries of the confusion matrix
Different metrics can be computed from the confusion matrix, depending on the class of
interest (https://en.wikipedia.org/wiki/Precision_and_recall)

11 /31
Outline
1. Metrics

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

4. Worked example

12 /31
Ranking instead of classifying
Classifiers such as logistic regression can output a probability of belonging to a class (or
something similar)

We can use this to rank the different istances and take actions on the cases at top of
the list

We may have a budget, so we have to target most promising individuals

Ranking enables to use different techniques for visualizing model performance

13 /31
Ranking instead of classifying
p n

Y 0 0 p n
Instance 1 0
True class Score N 100 100 Y
description
99 100
…………… 0,99 N

…………… 0,98 Different confusion

…………… 0,96 p n matrices by changing
…………… 0,90 Y
2 0 the threshold
…………… 0,88 N 98 100
p n
…………… 0,87 Y
2 1

…………… 0,85 98 99
N
…………… 0,80 p n
…………… 0,70 Y
6 4

94 96
N

14 /31
Ranking instead of classifying
ROC curves are a very general way to represent and compare the performance of
different models (on a binary classification task)

Perfection Observations
classify always negative
Recall (True Positive Rate)

Random classify always positive

Better guessing
classifier : random classifier
Worse
classifier : worse than random classifier
Different classifiers can be compared

Area Under the Curve (AUC): probability that a

randomly chosen positive instance will be ranked
1 – specificity (False Positive Rate) ahead of randomly chosen negative instance

15 /31
Outline
1. Metrics

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

4. Worked examples

16 /31
Breast cancer detection
Breast cancer is the most common cancer amongst women in the world.

It accounts for 25% of all cancer cases, and affected over 2.1 Million people in 2015
alone.

It starts when cells in the breast begin to grow out of control. These cells usually
form tumors that can be seen via X-ray or felt as lumps in the breast area.

The key challenges against it’s detection is how to classify tumors into malignant
(cancerous) or benign(non cancerous).

Goal: classifying these tumors using machine learning and the Breast Cancer
Wisconsin (Diagnostic) Dataset.

17 /31
Breast cancer Wisconsin dataset Output:
This dataset has been referred from Kaggle. Class 4 stands for malignant cancer
Class 2 stands for benign cancer

Uniformity Single
Clump Uniformity of Cell Marginal Epithelial Bare Bland Normal
id_num Thickness of Cell Size Shape Adhesion Cell Size Nuclei Chromatin Nucleoli Mitoses Class

1041801 5 3 3 3 2 3 4 4 1 4

1043999 1 1 1 1 2 3 3 1 1 2

1044572 8 7 5 10 7 9 5 5 4 4

1047630 7 4 6 4 6 1 4 3 1 4

1048672 4 1 1 1 2 1 2 1 1 2

1049815 4 1 1 1 2 1 3 1 1 2

1050670 10 7 7 6 4 10 4 1 2 4

……… …… …… …… …… …… …… ……. ……. ……. …….

18 /31
Breast cancer detection
We will use the dataset to compare differente logistic regression models by means
of the ROC curve associated to each of them.

To this aim we will work with 4 different dataset (plus an extra one)

1. Case 1: the whole dataset

2. Case 2: the first group of 5 features
3. Case 3: the second group of 5 features
4. Case 4: only the first two features

Extra: after learning the model of CASE 1, take only the features with the smallest
p-value.

19 /31
%% Load and clean data

Matlab code data = readtable('breast_cancer_w.xlsx'); %load our data as a table

Phi=table2array(data(:,1:end-1));
y=table2array(data(:,end));
Output:
y(y==4)=1; % in the original date 4 stands for malignant cancer
Class 4 stands for malignant y(y==2)=0; % in the original date 2 stands for benign cancer
cancer and it is for us the positive % Setup the data matrix appropriately, and add ones for the intercept
output. We set it to 1 term
[N, d] = size(Phi);

Class 2 stands for benign cancer Phi = [ones(N, 1) Phi]; % Add intercept term
and it is for us the negative %% Train and test data

output. We set it to 0. mdl = fitglm(Phi,y,'Distribution','binomial','Link','logit')

%% ============ Part 2: Compute the ROC curve ============

scores = mdl.Fitted.Probability;

[X,Y,T,AUC] = perfcurve(y,scores,1);

%Plot the ROC curve.

perfcurve compute the points in figure
plot(X,Y)
the ROC curve as well as the AUC xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification by Logistic Regression')

20 /31
Results

Comparison of case 1, 2, 3 and 4

Using only the first 2 features is nott

a smart choice.

21 /31
Results
Comparison of case 1, 4 and best

Using only the best features

provides a model that performs
almost as well as using all the
features

22 /31
Pneumonia detection
Suppose to have at disposal X-ray images of lungs: Healthy people - Covid-19 disease
patients

23 /31
Acknowledgments
The COVID-19 X-ray image is curated by Dr. Joseph Cohen, a postdoctoral fellow at
the University of Montreal, see https://josephpcohen.com/w/public-covid19-dataset/

The previous data contain only X-ray images of people with a disease. To collect
images of healthy people, we can download another X-ray dataset on the platform
Kaggle https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia

The analysis is inspired from a tutorial by Adrian Rosebrock:

https://www.pyimagesearch.com/2020/03/16/detecting-covid-19-in-x-ray-images-with-keras-
tensorflow-and-deep-learning/

24 /31
Acknowledgments
We want to use a classifier to perform classification:
Healthy patients: class
Patients with a disease: class

The input data are directly the X-ray images

For these computer vision tasks, the state of the art algorithm are the Convolutional
Neural Networks:
we can use them to classify the images into healthy and disease

25 /31
Estimated covid label
Pneumonia detection True label
Estimated healthy label

26 /31
Pneumonia detection
Classification results on test set
Actual class

Estimated class
1 (p) 0 (n)
Sensitivity (recall, true positive rate)

True positive False positive

1 (Y)
11 0

False negative True negative

0 (N)
1 11

Specificity (true negative rate)

Accuracy: )

27 /31
Pneumonia detection
Classification results on test set

Sensitivity (recall, true positive rate) Specificity (true negative rate)

Sensitivity: of patients that do have COVID-19 (i.e., true positives), we could

accurately identify them as “COVID-19 positive” 92% of the time using our model

Specificity: of patients that do not have COVID-19 (i.e., true negatives), we could
accurately identify them as “COVID-19 negative” 100% of the time using our model.

28 /31
Pneumonia detection
Classification results on test set

Sensitivity (recall, true positive rate) Specificity (true negative rate)

Being able to accurately detect healthy patients with 100% accuracy is great. We do
not want to quarantine someone for nothing

…but we don’t want to classify someone as «healthy» when they are «COVID-19
positive», since it could infect other people without knowing

29 /31
Summary
Balancing sensitivity and specificity is incredibly challenging when it comes to medical
applications

The results should always be validated with another pool of people

Furthermore, we need to be concerned of what the model is actually learning:

Does the results align with the medical knowledge?
Was the dataset well representative of the population or there was selection bias?

30 /31
Summary

Furthermore, we need to be concerned of

what the model is actually learning:
Do we accounted for all external factors
(confounding) that could interfere with the
response?

31 /31

Evaluation in Ai
No ratings yet
Evaluation in Ai
25 pages
Performance Parameters
No ratings yet
Performance Parameters
14 pages
TensorFlow Classification
No ratings yet
TensorFlow Classification
68 pages
Breast Cancer Detection and Prediction: Created by
No ratings yet
Breast Cancer Detection and Prediction: Created by
20 pages
APA Chapter3 T20
No ratings yet
APA Chapter3 T20
24 pages
Machine Learning Evaluation Metrics
No ratings yet
Machine Learning Evaluation Metrics
15 pages
BSAN Case 3
No ratings yet
BSAN Case 3
9 pages
Lec5 Classification
No ratings yet
Lec5 Classification
27 pages
Lecture11evaluationmetricsforclassification 240913060639 0c766554
No ratings yet
Lecture11evaluationmetricsforclassification 240913060639 0c766554
28 pages
ML Lecture 11 Evaluation
No ratings yet
ML Lecture 11 Evaluation
17 pages
3 - Model Evaluation & Validation
No ratings yet
3 - Model Evaluation & Validation
47 pages
Scikit-learn Classification Guide
No ratings yet
Scikit-learn Classification Guide
16 pages
l09 Machine Learning
No ratings yet
l09 Machine Learning
39 pages
ML Healthcare Clean APA Final
No ratings yet
ML Healthcare Clean APA Final
9 pages
IDS Project Group 11
No ratings yet
IDS Project Group 11
35 pages
Session 31-32 - ROC
No ratings yet
Session 31-32 - ROC
15 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
31 pages
DL IT324a 4
No ratings yet
DL IT324a 4
52 pages
ML Classification Metrics Guide
100% (1)
ML Classification Metrics Guide
30 pages
Machine Learning Model Evaluation Metrics
No ratings yet
Machine Learning Model Evaluation Metrics
14 pages
Breast Cancer Prediction with Logistic Regression
No ratings yet
Breast Cancer Prediction with Logistic Regression
15 pages
Breast Cancer Detection
No ratings yet
Breast Cancer Detection
15 pages
Functional - Test - Case - Template Minor
No ratings yet
Functional - Test - Case - Template Minor
3 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
Performance Metrics
No ratings yet
Performance Metrics
12 pages
Binary Classification PDF
No ratings yet
Binary Classification PDF
27 pages
Comparative Analysis of ML
No ratings yet
Comparative Analysis of ML
9 pages
S2 24 WIPRO AML Labcourse2 Kittu
No ratings yet
S2 24 WIPRO AML Labcourse2 Kittu
15 pages
Comparison of ML On WDBC Ayush
No ratings yet
Comparison of ML On WDBC Ayush
6 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
ML Metrics
No ratings yet
ML Metrics
9 pages
Intermediate Analytics-Regression-Week 3-1
No ratings yet
Intermediate Analytics-Regression-Week 3-1
44 pages
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-08 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-08 Reference-Material-I
18 pages
A10 Model Performance v2 2up
No ratings yet
A10 Model Performance v2 2up
11 pages
Hands On ML Workshop-Classification
No ratings yet
Hands On ML Workshop-Classification
17 pages
Lesson 4 - Performance Metrics
No ratings yet
Lesson 4 - Performance Metrics
46 pages
Binary Classifier Evaluation Guide
No ratings yet
Binary Classifier Evaluation Guide
12 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
Module 2
No ratings yet
Module 2
72 pages
ML Model Evaluation Metrics
No ratings yet
ML Model Evaluation Metrics
8 pages
Machine Learning PDF
No ratings yet
Machine Learning PDF
8 pages
08 Classifier Evaluation
No ratings yet
08 Classifier Evaluation
39 pages
Machine Learning II
No ratings yet
Machine Learning II
61 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
ML Acti
No ratings yet
ML Acti
23 pages
March 3rd&4th
No ratings yet
March 3rd&4th
19 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
Module 5 ML
No ratings yet
Module 5 ML
12 pages
Breast Cancer Classification Study Using AI
No ratings yet
Breast Cancer Classification Study Using AI
63 pages
L22 KNN+Metrics
No ratings yet
L22 KNN+Metrics
18 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
61 pages
Data Analysis and Machine Learning On The Wisconsin Breast Cancer Dataset
No ratings yet
Data Analysis and Machine Learning On The Wisconsin Breast Cancer Dataset
11 pages
ML Model Evaluation Metrics
No ratings yet
ML Model Evaluation Metrics
11 pages
Classification Metrics
No ratings yet
Classification Metrics
39 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
CIVI6731 Lecture (Week9)
No ratings yet
CIVI6731 Lecture (Week9)
18 pages
Compare Class I Fiers Part 13
No ratings yet
Compare Class I Fiers Part 13
32 pages
DAILY LESSON LOG OF M11GM-Ig-1 (Week Seven-Day Two) : Expected Answers
No ratings yet
DAILY LESSON LOG OF M11GM-Ig-1 (Week Seven-Day Two) : Expected Answers
4 pages
Analyzing Factors Affecting The Behavioral Intention To Use E-Wallet With The UTAUT Model With Experience As Moderating Variable
No ratings yet
Analyzing Factors Affecting The Behavioral Intention To Use E-Wallet With The UTAUT Model With Experience As Moderating Variable
11 pages
BDPR3103 Final Exam Answer
No ratings yet
BDPR3103 Final Exam Answer
5 pages
SITXHRM009 Assessment Guide
No ratings yet
SITXHRM009 Assessment Guide
26 pages
The Importance of Project Planning
No ratings yet
The Importance of Project Planning
2 pages
Project Management Lessons
100% (1)
Project Management Lessons
9 pages
Cichon 1983
No ratings yet
Cichon 1983
4 pages
Development Across Lifespan
100% (1)
Development Across Lifespan
6 pages
Viva Frida: (Caldecott Honor Book)
No ratings yet
Viva Frida: (Caldecott Honor Book)
29 pages
Week 7 Developing A Teaching Plan 1
No ratings yet
Week 7 Developing A Teaching Plan 1
4 pages
Grade 8 Math Stars PDF
100% (2)
Grade 8 Math Stars PDF
35 pages
History of Group Work
No ratings yet
History of Group Work
3 pages
Verb Forms Table
No ratings yet
Verb Forms Table
5 pages
Midterm IMT
No ratings yet
Midterm IMT
3 pages
NB GD
No ratings yet
NB GD
11 pages
Sample SOP For Law
100% (1)
Sample SOP For Law
2 pages
Submitted By: Dela Cruz, Janella T. Masters in Business Administration
No ratings yet
Submitted By: Dela Cruz, Janella T. Masters in Business Administration
21 pages
Renaissance Music Overview
No ratings yet
Renaissance Music Overview
2 pages
Mosley Laura N 201205 PHD
No ratings yet
Mosley Laura N 201205 PHD
369 pages
The Impact of Online Sales Promotion On Consumers' Online Impulsive Buying Decisions, Suggestion For AI Recommendation Systems
No ratings yet
The Impact of Online Sales Promotion On Consumers' Online Impulsive Buying Decisions, Suggestion For AI Recommendation Systems
9 pages
Test Bank For Saunders Comprehensive Review For The NCLEX-PN Examination, 6th Edition, Linda Anne Silvestri, PDF Download
100% (9)
Test Bank For Saunders Comprehensive Review For The NCLEX-PN Examination, 6th Edition, Linda Anne Silvestri, PDF Download
35 pages
Etech Finals Exam
No ratings yet
Etech Finals Exam
3 pages
Understanding Parliamentary Majorities
No ratings yet
Understanding Parliamentary Majorities
1 page
Interpersonal Skills Questionnaire
83% (12)
Interpersonal Skills Questionnaire
6 pages
Evil Rubber Duckies
No ratings yet
Evil Rubber Duckies
2 pages
Yunxiang - Yan - 2020 Gifts Cea
No ratings yet
Yunxiang - Yan - 2020 Gifts Cea
16 pages
Tony Stark: The Iron Man Legacy
No ratings yet
Tony Stark: The Iron Man Legacy
1 page
Robbery-Homicide Case Verdict 1964
67% (3)
Robbery-Homicide Case Verdict 1964
2 pages
PC Troubleshooting Guide
No ratings yet
PC Troubleshooting Guide
1 page
Field Guide To Butterflies of The San Francisco Bay and Sacramento Valley Regions (Arthur M. Shapiro, Timothy D. Manolis) (Z-Library)
50% (2)
Field Guide To Butterflies of The San Francisco Bay and Sacramento Valley Regions (Arthur M. Shapiro, Timothy D. Manolis) (Z-Library)
425 pages

Machine Learning Evaluation Metrics Lecturer

Uploaded by

Machine Learning Evaluation Metrics Lecturer

Uploaded by

.

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

Accuracy is a measure of how well a binary classifier correctly identifies or excludes

Train logistic regression model  " , with ! (  if disease, ! (  otherwise.

The "  class has very few samples with

For skewed classes, the accuracy metric can be deceptive

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

Precision (How much we are precise in the detection)

True positive False positive

Precision(P) Recall (R) Average F1 Score

          

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

We may have a budget, so we have to target most promising individuals

Ranking enables to use different techniques for visualizing model performance

…………… 0,98 Different confusion

Random   classify always positive

Area Under the Curve (AUC): probability that a

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

……… …… …… …… …… …… …… ……. ……. ……. …….

1. Case 1: the whole dataset

Matlab code data = readtable('breast_cancer_w.xlsx'); %load our data as a table

output. We set it to 0. mdl = fitglm(Phi,y,'Distribution','binomial','Link','logit')

%% ============ Part 2: Compute the ROC curve ============

%Plot the ROC curve.

Comparison of case 1, 2, 3 and 4

Using only the first 2 features is nott

Using only the best features

The analysis is inspired from a tutorial by Adrian Rosebrock:

The input data are directly the X-ray images

          True positive False positive

False negative True negative

Specificity (true negative rate)

Sensitivity (recall, true positive rate) Specificity (true negative rate)

Sensitivity: of patients that do have COVID-19 (i.e., true positives), we could

Sensitivity (recall, true positive rate) Specificity (true negative rate)

The results should always be validated with another pool of people

Furthermore, we need to be concerned of what the model is actually learning:

Furthermore, we need to be concerned of

You might also like

Train logistic regression model " , with ! ( if disease, ! ( otherwise.

The " class has very few samples with

Random classify always positive

True positive False positive