0% found this document useful (0 votes)

29 views68 pages

TensorFlow Classification

The document discusses classification as a fundamental problem in machine learning, detailing various classifiers and their evaluation metrics such as accuracy, precision, and recall. It highlights the importance of understanding these metrics, especially in skewed datasets, and emphasizes the need for careful model selection and threshold tuning to optimize performance. Additionally, it introduces concepts like confusion matrices and ROC curves to aid in model evaluation and decision-making.

Uploaded by

Surya Bhoi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views68 pages

TensorFlow Classification

Uploaded by

Surya Bhoi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 68

Classification as a Machine Learning Problem

Over view

Classification is a canonical problem in Machine Learning

Classifiers can be measured using accuracy, precision and

recall

Traditional ML models for classification include SVM and

Naive Bayes

Neural networks perform very well on classification problems

Classification and Classifiers
Machine Learning

Work with a huge maze of Make intelligent decisions

Find patterns
data
Machine Learning

Emails on a server Spam or Ham? Trash or Inbox

Types of Machine Learning Problems

Classification Regression Clustering Rule-extraction

Types of Machine Learning Problems

Classification Regression Clustering Rule-extraction

Whales: Fish or Mammals?

Mammals Fish
Members of the infraorder Cetacea Look like fish, swim like fish, move with
fish
Whales: Fish or Mammals?

ML-based Classifier
ML-based Classifier

Training Prediction
Feed in a large corpus of data classified Use it to classify new instances which it
correctly has not seen before
Training the ML-based Classifier

Classification

ML-based Classifier
Corpus

Feedback - loss
Improves model parameters function or cost
function
An algorithm might have high accuracy but
still be a poor machine learning model

Its predictions are useless

Accuracy, Precision, Recall
All-is-well Binary Classifier

Medical reports
Always classify as No Cancer
“normal”

Here, accuracy for rare cancer may be 99.9999%, but…

Accuracy

Some labels maybe much more common/rare

than others

Such a dataset is said to be skewed

Accuracy is a poor evaluation metric here

Confusion Matrix
Predicted Labels
No Cancer
Cancer
Actual Label
10 instances 4 instances
Cancer

No Cancer 5 instances 1000 instances

Confusion Matrix
Predicted Labels
No Cancer
Cancer
Actual Label
10 4
Cancer

No Cancer 5 1000
True Positive
Predicted Labels
No Cancer
Cancer
Actual Label
10 4
Cancer

No Cancer 5 1000

Actual Label = Predicted Label

True Positive
Predicted Labels
No Cancer
Cancer
Actual Label

Cancer
10 TP 4

No Cancer 5 1000

Actual Label = Predicted Label

False Positive
Predicted Labels
No Cancer
Cancer
Actual Label
10 4
Cancer

No Cancer 5 1000

Actual Label =/ Predicted Label

False Positive
Predicted Labels
No Cancer
Cancer
Actual Label
10 4
Cancer

No Cancer 5
FP 1000

Actual Label =/ Predicted Label

True Positive
Predicted Labels
No Cancer
Cancer
Actual Label
10 4
Cancer

No Cancer 5 1000

Actual Label = Predicted Label

True Negative
Predicted Labels
No Cancer
Cancer
Actual Label
10 4
Cancer

No Cancer 5 1000 TN

Actual Label = Predicted Label

False Negative
Predicted Labels
No Cancer
Cancer
Actual Label
10 4
Cancer

No Cancer 5 1000

Actual Label =/ Predicted Label

False Negative
Predicted Labels
No Cancer
Cancer
Actual Label

Cancer
10 4 FN

No Cancer 5 1000

Actual Label =/ Predicted Label

Confusion Matrix
Predicted Labels
No Cancer
Cancer
Actual Label

Cancer
10 TP 4 FN

No Cancer 5 FP 1000 TN
Accuracy
Predicted Labels
No Cancer
Cancer
Actual Label

Cancer
10 TP 4 FN

No Cancer 5 FP 1000 TN
Accuracy
Predicted Labels
No Cancer
Cancer
Actual Label

Cancer
10 TP 4 FN

No Cancer 5 FP 1000 TN

Actual Label = Predicted Label

Accuracy
Predicted Labels
No Cancer
Cancer
Actual Label

Cancer
10 TP 4 FN

No Cancer 5 FP 1000 TN

Accuracy =
TP + TN
=
1010
= 99.12%
Num Instances 1019
Accuracy

Accuracy = 99.12%

Classifier gets it right 99.12% of the time

But…
Accuracy
Predicted Labels
No Cancer
Cancer
Actual Label

Cancer
10 TP 4 FN

No Cancer 5 FP 1000 TN

People on chemotherapy, radiation when not required

Accuracy
Predicted Labels
No Cancer
Cancer
Actual Label

Cancer
10 TP 4 FN

No Cancer 5 FP 1000 TN

Cancer not detected, no treatment prescribed

Accuracy is not a good metric to evaluate
whether this model performs well
Precision
Predicted Labels
No Cancer
Cancer
Actual Label

Cancer
10 TP 4 FN

No Cancer 5 FP 1000 TN
Precision
Predicted Labels
No Cancer
Cancer
Actual Label

Cancer
10 TP 4 FN

No Cancer 5 FP 1000 TN

Precision = Accuracy when classifier flags cancer

Precision
Predicted Labels
No Cancer
Cancer
Actual Label

Cancer
10 TP 4 FN

No Cancer 5 FP 1000 TN

TP 10
Precision = TP + FP = 15 = 66.67%
Precision = 66.67%
Precision
1 in 3 cancer diagnoses is incorrect
Recall
Predicted Labels
No Cancer
Cancer
Actual Label

Cancer
10 TP 4 FN

No Cancer 5 FP 1000 TN
Recall
Predicted Labels
No Cancer
Cancer
Actual Label

Cancer
10 TP 4 FN

No Cancer 5 FP 1000 TN

Recall = Accuracy when cancer actually present

Recall
Predicted Labels
No Cancer
Cancer
Actual Label

Cancer
10 TP 4 FN

No Cancer 5 FP 1000 TN

TP 10
Recall = TP + FN = 14 = 71.42%
Recall = 71.42%
Recall
2 in 7 cancer cases missed
Choosing a Machine Learning Model
ML-based Binary Classifier

Breathes like a mammal

Mammal
Gives birth like a mammal
ML-based Classifier

Corpus
ML-based Binary Classifier

Breathes like a mammal

P(fish) = 0.45
Gives birth like a mammal
ML-based Classifier

Corpus
Applying Logistic Regression
Probability of
animal being (95%)
fish Lives in water, breathes with gills, lays
eggs
(60%)

Lives in water, breathes with lungs,does not lay

eggs
Lives on land, breathes with lungs,does not lay
eggs
(5%) (40%)

Whales: Fish or Mammals?

Choosing Decision Threshold
(50%)
Probability of
animal being
fish
Pthreshold (80%)
(95%)

(60%)

(5%) (20%) (40%)

Choosing Decision Threshold
Probability of
animal being
fish
Pthreshold (80%)
(95%)

(60%)

(5%) (20%) (40%)

If probability < Pthreshold, it’s a mammal

Applying Logistic Regression
Probability of
animal being
fish
Pthreshold (80%)
(95%)

(60%)

(5%) (20%) (40%)

If probability > Pthreshold, it’s a fish

Predicted
No Cancer
Cancer
Actual TP FN

Cancer 0 14

FP TN
“Always No Cancer
0 1005
Negative”
Pthreshold =1 - Recall = 0%

- Precision = Infinite

- Classifier too conservative

Precision vs.“Conservativeness”
1.0 Precision

0
1.0
“Conservativeness” of Decision Threshold
Predicted
No Cancer
Cancer
Actual TP FN

Cancer 14 0

FP TN
“Always No Cancer
1005 0
Positive”
Pthreshold = 0 - Recall = 100%

- Precision = 14/1019 = 13.7%

- Classifier not conservative enough

Recall vs.“Conser vativeness”
1.0

Recall
0
1.0
“Conservativeness” of Decision Threshold
Precision-Recall Tradeoff
1.0 Precision

Recall
0
1.0
“Conservativeness” of Decision Threshold
Precision-Recall Tradeoff
1.0

Precision

Recall
Heuristics to Choose a Model

ROC Curve
F1 Score
Plot a curve to maximize true positives,
Harmonic mean of precision and recall
minimize false positives
Heuristics to Choose a Model

ROC Curve
F1 Score
Plot a curve to maximize true positives,
Harmonic mean of precision and recall
minimize false positives
Precision x Recall
F1 = 2x
Precision + Recall
F1 Score - Harmonic mean of precision, recall

- Closer to lower of two

- Favors even tradeoff

Choosing Pthreshold

Tweak threshold values

Calculate F1 Score
Run training by changing threshold values for
Each training run produces a model, calculate F1 score for each model
each execution

High F1 score better

Calculate precision, recall
Choose threshold which results in the highest F1
Find values for each training run
score
Heuristics to Choose a Model

ROC Curve
F1 Score
Plot a curve to maximize true positives,
Harmonic mean of precision and recall
minimize false positives
Choosing Pthreshold

True
Positive
Rate

False Positive
Rate
Choosing Pthreshold

Should be as high as
True possible
Positive
Rate

False Positive
Rate
Choosing Pthreshold

Should be as low as
True possible
Positive
Rate

False Positive
Rate
Choosing Pthreshold
ROC Curve

(Receiver Operating
Characteristic)

True
Positive
Rate

False Positive
Rate
Choosing Pthreshold
1.0

True
Positive Different values of Pthreshold
(Hyperparameter tuning)
Rate

False Positive
Rate
Choosing Pthreshold
1.0

True
Positive Fit ROC curve from different
Rate values of Pthreshold

False Positive
Rate
ROC Cur ve
1.0

True
Positive Pick top-left corner point as Pthreshold
Rate Why? Maximises True Positive Rate,
minimises False Positive Rate
0

False Positive
Rate
ROC of Perfect Classifier
1.0

TP = 100%

FP = 0%

True
Positive
Rate

False Positive
Rate
ROC of Random Classifier
1.0

TP = FP
True
Positive
Rate

False Positive
Rate

3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
Machine Learning Evaluation Metrics Lecturer
No ratings yet
Machine Learning Evaluation Metrics Lecturer
30 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
CH 6
No ratings yet
CH 6
24 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Unit 2 Classification
No ratings yet
Unit 2 Classification
59 pages
Data Mining: Class Imbalance Solutions
No ratings yet
Data Mining: Class Imbalance Solutions
56 pages
Performance Parameters
No ratings yet
Performance Parameters
14 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Classification
100% (2)
Classification
105 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
Classification Data Mining
No ratings yet
Classification Data Mining
84 pages
Day 6 Model Evaluation Generalization
No ratings yet
Day 6 Model Evaluation Generalization
49 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
AI Evaluation
No ratings yet
AI Evaluation
18 pages
Lec10 Intro ML
No ratings yet
Lec10 Intro ML
93 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
Lectures3 5
No ratings yet
Lectures3 5
57 pages
Evaluation in Ai
No ratings yet
Evaluation in Ai
25 pages
Compare Class I Fiers Part 13
No ratings yet
Compare Class I Fiers Part 13
32 pages
Lesson 4 - Performance Metrics
No ratings yet
Lesson 4 - Performance Metrics
46 pages
Machine Learning Classification Guide
No ratings yet
Machine Learning Classification Guide
28 pages
19 ML Intro
No ratings yet
19 ML Intro
33 pages
Binary Classification PDF
No ratings yet
Binary Classification PDF
27 pages
DL IT324a 4
No ratings yet
DL IT324a 4
52 pages
BSC ML CH1
No ratings yet
BSC ML CH1
63 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
Classification
No ratings yet
Classification
53 pages
0 Machine Learning Overview and Metrics LT
No ratings yet
0 Machine Learning Overview and Metrics LT
84 pages
Logistic Regression
No ratings yet
Logistic Regression
87 pages
Lec 21
No ratings yet
Lec 21
34 pages
AIML-HC Mod 03
No ratings yet
AIML-HC Mod 03
46 pages
ML Chap 2
No ratings yet
ML Chap 2
60 pages
Model Evaluation
No ratings yet
Model Evaluation
31 pages
Intro to Supervised Learning
No ratings yet
Intro to Supervised Learning
55 pages
Machine Learning Evaluation Metrics
No ratings yet
Machine Learning Evaluation Metrics
15 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
Binary Classifier Evaluation Guide
No ratings yet
Binary Classifier Evaluation Guide
12 pages
Classification Algorithms Overview
No ratings yet
Classification Algorithms Overview
32 pages
For Unit 4 Useful
100% (1)
For Unit 4 Useful
107 pages
08 Classifier Evaluation
No ratings yet
08 Classifier Evaluation
39 pages
ML Notes UT-2
No ratings yet
ML Notes UT-2
19 pages
ML Lecture#03 1
No ratings yet
ML Lecture#03 1
21 pages
Breast Cancer Detection and Prediction: Created by
No ratings yet
Breast Cancer Detection and Prediction: Created by
20 pages
Statistical Learning Slides
No ratings yet
Statistical Learning Slides
60 pages
Module - 4 - ECE3047 - Machine Learning
No ratings yet
Module - 4 - ECE3047 - Machine Learning
81 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
31 pages
W4 Ecs7020p
No ratings yet
W4 Ecs7020p
48 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
49 pages
19 ML Intro
No ratings yet
19 ML Intro
31 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
Machine Learning Intro & Evaluation Metrics
No ratings yet
Machine Learning Intro & Evaluation Metrics
50 pages
Campus Placement Prediction with ML
No ratings yet
Campus Placement Prediction with ML
5 pages
Credit Card Fraud Report Ieee 26 (1) Nk631
No ratings yet
Credit Card Fraud Report Ieee 26 (1) Nk631
7 pages
Guide To AUC ROC Curve in Machine Learning
No ratings yet
Guide To AUC ROC Curve in Machine Learning
10 pages
Ensemble Techniques Project
100% (2)
Ensemble Techniques Project
28 pages
Data Mining Lab: Classification & Clustering
No ratings yet
Data Mining Lab: Classification & Clustering
2 pages
004-The Evaluation of Classifiers-Complete
No ratings yet
004-The Evaluation of Classifiers-Complete
90 pages
Clustering - Jupyter Notebook
100% (1)
Clustering - Jupyter Notebook
11 pages
IE506 Bagging Boosting April5 6
No ratings yet
IE506 Bagging Boosting April5 6
14 pages
Experiment7 1758729431
No ratings yet
Experiment7 1758729431
5 pages
Session-11 Machine Learning - Jupyter Notebook
No ratings yet
Session-11 Machine Learning - Jupyter Notebook
11 pages
t3 M 4256 Box Plot Matching Pairs Activity Sheet English - Ver - 3
No ratings yet
t3 M 4256 Box Plot Matching Pairs Activity Sheet English - Ver - 3
2 pages
Svmsmote 061430
No ratings yet
Svmsmote 061430
2 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
4 pages
A Simple Guide To Centroid Based Clustering (With Python Code)
No ratings yet
A Simple Guide To Centroid Based Clustering (With Python Code)
25 pages
Regression - Naive - SVM
No ratings yet
Regression - Naive - SVM
3 pages
Machine Learning: Oversampling vs Undersampling
No ratings yet
Machine Learning: Oversampling vs Undersampling
6 pages
07-Ensembles Notes
No ratings yet
07-Ensembles Notes
21 pages
Deep Learning Techniques For Cyber Security Intrusion Detection: A Detailed Analysis
No ratings yet
Deep Learning Techniques For Cyber Security Intrusion Detection: A Detailed Analysis
11 pages
Density Based Clustering
No ratings yet
Density Based Clustering
70 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
11 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
AdaBoost for Enhanced Classification
No ratings yet
AdaBoost for Enhanced Classification
20 pages
Understanding Screening Tests
No ratings yet
Understanding Screening Tests
37 pages
23CS0902
No ratings yet
23CS0902
13 pages
Understanding Cluster Analysis Techniques
No ratings yet
Understanding Cluster Analysis Techniques
68 pages
Module 4-6 DWM QUESTION BANK
No ratings yet
Module 4-6 DWM QUESTION BANK
25 pages
Key Scikit-Learn Models and Hyperparameters
No ratings yet
Key Scikit-Learn Models and Hyperparameters
1 page
Naïve Bayes + Neural Network
No ratings yet
Naïve Bayes + Neural Network
10 pages
Confusion Matrix Metrics Explained
No ratings yet
Confusion Matrix Metrics Explained
9 pages

TensorFlow Classification

Uploaded by

TensorFlow Classification

Uploaded by

Classification as a Machine Learning Problem

Classification is a canonical problem in Machine Learning

Classifiers can be measured using accuracy, precision and

Traditional ML models for classification include SVM and

Neural networks perform very well on classification problems

Work with a huge maze of Make intelligent decisions

Emails on a server Spam or Ham? Trash or Inbox

Classification Regression Clustering Rule-extraction

Classification Regression Clustering Rule-extraction

Its predictions are useless

Here, accuracy for rare cancer may be 99.9999%, but…

Some labels maybe much more common/rare

Such a dataset is said to be skewed

Accuracy is a poor evaluation metric here

No Cancer 5 instances 1000 instances

Actual Label = Predicted Label

Actual Label = Predicted Label

Actual Label =/ Predicted Label

Actual Label =/ Predicted Label

Actual Label = Predicted Label

Actual Label = Predicted Label

Actual Label =/ Predicted Label

Actual Label =/ Predicted Label

Actual Label = Predicted Label

Classifier gets it right 99.12% of the time

People on chemotherapy, radiation when not required

Cancer not detected, no treatment prescribed

Precision = Accuracy when classifier flags cancer

Recall = Accuracy when cancer actually present

Breathes like a mammal

Breathes like a mammal

Lives in water, breathes with lungs,does not lay

Whales: Fish or Mammals?

(5%) (20%) (40%)

(5%) (20%) (40%)

If probability < Pthreshold, it’s a mammal

(5%) (20%) (40%)

If probability > Pthreshold, it’s a fish

- Classifier too conservative

- Precision = 14/1019 = 13.7%

- Classifier not conservative enough

- Closer to lower of two

- Favors even tradeoff

Tweak threshold values

High F1 score better

You might also like