0% found this document useful (0 votes)

5 views32 pages

L05-HW2 Notes 2024

The document discusses various methods for data splitting in machine learning, including train-test splits, cross-validation, and hyperparameter tuning. It emphasizes the importance of proper validation and testing to avoid bias and ensure reliable performance estimates. Additionally, it covers the Receiver Operating Characteristic (RoC) curve and the trade-offs in selecting thresholds based on application needs.

Uploaded by

Phichet Phuangrot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views32 pages

L05-HW2 Notes 2024

Uploaded by

Phichet Phuangrot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

HW 2 notes

Data splitting, RoC

Loose ends from HW2

• A majority class baseline

• Powerful if one class dominates. Often happens in real life
• Recognizer becomes biased towards the majority class (the
prior term)
• How to deal with this?

• Zero probability in the estimation

• Hyperparameter
• RoC
Splitting the data
• We want to estimate the performance on the model
• Need a test set

• Train-test split
• Leave-one-out splitting
• Bootstraping
• Cross-validation splitting
Splitting data

Data
Simple train-test split
80 % 20 % Ratio can be varied
as appropriate

Training set Test set

200,000
1,000,000 samples
800,000 samples
samples

100
500 samples
400 samples
samples

Stratified splitting – tries to keep the distribution in the training and test the same

sklearn.model_selection.StratifiedShuffleSplit
sklearn.model_selection.StratifiedKFold
Leave-one-out

1 samples

Training set Test set

Multiple train-test split

Estimate the true performance (expected performance)

Random 𝑦! = 2.3x1 – 3.4x2 + 4.2

Train-test split ➝ Train model ➝ Predict

x 10

This is sometimes called bootstrapping (in statistics).

This can also be used to calculate the variance of your method performance
Cross-validation (CV)

Fold 1 Fold
Data2 Fold 3

Test set Training set Training set

Training set Test set Training set

Training set Training set Test set

3-fold cross-validation

Similar idea to bootstrapping but there’s no overlapped in the splits.

Hyperparameter vs Parameters
• Parameters - something the models learn from data
• Hyperparameter – something we pick for the model via
trial and error

Model Parameter Hyperparameter

Linear regression weights Loss (L1/L2), polynomial degree, features used, …

Naïve Bayes Distribution parameter Type of distribution used

GMM distribution Means, covariance, Number of mixture, type of covariance matrix

mixture weight
Histrogram Histrogram height Number of bins or size of bins
distribution
K-means Centroids K

ML model Model weights Type of ML model

Picking hyper-parameters

Data

Validation Test set

Training set
set

You decide on the models and hyperparameter on the validation performance.

Make sure that you are not optimizing on the test set.
Make the test set a good proxy for estimating real world performance

Don’t cheat!
Splitting a validation set

Fold 1 Fold 2set

Training Fold 3 Test set

Validation
Training set Training set Test set
set

If the test set is fixed, we can do CV on the training set to get the valitation set
Estimating true performance of our
pipeline Freeze the test set
Touch the test set as less as possible
The more you often see the test set the more you cheat

Fold 1 Fold 2set

Training Fold 3 Test set
Example

Validation
set
Training set Training set Test set Best degree = 3, CV accuracy = 80, Test accuracy = 75

Validation Best degree = 2, CV accuracy = 78, Test accuracy = 80

Training set Training set Test set
set

Validation Best degree = 2, CV accuracy = 78, Test accuracy = 85

Training set Training set Test set
set

Estimates of the accuracy using our precedure = 80%

Question: which model do we deploy?

Nested CV

It the fixed test set is too small to give a reliable estimate, use nested CV.
Or a mixture of tecchniques, EX leave one out CV with a validation set.

Validation Validation
Training set Training set Test set Test set Training set Training set
set set

Test split 1 Test split 2

Size of split
• Train/validation/test split
• 80/10/10, 90/5/5, 5-fold CV, leave one out CV, etc. for academia
• For real applications, get dev and test sets that represent
your users.
• Reflects the data you want to do well on.
• There can be a mis-match between train and dev data. But avoid
mis-match between dev and test data.
• If no users, recruit friends to pretend to be the users.

• Example: Cat classifier.

• Should you use ImageNet cat pictures as train/dev/test?
• Go pretend you’re a user and take cat pictures for the dev/test set.
Val and test set and size
• Val - tune hyperparameters, select features, and make
other decisions regarding the learning algorithm.
• Test - evaluate the performance of the algorithm, but not
to make any decisions about regarding what learning
algorithm or parameters to use.

• Val – big enough to notice difference between algorithms

(if you care about 0.1% difference, make sure you have
enough dev set to spot it).
• Test – large enough to give confidence that your model
will do well in real task
Congratulations on your first attempt on
(almost) re-implementing a research
paper!

Another trick to reduce 0 bins in histograms

Prediction and thresholds
Beer Grass Rice Flood Prediction
100 3 3 Yes 0.8
20 1 1 Yes 0.3
80 3 2 No 0.6
40 1 1 No 0.2
40 1 1 No 0.1

What happens if I set my threshold at 0.5?

Prediction and thresholds
Beer Grass Rice Flood Prediction Metric
100 3 3 Yes 0.8 TP
20 1 1 Yes 0.3 FN
80 3 2 No 0.6 FA
40 1 1 No 0.2 TN
40 1 1 No 0.1 TN

What happens if I set my threshold at 0.5?

True positive rate =
False alarm rate =
Precision =
Recall =
Prediction and thresholds
Beer Grass Rice Flood Prediction Metric
100 3 3 Yes 0.8 TP
20 1 1 Yes 0.3 FN
80 3 2 No 0.6 FA
40 1 1 No 0.2 TN
40 1 1 No 0.1 TN

What happens if I set my threshold at 0.5?

True positive rate = ½
False alarm rate = ⅓
Precision = ½
Recall = ½
Prediction and thresholds
Beer Grass Rice Flood Prediction Metric
100 3 3 Yes 0.8
20 1 1 Yes 0.3
80 3 2 No 0.6
40 1 1 No 0.2
40 1 1 No 0.1

What happens if I set my threshold at 0.15?

Prediction and thresholds
Beer Grass Rice Flood Prediction Metric
100 3 3 Yes 0.8 TP
20 1 1 Yes 0.3 TP
80 3 2 No 0.6 FA
40 1 1 No 0.2 FA
40 1 1 No 0.1 TN

What happens if I set my threshold at 0.15?

True positive rate = 1
False alarm rate = 2/3
Precision = 2/4
Recall = 2/2
Prediction and thresholds
Beer Grass Rice Flood Prediction Metric
100 3 3 Yes 0.8
20 1 1 Yes 0.3
80 3 2 No 0.6
40 1 1 No 0.2
40 1 1 No 0.1

What happens if I set my What happens if I set my

threshold at 0.5? threshold at 0.15?
True positive rate = ½ True positive rate = 1
False alarm rate = ⅓ False alarm rate = 2/3
Precision = ½ Precision = 2/4
Recall = ½ Recall = 2/2
Receiver operating Characteristic
(RoC) curve
• What if we change the threshold
• FA TP is a tradeoff
This is why we need to think of the application when
thinking of metrics.
• Plot FA rate and TP rate as threshold changes

TPR

FAR 1
Comparing detectors
• Which is better?

TPR

FAR 1
Comparing detectors
• Which is better?

TPR

FAR 1
Selecting the threshold
• Select based on the application
• Trade off between TP and FA. Know your application,
know your users.
• A miss is as bad as a false alarm FAR = 1-TPR => x = 1-y

1 This line has a special name

Equal Error Rate (EER)
TPR

FAR 1 x = 1-y
Selecting the threshold
• Select based on the application
• Trade off between TP and FA. Know your application,
know your users. Is the application about safety?
• A miss is 1000 times more costly than false alarm.
• FAR = 1000(1-TPR) => x = 1000-1000y

1
x = 1000-1000y

TPR

FAR 1
Churn prediction
Predict whether a customer will stop subscription, so we
can send a promotional ad.
Usual subscription fee 50
Cost of calling the customer 5
Promotional subscription fee 25
Describe the strategy to pick the threshold
1

TPR

FAR 1
Selecting the threshold
• Select based on the application
• Trade off between TP and FA.
• Regulation or hard threshold
• Cannot exceed 1 False alarm per year
• If 1 decision is made everyday, FAR = 1/365

x = 1/365
1

TPR

FAR 1
Notes about RoC
• Ways to compress RoC to just a number for easier
comparison -- use with care!!
• EER
• Area under the curve
• F score
• Other similar curve - Detection Error Tradeoff (DET) curve
• Plot False alarm vs Miss rate
1
• Other similar curve
PR curve (precision-recall curve) MR
• Can plot on log scale for clarity

FAR 1
Summary
• Train-validation-test
• Hyperparameter vs parameter
• RoC

04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
Lecture 3 1611410001002
No ratings yet
Lecture 3 1611410001002
51 pages
A10 Model Performance v2 2up
No ratings yet
A10 Model Performance v2 2up
11 pages
5 DL
No ratings yet
5 DL
33 pages
2020 Evaluation PDF
No ratings yet
2020 Evaluation PDF
25 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Understanding Datasets Features Selection Train Test Validation Sets L12
No ratings yet
Understanding Datasets Features Selection Train Test Validation Sets L12
25 pages
Lecture 5 - Feature Extraction, Model Building & Evaluation
No ratings yet
Lecture 5 - Feature Extraction, Model Building & Evaluation
35 pages
ML - 03 Evaluation Metrics
No ratings yet
ML - 03 Evaluation Metrics
17 pages
Chapter-3-Common Issues in Machine Learning
No ratings yet
Chapter-3-Common Issues in Machine Learning
20 pages
Lecture 13
No ratings yet
Lecture 13
39 pages
Evaluating ML Methods and Metrics
No ratings yet
Evaluating ML Methods and Metrics
50 pages
Model Evaluation in ML
No ratings yet
Model Evaluation in ML
12 pages
Predictive Modeling Week3
No ratings yet
Predictive Modeling Week3
68 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Decision Trees in Machine Learning
No ratings yet
Decision Trees in Machine Learning
76 pages
Chapter 09 CART - Week 06 - 02
No ratings yet
Chapter 09 CART - Week 06 - 02
53 pages
Unit 3 ML
No ratings yet
Unit 3 ML
40 pages
DL IT324a 4
No ratings yet
DL IT324a 4
52 pages
Dsbda Ut5
No ratings yet
Dsbda Ut5
7 pages
Data Mining Evaluation Metrics Guide
No ratings yet
Data Mining Evaluation Metrics Guide
40 pages
ML - Training - Evaluation For Machine Learning Course
No ratings yet
ML - Training - Evaluation For Machine Learning Course
31 pages
How to Evaluate Machine Learning Models
No ratings yet
How to Evaluate Machine Learning Models
14 pages
Chapter 3 NeeLXU
No ratings yet
Chapter 3 NeeLXU
68 pages
Deep Learning Unit 3
No ratings yet
Deep Learning Unit 3
19 pages
Binary Classifier Evaluation Guide
No ratings yet
Binary Classifier Evaluation Guide
12 pages
Overfitting & Feature Engineering
No ratings yet
Overfitting & Feature Engineering
37 pages
Model Evaluation
No ratings yet
Model Evaluation
44 pages
Advanced ML Classification Guide
No ratings yet
Advanced ML Classification Guide
40 pages
Data Mining NOTES
No ratings yet
Data Mining NOTES
57 pages
TR Rain Error
No ratings yet
TR Rain Error
6 pages
Lect 03 Evaluation Part 2
No ratings yet
Lect 03 Evaluation Part 2
40 pages
4-1 Fine-Tuning Your Model
No ratings yet
4-1 Fine-Tuning Your Model
60 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
31 pages
ML Module Iii
No ratings yet
ML Module Iii
12 pages
WINSEM2024-25 CBS3006 ETH VL2024250505168 2025-01-11 Reference-Material-I
No ratings yet
WINSEM2024-25 CBS3006 ETH VL2024250505168 2025-01-11 Reference-Material-I
81 pages
Performance Measures
No ratings yet
Performance Measures
32 pages
Evaluation Metricsflaksdj Fa
No ratings yet
Evaluation Metricsflaksdj Fa
22 pages
Mod 7 Smote ML
No ratings yet
Mod 7 Smote ML
40 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Chương 2e. Model Evaluation
No ratings yet
Chương 2e. Model Evaluation
27 pages
Python Decision Trees for Analysts
No ratings yet
Python Decision Trees for Analysts
50 pages
Lecture Model Test Evaluation Techniques
No ratings yet
Lecture Model Test Evaluation Techniques
31 pages
Unit 7 Deterministic Models
No ratings yet
Unit 7 Deterministic Models
71 pages
DA PRA WEEK 13 (Random Forest) - 054551
No ratings yet
DA PRA WEEK 13 (Random Forest) - 054551
12 pages
7 ML
No ratings yet
7 ML
38 pages
Machine Learning 10601 Recitation 8 Oct 21, 2009: Oznur Tastan
No ratings yet
Machine Learning 10601 Recitation 8 Oct 21, 2009: Oznur Tastan
46 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Chap 18 B
No ratings yet
Chap 18 B
22 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
73 pages
Unit 4 Classification (1) (P)
No ratings yet
Unit 4 Classification (1) (P)
50 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
Lssds Trees
No ratings yet
Lssds Trees
41 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
37 pages
Training Evaluation
No ratings yet
Training Evaluation
42 pages
Practical Issues
No ratings yet
Practical Issues
30 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
Proposal Presentation PPT Template Final
No ratings yet
Proposal Presentation PPT Template Final
21 pages
X - Vibrate Air To A Measured Timing and You Have The - Paranormal - 4chan
No ratings yet
X - Vibrate Air To A Measured Timing and You Have The - Paranormal - 4chan
3 pages
FMC The Science of Persulfate Activation Webinar-4!24!13 Final
No ratings yet
FMC The Science of Persulfate Activation Webinar-4!24!13 Final
16 pages
12 - Seminar - Coursework 2 - Report - Template
No ratings yet
12 - Seminar - Coursework 2 - Report - Template
4 pages
Conference On Rnai Based Pesticides Programme
No ratings yet
Conference On Rnai Based Pesticides Programme
7 pages
G
No ratings yet
G
462 pages
Articulation Speech
No ratings yet
Articulation Speech
305 pages
Measuring Personal-Group Relations
No ratings yet
Measuring Personal-Group Relations
7 pages
Soal Try Out Bahasa Inggris Lamongan B
No ratings yet
Soal Try Out Bahasa Inggris Lamongan B
8 pages
Understanding Behaviorism Theory
No ratings yet
Understanding Behaviorism Theory
16 pages
Poisson Distribution in Call Center Analysis
100% (1)
Poisson Distribution in Call Center Analysis
25 pages
Organizational Development Interventions
No ratings yet
Organizational Development Interventions
16 pages
Mathematics SBA
100% (5)
Mathematics SBA
22 pages
Exercise Sheet 1 Mathematics and Statistics
No ratings yet
Exercise Sheet 1 Mathematics and Statistics
9 pages
Basic Principle of Intelligence Operations
No ratings yet
Basic Principle of Intelligence Operations
9 pages
CH 4 - Sources of Error
No ratings yet
CH 4 - Sources of Error
13 pages
Six Sigma Course Assignment Analysis
No ratings yet
Six Sigma Course Assignment Analysis
18 pages
Digital Filter Design Guide
No ratings yet
Digital Filter Design Guide
20 pages
Boissevain 1979 NetworkAnalysis PDF
No ratings yet
Boissevain 1979 NetworkAnalysis PDF
4 pages
Designing An Efficient Image Encryption-Then-Compression System Via Prediction Error Clustering and Random Permutation
No ratings yet
Designing An Efficient Image Encryption-Then-Compression System Via Prediction Error Clustering and Random Permutation
19 pages
Elementary PE Teacher Profile
No ratings yet
Elementary PE Teacher Profile
2 pages
My Limitations
No ratings yet
My Limitations
83 pages
Grashof Law
No ratings yet
Grashof Law
7 pages
Fusion Gdms
No ratings yet
Fusion Gdms
14 pages
Heat Transfer Lesson Plan - Revised
No ratings yet
Heat Transfer Lesson Plan - Revised
4 pages
Managerial Risk Taking
No ratings yet
Managerial Risk Taking
68 pages
Arul Raj Resume
No ratings yet
Arul Raj Resume
4 pages
Shelf Life Estimation
No ratings yet
Shelf Life Estimation
35 pages
Population Space and Place - 2014 - Schapendonk - What If Networks Move Dynamic Social Networking in The Context of
No ratings yet
Population Space and Place - 2014 - Schapendonk - What If Networks Move Dynamic Social Networking in The Context of
11 pages
SIMATIC S7-300 PLC Overview
100% (1)
SIMATIC S7-300 PLC Overview
29 pages