0% found this document useful (0 votes)
33 views18 pages

Clase10 11

The document discusses underfitting and overfitting in machine learning models. It provides examples of how increasing complexity in decision trees can lead to overfitting when the training data is insufficient. The key ways to address overfitting are pre-pruning and post-pruning of decision trees, which aim to find the optimal complexity that minimizes test error. Performance of models is evaluated using metrics like accuracy and the confusion matrix, though accuracy alone can be misleading if classes are imbalanced.

Uploaded by

Magdalena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views18 pages

Clase10 11

The document discusses underfitting and overfitting in machine learning models. It provides examples of how increasing complexity in decision trees can lead to overfitting when the training data is insufficient. The key ways to address overfitting are pre-pruning and post-pruning of decision trees, which aim to find the optimal complexity that minimizes test error. Performance of models is evaluated using metrics like accuracy and the confusion matrix, though accuracy alone can be misleading if classes are imbalanced.

Uploaded by

Magdalena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

26-10-2020

Evaluation

Underfitting and Overfitting


Overfitting

Underfitting: when model is too simple, both training and test errors are large
Overfitting: when model is too complex, training error is small but test error is large

Overfitting due to Noise

Decision boundary is distorted by noise point

1
26-10-2020

Overfitting due to Insufficient Examples

Lack of data points in the lower half of the diagram makes it difficult to
predict correctly the class labels of that region
- Insufficient number of training records in the region causes the decision
tree to predict the test examples using other training records that are
irrelevant to the classification task

Underfitting and Overfitting (Example)

Two class problem:


+ : 5200 instances
• 5000 instances generated from a
Gaussian centered at (10,10)

• 200 noisy instances added

o : 5200 instances
• Generated from a uniform
distribution

10 % of the data used for


training and 90% of the data
used for testing

Increasing number of nodes in Decision Trees

2
26-10-2020

Decision Tree with 4 nodes

Decision Tree

Decision boundaries on Training data

Decision Tree with 50 nodes

Decision Tree

Decision boundaries on Training data

Which tree is better?

Decision Tree with 4 nodes

Which tree is better ?


Decision Tree with 50 nodes

3
26-10-2020

Model Overfitting

Underfitting: when model is too simple, both training and test errors are large
Overfitting: when model is too complex, training error is small but test error is large

10

Model Overfitting

Using twice the number of data instances

• If training data is under-representative, testing errors increase and training


errors decrease on increasing number of nodes
• Increasing the size of training data reduces the difference between training and
testing errors at a given number of nodes

11

Model Overfitting

Decision Tree with 50 nodes Decision Tree with 50 nodes

Using twice the number of data instances

• If training data is under-representative, testing errors increase and training


errors decrease on increasing number of nodes
• Increasing the size of training data reduces the difference between training and
testing errors at a given number of nodes

12

4
26-10-2020

Notes on Overfitting

• Overfitting results in decision trees that are more complex than necessary

• Training error no longer provides a good estimate of how well the tree will
perform on previously unseen records

• Need new ways for estimating errors

13

How to Address Overfitting

• Pre-Pruning (Early Stopping Rule)


– Stop the algorithm before it becomes a fully-grown tree
– Typical stopping conditions for a node:
• Stop if all instances belong to the same class
• Stop if all the attribute values are the same
– More restrictive conditions:
• Stop if number of instances is less than some user-specified threshold
• Stop if class distribution of instances are independent of the available features
(e.g., using  2 test)
• Stop if expanding the current node does not improve impurity
measures (e.g., Gini or information gain).

14

Model Selection for Decision Trees

• Post-pruning
– Grow decision tree to its entirety
– Subtree replacement
• Trim the nodes of the decision tree in a bottom-up fashion
• If generalization error improves after trimming, replace sub-tree
by a leaf node
• Class label of leaf node is determined from majority class of
instances in the sub-tree
– Subtree raising
• Replace subtree with most frequently used branch

15

5
26-10-2020

Examples of Post-pruning

16

Model Evaluation

• Metrics for Performance Evaluation


– How to evaluate the performance of a model?

• Methods for Performance Evaluation


– How to obtain reliable estimates?

• Methods for Model Comparison


– How to compare the relative performance among competing models?

17

Model Evaluation

• Metrics for Performance Evaluation


– How to evaluate the performance of a model?

• Methods for Performance Evaluation


– How to obtain reliable estimates?

• Methods for Model Comparison


– How to compare the relative performance among competing models?

18

6
26-10-2020

Metrics for Performance Evaluation

• Focus on the predictive capability of a model


– Rather than how fast it takes to classify or build models, scalability,
etc.
• Confusion Matrix:

PREDICTED CLASS
a: TP (true positive)

Class=Yes Class=No b: FN (false negative)


c: FP (false positive)

Class=Yes a b d: TN (true negative)


ACTUAL
CLASS Class=No c d

19

Metrics for Performance Evaluation

PREDICTED CLASS a: TP (true positive)


b: FN (false negative)

Class=Yes Class=No c: FP (false positive)


d: TN (true negative)

Class=Yes a b
ACTUAL
CLASS Class=No c d

20

Metrics for Performance Evaluation…


PREDICTED CLASS
Class=Yes Class=No

Class=Yes a b
ACTUAL (TP) (FN)
CLASS Class=No c d
(FP) (TN)

• Most widely-used metric:


ad TP  TN
Accuracy  
a  b  c  d TP  TN  FP  FN

21

7
26-10-2020

Limitation of Accuracy

• Consider a 2-class problem


– Number of Class 0 examples = 9990
– Number of Class 1 examples = 10

• If model predicts everything to be class 0, accuracy is 9990/10000 = 99.9%


– Accuracy is misleading because model does not detect any class 1
example

22

Performance metrics

PREDICTED CLASS
Class=Yes Class=No
Class=Yes a b
(TP) (FN)
ACTUAL
CLASS Class=No c d
(FP) (TN)

23

24

8
26-10-2020

25

Cost Matrix

PREDICTED CLASS

C(i|j) Class=Yes Class=No

Class=Yes C(Yes|Yes) C(No|Yes)


ACTUAL
CLASS Class=No C(Yes|No) C(No|No)

C(i|j): Cost of misclassifying class j example as class i

26

Computing Cost of Classification


Cost PREDICTED CLASS
Matrix
C(i|j) + -
ACTUAL + -1 100
CLASS
- 1 0

Model PREDICTED CLASS Model PREDICTED CLASS


M1 M2
+ - + -
ACTUAL + 150 40 ACTUAL + 250 45
CLASS CLASS
- 60 250 - 5 200

Accuracy = 80% Accuracy = 90%


Cost = 3910 Cost = 4255

27

9
26-10-2020

Cost vs Accuracy
Count PREDICTED CLASS Accuracy is proportional to cost if
1. C(Yes|No)=C(No|Yes) = q
Class=Yes Class=No
2. C(Yes|Yes)=C(No|No) = p
Class=Yes a b
ACTUAL N=a+b+c+d
CLASS Class=No c d
Accuracy = (a + d)/N

Cost PREDICTED CLASS


Cost = p (a + d) + q (b + c)
Class=Yes Class=No
= p (a + d) + q (N – a – d)
Class=Yes p q = q N – (q – p)(a + d)
ACTUAL
CLASS Class=No
= N [q – (q-p)  Accuracy]
q p

28

Model Evaluation

• Metrics for Performance Evaluation


– How to evaluate the performance of a model?

• Methods for Performance Evaluation


– How to obtain reliable estimates?

• Methods for Model Comparison


– How to compare the relative performance among competing models?

29

Model Evaluation
• Purpose:
– To estimate performance of classifier on previously
unseen data (test set)

• Holdout
– Reserve k% for training and (100-k)% for testing
– Random subsampling: repeated holdout
• Cross validation
– Partition data into k disjoint subsets
– k-fold: train on k-1 partitions, test on the remaining
one
– Leave-one-out: k=n

30

10
26-10-2020

31

32

Cross-validation Example

• 3-fold cross-validation

33

11
26-10-2020

34

Model Evaluation

• Metrics for Performance Evaluation


– How to evaluate the performance of a model?

• Methods for Performance Evaluation


– How to obtain reliable estimates?

• Methods for Model Comparison


– How to compare the relative performance among competing models?

35

ROC curves

• ROC = Receiver Operating Characteristic


• Started in electronic signal detection theory (1940s -
1950s)
• Has become very popular in in machine learning
applications to assess classifiers

36

12
26-10-2020

Call these “negatives” Call these “positives”

Test Result

37

True Positives

Test Result

38

Test Result False


Positives

39

13
26-10-2020

True
negatives

Test Result

40

False
negatives

Test Result

41

‘‘-’’ ‘‘+’’

Test Result

42

14
26-10-2020

‘‘-’’ ‘‘+’’

Test Result

43

ROC curve
100%
True Positive Rate
(sensitivity)

0
% 100%
0
% False Positive Rate
(1-specificity)

44

A good test: A poor test:

100% 100%
True Positive Rate

True Positive Rate

0 0
% %
0 100 100
0
% False Positive % False Positive %
%
Rate Rate

45

15
26-10-2020

Best Test: Worst test:

100 100
% %

True Positive
True Positive

Rate
Rate

0
0 %
% 0 10
0 100 False Positive 0%
False Positive % %
% Rate
Rate

The distributions The distributions


don’t overlap at all overlap completely

46

Area under ROC curve (AUC)

• Overall measure of test performance


• Comparisons between two tests based on
differences between (estimated) AUC
• For continuous data, AUC equivalent to Mann-
Whitney U-statistic (nonparametric test of
difference in location between two populations)

47

100 100
% %

AUC = 100%
True Positive
True Positive

Rate
Rate

AUC = 50%
0
0 %
% 0 100
0 100 False Positive
% %
% False Positive %
Rate
Rate

100 100
% %

AUC = 90%
True Positive

True Positive

AUC = 65%
Rate

Rate

0 0
% %
0 100 100
False Positive 0
% %
% False Positive %
Rate Rate

48

16
26-10-2020

RECEIVER OPERATOR CHARACTERISTIC

49

RECEIVER OPERATOR CHARACTERISTIC

50

51

17
26-10-2020

52

Significance Tests

• T-Test between two difference performances


• ANOVA between two or more difference
performances

53

18

You might also like