0% found this document useful (0 votes)
41 views35 pages

Lecture15 evaluatingClassifiersPartII

Lecture 15 focuses on evaluating classifiers, including a review of confusion matrices, sensitivity, specificity, and the introduction of ROC curves and the AUC metric. Students are instructed to complete an assignment involving JupyterLab and specific datasets by a set deadline. The lecture emphasizes the importance of understanding thresholds in classification and how they affect sensitivity and specificity.

Uploaded by

Grant Rustan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views35 pages

Lecture15 evaluatingClassifiersPartII

Lecture 15 focuses on evaluating classifiers, including a review of confusion matrices, sensitivity, specificity, and the introduction of ROC curves and the AUC metric. Students are instructed to complete an assignment involving JupyterLab and specific datasets by a set deadline. The lecture emphasizes the importance of understanding thresholds in classification and how they affect sensitivity and specificity.

Uploaded by

Grant Rustan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Lecture 15: Evaluating

Classifiers (Part II)


ENGR:3110
Introduction to AI and Machine Learning in Engineering

1
Today’s topics
• Brief review of confusion matrices and sensitivity/specificity
• Introduction to ROC curves and the AUC metric
• Lec15 assignment (due Monday by 11:59 p.m.)
• Before class starts:
• Start JupyterLab, download/open lec15_EvaluatingClassifiersPartII.ipynb, and (if not
downloaded already) download biomechanics_train.csv, DRIVE_fundus.png, and
DRIVE_target.png. Also NEWLY download DRIVE_prob_prediction_map.png.
• Submit your completed Jupyter notebook file
(lec15_EvaluatingClassifiersPartI.ipynb) to ICON (assignment lec15)

2
Brief review of confusion
matrices and
sensitivity/specificity

3
What does it mean to say that a rapid flu
test has a sensitivity of 50-70%?
A. The test is expected to report a ‘positive’ result (i.e., say that someone
has the flu) on 50-70% of patients who actually have the flu.
B. Of all patients who test ‘positive’ on the test (i.e., of the patients for
which the test reports that they have the flu), 50-70% of them are
expected to actually have the flu.
C. The test is expected to report a ‘negative’ result (i.e., say that
someone does NOT have the flu) on 50-70% of patients who actually
do NOT have the flu.
D. Of all patients who test ‘negative’ on the test (i.e., of the patients for
which the test reports that they do NOT have the flu), 50-70% of them
are expected to actually NOT have the flu.

Also see [Link]


4
What does it mean to say that a rapid flu
test has a sensitivity of 50-70%?
A. The test is expected to report a ‘positive’ result (i.e., say that someone
has the flu) on 50-70% of patients who actually have the flu.
B. Of all patients who test ‘positive’ on the test (i.e., of the patients for
which the test reports that they have the flu), 50-70% of them are
expected to actually have the flu.
C. The test is expected to report a ‘negative’ result (i.e., say that
someone does NOT have the flu) on 50-70% of patients who actually
do NOT have the flu.
D. Of all patients who test ‘negative’ on the test (i.e., of the patients for
which the test reports that they do NOT have the flu), 50-70% of them
are expected to actually NOT have the flu.

Also see [Link]


5
Review of sensitivity and specificity (from
the confusion matrix entries)
A. The test is expected to report a ‘positive’ result (i.e.,
say that someone has the flu) on 50-70% of patients
who actually have the flu.

(TP + TN) / (TP + FN + FP + TN)


accuracy

Also see lec13 notebook for a review of computing


the confusion matrix, and sensitivities/specificities 6
Review of sensitivity and specificity (from
the confusion matrix entries)
B. Of all patients who test ‘positive’ on the test (i.e., of the
patients for which the test reports that they have the flu),
50-70% of them are expected to actually have the flu.

(TP + TN) / (TP + FN + FP + TN)


accuracy

Also see lec13 notebook for a review of computing


the confusion matrix, and sensitivities/specificities 7
Review of sensitivity and specificity (from
the confusion matrix entries)
C. The test is expected to report a ‘negative’ result (i.e.,
say that someone does NOT have the flu) on 50-70% of
patients who actually do NOT have the flu.

(TP + TN) / (TP + FN + FP + TN)


accuracy

Also see lec13 notebook for a review of computing


the confusion matrix, and sensitivities/specificities 8
Review of sensitivity and specificity (from
the confusion matrix entries)
D Of all patients who test ‘negative’ on the test (i.e., of the
patients for which the test reports that they do NOT have
the flu), 50-70% of them are expected to actually NOT
have the flu.

(TP + TN) / (TP + FN + FP + TN)


accuracy

Also see lec13 notebook for a review of computing


the confusion matrix, and sensitivities/specificities 9
What does it mean to say that a rapid flu
test has a specificity of 90-95%?
A. The test is expected to report a ‘positive’ result (i.e., say that someone
has the flu) on 90-95% of patients who actually have the flu.
B. Of all patients who test ‘positive’ on the test (i.e., of the patients for
which the test reports that they have the flu), 90-95% of them are
expected to actually have the flu.
C. The test is expected to report a ‘negative’ result (i.e., say that
someone does NOT have the flu) on 90-95% of patients who actually
do NOT have the flu.
D. Of all patients who test ‘negative’ on the test (i.e., of the patients for
which the test reports that they do NOT have the flu), 90-95% of them
are expected to actually NOT have the flu.

Also see [Link]


10
What does it mean to say that a rapid flu
test has a specificity of 90-95%?
A. The test is expected to report a ‘positive’ result (i.e., say that someone
has the flu) on 90-95% of patients who actually have the flu.
B. Of all patients who test ‘positive’ on the test (i.e., of the patients for
which the test reports that they have the flu), 90-95% of them are
expected to actually have the flu.
C. The test is expected to report a ‘negative’ result (i.e., say that
someone does NOT have the flu) on 90-95% of patients who actually
do NOT have the flu.
D. Of all patients who test ‘negative’ on the test (i.e., of the patients for
which the test reports that they do NOT have the flu), 90-95% of them
are expected to actually NOT have the flu.

Also see [Link]


11
Review of sensitivity and specificity (from
the confusion matrix entries)
sensitivity → true positive rate (with respect
to ACTUAL positives): TP/(TP+FN)

specificity → true negative rate (with respect


to ACTUAL negatives): TN/(FP+TN)

accuracy → number correct:


(TP + TN) / (TP+FN+FP+TN)

(TP + TN) / (TP + FN + FP + TN)


accuracy

Also see lec13 notebook for a review of computing


the confusion matrix, and sensitivities/specificities 12
Introduction to ROC curves
and the AUC metric

13
Classifiers often have an intermediate
numerical result before determining the final
classification
Example: In k-NN classification, in the process of
determining the ‘majority class’ of the closest k
neighbors, one can also keep track of the
proportions of each class.
• Recall the iris classification problem. In the
example on the right (using two features), with
k=3, 2 of the closest points to the red star are
versicolor examples and the remaining point is
virginica. Thus, we could assign the ‘probability’ of
versicolor to be 0.67 and the ‘probability’ of
virginica to be 0.33.
• In sklearn, after training, you can obtain these
‘probability-like’ values using predict_proba rather
than predict (see lec13 notebook for
biomechanics example). 14
petal_length = 5.00, petal_width = 1.65
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)

Biomechanics data…

Suppose the the ‘threshold’ is 0.5. Then we will classify


anything with an abnormal probability value >= 0.5 as
abnormal (this is also what [Link] would do).

15
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)

Suppose that the ‘threshold’ is 0.3. Then we will classify


anything with an abnormal probability value >= 0.3 as
abnormal.

Note -- I colored the "Abnormal" rows yellow, but left the


text label "Normal" -- so you can see the progression

16
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)
Suppose the the ‘threshold’ is 0.0. Then we will classify
anything with an abnormal probability value >= 0.0 as
abnormal.

17
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)
Suppose the the ‘threshold’ is 0.0. Then we will classify
anything with an abnormal probability value >= 0.0 as
abnormal.

Question: Even though we are not showing the


actual target values in this example, what would
the sensitivity be in the case that the threshold is
0.0? What would the specificity be?

Recall that sensitivity is TP/(TP+FN) or TP/(all


actual positives).
Recall that specificity is TN/(FP+TN) or TN/(all
actual negatives).

18
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)
Suppose the the ‘threshold’ is 0.0. Then we will classify
anything with an abnormal probability value >= 0.0 as
abnormal.

Question: Even though we are not showing the


actual target values in this example, what would
the sensitivity be in the case that the threshold is
0.0? What would the specificity be?

Recall that sensitivity is TP/(TP+FN) or TP/(all


actual positives).
Recall that specificity is TN/(FP+TN) or TN/(all
actual negatives).
Sensitivity = 100%
Specificity = 0%
19
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)

Suppose the the ‘threshold’ is 0.5. Then we will classify


anything with an abnormal probability value >= 0.5 as
abnormal (this is also what [Link] would do).

20
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)

Suppose the ‘threshold’ is 0.7. Then we will classify


anything with an abnormal probability value >= 0.7 as
abnormal.

21
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)

Suppose the the ‘threshold’ is 1.01. Then we will classify


anything with an abnormal probability value >= 1.01 as
22
abnormal.
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)
Question: Even though we are not showing the
actual target values in this example, what would
the sensitivity be in the case that the threshold is
1.01? What would the specificity be?

Recall that sensitivity is TP/(TP+FN) or TP/(all


actual positives).

Recall that specificity is TN/(FP+TN) or TN/(all


actual negatives).

Suppose the the ‘threshold’ is 1.01. Then we will classify


anything with an abnormal probability value >= 1.01 as
23
abnormal.
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)
Question: Even though we are not showing the
actual target values in this example, what would
the sensitivity be in the case that the threshold is
1.01? What would the specificity be?

Recall that sensitivity is TP/(TP+FN) or TP/(all


actual positives). Recall that specificity is
TN/(FP+TN) or TN/(all actual negatives).
Sensitivity = 0%
Specificity = 100%

Suppose the the ‘threshold’ is 1.01. Then we will classify


anything with an abnormal probability value >= 1.01 as
24
abnormal.
A Receiver Operating Characteristic (ROC) curve
plots sensitivity versus 1-specificity (across the
various threshold values)

from biomechanics example from image-segmentation example

25
A Receiver Operating Characteristic (ROC) curve
plots sensitivity versus 1-specificity (across the
various threshold values)
Threshold = ? Threshold = ?

from biomechanics example from image-segmentation example

26
A Receiver Operating Characteristic (ROC) curve
plots sensitivity versus 1-specificity (across the
various threshold values)
Threshold = 0 Threshold = 0

from biomechanics example from image-segmentation example

27
AUC: Area-Under-the-(ROC)-Curve

AUC = 0.96 AUC = 0.99

from biomechanics example from image-segmentation example

auc = metrics.roc_auc_score(actual_abnormal,
predictions_abnormal_prob)

28
29
30
Threshold

In sklearn, a low threshold means moving the PredictionBar all the way to
the right of the confusion matrix: we make everything positive. This occurs
in the top-right corner of the ROC graph.

A high threshold means we make many things negative—our PredictionBar


is slammed all the way left in the confusion matrix. This is what happens in
the bottom-left corner of the ROC graph.

31
Threshold versus Confusion Matrix
Confusion

Moving the threshold to the right


(on a traditional number line)….

32
Threshold versus Confusion Matrix
Confusion
Predicted
P N

P TP FN

Actual
Moving the threshold to the right
(on a traditional number line)…. N FP TN

… is analogous to sliding the division to


the left on the CM (higher threshold, fewer
TPs)
33
Threshold versus Confusion Matrix
Confusion
Predicted
P N

P TP FN

Actual
Moving the threshold to the right increases
(on a traditional number line)….
N FP TN

from image-
… is analogous to sliding the division to
… and moving to the LEFT on segmentation
the LEFT on the CM (higher threshold,
the ROC curve (higher threshold, example
fewer TPs)
fewer TPs)
Threshold > 1 34
Vice-versa
Predicted
P N

P TP FN

Actual
Moving the threshold to the left decreases
(on a traditional number line)…. Threshold = 0
N FP TN

from image-
… is analogous to sliding the division to
… and moving to the RIGHT on segmentation
the RIGHT on the CM (lower threshold,
the ROC curve (lower threshold, example
more TPs)
more TPs)
35

You might also like