Lecture 15: Evaluating
Classifiers (Part II)
ENGR:3110
Introduction to AI and Machine Learning in Engineering
1
Today’s topics
• Brief review of confusion matrices and sensitivity/specificity
• Introduction to ROC curves and the AUC metric
• Lec15 assignment (due Monday by 11:59 p.m.)
• Before class starts:
• Start JupyterLab, download/open lec15_EvaluatingClassifiersPartII.ipynb, and (if not
downloaded already) download biomechanics_train.csv, DRIVE_fundus.png, and
DRIVE_target.png. Also NEWLY download DRIVE_prob_prediction_map.png.
• Submit your completed Jupyter notebook file
(lec15_EvaluatingClassifiersPartI.ipynb) to ICON (assignment lec15)
2
Brief review of confusion
matrices and
sensitivity/specificity
3
What does it mean to say that a rapid flu
test has a sensitivity of 50-70%?
A. The test is expected to report a ‘positive’ result (i.e., say that someone
has the flu) on 50-70% of patients who actually have the flu.
B. Of all patients who test ‘positive’ on the test (i.e., of the patients for
which the test reports that they have the flu), 50-70% of them are
expected to actually have the flu.
C. The test is expected to report a ‘negative’ result (i.e., say that
someone does NOT have the flu) on 50-70% of patients who actually
do NOT have the flu.
D. Of all patients who test ‘negative’ on the test (i.e., of the patients for
which the test reports that they do NOT have the flu), 50-70% of them
are expected to actually NOT have the flu.
Also see [Link]
4
What does it mean to say that a rapid flu
test has a sensitivity of 50-70%?
A. The test is expected to report a ‘positive’ result (i.e., say that someone
has the flu) on 50-70% of patients who actually have the flu.
B. Of all patients who test ‘positive’ on the test (i.e., of the patients for
which the test reports that they have the flu), 50-70% of them are
expected to actually have the flu.
C. The test is expected to report a ‘negative’ result (i.e., say that
someone does NOT have the flu) on 50-70% of patients who actually
do NOT have the flu.
D. Of all patients who test ‘negative’ on the test (i.e., of the patients for
which the test reports that they do NOT have the flu), 50-70% of them
are expected to actually NOT have the flu.
Also see [Link]
5
Review of sensitivity and specificity (from
the confusion matrix entries)
A. The test is expected to report a ‘positive’ result (i.e.,
say that someone has the flu) on 50-70% of patients
who actually have the flu.
(TP + TN) / (TP + FN + FP + TN)
accuracy
Also see lec13 notebook for a review of computing
the confusion matrix, and sensitivities/specificities 6
Review of sensitivity and specificity (from
the confusion matrix entries)
B. Of all patients who test ‘positive’ on the test (i.e., of the
patients for which the test reports that they have the flu),
50-70% of them are expected to actually have the flu.
(TP + TN) / (TP + FN + FP + TN)
accuracy
Also see lec13 notebook for a review of computing
the confusion matrix, and sensitivities/specificities 7
Review of sensitivity and specificity (from
the confusion matrix entries)
C. The test is expected to report a ‘negative’ result (i.e.,
say that someone does NOT have the flu) on 50-70% of
patients who actually do NOT have the flu.
(TP + TN) / (TP + FN + FP + TN)
accuracy
Also see lec13 notebook for a review of computing
the confusion matrix, and sensitivities/specificities 8
Review of sensitivity and specificity (from
the confusion matrix entries)
D Of all patients who test ‘negative’ on the test (i.e., of the
patients for which the test reports that they do NOT have
the flu), 50-70% of them are expected to actually NOT
have the flu.
(TP + TN) / (TP + FN + FP + TN)
accuracy
Also see lec13 notebook for a review of computing
the confusion matrix, and sensitivities/specificities 9
What does it mean to say that a rapid flu
test has a specificity of 90-95%?
A. The test is expected to report a ‘positive’ result (i.e., say that someone
has the flu) on 90-95% of patients who actually have the flu.
B. Of all patients who test ‘positive’ on the test (i.e., of the patients for
which the test reports that they have the flu), 90-95% of them are
expected to actually have the flu.
C. The test is expected to report a ‘negative’ result (i.e., say that
someone does NOT have the flu) on 90-95% of patients who actually
do NOT have the flu.
D. Of all patients who test ‘negative’ on the test (i.e., of the patients for
which the test reports that they do NOT have the flu), 90-95% of them
are expected to actually NOT have the flu.
Also see [Link]
10
What does it mean to say that a rapid flu
test has a specificity of 90-95%?
A. The test is expected to report a ‘positive’ result (i.e., say that someone
has the flu) on 90-95% of patients who actually have the flu.
B. Of all patients who test ‘positive’ on the test (i.e., of the patients for
which the test reports that they have the flu), 90-95% of them are
expected to actually have the flu.
C. The test is expected to report a ‘negative’ result (i.e., say that
someone does NOT have the flu) on 90-95% of patients who actually
do NOT have the flu.
D. Of all patients who test ‘negative’ on the test (i.e., of the patients for
which the test reports that they do NOT have the flu), 90-95% of them
are expected to actually NOT have the flu.
Also see [Link]
11
Review of sensitivity and specificity (from
the confusion matrix entries)
sensitivity → true positive rate (with respect
to ACTUAL positives): TP/(TP+FN)
specificity → true negative rate (with respect
to ACTUAL negatives): TN/(FP+TN)
accuracy → number correct:
(TP + TN) / (TP+FN+FP+TN)
(TP + TN) / (TP + FN + FP + TN)
accuracy
Also see lec13 notebook for a review of computing
the confusion matrix, and sensitivities/specificities 12
Introduction to ROC curves
and the AUC metric
13
Classifiers often have an intermediate
numerical result before determining the final
classification
Example: In k-NN classification, in the process of
determining the ‘majority class’ of the closest k
neighbors, one can also keep track of the
proportions of each class.
• Recall the iris classification problem. In the
example on the right (using two features), with
k=3, 2 of the closest points to the red star are
versicolor examples and the remaining point is
virginica. Thus, we could assign the ‘probability’ of
versicolor to be 0.67 and the ‘probability’ of
virginica to be 0.33.
• In sklearn, after training, you can obtain these
‘probability-like’ values using predict_proba rather
than predict (see lec13 notebook for
biomechanics example). 14
petal_length = 5.00, petal_width = 1.65
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)
Biomechanics data…
Suppose the the ‘threshold’ is 0.5. Then we will classify
anything with an abnormal probability value >= 0.5 as
abnormal (this is also what [Link] would do).
15
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)
Suppose that the ‘threshold’ is 0.3. Then we will classify
anything with an abnormal probability value >= 0.3 as
abnormal.
Note -- I colored the "Abnormal" rows yellow, but left the
text label "Normal" -- so you can see the progression
16
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)
Suppose the the ‘threshold’ is 0.0. Then we will classify
anything with an abnormal probability value >= 0.0 as
abnormal.
17
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)
Suppose the the ‘threshold’ is 0.0. Then we will classify
anything with an abnormal probability value >= 0.0 as
abnormal.
Question: Even though we are not showing the
actual target values in this example, what would
the sensitivity be in the case that the threshold is
0.0? What would the specificity be?
Recall that sensitivity is TP/(TP+FN) or TP/(all
actual positives).
Recall that specificity is TN/(FP+TN) or TN/(all
actual negatives).
18
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)
Suppose the the ‘threshold’ is 0.0. Then we will classify
anything with an abnormal probability value >= 0.0 as
abnormal.
Question: Even though we are not showing the
actual target values in this example, what would
the sensitivity be in the case that the threshold is
0.0? What would the specificity be?
Recall that sensitivity is TP/(TP+FN) or TP/(all
actual positives).
Recall that specificity is TN/(FP+TN) or TN/(all
actual negatives).
Sensitivity = 100%
Specificity = 0%
19
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)
Suppose the the ‘threshold’ is 0.5. Then we will classify
anything with an abnormal probability value >= 0.5 as
abnormal (this is also what [Link] would do).
20
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)
Suppose the ‘threshold’ is 0.7. Then we will classify
anything with an abnormal probability value >= 0.7 as
abnormal.
21
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)
Suppose the the ‘threshold’ is 1.01. Then we will classify
anything with an abnormal probability value >= 1.01 as
22
abnormal.
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)
Question: Even though we are not showing the
actual target values in this example, what would
the sensitivity be in the case that the threshold is
1.01? What would the specificity be?
Recall that sensitivity is TP/(TP+FN) or TP/(all
actual positives).
Recall that specificity is TN/(FP+TN) or TN/(all
actual negatives).
Suppose the the ‘threshold’ is 1.01. Then we will classify
anything with an abnormal probability value >= 1.01 as
23
abnormal.
We can vary the ‘threshold’ of these numerical results to
obtain different classification results (also see lec13
notebook)
Question: Even though we are not showing the
actual target values in this example, what would
the sensitivity be in the case that the threshold is
1.01? What would the specificity be?
Recall that sensitivity is TP/(TP+FN) or TP/(all
actual positives). Recall that specificity is
TN/(FP+TN) or TN/(all actual negatives).
Sensitivity = 0%
Specificity = 100%
Suppose the the ‘threshold’ is 1.01. Then we will classify
anything with an abnormal probability value >= 1.01 as
24
abnormal.
A Receiver Operating Characteristic (ROC) curve
plots sensitivity versus 1-specificity (across the
various threshold values)
from biomechanics example from image-segmentation example
25
A Receiver Operating Characteristic (ROC) curve
plots sensitivity versus 1-specificity (across the
various threshold values)
Threshold = ? Threshold = ?
from biomechanics example from image-segmentation example
26
A Receiver Operating Characteristic (ROC) curve
plots sensitivity versus 1-specificity (across the
various threshold values)
Threshold = 0 Threshold = 0
from biomechanics example from image-segmentation example
27
AUC: Area-Under-the-(ROC)-Curve
AUC = 0.96 AUC = 0.99
from biomechanics example from image-segmentation example
auc = metrics.roc_auc_score(actual_abnormal,
predictions_abnormal_prob)
28
29
30
Threshold
In sklearn, a low threshold means moving the PredictionBar all the way to
the right of the confusion matrix: we make everything positive. This occurs
in the top-right corner of the ROC graph.
A high threshold means we make many things negative—our PredictionBar
is slammed all the way left in the confusion matrix. This is what happens in
the bottom-left corner of the ROC graph.
31
Threshold versus Confusion Matrix
Confusion
Moving the threshold to the right
(on a traditional number line)….
32
Threshold versus Confusion Matrix
Confusion
Predicted
P N
P TP FN
Actual
Moving the threshold to the right
(on a traditional number line)…. N FP TN
… is analogous to sliding the division to
the left on the CM (higher threshold, fewer
TPs)
33
Threshold versus Confusion Matrix
Confusion
Predicted
P N
P TP FN
Actual
Moving the threshold to the right increases
(on a traditional number line)….
N FP TN
from image-
… is analogous to sliding the division to
… and moving to the LEFT on segmentation
the LEFT on the CM (higher threshold,
the ROC curve (higher threshold, example
fewer TPs)
fewer TPs)
Threshold > 1 34
Vice-versa
Predicted
P N
P TP FN
Actual
Moving the threshold to the left decreases
(on a traditional number line)…. Threshold = 0
N FP TN
from image-
… is analogous to sliding the division to
… and moving to the RIGHT on segmentation
the RIGHT on the CM (lower threshold,
the ROC curve (lower threshold, example
more TPs)
more TPs)
35