Multiclass Classification
The City College of New York
CSc 59929 – Introduction to Machine Learning
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Definitions
• Binary Classification is the problem of classifying
instances into one of two classes.
• Multiclass or Multinomial Classification is the problem of
classifying instances into one of more than two classes.
Multiclass classification should not be confused with multi-
label classification, where multiple labels are to be predicted
for each instance.
Ref.: Wikipedia
The City College of New York
CSc 59929 – Introduction to Machine Learning 2
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Binary Classifiers
While some classification algorithms naturally permit the use
of more than two classes, others are by nature binary
algorithms; these can, however, be turned into multinomial
classifiers by a variety of strategies.
Ref.: Wikipedia
The City College of New York
CSc 59929 – Introduction to Machine Learning 3
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Using binary classifiers
for multiclass classification
• One-vs.-Rest (OvR)
• One-vs.-All (OvA)
• One-against-All (OaA)
• One-vs.-One (OvO)
Ref.: Wikipedia
The City College of New York
CSc 59929 – Introduction to Machine Learning 4
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Comparison of OvR and OvO
The City College of New York
CSc 59929 – Introduction to Machine Learning 5
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Training OvR
fA fB fC fD
A A Not B Not C Not D
B Not A B Not C Not D
C Not A Not B C Not D
D Not A Not B Not C D
Ref.: Wikipedia
The City College of New York
CSc 59929 – Introduction to Machine Learning 6
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Predicting with OvR
A
fA ( x)
B
fB ( x) Choose the prediction that
has the highest cofidence.
C
fC ( x )
D fD ( x)
Ref.: Wikipedia
The City College of New York
CSc 59929 – Introduction to Machine Learning 7
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Training OvO f AD
f AB f AC
A A A A
B B C D
C f BC f BD fCD
B B C
D
C D D
The City College of New York
CSc 59929 – Introduction to Machine Learning 8
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Predicting with OvO
A
f AB ( x )
f AC ( x )
B f AD ( x ) Choose the result that is
predicted most often.
f BC ( x )
C
f BD ( x )
D fCD ( x )
Ref.: Wikipedia
The City College of New York
CSc 59929 – Introduction to Machine Learning 9
Spring 2020 – Erik K. Grimmelmann, Ph.D.
One-vs.-Rest (OvR)
Inputs:
L, a learner (a training algorithm for binary classifiers)
X , a sample
y, labels where yi ∈ {1,..., K }
Output:
{ f k } , a set of classifiers, for k ∈{1,..., K }
Procedure:
For each k in {1,..., K }, construct a new label vector, z ,
where
= zi 1= if yi k and = zi 0 otherwise.
Apply L to ( X , z ) to obtain f k
Making a decision consists of applying all classifiers to a sample and
predicting the label i for which the corresponding classifier reports
the highest confidence score, yˆ = argmax f i ( x) Ref.: Wikipedia
i∈{1…K }
The City College of New York
CSc 59929 – Introduction to Machine Learning 10
Spring 2020 – Erik K. Grimmelmann, Ph.D.
One-vs.-One (OvO)
Inputs:
L, a learner (a training algorithm for binary classifiers)
X , a sample
y, labels where yi ∈ {1,..., K }
Output:
{ f } , a set of classifiers, for j,k ∈{1,..., K } and j < k.
jk
Procedure:
For each j ,k in {1,..., K } with j < k , form subsets X jk of X
such that xi ∈ X jk if yi =j or yi =k .
Apply L to each ( X jk ) to obtain f jk
Making a decision consists of applying all classifiers to a sample and
predicting the label i for which the most classifiers report a
result of i. Ref.: Wikipedia
The City College of New York
CSc 59929 – Introduction to Machine Learning 11
Spring 2020 – Erik K. Grimmelmann, Ph.D.