Multiclass classification
Multiclass classification
• What is multiclass classification?
• Which classifiers do we use in multiclass
classification?
• How and when do we use these classifiers?
• Is multiclass and multi-label classification
similar?
Multiclass classification
• Classification involving more than two classes (i.e., > 2
Classes)
• Each data point can only belong to one class
Is multiclass and multi-label
classification similar?
• There are mainly two types of multi-class
classification techniques:-
–One vs. All (one-vs-rest)
–One vs. One
Binary Classification
• Only two class instances are present in the dataset.
• It requires only one classifier model.
• Confusion Matrix is easy to derive and understand.
• Example:- Check email is spam or not, predicting gender based on height and weight.
Multi-class Classification
• Multiple class labels are present in the dataset.
• The number of classifier models depends on the classification technique we are applying
to.
• One vs. All:- N-class instances then N binary classifier models
• One vs. One:- N-class instances then N* (N-1)/2 binary classifier models
• The Confusion matrix is easy to derive but complex to understand.
• Example:- Check whether the fruit is apple, banana, or orange.
Method 1. OvR — One vs Rest
• OvR stands for “One vs Rest”, and as the name suggests is one method to
evaluate multiclass models by comparing each class against all the
others at the same time.
• In this scenario we take one class and consider it as our “positive” class,
while all the others (the rest) are considered as the “negative” class.
• By doing this, we reduce the multiclass classification output into a
binary classification one, and so it is possible to use all the known
binary classification metrics to evaluate this scenario.
• We must repeat this for each class present on the data, so for a 3-class
dataset we get 3 different OvR scores. In the end, we can average them
(simple or weighted average) to have a final OvR model score.
Method 1. One-vs.-Rest(OvR) or
One-vs-All or OvA)
• In one-vs-All classification, for the N-class instances dataset, we have to
generate the N-binary classifier models.
• The number of class labels present in the dataset and the number of
generated binary classifiers must be the same.
we have three classes,
for example,
type 1 for Green,
type 2 for Blue, and
type 3 for Red.
Method 1. One-vs.-Rest(OvR) or
One-vs-All or OvA)
• Learn a classifier one at a time
• One-vs-rest
• Given m classes, train m classifiers: one for each class
• Classifier i: treat tuples in class i as positive& all others as
negative
• To classify a tuple X, choose the classifier with maximum
value
• Uses “winner-takes-all” strategy.
• Generate the same number of classifiers as
the class labels are present in the dataset, So
we have to create three classifiers here for
three respective classes.
– Classifier 1:- [Green] vs [Red, Blue]
– Classifier 2:- [Blue] vs [Green, Red]
– Classifier 3:- [Red] vs [Blue, Green]
• Now to train these three classifiers, we need
to create three training datasets. So let’s
consider our primary dataset is as follows,
• You can see that there are three class labels Green, Blue, and Red present in the
dataset. Now we have to create a training dataset for each class.
• Here, we created the training datasets by putting +1 in the class column for that
feature value, which is aligned to that particular class only. For the costs of the
remaining features, we put -1 in the class column.
Training dataset for Green class
Consider the primary dataset, in the first row; we have x1, x2, x3 feature values, and the corresponding
class value is G, which means these feature values belong to G class. So we put +1 value in the class
column for the correspondence of green type. Then we applied the same for the x10, x11, x12 input train
data.
For the rest of the values of the features which are not in correspondence with the Green class, we put -1
in their class column.
• Training dataset for Blue class and Red class
create a training dataset for each classifier, we provide it to our
classifier model and train the model by applying an algorithm.
By analyzing the probability scores, we
predict the result as the class index having a
maximum probability score.
Example:
• consider three test features value as y1,y2,and y3 respectively.
• we pass the test data to the classifier models
• Let's say we got the outcome as,
• Green class classifier -> Positive with a probability score of [0.9]
• Blue class classifier -> Positive with a probability score of [0.4]
• Red class classifier -> Negative with a probability score of [0.5]
• Hence, based on the positive responses and decisive probability
score, we can say that out test input belongs to the Green class
• The benefit of the OVA scheme is that we only have
to train ‘m ‘ classifiers.
• However, we have to deal with highly unbalanced
training data for each binary classifier.
Balanced dataset
Imbalanced dataset
Classification of Class-Imbalanced
Data Sets
• Class-imbalance problem-
• Rare positive examples but numerous negative ones, e.g.,
medical diagnosis, fraud, fault identification, etc.
• Solutions:
– Oversampling: Re-sampling of data from positive
class
– Under-sampling: randomly eliminate tuples from
negative class
– Synthesizing new data points for minority class
1-2 Illustration of Oversampling and
Undersampling
• Oversampling
randomly replicates
minority instances to
increase their
population.
• Undersampling
randomly
downsamples the
majority class.
3. Synthesizing new examples
Method 2. One-vs.-One(OvO)
• OvO stands for “One vs One” and is really similar to OvR,
but instead of comparing each class with the rest, we
compare all possible two-class combinations of the
dataset.
• Let’s say we have a 3-class scenario and we choose the
combination “Class1 vs Class2” as the first one. The first
step is to get a copy of the dataset that only contains the
two classes and discard all the others.
Method 2. One-vs.-One(OvO)
• Learn a classifier for each pair of classes
• Given m classes, construct m(m-1)/2 binary classifiers
• A classifier is trained using tuples of the two classes
• To classify a tuple x, each classifier votes. x is assigned to
the class with maximal vote
• one classifier to distinguish each pair of classes i and j. Let f ij
be the classifier where class i were positive examples and
class j were negative.
• In One-vs-One classification, for the N-class instances dataset, we have to generate
the N* (N-1)/2 binary classifier models. Using this classification approach, we
split the primary dataset into one dataset for each class
• we have a classification problem having three types: Green, Blue, and Red (N=3).
• We divide this problem into N* (N-1)/2 = 3 binary classifier problems:
• Classifier 1: Green vs. Blue
• Classifier 2: Green vs. Red
• Classifier 3: Blue vs. Red
• Each binary classifier predicts one class label. When we input the test data to the
classifier, then the model with the majority counts is concluded as a result.
Assessing Multi-class classification
Performance
• If we have k classes, performance of a classifier can be
assessed using a k-by-k contingency table.
• Classifier’s accuracy: sum of the descending diagonal of the
contingency table, divided by the number of test instances.
• Ex: For the given confusion matrix, calculate
the accuracy, precision and recall
• Accuracy : (15+15+45)/100 = 0.75
• Per class precision : for the first class: 15/24 = 0.63, for the
second class 15/20 = 0.75 , for the third class 45/56 = 0.80
• Per class recall: for first class :15/20 = 0.75, 15/30 = 0.50 (for
second class) and 45/50 = 0.90 (third class)
• Average these numbers to obtain single precision and recall
numbers for the whole classifier
• Or take a weighted average taking the proportion of each
class into account.
• For instance, the weighted average precision is 0.20
·0.63+0.30·0.75+0.50·0.80 = 0.75.
Problem Solving
Performance of multi-class classifiers
True Positive
False Positive of A
True Negative of A
True Negative of D
False positive of A
False positive of B
False Negative of A
Accuracy
Precision-Recall Curves
• Precision-recall is a useful measure of success
for prediction when the classes are
imbalanced.
• Precision is a measure of the ability of a
classification model to identify only the
relevant data points, while recall is a measure
of the ability of a model to find all the relevant
cases within a data set.
• The precision-recall curve shows the trade-off
between precision and recall for different
thresholds.
• A high area under the curve represents both
high recall and high precision, where high
precision relates to a low false positive rate,
and high recall relates to a low false negative
rate.
F1 score is good metric when data is imbalanced
• F1 score is harmonic mean of recall and precision