Classification
Classification Algorithm:
● Supervised Learning technique
● Results: Yes or No, 0 or 1,etc.
● In classification algorithm, a discrete output function(y) is mapped to input variable(x):
y=f(x), where y = categorical output
● identify the category of a given dataset
Types of Classifications:
● Binary Classifier
Example: YES or NO, MALE or FEMALE, etc.
● Multi-class Classifier
Example: Classifications of types of Animals
Learners in Classification Problems
● Lazy Learners
Example: K-NN algorithm
● Eager Learners
Example: Decision Trees
Evaluating a Classification model
1. Log Loss or Cross-Entropy Loss:
● It is used for evaluating the performance of a classifier, whose output is a probability value between the 0 and 1.
● For a good binary Classification model, the value of log loss should be near to 0.
● The value of log loss increases if the predicted value deviates from the actual value.
● The lower log loss represents the higher accuracy of the model.
● For Binary classification, cross-entropy can be calculated as:
Binary Cross-Entropy = -y*log(p) + (-(1-y)log(1-p)
Cross-entropy(D) = - y*log(p) when y = 1
Cross-entropy(D) = - (1-y)*log(1-p) when y = 0
Where y= Actual output, p= predicted output.
2. Confusion Matrix:
● The confusion matrix provides us a matrix/table as output and describes the performance of the model.
● It is also known as the error matrix.
● The matrix consists of predictions result in a summarized form, which has a total number of correct
predictions and incorrect predictions.
● For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3 classes, it is 3*3 table, and so
on.
● The matrix is divided into two dimensions, that are predicted values and actual values along with the
total number of predictions.
● Predicted values are those values, which are predicted by the model, and actual values are the true
values for the given observations.
Need for Confusion Matrix in Machine learning
● It evaluates the performance of the classification models, when they make predictions on test data, and
tells how good our classification model is.
● It not only tells the error made by the classifiers but also the type of errors such as it is either type-I or
type-II error.
● With the help of the confusion matrix, we can calculate the different parameters for the model, such as
accuracy, precision, etc.
Classification Accuracy:
Misclassification rate:
Precision:
Recall:
Types of ML Classification Algorithms
● Linear Models
○ Logistic Regression
○ Support Vector Machines
● Non-linear Models
○ K-Nearest Neighbours
○ Kernel SVM
○ Decision Tree Classification
Use cases of Classification:
● Email Spam Detection
● Speech Recognition
● Identifications of Cancer tumor cells.
● Drugs Classification
● Biometric Identification, etc.