Performance Metrics in Machine Learning
Evaluating the performance of a Machine learning model is one of the important steps while
building an effective ML model. To evaluate the performance or quality of the model, different
metrics are used, and these metrics are known as performance metrics or evaluation metrics.
These performance metrics help us understand how well our model has performed for the
given data. In this way, we can improve the model's performance by tuning the hyper-
parameters. Each ML model aims to generalize well on unseen/new data, and performance
metrics help determine how well the model generalizes on the new dataset.
1. Performance Metrics for Classification
In a classification problem, the category or classes of data is identified based on training data.
The model learns from the given dataset and then classifies the new data into classes or groups
based on the training. It predicts class labels as the output, such as Yes or No, 0 or 1, Spam or
Not Spam, etc. To evaluate the performance of a classification model, different metrics are
used, and some of them are as follows:
Accuracy
Confusion Matrix
Precision
Recall
F-Score
AUC(Area Under the Curve)-ROC
I. Accuracy
The accuracy metric is one of the simplest Classification metrics to implement, and it can be
determined as the number of correct predictions to the total number of predictions.
It can be formulated as:
When to Use Accuracy?
It is good to use the Accuracy metric when the target variable classes in data are approximately
balanced. For example, if 60% of classes in a fruit image dataset are of Apple, 40% are Mango.
In this case, if the model is asked to predict whether the image is of Apple or Mango, it will give
a prediction with 97% of accuracy.
When not to use Accuracy?
It is recommended not to use the Accuracy measure when the target variable majorly belongs
to one class. For example, Suppose there is a model for a disease prediction in which, out of
100 people, only five people have a disease, and 95 people don't have one. In this case, if our
model predicts every person with no disease (which means a bad prediction), the Accuracy
measure will be 95%, which is not correct.
II. Confusion Matrix
A confusion matrix is a tabular representation of prediction outcomes of any binary classifier,
which is used to describe the performance of the classification model on a set of test data when
true values are known.
The confusion matrix is simple to implement, but the terminologies used in this matrix might be
confusing for beginners.
A typical confusion matrix for a binary classifier looks like the below image(However, it can be
extended to use for classifiers with more than two classes).
We can determine the following from the above matrix:
In the matrix, columns are for the prediction values, and rows specify the
Actual values. Here Actual and prediction give two possible classes, Yes or
No. So, if we are predicting the presence of a disease in a patient, the
Prediction column with Yes means, Patient has the disease, and for NO, the
Patient doesn't have the disease.
In this example, the total number of predictions are 165, out of which 110
time predicted yes, whereas 55 times predicted No.
However, in reality, 60 cases in which patients don't have the disease,
whereas 105 cases in which patients have the disease.
True Positive(TP) signifies how many positive class samples your model
predicted correctly.
True Negative(TN) signifies how many negative class samples your model
predicted correctly.
False Positive(FP) signifies how many negative class samples your model
predicted incorrectly. This factor represents Type-I error in statistical
nomenclature. This error positioning in the confusion matrix depends on the
choice of the null hypothesis.
False Negative(FN) signifies how many positive class samples your model
predicted incorrectly.
III. Precision
The precision metric is used to overcome the limitation of Accuracy. The
precision determines the proportion of positive prediction that was actually
correct. It can be calculated as the True Positive or predictions that are
actually true to the total positive predictions (True Positive and False
Positive).
IV. Recall or Sensitivity
It is also similar to the Precision metric; however, it aims to calculate the
proportion of actual positive that was identified incorrectly. It can be
calculated as True Positive or predictions that are actually true to the total
number of positives, either correctly predicted as positive or incorrectly
predicted as negative (true Positive and false negative).
The formula for calculating Recall is given below:
V. F-Scores
F-score or F1 Score is a metric to evaluate a binary classification model on
the basis of predictions that are made for the positive class. It is calculated
with the help of Precision and Recall. It is a type of single score that
represents both Precision and Recall. So, the F1 Score can be calculated
as the harmonic mean of both precision and Recall, assigning equal
weight to each of them.
The formula for calculating the F1 score is given below:
VI. AUC-ROC
Sometimes we need to visualize the performance of the classification model
on charts; then, we can use the AUC-ROC curve. It is one of the popular and
important metrics for evaluating the performance of the classification model.
Firstly, let's understand ROC (Receiver Operating Characteristic curve)
curve. ROC represents a graph to show the performance of a
classification model at different threshold levels. The curve is plotted
between two parameters, which are:
o True Positive Rate
o False Positive Rate
TPR or true Positive rate is a synonym for Recall, hence can be calculated as:
FPR or False Positive Rate can be calculated as:
2. Performance Metrics for Regression
Regression is a supervised learning technique that aims to find the
relationships between the dependent and independent variables. A
predictive regression model predicts a numeric or discrete value. The metrics
used for regression are different from the classification metrics.
Mean Absolute Error
Mean Squared Error
R2 Score
I. Mean Absolute Error (MAE)
Mean Absolute Error or MAE is one of the simplest metrics, which measures
the absolute difference between actual and predicted values, where absolute
means taking a number as Positive.
To understand MAE, let's take an example of Linear Regression, where the
model draws a best fit line between dependent and independent variables.
To measure the MAE or error in prediction, we need to calculate the
difference between actual values and predicted values.
The below formula is used to calculate MAE:
Here,
Y is the Actual outcome, Y' is the predicted outcome, and N is the total number of data points.
I. Mean Squared Error
Mean Squared error or MSE is one of the most suitable metrics for Regression evaluation. It
measures the average of the Squared difference between predicted values and the actual value
given by the model.
Since in MSE, errors are squared, therefore it only assumes non-negative values, and it is
usually positive and non-zero.
The formula for calculating MSE is given below:
Here,
Y is the Actual outcome, Y' is the predicted outcome, and N is the total number of data points.
III. R Squared Score
R squared error is also known as Coefficient of Determination, which is another popular metric
used for Regression model evaluation.
The R squared score will always be less than or equal to 1 without
concerning if the values are too large or small.