A classification report is a crucial tool for evaluating the performance of a classification model.
It provides a summary of key metrics, helping you understand how well your model is
classifying different classes. Here's a breakdown:
What it Shows:
The report typically includes the following metrics for each class, as well as overall metrics:
Precision:
Precision measures the proportion of predicted positive cases that were actually positive.
It answers the question: "Of all the instances the model labeled as positive, how many were
actually positive?"
Formula: Precision = True Positives / (True Positives + False Positives)
Recall (Sensitivity or True Positive Rate):
Recall measures the proportion of actual positive cases that were correctly identified by the
model.
It answers the question: "Of all the actual positive instances, how many did the model correctly
label as positive?"
Formula: Recall = True Positives / (True Positives + False Negatives)
F1-Score:
The F1-score is the harmonic mean of precision and recall.
It provides a balanced measure of a model's performance, especially when dealing with
imbalanced datasets.
Formula: F1-score = 2 * (Precision * Recall) / (Precision + Recall)
Support:
Support is the number of actual occurrences of the class in the specified dataset.
It indicates the number of samples belonging to each class.
Overall Metrics:
Accuracy:
Accuracy measures the overall proportion of correctly classified instances.
Formula: Accuracy = (True Positives + True Negatives) / (Total Instances)
However, accuracy can be misleading in imbalanced datasets.
Macro Average:
The macro average calculates the average of precision, recall, and F1-score across all classes,
giving equal weight to each class.
Useful when you want to evaluate the overall performance of the model across all classes,
regardless of their support.
Weighted Average:
The weighted average calculates the average of precision, recall, and F1-score across all classes,
weighting each class by its support (the number of true instances for each label).
Useful when dealing with imbalanced datasets, as it accounts for the relative importance of each
class.
When to Use It:
When you need a detailed evaluation of a classification model's performance.
When you want to understand how well the model is classifying each class individually.
When you are dealing with imbalanced datasets, where accuracy alone is not a reliable metric.
When you need to compare different models.