Evaluation Metrics: Formulas
• Understanding key evaluation metrics:
ROUGE, BLEU, F1 Score, Precision, and Recall.
ROUGE (Recall-Oriented Understudy for
Gisting Evaluation)
• Formula: ROUGE-N = (∑ Overlapping n-
grams) / (∑ Reference n-grams)
• Measures recall-based similarity for text
summarization.
BLEU (Bilingual Evaluation Understudy)
• Formula: BLEU = BP * exp(∑ log(Pn) / N)
• Where BP is Brevity Penalty, and Pn is n-gram
precision.
• Measures n-gram overlap between generated
and reference text.
Precision
• Formula: Precision = TP / (TP + FP)
• Where TP = True Positives, FP = False Positives.
• Measures how many retrieved results are
relevant.
Precision
• Formula: Precision = TP / (TP + FP)
• Where TP = True Positives, FP = False Positives.
• Measures how many retrieved results are
relevant.
Recall
• Formula: Recall = TP / (TP + FN)
• Where TP = True Positives, FN = False
Negatives.
• Measures how many relevant results were
retrieved.
F1 Score
• Formula: F1 = 2 * (Precision * Recall) /
(Precision + Recall)
• Balances precision and recall for overall
performance evaluation.
Conclusion
• Using a combination of these metrics ensures
reliable AI model evaluation.
Q&A
• Open floor for discussion and questions.