0% found this document useful (0 votes)
98 views3 pages

CH 3 Evalauting Models

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views3 pages

CH 3 Evalauting Models

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

EVALUATION Model evaluation is the process of using different evaluation metrics to understand a machine learning model's

performance.

An AI model gets better with constructive feedback. You build a model, get feedback from metrics, make improvements, and
repeat the cycle until you achieve a desirable accuracy.

NEED FOR MODEL EVALUATION

● Model evaluation is like giving your AI model a report card, which helps you understand its strengths, weaknesses, and
suitability for the task. ●This feedback loop is essential for building trustworthy and reliable AI systems.

WHAT IS TRAIN-TEST SPLIT?

▪The train-test split is a technique for evaluating the performance of a machine learning algorithm
▪ It can be used for any supervised learning algorithm
▪ The procedure involves taking a dataset and dividing it into two subsets: The training dataset and the testing dataset
▪ The train-test procedure is appropriate when there is a sufficiently large dataset available

WHY DO WE NEED TO DO TRAIN-TEST SPLIT?


▪The train dataset is used to make the model learn ▪ The input elements of the test dataset are provided to the trained model.
The model makes predictions, and the predicted values are compared to the expected values
▪ The objective is to estimate the performance of the machine learning model on new data: data not used to train the model

ACCURACY : Accuracy is an evaluation metric that measures the total number of predictions a model gets right.

The accuracy of the model and its performance are directly proportional, so a better-performing model will have more accurate
predictions. The goal is to maximize accuracy.

ERROR : Error is an action that is inaccurate or wrong.

In machine learning, error is used to see how accurately a model can predict new, unseen data.

Error refers to the difference between a model's prediction and the actual outcome, and it quantifies how often the model
makes mistakes. The goal is to minimize error.

WHAT IS CLASSIFICATION? : Classification is a problem where a specific class label is the result to be predicted from a
given input .Example : a vegetable-grocery classifier model that predicts whether an item is a vegetable or a grocery item22.

CLASSIFICATION METRICS (4)

1) Confusion Matrix:
The confusion matrix is a table that is used to visualize the performance of a classification model. It helps you understand where
the model is succeeding and where it is making errors. It is a handy presentation of accuracy of a model.

 True Positive (TP): The model correctly predicted the positive class. For example, a spam filter correctly identifies a
spam email as "spam."
 True Negative (TN): The model correctly predicted the negative class. For example, the spam filter correctly identifies
a legitimate email as "not spam."
 False Positive (FP): The model incorrectly predicted the positive class. This is also known as a Type I Error. For
example, the spam filter incorrectly flags a legitimate email as "spam" (a "false alarm").
 False Negative (FN): The model incorrectly predicted the negative class. This is also known as a Type II Error. For
example, the spam filter incorrectly flags a spam email as "not spam" (a "missed case").
2) Accuracy :
Accuracy is the most intuitive and widely used classification metric. It measures the proportion of correct predictions (both true
positives and true negatives) out of the total number of predictions made by the model.

**Formula : Accuracy = (TP + TN) / (TP + TN + FP + FN)

key limitation: It is suitable only when there are equal number of observations in each class, when dealing with unbalanced
datasets where the number of observations in each class is not equal, other classification metrics like precision, recall, and F1
score are recommended

3) Precision :
It is the ratio of total number of correctly classified positive examples and total number of predicted positive examples.
It focuses on the quality of positive predictions. Generally it is used for unbalanced datasets.

**Formula: Precision = TP / (TP + FP)

Example: In a medical test for a rare disease, high precision is crucial. A high precision means that when the test says a patient
has the disease, they are very likely to actually have it, avoiding unnecessary stress and treatment for healthy people.

4) Recall :
It is the measure of our model correctly identifying True Positive. It focuses on the model's ability to find all positive instances.
Generally it is used for unbalanced datasets.

**Formula: Recall = TP / (TP + FN)

Example: In a system for detecting fraud, high recall is critical. A high recall means the system is able to catch most of the
fraudulent transactions, even if it flags a few legitimate ones by mistake.

◉ F1 Score :
The F1 Score is the harmonic mean of Precision and Recall. It provides a single score that balances both metrics.

“F1 is a better selection in evaluation metrics” : In cases where the data is unbalanced, we are unable to decide whether FP is
more important for FN, we should use F1 score as the suitable metric.

**Formula: F1 Score = 2 * (Precision * Recall) / (Precision + Recall)

Example: An email spam filter needs to be both precise (not flagging legitimate emails as spam) and have high recall (not
letting spam into the inbox). The F1 score helps evaluate if the model is doing a good job on both fronts.

ETHICAL CONCERNS In Model Evaluation :

1) Bias :
Ensure that the evaluation metrics chosen don’t result in any kind of bias

2) Transparency :
Honest explanation how the chosen evaluation metrics work and produce results without keeping any information hidden

3) Accountability :
Take responsibility for your choice of metrics and methodology of evaluation in case any user faces a disadvantage because of
your chosen methodology

You might also like