What is evaluation?
Model evaluation is the process of using different evaluation metrics to understand a
machine learning model’s performance. An AI model gets better with constructive
feedback; you build a model, get feedback from metrics, make improvements and
continue until you achieve a desirable accuracy.
What is Evaluating Models?
The evaluation process uses different evaluation metrics to understand a machine
learning model’s performance, strengths, and weaknesses. Evaluation is the process of
understanding the reliability of any AI model, based on outputs by feeding test dataset
into the model and comparing with actual answers. There can be different Evaluation
techniques, depending on the type and purpose of the model.
Why we need evaluation model?
Evaluation models are methods for evaluating and choosing the best model during the
modeling process. The model evaluation is like giving your AI model a report card. It
helps you understand its strengths, weaknesses, and suitability for the task at hand.
This feedback loop is essential for building trustworthy and reliable AI systems.
Splitting the training set data for Evaluation
The train-test split is a technique for evaluating the performance of a machine learning
algorithm. It can be used for any supervised learning algorithm. The evaluation model
divides the dataset into a training set and a testing set.
Why we need of Train-test split?
The train dataset is used to make the model learn, the input elements of the test dataset
are provided to the trained model. The model makes predictions, and the predicted
values are compared to the expected values. The objective is to estimate the
performance of the machine learning model on new data: data not used to train the
model.
Accuracy and Error
In AI model evaluation accuracy and error are key metrics which helps to understand
how well a model performs and identify the areas for improvement. In AI model
evaluation, higher accuracy means a model is better, while lower error indicates less
mistakes.
Accuracy – Accuracy is an evaluation metric that allows you to measure the total
number of predictions a model gets right. The accuracy of the model and performance of
the model is directly proportional, and hence better the performance of the model, the
more accurate are the predictions.
Error – Error can be described as an action that is inaccurate or wrong. In Machine
Learning, the error is used to see how accurately our model can predict data it uses to
learn new, unseen data. Based on our error, we choose the machine learning model
which performs best for a particular dataset.
How to find accuracy of the AI model
To find the accuracy of an AI model, we have to first calculate the percentage of correct
predictions made by the testing dataset. The formula to find the accuracy is—
Error = Actual – Predicted
Error Rate = Error / Actual Price
Accuracy = 1 – Error Rate
Accuracy in percentage = Accuracy * 100
Given values:
Predicted House Price = 391k
Actual House Price = 402k
Step 1: Calculate Absolute Error
Error: 402k−391k = 11k
Step 2: Calculate Error Rate
Error Rate: 11 / 402 = 0.0274
Step 3: Calculate Accuracy
Accuracy: 1 – 0.0274 = 0.9726
Step 4: Convert to Percentage
Accuracy in percentage: 0.973 × 100 = 97.3%
Evaluation metrics for classification
What is Classification?
In artificial intelligence classification is a technique that organizes data into categories.
It’s a type of machine learning that uses algorithms to sort data into predefined classes.
Classification metrics
Classification metrics are used to evaluate the performance of a classification model in
machine learning, or you can say that it is performance measures used to evaluate the
effectiveness of the model.
Different types of classification techniques in AI
Popular metrics used for classification model
Confusion matrix
Classification accuracy
Precision
Recall
F1 Score
1. What is confusion matrix?
The confusion matrix is a handy presentation of the accuracy of a model with two or
more classes. The confusion matrix comparison between the prediction and reality and
can be recorded in what we call the confusion matrix. The confusion matrix allows us to
understand the prediction results.
It consists of four values:
True Positive (TP): Correctly predicted positive cases.
False Negative (FN): Model predicted negative, but it was actually positive.
False Positive (FP): Model predicted positive, but it was actually negative.
True Negative (TN): Correctly predicted negative cases.
Prediction and Reality can be easily mapped together with the help of this confusion
matrix.
2. Classification accuracy
Classification accuracy allows you to count the total number of accurate predictions
made by a model. The accuracy calculation is as follows: How many of the model
predictions were accurate will be determined by accuracy. True Positives and True
Negatives are what accuracy considers.
Here, total observations cover all the possible cases of prediction that can be True Positive
(TP), True Negative (TN), False Positive (FP) and False Negative (FN).
3. Precision
Precision is defined as the percentage of true positive cases versus all the cases where
the prediction is true. That is, it takes into account the True Positives and False
Positives.
3. Recall
It can be described as the percentage of positively detected cases that are positive. The
scenarios where a fire actually existed in reality but was either correctly or incorrectly
recognized by the machine are heavily considered. That is, it takes into account both,
False Negatives (there was a forest fire but the model didn’t predict it) and True
Positives (there was a forest fire in reality and the model anticipated a forest fire).
4. F1 Score
F1 score can be defined as the measure of balance between precision and recall or F1-
Score provides a way to combine both precisions and recall into a single measure that
captures both properties.
Take a look at the formula and think of when can we get a perfect F1 score?
An ideal situation would be when we have a value of 1 (that is 100%) for both Precision
and Recall. In that case, the F1 score would also be an ideal 1 (100%). It is known as
the perfect value for F1 Score. As the values of both Precision and Recall ranges from 0
to 1, the F1 score also ranges from 0 to 1.
Let us explore the variations we can have in the F1 Score:
Let’s see one question
Draw the confusion matrix for the following data
1. the number of true positive = 100
2. the number of true negative 47
3. the number of false positive = 62
4. the number of false negative = 290
5.
Question: An AI model made the following sales prediction for a new mobile phone which they
have recently launched:
Answer:
(i) the total number of wrong predictions made by the model is the sum of false positive
and false negative.
FP+FN=0+100= 100
(ii) Before calculating, we will first see the formulas for precision, recall, and F1 score.
Precision=TP/(TP+FP)
=900/(900+0)
=(900/900)*100
=1.0
Recall=TP/(TP+FN)
=900/(900+100)
=900/1000
=.0.9
F1 Score = 2 * Precision * Recall / ( Precision + Recall )
=2 * 1.0 * 0.9 / (1.0+0.9)
=2 * 0.4737
=0.947
The accuracy of the model is 94.7%.
Question: An AI model made the following sales prediction for a new mobile phone
which they have recently launched:
Answer:
(i) the total number of wrong predictions made by the model is the sum of false positive
and false negative.
FP+FN=40+12= 52
(ii) Before calculating, we will first see the formulas for precision, recall, and F1 score.
Precision=TP/(TP+FP)
=50/(50+40)
=(50/90)*100
=0.55
Recall=TP/(TP+FN)
=50/(50+12)
=50/62
=.81
F1 Score = 2 * Precision * Recall / ( Precision + Recall )
=2 * 0.55 * .81 / (.55+.81)
=.891 / 1.36
=0.65
Which metric is appropriate to evaluate the AI model?
Let’s compare which matrix is important for finding accuracy.
Ethical concerns around model evaluation
Ethical concerns around model evaluation primarily focus on three aspects: bias,
transparency, and accuracy. Nowadays, we are moving from the Information era to the
Artificial Intelligence era. Now we do not use data or information, but the intelligence
collected from the data to build solutions. We need to keep aspects relating to ethical
practices in mind while developing solutions using AI. Let us understand some of the
ethical concerns in detail.
Bias – Bias occurs when a model generates unfair or discriminatory results. This
can happen due to the model favoring certain groups or due to the algorithm. For
example, if the AI application of Amazon is favoring male candidates only, then
the maximum product suggestion will be shown only to male candidates, which
will decrease the profit of the company.
Transparency – The AI decision-making process should be transparent; people
can easily understand and interpret the result. If the lack of transparency issue is
there, then the people will not trust the model. For example, if any person has
applied for a loan and the AI model denies a loan application of any candidate,
then it is the duty of the AI that the applicant should know why the loan
application is rejected.
Accuracy – The AI model should predict the correct result. The accurate model
makes error-free and reliable results. For example, in medicine, an AI model
should diagnose and generate accurate predictions; otherwise, due to wrong
diagnoses, it can lead to a serious illness in the people.