UNIT-I
1 The Ingredients of Machine Learning:
1.1 Introduction to Machine Learning
1.2 Types of Machine Learning Models
1.3 The output of Machine Learning
2 Binary Classification and related tasks:
2.1 Classification
2.2 Calculating accuracy in classification.
1.1 Introduction to Machine Learning
What is Machine Learning?
• Machine learning is an application of artificial intelligence that involves algorithms and data that
automatically analyse and make decision by itself without human intervention.
• It describes how computer perform tasks on their own by previous experiences.
• Therefore we can say in machine language artificial intelligence is generated on the basis of
experience.
How Machine Learning Works
Consider a system with input data that contains photos of various kinds of fruits. You want the system to
group the data according to the different types of fruits.
First, the system will analyze the input data. Next, it tries to find patterns, like shapes, size, and color.
Based on these patterns, the system will try to predict the different types of fruit and segregate them.
Finally, it keeps track of all the decisions it made during the process to ensure it is learning. The next
time you ask the same system to predict and segregate the different types of fruits, it won't have to go
through the entire process again. That’s how machine learning works
1.2 Types of Machine Learning
Supervised machine learning: You supervise the machine while training it to
work on its own. This requires labeled training data
Unsupervised learning: There is training data, but it won’t be labeled
Reinforcement learning: The system learns on its own
Supervised Learning :
To understand how supervised learning works, look at the example below, where you
have to train a model or system to recognize an apple.
First, you have to provide a data set that contains pictures of a kind of fruit, e.g.,
apples.
Then, provide another data set that lets the model know that these are pictures of
apples. This completes the training phase.
Next, provide a new set of data that only contains pictures of apples. At this point, the
system can recognize what the fruit it is and will remember it.
That's how supervised learning works. You are training the model to perform a specific
operation on its own. This kind of model is often used in filtering spam mail from your
email accounts.
Here are some different supervised learning algorithms:
Decision trees: Uses labeled data to train and predict on unlabeled data
Random forest:Combines multiple decision trees to create a more accurate prediction
Support vector machines (SVM):A well-known supervised learning algorithm with
many applications and variations
Logistic regression:A classification algorithm that uses a model to determine the
probability of an event
Linear regression:A data scientist trains the algorithm using a set of training data with
correct outputs
Naive Bayes:A supervised learning technique based on Bayes' Theorem
Unsupervised Learning
Consider a cluttered dataset: a collection of pictures of different fruit. You feed this data
to the model, and the model analyzes it to recognize any patterns. In the end, the
machine categorizes the photos into three types, as shown in the image, based on their
similarities. Flipkart uses this model to find and recommend products that are well suited
for you.
Types of Unsupervised Learning:
2.1 Clustering
2.2 Dimensionality Reduction
Reinforcement Learning
You provide a machine with a data set and ask it to identify a particular kind of fruit (in
this case, an apple). The machine tells you that it's mango, but that’s the wrong answer.
As feedback, you tell the system that it's wrong; it's not a mango, it's an apple. The
machine then learns from the feedback and keeps that in mind. The next time you ask
the same question, the system gives you the right answer; it is able to tell you that it’s
an apple. That is a reinforced response.
That's how reinforcement learning works; the system learns from its mistakes and
experiences. This model is used in games like Prince of Persia, Assassin’s Creed, and
FIFA, wherein the level of difficulty increases as you get better with the games.
Output of machine learning:
The output of machine learning is a machine learning model, which is a computer
program that contains data and guidelines for making predictions. The model is created
by a machine learning algorithm that analyzes data to find patterns and make
predictions.
The output of a machine learning model depends on the type of learning:
Supervised learning: The model output is a predicted target value for a given input.
Unsupervised learning: The model output may include cluster assignments or other
learned patterns in the data.
Machine learning models can be used to perform classification and prediction tasks on
various types of data, including documents, images, and numbers. For example, a
financial institution might use a machine learning model to classify transactions as
fraudulent or genuine.
Machine learning models can be very accurate, but they are only as accurate as the
data used to train them. The data should be clean, unbiased, and representative of
different scenarios.
CLASSIFICATION:
Definition of Classification
In machine learning, Classification, as the name suggests, classifies data into different parts/classes/groups. It is
used to predict from which dataset the input data belongs to .
Classification is the process of assigning new input variables (X) to the class they most likely belong to, based on a
classification model, as constructed from previously labeled training data.
Data with labels is used to train a classifier such that it can perform well on data without labels (not yet labeled). This
process of continuous classification, of previously known classes, trains a machine. If the classes are discrete, it can
be difficult to perform classification tasks.
Types of Classification
There are two types of classifications
Binary classification
Multi-class classification
Binary Classification
It is a process or task of classification, in which a given data is being classified into two classes. It’s basically a kind
of prediction about which of two groups the thing belongs to.
Examples include:
Email spam detection (spam or not).
Churn prediction (churn or not).[ a measurement of the percentage of accounts that cancel or choose
not to renew their subscriptions ]
Conversion prediction (buy or not).
Typically, binary classification tasks involve one class that is the normal state and another class that is the abnormal
state.
For example “not spam” is the normal state and “spam” is the abnormal state. Another example is “cancer not
detected” is the normal state of a task that involves a medical test and “cancer detected” is the abnormal state.
The class for the normal state is assigned the class label 0 and the class with the abnormal state is assigned the
class label 1.
Let us suppose, two emails are sent to you, one is sent by an insurance company that keeps sending their ads, and
the other is from your bank regarding your credit card bill. The email service provider will classify the two emails, the
first one will be sent to the spam folder and the second one will be kept in the primary one.
This process is known as binary classification, as there are two discrete classes, one is spam and the other is
primary. So, this is a problem of binary classification.
Binary classification uses some algorithms to do the task, some of the most common
algorithms used by binary classification are
Logistic Regression
k-Nearest Neighbors
Decision Trees
Support Vector Machine
Naive Bayes
Paramet
Binary classification Multi-class classification
ers
There can be any number of
It is a classification of two
No. of classes in it, i.e., classifies the
groups, i.e. classifies objects in
classes object into more than two
at most two classes.
classes.
Algorith The most popular algorithms Popular algorithms that can be
ms used used by the binary used for multi-class
classification are- classification include:
Logistic Regression
k-Nearest Neighbors
k-Nearest Neighbors Decision Trees
Decision Trees
Naive Bayes
Support Vector Machine
Random Forest
Naive Bayes
Gradient Boosting
Examples of binary
classification include- Examples of multi-class
E mail spam detection (spam or classification include:
Example
not). Face classification.
s
Churn prediction (churn or not). Plant species classification.
Conversion prediction (buy or Optical character recognition.
not).
Evaluation of binary classifiers
If the model successfully predicts the patients as positive, this case is called True Positive (TP).
If the model successfully predicts patients as negative, this is called True Negative (TN).
The binary classifier may misdiagnose some patients as well. If a diseased patient is classified as healthy by a
negative test result, this error is called False Negative (FN).
Similarly, If a healthy patient is classified as diseased by a positive test result, this error is called False Positive(FP).
We can evaluate a binary classifier based on the following parameters:
True Positive (TP): The patient is diseased and the model predicts "diseased"
False Positive (FP): The patient is healthy but the model predicts "diseased"
True Negative (TN): The patient is healthy and the model predicts "healthy"
False Negative (FN): The patient is diseased and the model predicts "healthy"
The following is a confusion matrix, which represents the above parameters:
CALCULATING ACCURACY IN CLASSIFICATION
Accuracy is perhaps the best-known Machine Learning model validation method used in classification
problems. One reason for its popularity is its relative simplicity. It is easy to understand and easy to
implement. Accuracy is a good metric to assess model performance for simple cases.
What is Accuracy?
Accuracy is a metric used in classification problems used to tell the percentage of accurate predictions.
We calculate it by dividing the number of correct predictions by the total number of predictions.
In the binary classification case, we can express accuracy in True Positive/False Positive/True Negative
False Negative values.
Where
TP : True Positives
FP : False Positives
TN : True Negatives
FN : False Negatives
What is Precision?
Precision is defined as the ratio of correctly classified positive samples (True Positive) to a total
number of classified positive samples (either correctly or incorrectly).
or
The percentage of correct predictions for the positive class
Precision = True Positive/True Positive + False Positive
Precision = TP/TP+FP (or)
Precision=TP/Predicted(positive)
Hence, precision helps us to visualize the reliability of the machine learning model in
classifying the model as positive.
What is Recall?
The recall is calculated as the ratio between the numbers of Positive samples correctly classified as
Positive to the total number of Positive samples. The recall measures the model's ability to
detect positive samples. The higher the recall, the more positive samples detected.
Or
The percentage of actual positive class samples that were identified by the model
Recall = True Positive/True Positive + False Negative
Recall = TP/TP+FN (or)
Recall=TP/Actual(positive)
EXAMPLE:1
165 Predicted Predicted
Negative Positive
Actual 50 10 60
Negative
TN FP
Actual 5 100 105
Positive
FN TP
55 110
ACCURACY=TN+TP/TOTAL =50+100/165 =0.91
ERROR RATE=1-ACCURACY OR FP+FN/TOTAL =0.09
PRECISION=TP/PREDICTED(POSITIVE) =100/110 =0.91
RECALL=TP/ACTUAL(POSITIVE) =100/105 =0.95 OR RECALL=TPR
TPR=TP/TP+FN = 100/105 =0.95(SENSITIVITY)
FPR=FP/FP+TN OR 1-TNR =10/60 =0.16
FNR=FN/FN+TP OR 1-TPR =0.05
TNR=TN/TN+ FP = 50/60 =0.83(SPECIFICITY)
The F-Measure or F1-score is a way of combining the precision and recall of the model, and it is
defined as the harmonic mean of the model’s precision and recall.
F-MEASURE or F-SCORE =2PR/P+R =1.729/1.86 =0.92
EXAMPLE:2
Difference between Precision and Recall in Machine Learning
Precision Recall
It helps us to measure the ability to classify It helps us to measure how many positive
positive samples in the model. samples were correctly classified by the ML
model.
While calculating the Precision of a model, While calculating the Recall of a model, we
we should consider both Positive as well as only need all positive samples while all
Negative samples that are classified. negative samples will be neglected.
When a model classifies most of the When a model classifies a sample as Positive,
positive samples correctly as well as many but it can only classify a few positive
false-positive samples, then the model is samples, then the model is said to be high
said to be a high recall and low precision accuracy, high precision, and low recall
model. model.
The precision of a machine learning model Recall of a machine learning model is
is dependent on both the negative and dependent on positive samples and
positive samples. independent of negative samples.
In Precision, we should consider all positive The recall cares about correctly classifying all
samples that are classified as positive positive samples. It does not consider if any
either correctly or incorrectly. negative sample is classified as positive.
Why use Precision and Recall in Machine Learning models?
This question is very common among all machine learning engineers and data researchers. The use of
Precision and Recall varies according to the type of problem being solved.
o If there is a requirement of classifying all positive as well as Negative samples as Positive,
whether they are classified correctly or incorrectly, then use Precision.
o Further, on the other end, if our goal is to detect only all positive samples, then use Recall.
Here, we should not care how negative samples are correctly or incorrectly classified the
samples.
https://www.kaggle.com/datasets/chepkoyallan/datapreprocessing