0% found this document useful (0 votes)
36 views16 pages

Machine Learning Basics and Classification

The document provides an overview of machine learning, including its definition, types (supervised, unsupervised, and reinforcement learning), and how it works through examples. It also discusses binary classification, its algorithms, evaluation metrics like accuracy, precision, and recall, and their importance in assessing model performance. Additionally, it highlights the differences between precision and recall in the context of machine learning models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views16 pages

Machine Learning Basics and Classification

The document provides an overview of machine learning, including its definition, types (supervised, unsupervised, and reinforcement learning), and how it works through examples. It also discusses binary classification, its algorithms, evaluation metrics like accuracy, precision, and recall, and their importance in assessing model performance. Additionally, it highlights the differences between precision and recall in the context of machine learning models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

UNIT-I

1 The Ingredients of Machine Learning:


1.1 Introduction to Machine Learning
1.2 Types of Machine Learning Models
1.3 The output of Machine Learning
2 Binary Classification and related tasks:
2.1 Classification
2.2 Calculating accuracy in classification.

1.1 Introduction to Machine Learning


What is Machine Learning?
• Machine learning is an application of artificial intelligence that involves algorithms and data that
automatically analyse and make decision by itself without human intervention.

• It describes how computer perform tasks on their own by previous experiences.

• Therefore we can say in machine language artificial intelligence is generated on the basis of
experience.

How Machine Learning Works

Consider a system with input data that contains photos of various kinds of fruits. You want the system to
group the data according to the different types of fruits.
First, the system will analyze the input data. Next, it tries to find patterns, like shapes, size, and color.
Based on these patterns, the system will try to predict the different types of fruit and segregate them.
Finally, it keeps track of all the decisions it made during the process to ensure it is learning. The next
time you ask the same system to predict and segregate the different types of fruits, it won't have to go
through the entire process again. That’s how machine learning works

1.2 Types of Machine Learning

Supervised machine learning: You supervise the machine while training it to


work on its own. This requires labeled training data
Unsupervised learning: There is training data, but it won’t be labeled
Reinforcement learning: The system learns on its own

Supervised Learning :

To understand how supervised learning works, look at the example below, where you
have to train a model or system to recognize an apple.
First, you have to provide a data set that contains pictures of a kind of fruit, e.g.,
apples.

Then, provide another data set that lets the model know that these are pictures of
apples. This completes the training phase.

Next, provide a new set of data that only contains pictures of apples. At this point, the
system can recognize what the fruit it is and will remember it.

That's how supervised learning works. You are training the model to perform a specific
operation on its own. This kind of model is often used in filtering spam mail from your
email accounts.

Here are some different supervised learning algorithms:

Decision trees: Uses labeled data to train and predict on unlabeled data

Random forest:Combines multiple decision trees to create a more accurate prediction

Support vector machines (SVM):A well-known supervised learning algorithm with


many applications and variations
Logistic regression:A classification algorithm that uses a model to determine the
probability of an event

Linear regression:A data scientist trains the algorithm using a set of training data with
correct outputs

Naive Bayes:A supervised learning technique based on Bayes' Theorem

Unsupervised Learning

Consider a cluttered dataset: a collection of pictures of different fruit. You feed this data
to the model, and the model analyzes it to recognize any patterns. In the end, the
machine categorizes the photos into three types, as shown in the image, based on their
similarities. Flipkart uses this model to find and recommend products that are well suited
for you.
Types of Unsupervised Learning:

2.1 Clustering

2.2 Dimensionality Reduction

Reinforcement Learning
You provide a machine with a data set and ask it to identify a particular kind of fruit (in
this case, an apple). The machine tells you that it's mango, but that’s the wrong answer.
As feedback, you tell the system that it's wrong; it's not a mango, it's an apple. The
machine then learns from the feedback and keeps that in mind. The next time you ask
the same question, the system gives you the right answer; it is able to tell you that it’s
an apple. That is a reinforced response.

That's how reinforcement learning works; the system learns from its mistakes and
experiences. This model is used in games like Prince of Persia, Assassin’s Creed, and
FIFA, wherein the level of difficulty increases as you get better with the games.

Output of machine learning:


The output of machine learning is a machine learning model, which is a computer
program that contains data and guidelines for making predictions. The model is created
by a machine learning algorithm that analyzes data to find patterns and make
predictions.

The output of a machine learning model depends on the type of learning:

Supervised learning: The model output is a predicted target value for a given input.

Unsupervised learning: The model output may include cluster assignments or other
learned patterns in the data.

Machine learning models can be used to perform classification and prediction tasks on
various types of data, including documents, images, and numbers. For example, a
financial institution might use a machine learning model to classify transactions as
fraudulent or genuine.

Machine learning models can be very accurate, but they are only as accurate as the
data used to train them. The data should be clean, unbiased, and representative of
different scenarios.

CLASSIFICATION:
Definition of Classification
In machine learning, Classification, as the name suggests, classifies data into different parts/classes/groups. It is
used to predict from which dataset the input data belongs to .

Classification is the process of assigning new input variables (X) to the class they most likely belong to, based on a

classification model, as constructed from previously labeled training data.

Data with labels is used to train a classifier such that it can perform well on data without labels (not yet labeled). This

process of continuous classification, of previously known classes, trains a machine. If the classes are discrete, it can

be difficult to perform classification tasks.

Types of Classification

There are two types of classifications

 Binary classification

 Multi-class classification

Binary Classification

It is a process or task of classification, in which a given data is being classified into two classes. It’s basically a kind
of prediction about which of two groups the thing belongs to.
Examples include:

 Email spam detection (spam or not).


 Churn prediction (churn or not).[ a measurement of the percentage of accounts that cancel or choose
 not to renew their subscriptions ]
 Conversion prediction (buy or not).

Typically, binary classification tasks involve one class that is the normal state and another class that is the abnormal

state.

For example “not spam” is the normal state and “spam” is the abnormal state. Another example is “cancer not

detected” is the normal state of a task that involves a medical test and “cancer detected” is the abnormal state.

The class for the normal state is assigned the class label 0 and the class with the abnormal state is assigned the

class label 1.

Let us suppose, two emails are sent to you, one is sent by an insurance company that keeps sending their ads, and

the other is from your bank regarding your credit card bill. The email service provider will classify the two emails, the

first one will be sent to the spam folder and the second one will be kept in the primary one.
This process is known as binary classification, as there are two discrete classes, one is spam and the other is

primary. So, this is a problem of binary classification.

Binary classification uses some algorithms to do the task, some of the most common
algorithms used by binary classification are
 Logistic Regression
 k-Nearest Neighbors
 Decision Trees
 Support Vector Machine
 Naive Bayes

Paramet
Binary classification Multi-class classification
ers

There can be any number of


It is a classification of two
No. of classes in it, i.e., classifies the
groups, i.e. classifies objects in
classes object into more than two
at most two classes.
classes.

Algorith The most popular algorithms Popular algorithms that can be


ms used used by the binary used for multi-class
classification are- classification include:
Logistic Regression
k-Nearest Neighbors
k-Nearest Neighbors Decision Trees
Decision Trees
Naive Bayes
Support Vector Machine
Random Forest
Naive Bayes
Gradient Boosting

Examples of binary
classification include- Examples of multi-class
E mail spam detection (spam or classification include:
Example
not). Face classification.
s
Churn prediction (churn or not). Plant species classification.
Conversion prediction (buy or Optical character recognition.
not).

Evaluation of binary classifiers


If the model successfully predicts the patients as positive, this case is called True Positive (TP).
If the model successfully predicts patients as negative, this is called True Negative (TN).

The binary classifier may misdiagnose some patients as well. If a diseased patient is classified as healthy by a
negative test result, this error is called False Negative (FN).

Similarly, If a healthy patient is classified as diseased by a positive test result, this error is called False Positive(FP).

We can evaluate a binary classifier based on the following parameters:


 True Positive (TP): The patient is diseased and the model predicts "diseased"
 False Positive (FP): The patient is healthy but the model predicts "diseased"
 True Negative (TN): The patient is healthy and the model predicts "healthy"
 False Negative (FN): The patient is diseased and the model predicts "healthy"

 The following is a confusion matrix, which represents the above parameters:


CALCULATING ACCURACY IN CLASSIFICATION
Accuracy is perhaps the best-known Machine Learning model validation method used in classification
problems. One reason for its popularity is its relative simplicity. It is easy to understand and easy to
implement. Accuracy is a good metric to assess model performance for simple cases.

What is Accuracy?

Accuracy is a metric used in classification problems used to tell the percentage of accurate predictions.
We calculate it by dividing the number of correct predictions by the total number of predictions.

In the binary classification case, we can express accuracy in True Positive/False Positive/True Negative
False Negative values.

Where

 TP : True Positives

 FP : False Positives

 TN : True Negatives

 FN : False Negatives

What is Precision?

Precision is defined as the ratio of correctly classified positive samples (True Positive) to a total
number of classified positive samples (either correctly or incorrectly).

or

The percentage of correct predictions for the positive class

Precision = True Positive/True Positive + False Positive

Precision = TP/TP+FP (or)

Precision=TP/Predicted(positive)

Hence, precision helps us to visualize the reliability of the machine learning model in
classifying the model as positive.
What is Recall?

The recall is calculated as the ratio between the numbers of Positive samples correctly classified as
Positive to the total number of Positive samples. The recall measures the model's ability to
detect positive samples. The higher the recall, the more positive samples detected.

Or

The percentage of actual positive class samples that were identified by the model

Recall = True Positive/True Positive + False Negative


Recall = TP/TP+FN (or)

Recall=TP/Actual(positive)

EXAMPLE:1

165 Predicted Predicted


Negative Positive
Actual 50 10 60
Negative
TN FP
Actual 5 100 105
Positive
FN TP
55 110

ACCURACY=TN+TP/TOTAL =50+100/165 =0.91

ERROR RATE=1-ACCURACY OR FP+FN/TOTAL =0.09

PRECISION=TP/PREDICTED(POSITIVE) =100/110 =0.91

RECALL=TP/ACTUAL(POSITIVE) =100/105 =0.95 OR RECALL=TPR

TPR=TP/TP+FN = 100/105 =0.95(SENSITIVITY)

FPR=FP/FP+TN OR 1-TNR =10/60 =0.16

FNR=FN/FN+TP OR 1-TPR =0.05

TNR=TN/TN+ FP = 50/60 =0.83(SPECIFICITY)

The F-Measure or F1-score is a way of combining the precision and recall of the model, and it is
defined as the harmonic mean of the model’s precision and recall.

F-MEASURE or F-SCORE =2PR/P+R =1.729/1.86 =0.92


EXAMPLE:2
Difference between Precision and Recall in Machine Learning

Precision Recall

It helps us to measure the ability to classify It helps us to measure how many positive
positive samples in the model. samples were correctly classified by the ML
model.

While calculating the Precision of a model, While calculating the Recall of a model, we
we should consider both Positive as well as only need all positive samples while all
Negative samples that are classified. negative samples will be neglected.

When a model classifies most of the When a model classifies a sample as Positive,
positive samples correctly as well as many but it can only classify a few positive
false-positive samples, then the model is samples, then the model is said to be high
said to be a high recall and low precision accuracy, high precision, and low recall
model. model.

The precision of a machine learning model Recall of a machine learning model is


is dependent on both the negative and dependent on positive samples and
positive samples. independent of negative samples.

In Precision, we should consider all positive The recall cares about correctly classifying all
samples that are classified as positive positive samples. It does not consider if any
either correctly or incorrectly. negative sample is classified as positive.

Why use Precision and Recall in Machine Learning models?


This question is very common among all machine learning engineers and data researchers. The use of
Precision and Recall varies according to the type of problem being solved.

o If there is a requirement of classifying all positive as well as Negative samples as Positive,


whether they are classified correctly or incorrectly, then use Precision.
o Further, on the other end, if our goal is to detect only all positive samples, then use Recall.
Here, we should not care how negative samples are correctly or incorrectly classified the
samples.

https://www.kaggle.com/datasets/chepkoyallan/datapreprocessing

You might also like