0% found this document useful (0 votes)

42 views71 pages

Chapter 7 Supervised Learning

Uploaded by

ap.86.gperi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views71 pages

Chapter 7 Supervised Learning

Uploaded by

ap.86.gperi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 71

CHAPTER 7

Supervised Learning:
Classification and Regression
WHAT IS SUPERVISED
LEARNING
 Supervised learning is the types of machine
learning in which machines are trained using well
"labelled" training data, and on basis of that data,
machines predict the output. The labelled data
means some input data is already tagged with the
correct output.
 It is also know as Classification algorithm.
 Supervised learning is a process of providing
input data as well as correct output data to the
machine learning model. The aim of a supervised
learning algorithm is to find a mapping
function to map the input variable(x) with
the output variable(y).
 In the real-world, supervised learning can be used
for Risk Assessment, Image classification,
Fraud Detection, spam filtering, etc.
HOW SUPERVISED LEARNING WORKS?
In supervised learning, models are trained
using labelled dataset, where the model
learns about each type of data. Once the
training process is completed, the model is
tested on the basis of test data (a subset of
the training set), and then it predicts the
output.
TYPES OF SUPERVISED MACHINE LEARNING ALGORITHMS:
1. REGRESSION
 Regression algorithms are used if there is a
relationship between the input variable and
the output variable. It is used for the
prediction of continuous variables, such as
Weather forecasting, Market Trends, etc.
Below are some popular Regression
algorithms which come under supervised
learning:
• Linear Regression
• Regression Trees
• Non-Linear Regression
• Bayesian Linear Regression
• Polynomial Regression
2. CLASSIFICATION
 Classification algorithms are used when
the output variable is categorical, which
means there are two classes such as
Yes-No, Male-Female, True-false, etc.
 Spam Filtering,
• Random Forest
• Decision Trees
• Logistic Regression
• Support vector Machines
CLASSIFICATION—A TWO-
STEP PROCESS
 Model construction: describing a set of predetermined
classes
 Each tuple/sample is assumed to belong to a predefined
class, as determined by the class label attribute
 The set of tuples used for model construction is training set
 The model is represented as classification rules, decision
trees, or mathematical formulae
 Model usage: for classifying future or unknown objects
 Estimate accuracy of the model
 The known label of test sample is compared with the
classified result from the model
 Accuracy rate is the percentage of test set samples that
are correctly classified by the model
 Test set is independent of training set (otherwise
overfitting)
 If the accuracy is acceptable, use the model to classify new
data
PROCESS (1): MODEL
CONSTRUCTION
PROCESS (2): USING THE
MODEL IN PREDICTION
ADVANTAGES OF SUPERVISED LEARNING

• With the help of supervised learning, the

model can predict the output on the
basis of prior experiences.
• In supervised learning, we can have an
exact idea about the classes of objects.
• Supervised learning model helps us to
solve various real-world problems such
as fraud detection, spam filtering,
etc.
LEARNING STEPS
DISADVANTAGES OF SUPERVISED LEARNING:
• Supervised learning models are not
suitable for handling the complex tasks.
• Supervised learning cannot predict the
correct output if the test data is
different from the training dataset.
• Training required lots of computation
times.
• In supervised learning, we need enough
knowledge about the classes of object.
DECISION TREE
 Decision Tree is a Supervised learning
technique that can be used for both
classification and Regression problems, but
mostly it is preferred for solving Classification
problems.
 It is a tree-structured classifier, where internal
nodes represent the features of a dataset,
branches represent the decision rules and each
leaf node represents the outcome.
 In a Decision tree, there are two nodes, which are
the Decision Node and Leaf Node. Decision nodes are
used to make any decision and have multiple
branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branches.
 The decisions or the test are performed on the basis of
features of the given dataset.
 Define: It is a graphical representation for getting all
the possible solutions to a problem/decision based on
given conditions.
 In order to build a tree, we use the CART
algorithm, which stands for Classification and
Regression Tree algorithm.
 A decision tree simply asks a question, and based on
the answer (Yes/No), it further split the tree into sub-
trees.
DECISION TREE TERMINOLOGIES
 Root Node: Root node is from where the decision tree
starts. It represents the entire dataset, which further gets
divided into two or more homogeneous sets.
 Leaf Node: Leaf nodes are the final output node, and the
tree cannot be segregated further after getting a leaf
node.
 Splitting: Splitting is the process of dividing the decision
node/root node into sub-nodes according to the given
conditions.
 Branch/Sub Tree: A tree formed by splitting the tree.
 Pruning: Pruning is the process of removing the
unwanted branches from the tree.
 Parent/Child node: The root node of the tree is called the
parent node, and other nodes are called the child nodes.
Example:
ALGORITHM FOR
DECISION TREE
 Basic algorithm (a greedy algorithm)
 Tree is constructed in a top-down recursive divide-and-
conquer manner
 At start, all the training examples are at the root
 Attributes are categorical (if continuous-valued, they are
discretized in advance)
 Examples are partitioned recursively based on selected
attributes
 Test attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain)
 Conditions for stopping partitioning
 All samples for a given node belong to the same class
 There are no remaining attributes for further partitioning
– majority voting is employed for classifying the leaf
 There are no samples left
KEY TERMS AND CONCEPTS
(A) Entropy: It measures randomness or impurity of
an attribute(or a datapoints) in a decision tree.
 Entropy generally lies between 0 and 1.
 A lower value of entropy signifies a
homogeneous dataset with less randomness
hence better predictions.
 A high entropy indicates high disorder.
 Entropy could also be more then 1.It then
signifies that the dataset is very random and is
not good for creating a prediction classification
model.
Example: Find Entropy of the following distribution:

Gender Count
Male 9
Female 5

p(Male)= 9/14
p(Female)= 5/14
So based on the formula entropy is:

Entropy= - (5/14 log2 (5/14) + 9/14 log2 (9/14))

= - (-0.53 – 0.40)
= 0.93
Example:
Fruit Colour Taste Count
Yellow Sweet 10
Red Sweet 5
Green Sour 15
Orange Sour 5
 Example:

Fruit Colour Count

Yellow 10
Red 5
Green 15
Orange 5
INFORMATION GAIN
 Definition: It is a measure of purity produced by an
attribute.

 IG=Entropy(before split) – Entropy(after the

split for all subsets)
Question: Find Entropy and Information Gain
Credit Rating Accommodation Loan Approved
Above 600 Own Yes
Above 600 Own Yes
Above 600 Own Yes
Above 600 Own Yes
Above 600 Own Yes
Above 600 Own Yes
Above 600 Own Yes
Above 600 Rent Yes
Above 600 Rent Yes
Above 600 Rent Yes
Above 600 Rent Yes
Above 600 Other Yes
Below 600 Other Yes
Below 600 Other Yes
Below 600 Other Yes
Below 600 Other Yes
Credit Rating Accommodation Loan Approved
Above 600 Own No
Below 600 Rent No
Below 600 Rent No
Below 600 Rent No
Below 600 Rent No
Below 600 Rent No
Below 600 Rent No
Below 600 Other No
Below 600 Other No
Below 600 Other No
Below 600 Other No
Below 600 Other No
Below 600 Other No
Below 600 Other No
 Answer:
 First calculate the entropy of root note
which have 30 datapoints from which 16
– loan approved and 14 – loan not
approved.

 p(loan approved)=16/30
p(loan not approved)= 14/30

Entropy(parent node)= -((16/30 log2

(16/30)) +
(14/30 log2
(14/30)))
= 0.99
 Taking Credit Rating as the attribute of
split:

Entropy(Above 600) = - ((12/13 log2 12/13) + (1/13 log 2 1/13))

= 0.38

Entropy(Below 600) = - ((4/17 log2 4/17) + (13/17 log 2 13/17)) =

0.79

Weighted Entropy(Credit Rating) = 13/30 * Entropy(Above 600)

+ 17/30 * Entropy(Below
600)
= 0.16 + 0.45 = 0.61

Information Gain(Credit Rating), IG = Entropy(Parent node) -

 Taking Accommodation as the attribute
of split:

Entropy(Own) = - ((7/8 log2 7/8) + (1/8 log 2 1/8)) = 0.55

Entropy(Rent) = - ((4/10 log2 4/10) + (6/10 log 2 6/10)) = 0.97

Entropy(Other) = - ((5/12 log2 5/12) + (7/12 log 2 7/12)) = 0.98

Weighted Entropy(Accommodation) = 8/30 * Entropy(Own) +

10/30 * Entropy(Rent) + 12/30 *
Entropy(Other)
= 0.15 + 0.32 + 0.39 = 0.86
Information Gain(Accommodation), IG =
Entropy(Parent node) - Weighted

Entropy(Accommodation)
= 0.99 – 0.86 = 0.13

 IG(Credit Rating) is almost 3 times more

better then IG(Accommodation).
 Hence, credit Rating is better choice for
splitting the decision.
K-NEAREST NEIGHBOR(KNN) ALGORITHM
• K-Nearest Neighbour is one of the simplest
Machine Learning algorithms based on
Supervised Learning technique.
• K-NN algorithm assumes the similarity
between the new case/data and available
cases and put the new case into the category
that is most similar to the available
categories.
• K-NN algorithm stores all the available data
and classifies a new data point based on the
similarity. This means when new data
appears then it can be easily classified into a
well suite category by using K- NN algorithm.
K-NEAREST NEIGHBOR(KNN) ALGORITHM
• K-NN algorithm can be used for Regression as well
as for Classification but mostly it is used for the
Classification problems.
• K-NN is a non-parametric algorithm, which
means it does not make any assumption on
underlying data.
• It is also called a lazy learner algorithm because
it does not learn from the training set immediately
instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
• KNN algorithm at the training phase just stores the
dataset and when it gets new data, then it
classifies that data into a category that is much
similar to the new data.
EXAMPLE:
 Suppose, we have an image of a creature
that looks similar to cat and dog, but we
want to know either it is a cat or dog. So for
this identification, we can use the KNN
algorithm, as it works on a similarity
measure. Our KNN model will find the
similar features of the new data set to the
cats and dogs images and based on the
most similar features it will put it in either
cat or dog category.
WHY DO WE NEED A K-NN ALGORITHM?
 Suppose there are two categories, i.e.,
Category A and Category B, and we have a
new data point x1, so this data point will lie
in which of these categories. To solve this
type of problem, we need a K-NN algorithm.
With the help of K-NN, we can easily identify
the category or class of a particular
dataset. Consider the below diagram:
KNN ALGORITHM IN
DETAIL
 Refer to the below link for KNN
Explanation and problem Sums.

 https://www.slideshare.net/Simplilearn/k
nearest-neighbor-classification-algorith
m-how-knn-algorithm-works-knn-algorit
hm-simplilearn
ADVANTAGES OF KNN
ALGORITHM:
• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training
data is large.
DISADVANTAGES OF KNN ALGORITHM:

• Always needs to determine the value of

K which may be complex some time.
• The computation cost is high because of
calculating the distance between the
data points for all the training samples.
SUPPORT VECTOR
MACHINE (SVM)
 Support Vector Machine or SVM is one of the most
popular Supervised Learning algorithms, which is used
for Classification as well as Regression problems.
 The goal of the SVM algorithm is to create the best
line or decision boundary that can segregate n-
dimensional space into classes so that we can easily
put the new data point in the correct category in the
future. This best decision boundary is called a
hyperplane.
 SVM chooses the extreme points/vectors that help in
creating the hyperplane. These extreme cases are
called as support vectors, and hence algorithm is
termed as Support Vector Machine.
 SVM algorithm can be used for Face detection,
image classification, text categorization, etc.
GRAPH:
EXAMPLE FOR SVM
 Suppose we see a strange cat that also has some
features of dogs, so if we want a model that can
accurately identify whether it is a cat or dog, so such a
model can be created by using the SVM algorithm.
 We will first train our model with lots of images of cats
and dogs so that it can learn about different features of
cats and dogs, and then we test it with this strange
creature.
 So as support vector creates a decision boundary
between these two data (cat and dog) and choose
extreme cases (support vectors), it will see the extreme
case of cat and dog. On the basis of the support vectors,
it will classify it as a cat. Consider the below diagram:
EXAMPLE OF SVM(CONT.)
HYPERPLANE IN THE SVM ALGORITHM:
 Hyperplane: There can be multiple
lines/decision boundaries to segregate the
classes in n-dimensional space, but we need to
find out the best decision boundary that helps to
classify the data points. This best boundary is
known as the hyperplane of SVM.
 The dimensions of the hyperplane depend on the
features present in the dataset, which means if
there are 2 features (as shown in image), then
hyperplane will be a straight line. And if there
are 3 features, then hyperplane will be a 2-
dimension plane.
 We always create a hyperplane that has a
maximum margin, which means the maximum
distance between the data points.
SUPPORT VECTORS IN THE
SVM ALGORITHM
 The data points or vectors that are the
closest to the hyperplane and which
affect the position of the hyperplane are
termed as Support Vector. Since these
vectors support the hyperplane, hence
called a Support vector.
NON-LINEAR SVM
 Nonlinear SVM (Support Vector Machine) is necessary
when the data cannot be effectively separated by a
linear decision boundary in the original feature
space.
 Nonlinear SVM addresses this limitation by utilizing
kernel functions to map the data from low dimension
into a higher-dimensional space where linear
separation becomes possible.
 The kernel function computes the similarity between
data points, allowing SVM to capture complex
patterns and nonlinear relationships between
features.
 By leveraging the kernel trick, nonlinear SVM
provides a powerful tool for solving classification
problems where linear separation is insufficient,
extending its applicability to a wide range of real-
world scenarios.
 If data is linearly arranged, then we can separate it by
using a straight line, but for non-linear data, we cannot
draw a single straight line. Consider the below image:
 So to separate these data points, we
need to add one more dimension. For
linear data, we have used two
dimensions x and y, so for non-linear
data, we will add a third dimension z. It
can be calculated as:
 By adding the third dimension, the
sample space will become as below
image:
 So now, SVM will divide the datasets
into classes in the following way.
Consider the below image:
RANDOM FOREST
ALGORITHM
 Random Forest is a popular machine learning
algorithm that belongs to the supervised learning
technique. It can be used for both Classification and
Regression problems in ML. It is based on the
concept of ensemble learning, which is a process
of combining multiple classifiers to solve a complex
problem and to improve the performance of the
model.
 As the name suggests, "Random Forest is a
classifier that contains a number of decision
trees on various subsets of the given dataset
and takes the average to improve the
predictive accuracy of that dataset." Instead of
relying on one decision tree, the random forest
takes the prediction from each tree and based on
the majority votes of predictions, and it predicts the
final output.
 The greater number of trees in the
forest leads to higher accuracy and
prevents the problem of overfitting.
ASSUMPTIONS FOR RANDOM FOREST
 Since the random forest combines multiple
trees to predict the class of the dataset, it
is possible that some decision trees may
predict the correct output, while others
may not. But together, all the trees predict
the correct output. Therefore, below are
two assumptions for a better Random
forest classifier:
1. There should be some actual values in
the feature variable of the dataset so that
the classifier can predict accurate results
rather than a guessed result.
2. The predictions from each tree must have
very low correlations.
WHY USE RANDOM FOREST?
• It takes less training time as compared
to other algorithms.
• It predicts output with high accuracy,
even for the large dataset it runs
efficiently.
• It can also maintain accuracy when a
large proportion of data is missing.
HOW DOES RANDOM FOREST
ALGORITHM WORK?
 Random Forest works in two-phase first
is to create the random forest by
combining N decision tree, and second
is to make predictions for each tree
created in the first phase.
 The Working process can be explained
in the below steps and diagram:
 Step-1: Select random K data points
from the training set.
 Step-2: Build the decision trees
associated with the selected data points
(Subsets).
 Step-3: Choose the number N for decision trees
that you want to build.
 Step-4: Repeat Step 1 & 2.
 Step-5: For new data points, find the predictions
of each decision tree, and assign the new data
points to the category that wins the majority
votes.
 The working of the algorithm can be better
understood by the below example:
 Example: Suppose there is a dataset that
contains multiple fruit images. So, this dataset is
given to the Random forest classifier. The dataset
is divided into subsets and given to each decision
tree. During the training phase, each decision tree
produces a prediction result, and when a new
data point occurs, then based on the majority of
results, the Random Forest classifier predicts the
final decision. Consider the below image:
HOW DOES RANDOM FOREST
ALGORITHM WORK? (CONTI.)
LINEAR REGRESSION
 Linear regression is one of the easiest and most
popular Machine Learning algorithms. It is a
statistical method that is used for predictive
analysis. Linear regression makes predictions
for continuous/real or numeric variables such
as sales, salary, age, product price, etc.
 Linear regression algorithm shows a linear
relationship between a dependent (y) and one
or more independent (x) variables, hence called
as linear regression. Since linear regression
shows the linear relationship, which means it
finds how the value of the dependent variable is
changing according to the value of the
independent variable.
 The linear regression model provides a
sloped straight line representing the
relationship between the variables.
Consider the below image:
 Mathematically, we can represent a
linear regression as:

Y= Dependent Variable (Target Variable)

X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional
degree of freedom)
a1 = Linear regression coefficient (scale
factor to each input value).

The values for x and y variables are training

datasets for Linear Regression model
representation.
Multiple Linear Regression:
 This involves more than one
independent variable and one
dependent variable. The equation for
multiple linear regression is:

where:
• Y is the dependent variable
• X1, X2, …, Xn are the independent
variables
• β0 is the intercept
• β1, β2, …, βn are the slopes
TYPES OF LINEAR REGRESSION
• Simple Linear Regression:
If a single independent variable is used
to predict the value of a numerical
dependent variable, then such a Linear
Regression algorithm is called Simple
Linear Regression.
• Multiple Linear regression:
If more than one independent variable is
used to predict the value of a numerical
dependent variable, then such a Linear
Regression algorithm is called Multiple
Linear Regression.
LINEAR REGRESSION LINE
• Positive Linear Relationship:
If the dependent variable increases on
the Y-axis and independent variable
increases on X-axis, then such a
relationship is termed as a Positive
linear relationship.
LINEAR REGRESSION
LINE(CONT.)
• Negative Linear Relationship:
If the dependent variable decreases on
the Y-axis and independent variable
increases on the X-axis, then such a
relationship is called a negative linear
relationship.
LOGISTIC REGRESSION
• Logistic regression is one of the most popular
Machine Learning algorithms, which comes under
the Supervised Learning technique. It is used for
predicting the categorical dependent variable
using a given set of independent variables.
• Logistic regression predicts the output of a
categorical dependent variable. Therefore the
outcome must be a categorical or discrete value.
It can be either Yes or No, 0 or 1, true or False,
etc. but instead of giving the exact value as 0
and 1, it gives the probabilistic values which
lie between 0 and 1.
• Linear Regression is used for solving Regression
problems, whereas Logistic regression is used
for solving the classification problems.
LOGISTIC REGRESSION
• In Logistic regression, instead of fitting a
regression line, we fit an "S" shaped
logistic function, which predicts two
maximum values (0 or 1).
• The curve from the logistic function
indicates the likelihood of something such
as whether the cells are cancerous or not,
a mouse is obese or not based on its
weight, etc.
• Logistic Regression is a significant
machine learning algorithm because it
has the ability to provide probabilities and
classify new data using continuous and
discrete datasets.
LOGISTIC REGRESSION
• Logistic Regression can be used to
classify the observations using different
types of data and can easily determine
the most effective variables used for the
classification. The below image is
showing the logistic function:
LOGISTIC FUNCTION
(SIGMOID FUNCTION):
• The sigmoid function is a mathematical function
used to map the predicted values to probabilities.
• It maps any real value into another value within a
range of 0 and 1.
• The value of the logistic regression must be
between 0 and 1, which cannot go beyond this
limit, so it forms a curve like the "S" form. The S-
form curve is called the Sigmoid function or the
logistic function.
• In logistic regression, we use the concept of the
threshold value, which defines the probability of
either 0 or 1. Such as values above the threshold
value tends to 1, and a value below the threshold
values tends to 0.
LOGISTIC REGRESSION EQUATION:
TYPE OF LOGISTIC REGRESSION:
 On the basis of the categories, Logistic
Regression can be classified into three
types:
• Binomial: In binomial Logistic
regression, there can be only two
possible types of the dependent
variables, such as 0 or 1, Pass or Fail,
etc.
• Multinomial: In multinomial Logistic
regression, there can be 3 or more
possible unordered types of the
dependent variable, such as "cat",
"dogs", or "sheep"
THANK YOU

DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Refer For KNNDecison Tree SVM
No ratings yet
Refer For KNNDecison Tree SVM
90 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Unit 3 (MLT)
No ratings yet
Unit 3 (MLT)
42 pages
Unit 2
No ratings yet
Unit 2
29 pages
ML-PPT Unit Iii-1
No ratings yet
ML-PPT Unit Iii-1
38 pages
INT354 - Unit 2
No ratings yet
INT354 - Unit 2
26 pages
Machine Learning: Classification & Decision Trees
No ratings yet
Machine Learning: Classification & Decision Trees
24 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
41 pages
Supervised Learning Part1
No ratings yet
Supervised Learning Part1
42 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
33 pages
Text Classification Techniques Overview
No ratings yet
Text Classification Techniques Overview
65 pages
Unit3 ML
No ratings yet
Unit3 ML
23 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
4 & 5 DWM 2024-25
No ratings yet
4 & 5 DWM 2024-25
32 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Machine Learning Chapter 2
No ratings yet
Machine Learning Chapter 2
53 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Chapter 4 SqCzYr
No ratings yet
Chapter 4 SqCzYr
47 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
CH 5
No ratings yet
CH 5
84 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Classification & Prediction Techniques
No ratings yet
Classification & Prediction Techniques
71 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Supervised Learning Classification Techniques
No ratings yet
Supervised Learning Classification Techniques
224 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Decision Tree Basics for Data Scientists
No ratings yet
Decision Tree Basics for Data Scientists
61 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
CH-5 DM Classification
No ratings yet
CH-5 DM Classification
31 pages
49 Machine Learning
No ratings yet
49 Machine Learning
300 pages
Classifying in Machine Learning
No ratings yet
Classifying in Machine Learning
26 pages
Classification vs. Prediction in ML
No ratings yet
Classification vs. Prediction in ML
25 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Unit 4 Classification & Prediction
No ratings yet
Unit 4 Classification & Prediction
10 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
ML Unit-III
No ratings yet
ML Unit-III
30 pages
Decision Tree Modeling Explained
No ratings yet
Decision Tree Modeling Explained
58 pages
Classification
No ratings yet
Classification
33 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
UNIT II 2.1 ML Decision Tree Learning
No ratings yet
UNIT II 2.1 ML Decision Tree Learning
55 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
Classification
No ratings yet
Classification
45 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
Chapter 02 - DM Tasks - Part I - Classification
No ratings yet
Chapter 02 - DM Tasks - Part I - Classification
58 pages
Ml-Unit Iii-1
No ratings yet
Ml-Unit Iii-1
46 pages
Chapter 4 SqCzYr
No ratings yet
Chapter 4 SqCzYr
47 pages
Understanding Decision Trees in Classification
100% (1)
Understanding Decision Trees in Classification
58 pages
DM - 06 Mar 2025
No ratings yet
DM - 06 Mar 2025
13 pages
ML Unit4
No ratings yet
ML Unit4
10 pages
Module3 Final
No ratings yet
Module3 Final
66 pages
A Survey of Decision Trees Concepts Algorithms and Applications
No ratings yet
A Survey of Decision Trees Concepts Algorithms and Applications
12 pages
Unit - 3 Q1.State Different Types of Report With Application 1.crosstab Report
No ratings yet
Unit - 3 Q1.State Different Types of Report With Application 1.crosstab Report
26 pages
Ensemble Learning and Random Forests
No ratings yet
Ensemble Learning and Random Forests
151 pages
Machine Learning-Powered Web Application For Predicting and Identifying Fake Job Listing
No ratings yet
Machine Learning-Powered Web Application For Predicting and Identifying Fake Job Listing
6 pages
Arduino Based Face Detection and Tracking System
No ratings yet
Arduino Based Face Detection and Tracking System
45 pages
PU Learning for Unlabeled Data Classification
No ratings yet
PU Learning for Unlabeled Data Classification
10 pages
8 Decision Trees Option
No ratings yet
8 Decision Trees Option
59 pages
MTech Deep Learning Syllabus
No ratings yet
MTech Deep Learning Syllabus
1 page
Data Mining Project
100% (1)
Data Mining Project
24 pages
Crop Prediction with Ensemble Models
No ratings yet
Crop Prediction with Ensemble Models
39 pages
Automated Robot
No ratings yet
Automated Robot
17 pages
Phase 3 IBM
No ratings yet
Phase 3 IBM
7 pages
AI Learning: Methods and Models
No ratings yet
AI Learning: Methods and Models
19 pages
The Ethics of Algorithms-Mittelstadt Et Al
No ratings yet
The Ethics of Algorithms-Mittelstadt Et Al
21 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
Recurrent Neural Network (RNN)
No ratings yet
Recurrent Neural Network (RNN)
26 pages
Mahalanobis Distance - G. F. MacLachlan (1999)
No ratings yet
Mahalanobis Distance - G. F. MacLachlan (1999)
7 pages
Classifying Protein-Calorie Malnutrition
No ratings yet
Classifying Protein-Calorie Malnutrition
4 pages
MLP Architecture Optimization Guide
No ratings yet
MLP Architecture Optimization Guide
5 pages
Facial Emotion Detection with CNN
No ratings yet
Facial Emotion Detection with CNN
14 pages
Alfred - Irabor Glory .Complete Project
No ratings yet
Alfred - Irabor Glory .Complete Project
91 pages
Naïve Bayes for Data Scientists
No ratings yet
Naïve Bayes for Data Scientists
31 pages
Logistik Regressiya Modeli va Bashorat
No ratings yet
Logistik Regressiya Modeli va Bashorat
10 pages
ML 9
No ratings yet
ML 9
15 pages
Mini Project
No ratings yet
Mini Project
10 pages
Taking The Human Out of Decomposition Based Optimization 2024 Computers CH
No ratings yet
Taking The Human Out of Decomposition Based Optimization 2024 Computers CH
9 pages
ML Group Ass
No ratings yet
ML Group Ass
26 pages
Course Logistics and Introduction To Machine Learning
No ratings yet
Course Logistics and Introduction To Machine Learning
34 pages
Detection of Fake News Posts On Facebook
No ratings yet
Detection of Fake News Posts On Facebook
6 pages
Abstract
No ratings yet
Abstract
11 pages

Chapter 7 Supervised Learning

Uploaded by

Chapter 7 Supervised Learning

Uploaded by

CHAPTER 7

• With the help of supervised learning, the

Entropy= - (5/14 log2 (5/14) + 9/14 log2 (9/14))

Fruit Colour Count

 IG=Entropy(before split) – Entropy(after the

Entropy(parent node)= -((16/30 log2

Entropy(Above 600) = - ((12/13 log2 12/13) + (1/13 log 2 1/13))

Entropy(Below 600) = - ((4/17 log2 4/17) + (13/17 log 2 13/17)) =

Weighted Entropy(Credit Rating) = 13/30 * Entropy(Above 600)

Information Gain(Credit Rating), IG = Entropy(Parent node) -

Entropy(Own) = - ((7/8 log2 7/8) + (1/8 log 2 1/8)) = 0.55

Entropy(Rent) = - ((4/10 log2 4/10) + (6/10 log 2 6/10)) = 0.97

Entropy(Other) = - ((5/12 log2 5/12) + (7/12 log 2 7/12)) = 0.98

Weighted Entropy(Accommodation) = 8/30 * Entropy(Own) +

 IG(Credit Rating) is almost 3 times more

• Always needs to determine the value of

Y= Dependent Variable (Target Variable)

The values for x and y variables are training

You might also like