0% found this document useful (0 votes)

12 views31 pages

Decision Trees

A decision tree is a non-parametric supervised learning algorithm with a hierarchical structure consisting of nodes that represent tests on features and outcomes. Key concepts include root nodes, decision nodes, leaf nodes, pruning, and information gain, which measures the reduction of uncertainty after a split. The goal of using decision trees is to decrease disorder in the dataset, often utilizing metrics like entropy to determine the best features for splitting.

Uploaded by

sushanth3826

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views31 pages

Decision Trees

Uploaded by

sushanth3826

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Decision Trees

➢ A decision tree is a non-parametric supervised learning algorithm.

➢ It has a hierarchical, tree structure, which consists of a root node, branches, internal
nodes and leaf nodes.

➢ A decision tree is a flowchart-like structure in which each internal node represents a

test on a feature (e.g. whether a coin flip comes up heads or tails), each leaf node
represents a class label (decision taken after computing all features) and branches
represent conjunctions of features that lead to those class labels.
➢ Root Node: The initial node at the beginning of a decision tree, where the
entire population or dataset starts dividing based on various features or
condition.
➢ Decision Nodes: Nodes resulting from the splitting of root nodes are
known as decision nodes. These nodes represent intermediate decisions
or conditions within the tree.
➢ Leaf Nodes: Nodes where further splitting is not possible, often
indicating the final classification or outcome. Leaf nodes are also referred
to as terminal nodes.
➢ Pruning: The process of removing or cutting down specific nodes in a
decision tree to prevent overfitting and simplify the model.
➢ Branch / Sub-Tree: A subsection of the entire decision tree is referred to
as a branch or sub-tree. It represents a specific path of decisions and
outcomes within the tree.
➢ Parent and Child Node: In a decision tree, a node that is divided into
sub-nodes is known as a parent node, and the sub-nodes emerging from
it are referred to as child nodes. The parent node represents a decision or
condition, while the child nodes represent the potential outcomes or
further decisions based on that condition.
Types of Decision Trees

Hunt’s algorithm, which was developed in the 1960s to model human learning in Psychology, forms the
foundation of many popular decision tree algorithms.

ID3, C4.5 and CART

Example of Decision Tree

➢ Decision trees are upside down.

➢ Decision trees are nothing but a bunch of if-else statements.

➢ In the below diagram the tree will first ask what is the weather?

Did you notice anything in the above flowchart? We see that if the weather is cloudy then we must go to play. Why
didn’t it split more? Why did it stop there?
To answer this question, we need to know about few more concepts like entropy, information gain, and Gini index.
But in simple terms, I can say here that the output for the training dataset is always yes for cloudy weather, since there
is no disorderliness here we don’t need to split the node further.

The goal of machine learning is to decrease uncertainty or disorders from the dataset and for this, we use decision
trees.

Entropy

• Entropy is nothing but the uncertainty in our dataset or measure of disorder.

• Entropy is a measure of the randomness in the information being processed.

• Suppose you have a group of friends who decides which movie they can watch together on Sunday.
This is exactly what we call disorderness, there is an equal number of votes for both the movies, and we
can’t really decide which movie we should watch. It would have been much easier if the votes more for
one movie,

In a decision tree, the output is mostly “yes” or “no”

The formula for Entropy is shown below:

Mathematically Entropy for 1 attribute is represented as:

Here,
•p+ is the probability of positive class
•p– is the probability of negative class
•S is the subset of the training example
where T→ Current state and X → Selected attribute
How do Decision Trees use Entropy?

Entropy basically measures the impurity of a node. Impurity is the degree of randomness; it tells how random
our data is. Apure sub-split means that either you should be getting “yes”, or you should be getting “no”.

Suppose a feature has 8 “yes” and 4 “no” initially,

In order to make a decision tree, we need to calculate the impurity of each split, and when the purity is 100%,
we make it as a leaf node.

To check the impurity of feature 2 and feature 3 we will take the help for Entropy formula.
For feature 2, For feature 3,

➢ We can clearly see from the tree itself that left node has low entropy or more purity than right node since left
node has a greater number of “yes” and it is easy to decide here.

➢ Always remember that the higher the Entropy, the lower will be the purity and the higher will be the impurity.
The goal of machine learning is to decrease the uncertainty or impurity in the dataset, here by using the entropy we
are getting the impurity of a particular node, we don’t know if the parent entropy or the entropy of a particular node
has decreased or not.

For this, we bring a new metric called “Information gain” which tells us how much the parent entropy has decreased
after splitting it with some feature.

Information Gain

Information gain measures the reduction of uncertainty (entropy) given some feature and it is also a deciding factor for
which attribute should be selected as a decision node or root node.

Information gain can be defined as the amount of information gained about a random variable or signal from observing
another random variable. It can be considered as the difference between the entropy of parent node and weighted
average entropy of child nodes.

It is just entropy of the full dataset – entropy of the dataset given some feature.

It computes the difference between entropy before split and average entropy after split of the dataset based on given
attribute values. ID3 (Iterative Dichotomiser) decision tree algorithm uses information gain.
To understand this better let’s consider an example: Suppose our entire population has a total of 30 instances. The
dataset is to predict whether the person will go to the gym or not. Let’s say 16 people go to the gym and 14 people
don’t

Now we have two features to predict whether he/she will go to the gym or not.

Feature 1 is “Energy” which takes two values “high” and “low”

Feature 2 is “Motivation” which takes 3 values “No motivation”, “Neutral” and “Highly motivated”.
Let’s see how our decision tree will be made using these 2 features. We’ll use information gain to decide which
feature should be the root node and which feature should be placed after the split.
Let’s calculate the entropy

To see the weighted average of entropy of each node we will do as follows:

Now we have the value of E(Parent) and E(Parent|Energy), information gain will be:

Our parent entropy was near 0.99 and after looking at this value of information gain, we can say that the
entropy of the dataset will decrease by 0.37 if we make “Energy” as our root node.
Similarly, we will do this with the other feature “Motivation” and calculate its information gain

Let’s calculate the entropy here:

To see the weighted average of entropy of each node we will do as follows:

Now we have the value of E(Parent) and E(Parent|Motivation), information gain will be:

We now see that the “Energy” feature gives more reduction which is 0.37 than the “Motivation” feature. Hence
we will select the feature which has the highest information gain and then split the node based on that feature.

In this example “Energy” will be our root node and we’ll do the same for sub-nodes. Here we can see that when
the energy is “high” the entropy is low and hence we can say a person will definitely go to the gym if he has
high energy, but what if the energy is low? We will again split the node based on the new feature which is
“Motivation”.
Find the entropy of the class variable.

E(S) = -[(9/14)log(9/14) + (5/14)log(5/14)] = 0.94

From the above data for outlook we can arrive at the following
table easily

Now we have to calculate average weighted entropy. ie, we have found the total of weights of each feature multiplied by
probabilities.

Outlook: Sunny, Overcast and

E(3,2) = Entropy(SRain) =-3/5 log 3/5 -2/5 log2/5 =0.971

E(4,0) = Entropy(SOvercast) =-4/4 log 4/4 -0/4 log0/4 =0

E(2,3) = Entropy(SSunny) =-2/5 log 2/5 -3/5 log3/5 =0.971

The next step is to find the information gain. It is the difference between parent entropy and average
weighted entropy we found above.

= Entropy(S) - 5/14 Entropy(SSunny) - 4/14 Entropy(SOvercast) - 5/14 Entropy(SRain)

= 0.94 -5/14 0.971 - 4/14 0 - 5/14*0.971 = 0.2464

Similarly find Information gain for Temperature [9+ 5-] : Hot, Mild and Cool
Play
E(S) = -[(9/14)log(9/14) + (5/14)log(5/14)] = 0.94
Yes No Total
Temp Hot 2 2 4
E(2,2) = Entropy(SHot) = -2/4 log 2/4 -2/4 log2/4 =1.0 Mild 4 2 6
Cool 3 1 4
E(4,2) = Entropy(SMild) =-4/6 log 4/6 -2/6 log2/6 =0.9183
Total 9 5 14

E(3,1) = Entropy(SCool) =-3/4 log 3/4 -1/4 log1/4 =0.8113

= Entropy(S) - 4/14 Entropy(SHot) - 6/14 Entropy(SMild) - 4/14 Entropy(SCool)

= 0.94 -4/14 1.0 - 6/14 0.9183 - 4/14*0.8113 = 0.0289

Similarly find Information gain for Humidity [9+ 5-] : High and Normal
Play
E(S) = -[(9/14)log(9/14) + (5/14)log(5/14)] = 0.94
Yes No Total
Humidity High 3 4 7
E(3,4) = Entropy(SHigh) = -3/7 log 3/7 -4/7 log4/7 =0.9852 Normal 6 1 7
Total 9 5 14
E(6,1) = Entropy(SNormal) =-6/7 log 6/7 -1/7 log1/7 =0.5916

= Entropy(S) - 7/14 Entropy(SHigh) - 7/14 Entropy(SNormal)

= 0.94 -7/14 0.9852 - 7/14 0.5916 = 0.1516

Similarly find Information gain for Wind [9+ 5-] : Weak and Strong
Play
E(S) = -[(9/14)log(9/14) + (5/14)log(5/14)] = 0.94
Yes No Total
Wind Weak 6 2 6
E(6,2) = Entropy(SWeak) = -6/8 log 6/8 -2/8 log2/8 =0.8113 Strong 3 3 8
Total 9 5 14
E(3,3) = Entropy(SStrong) =-3/6 log 3/6 -3/6 log3/6 =1.0

= Entropy(S) - 6/14 Entropy(SStrong) - 8/14 Entropy(SWeak)

= 0.94 -6/14 1.0 - 8/14 0.8113 = 0.0478

Gain(S, Outlook) = 0.2464

Gain(S, Temperature) = 0.289
Gain(S, Humidity) = 0.1516
Gain(S, Wind) = 0.0478
The next step is to find the next node in our decision tree.
Now we will find one under sunny. We have to determine
which of the following Temperature, Humidity or Wind has
higher information gain.

Calculate parent entropy E(sunny) [2,3]:

E(sunny) = (-(3/5)log(3/5)-(2/5)log(2/5)) = 0.971

Similarly find Information gain for Temperature [2+ 3-] : Hot, Mild and Cool
Play
E(sunny) = (-(2/5)log(2/5)-(3/5)log(3/5)) = 0.971
Yes No Total
Temp Hot 0 2 2
E(0,2) = Entropy(SHot) = -0/2 log 0/2 -2/2 log2/2 = 0.0
Mild 1 1 2

E(1,1) = Entropy(SMild) =-1/2 log 1/2 -1/2 log1/1 =1.0 Cool 1 0 1

Total 2 3 5
E(1,0) = Entropy(SCool) =-1/1 log 1/1 -0/1 log0/1 =0.0

= Entropy(S) - 2/5 Entropy(SHot) - 2/5 Entropy(SMild) - 1/5 Entropy(SCool)

= 0.971 -2/5 0.0 - 2/5 1.0 - 1/5*0.0 = 0. 570

Similarly find Information gain for Humidity [2+ 3-] : High and Normal
Play
E(sunny) = (-(2/5)log(2/5)-(3/5)log(3/5)) = 0.971
Yes No Total
Humidity High 0 3 3
E(0,3) = Entropy(SHigh) = -0/3 log 0/3 -3/3 log3/3 =0.0 Normal 2 0 2
Total 2 3 5
E(2,0) = Entropy(SNormal) =-2/2 log 2/2 -0/2 log0/2 =0.0

= Entropy(S) - 3/5 Entropy(SHigh) - 2/5 Entropy(SNormal)

= 0.97 -3/5 0.0 - 2/5 0.0 = 0.97

Similarly find Information gain for Wind [2+ 3-] : Weak and Strong
Play
E(sunny) = (-(2/5)log(2/5)-(3/5)log(3/5)) = 0.971
Yes No Total
Wind Weak 1 1 2
E(1,1) = Entropy(SWeak) = -1/2 log 1/2 -1/2 log1/2 =1.0 Strong 1 2 3
Total 2 3 5
E(1,2) = Entropy(SStrong) =-1/3 log 1/2 -2/3 log2/3 =0.9183

= Entropy(S) - 3/5 Entropy(SStrong) - 2/5 Entropy(SWeak)

= 0.94 -3/5 0.9183 - 2/5 1.0 = 0.0192

Gain(SSunny, Temperature) = 0.570

Gain(SSunny, Humidity) = 0.97
Gain(SSunny, Wind) = 0.0192
The next step is to find the next node in our decision tree.
Now we will find one under Rain. We have to determine
which of the following Temperature, Humidity or Wind has
higher information gain.
Similarly find Information gain for Temperature [2+ 3-] : Hot, Mild and Cool
Play
E(Rain) = (-(2/5)log(2/5)-(3/5)log(3/5)) = 0.971
Yes No Total
Temp Hot 0 0 0
E(0,0) = Entropy(SHot) = -0/0 log 0/0 -0/0 log0/0 = 0.0
Mild 2 1 3

E(2,1) = Entropy(SMild) =-2/3 log 2/3 -1/3 log1/3 =0.9183 Cool 1 1 2

Total 3 2 5
E(1,1) = Entropy(SCool) =-1/2 log 1/2 -1/2 log1/2 =1.0

= Entropy(S) - 0/5 Entropy(SHot) - 3/5 Entropy(SMild) - 2/5 Entropy(SCool)

= 0.971 -0/5 0.0 - 3/5 0.9183 - 2/5*1.0 = 0. 0192

Similarly find Information gain for Humidity [2+ 3-] : High and Normal
Play
E(Rain) = (-(2/5)log(2/5)-(3/5)log(3/5)) = 0.971
Yes No Total
Humidity High 1 1 2
E(1,1) = Entropy(SHigh) = -1/2 log 1/2 -1/2 log1/2 =1.0 Normal 2 1 3
Total 3 2 5
E(2,1) = Entropy(SNormal) =-2/3 log 2/3 -1/3 log1/3 =0.9183

= Entropy(S) - 2/5 Entropy(SHigh) - 3/5 Entropy(SNormal)

= 0.97 -2/5 1.0 - 3/5 0.9183 = 0.0192

Similarly find Information gain for Wind [2+ 3-] : Weak and Strong
Play
E(Rain) = (-(2/5)log(2/5)-(3/5)log(3/5)) = 0.971
Yes No Total
Wind Weak 3 0 3
E(3,0) = Entropy(SWeak) = -3/3 log 3/3 -0/3 log0/3 =0.0 Strong 0 2 2
Total 3 2 5
E(0,2) = Entropy(SStrong) =-0/2 log 0/2 -2/2 log2/2 =0.0

= Entropy(S) - 2/5 Entropy(SStrong) - 3/5 Entropy(SWeak)

= 0.94 -2/5 0.0 - 3/5 0.0 = 0.97

Gain(SRain, Temperature) = 0.0192

Gain(SRain, Humidity) = 0.0192
Gain(SRai, Wind) = 0.97
When to Stop Splitting?

You must be asking this question to yourself that when do we stop growing our tree? Usually, real-world
datasets have a large number of features, which will result in a large number of splits, which in turn gives a
huge tree. Such trees take time to build and can lead to overfitting. That means the tree will give very good
accuracy on the training dataset but will give bad accuracy in test data.

There are many ways to tackle this problem through hyperparameter tuning. We can set the maximum depth of
our decision tree using themax_depth parameter. The more the value of max_depth, the more complex your tree
will be. The training error will off-course decrease if we increase the max_depth value but when our test data
comes into the picture, we will get a very bad accuracy. Hence you need a value that will not overfit as well as
underfit our data and for this,

Another way is to set the minimum number of samples for each spilt. It is denoted by min_samples_split. Here we
specify the minimum number of samples required to do a spilt. For example, we can use a minimum of 10 samples
to reach a decision. That means if a node has less than 10 samples then using this parameter, we can stop the further
splitting of this node and make it a leaf node.
There are more hyperparameters such as :
•min_samples_leaf – represents the minimum number of samples required to be in the leaf node. The
more you increase the number, the more is the possibility of overfitting.
•max_features – it helps us decide what number of features to consider when looking for the best split.

Pruning

Pruning is another method that can help us avoid overfitting. It helps in improving the performance
of the tree by cutting the nodes or sub-nodes which are not significant. Additionally, it removes the
branches which have very low importance.
There are mainly 2 ways for pruning:
•Pre-pruning – we can stop growing the tree earlier, which means we can prune/remove/cut a node
if it has low importance while growing the tree.
•Post-pruning – once our tree is built to its depth, we can start pruning the nodes based on their
significance.

Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
No ratings yet
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
13 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
26 pages
Decision Tree - ID3
No ratings yet
Decision Tree - ID3
39 pages
Decision Tree Induction for Classification
No ratings yet
Decision Tree Induction for Classification
55 pages
Machine Learning: Decision Trees Overview
No ratings yet
Machine Learning: Decision Trees Overview
41 pages
Understanding Decision Trees in AI
No ratings yet
Understanding Decision Trees in AI
61 pages
ID3 Algorithm and Entropy Explained
No ratings yet
ID3 Algorithm and Entropy Explained
45 pages
Decision Tree Learning in Machine Learning
No ratings yet
Decision Tree Learning in Machine Learning
37 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
79 pages
Introduction to Decision Trees Basics
No ratings yet
Introduction to Decision Trees Basics
75 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Overview of Decision Tree Algorithms
No ratings yet
Overview of Decision Tree Algorithms
19 pages
Information Gain in Decision Trees
No ratings yet
Information Gain in Decision Trees
10 pages
Decision Tree Basics for Data Scientists
No ratings yet
Decision Tree Basics for Data Scientists
61 pages
DT Classifier
No ratings yet
DT Classifier
45 pages
Data Mining Mini Projrct
No ratings yet
Data Mining Mini Projrct
16 pages
Decision Trees and ID3 Algorithm Overview
No ratings yet
Decision Trees and ID3 Algorithm Overview
15 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
Decision Tree Random Forest Theory
No ratings yet
Decision Tree Random Forest Theory
13 pages
Understanding Decision Trees for Prediction
No ratings yet
Understanding Decision Trees for Prediction
61 pages
Module 5 Notes
No ratings yet
Module 5 Notes
8 pages
Decision Tree Learning Explained
No ratings yet
Decision Tree Learning Explained
70 pages
Decision Tree Learning Techniques Explained
No ratings yet
Decision Tree Learning Techniques Explained
29 pages
Understanding Decision Trees in Classification
No ratings yet
Understanding Decision Trees in Classification
75 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
7 DecisionTree
No ratings yet
7 DecisionTree
58 pages
Unit 2 Decision Tree
No ratings yet
Unit 2 Decision Tree
26 pages
Unit 3
No ratings yet
Unit 3
90 pages
Building Decision Trees with Entropy
No ratings yet
Building Decision Trees with Entropy
11 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
81 pages
ID3 Decision Trees for ML Beginners
No ratings yet
ID3 Decision Trees for ML Beginners
7 pages
Unit 4 - Decision Tree ID3
No ratings yet
Unit 4 - Decision Tree ID3
5 pages
Lecture-7 Machine Learning With Python
No ratings yet
Lecture-7 Machine Learning With Python
42 pages
Decision Trees
No ratings yet
Decision Trees
25 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Decision Trees for CS Students
No ratings yet
Decision Trees for CS Students
54 pages
Understanding Decision Trees and ID3 Algorithm
No ratings yet
Understanding Decision Trees and ID3 Algorithm
56 pages
Decision Trees for Play Tennis Analysis
No ratings yet
Decision Trees for Play Tennis Analysis
51 pages
Geometric Intuition of Decision Trees
No ratings yet
Geometric Intuition of Decision Trees
7 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
12 pages
Decision Tree
100% (4)
Decision Tree
66 pages
Understanding Decision Tree Algorithms
No ratings yet
Understanding Decision Tree Algorithms
18 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Machine Learning: Professor Department of Computer Science & Engineering
No ratings yet
Machine Learning: Professor Department of Computer Science & Engineering
45 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Decision Trees: Overview and Examples
No ratings yet
Decision Trees: Overview and Examples
22 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
Machine Learning for Engineers
100% (1)
Machine Learning for Engineers
80 pages
Artificial Intelligence 11. Decision Tree Learning
No ratings yet
Artificial Intelligence 11. Decision Tree Learning
25 pages
ID3 Decision Tree Algorithm Overview
No ratings yet
ID3 Decision Tree Algorithm Overview
41 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Project Report FPS GAME Development Using Unity
No ratings yet
Project Report FPS GAME Development Using Unity
40 pages
Telecom Equipment Approvals
No ratings yet
Telecom Equipment Approvals
75 pages
Q-Exactive Software Manual
No ratings yet
Q-Exactive Software Manual
222 pages
3 Client-Side and Server-Side Technologies Basics of HTML HTML Document Structure
No ratings yet
3 Client-Side and Server-Side Technologies Basics of HTML HTML Document Structure
16 pages
5G in India
No ratings yet
5G in India
3 pages
Fender Tone Master Pro
100% (1)
Fender Tone Master Pro
46 pages
E PlMA
No ratings yet
E PlMA
23 pages
10+ Proven Technical Interview Questions (+answers)
No ratings yet
10+ Proven Technical Interview Questions (+answers)
6 pages
Data Processing and Analysis Functions
No ratings yet
Data Processing and Analysis Functions
7 pages
Cognizant - (Important) Cognizant Invites You To Take The Aptitude Assessment
No ratings yet
Cognizant - (Important) Cognizant Invites You To Take The Aptitude Assessment
3 pages
Difference Between Concave and Convex Function
No ratings yet
Difference Between Concave and Convex Function
4 pages
0917 Piminer Raspberry Pi Bitcoin Miner
No ratings yet
0917 Piminer Raspberry Pi Bitcoin Miner
18 pages
Easy Rental: Collaborative Tool Sharing
No ratings yet
Easy Rental: Collaborative Tool Sharing
7 pages
All in One Automation Interview Q&A
No ratings yet
All in One Automation Interview Q&A
49 pages
CAQA6e ch7
No ratings yet
CAQA6e ch7
36 pages
Part EBI Electric
No ratings yet
Part EBI Electric
16 pages
SIP - VOIP - IMS Interview Questions PDF
100% (1)
SIP - VOIP - IMS Interview Questions PDF
5 pages
Telegram Desktop Installation Log
No ratings yet
Telegram Desktop Installation Log
4 pages
Oracle Data Integrator Exam Prep
No ratings yet
Oracle Data Integrator Exam Prep
7 pages
Circuit Air Purifier
No ratings yet
Circuit Air Purifier
1 page
ZTE - CDR Format Document
No ratings yet
ZTE - CDR Format Document
4 pages
ACM 50-600KVA Modular UPS User Manual V1.3
No ratings yet
ACM 50-600KVA Modular UPS User Manual V1.3
81 pages
Mitsubishi EVO I-III Installation Manual
No ratings yet
Mitsubishi EVO I-III Installation Manual
18 pages
Unit 9 Review - Google
No ratings yet
Unit 9 Review - Google
5 pages
SMPBPO101 - 004 v2013 QCCI
No ratings yet
SMPBPO101 - 004 v2013 QCCI
95 pages
Real Time Billionaires List
No ratings yet
Real Time Billionaires List
11 pages
DIY Arduino Power Meter/Logger Guide
No ratings yet
DIY Arduino Power Meter/Logger Guide
14 pages
Nature Themed Poetry Copywork Cursive
No ratings yet
Nature Themed Poetry Copywork Cursive
25 pages
NumPy Basics for Beginners in Python
No ratings yet
NumPy Basics for Beginners in Python
9 pages
Central Control Room Nawara Accommodation Camp
No ratings yet
Central Control Room Nawara Accommodation Camp
1 page

Decision Trees

Uploaded by

Decision Trees

Uploaded by

Decision Trees

➢ A decision tree is a non-parametric supervised learning algorithm.

➢ A decision tree is a flowchart-like structure in which each internal node represents a

ID3, C4.5 and CART

Example of Decision Tree

➢ Decision trees are nothing but a bunch of if-else statements.

• Entropy is nothing but the uncertainty in our dataset or measure of disorder.

• Entropy is a measure of the randomness in the information being processed.

In a decision tree, the output is mostly “yes” or “no”

The formula for Entropy is shown below:

Suppose a feature has 8 “yes” and 4 “no” initially,

Feature 1 is “Energy” which takes two values “high” and “low”

To see the weighted average of entropy of each node we will do as follows:

Let’s calculate the entropy here:

E(S) = -[(9/14)log(9/14) + (5/14)log(5/14)] = 0.94

Outlook: Sunny, Overcast and

E(4,0) = Entropy(SOvercast) =-4/4 log 4/4 -0/4 log0/4 =0

E(2,3) = Entropy(SSunny) =-2/5 log 2/5 -3/5 log3/5 =0.971

= Entropy(S) - 5/14 Entropy(SSunny) - 4/14 Entropy(SOvercast) - 5/14 Entropy(SRain)

= 0.94 -5/14 *0.971 - 4/14 *0 - 5/14*0.971 = 0.2464

E(3,1) = Entropy(SCool) =-3/4 log 3/4 -1/4 log1/4 =0.8113

= Entropy(S) - 4/14 Entropy(SHot) - 6/14 Entropy(SMild) - 4/14 Entropy(SCool)

= 0.94 -4/14 *1.0 - 6/14 *0.9183 - 4/14*0.8113 = 0.0289

= Entropy(S) - 7/14 Entropy(SHigh) - 7/14 Entropy(SNormal)

= 0.94 -7/14 *0.9852 - 7/14 *0.5916 = 0.1516

= Entropy(S) - 6/14 Entropy(SStrong) - 8/14 Entropy(SWeak)

= 0.94 -6/14 *1.0 - 8/14 *0.8113 = 0.0478

Gain(S, Outlook) = 0.2464

Calculate parent entropy E(sunny) [2,3]:

E(sunny) = (-(3/5)log(3/5)-(2/5)log(2/5)) = 0.971

E(1,1) = Entropy(SMild) =-1/2 log 1/2 -1/2 log1/1 =1.0 Cool 1 0 1

= Entropy(S) - 2/5 Entropy(SHot) - 2/5 Entropy(SMild) - 1/5 Entropy(SCool)

= 0.971 -2/5 *0.0 - 2/5 *1.0 - 1/5*0.0 = 0. 570

= Entropy(S) - 3/5 Entropy(SHigh) - 2/5 Entropy(SNormal)

= 0.97 -3/5 *0.0 - 2/5 *0.0 = 0.97

= Entropy(S) - 3/5 Entropy(SStrong) - 2/5 Entropy(SWeak)

= 0.94 -3/5 *0.9183 - 2/5 *1.0 = 0.0192

Gain(SSunny, Temperature) = 0.570

E(2,1) = Entropy(SMild) =-2/3 log 2/3 -1/3 log1/3 =0.9183 Cool 1 1 2

= Entropy(S) - 0/5 Entropy(SHot) - 3/5 Entropy(SMild) - 2/5 Entropy(SCool)

= 0.971 -0/5 *0.0 - 3/5 *0.9183 - 2/5*1.0 = 0. 0192

= Entropy(S) - 2/5 Entropy(SHigh) - 3/5 Entropy(SNormal)

= 0.97 -2/5 *1.0 - 3/5 *0.9183 = 0.0192

= Entropy(S) - 2/5 Entropy(SStrong) - 3/5 Entropy(SWeak)

= 0.94 -2/5 *0.0 - 3/5 *0.0 = 0.97

Gain(SRain, Temperature) = 0.0192

You might also like

= 0.94 -5/14 0.971 - 4/14 0 - 5/14*0.971 = 0.2464

= 0.94 -4/14 1.0 - 6/14 0.9183 - 4/14*0.8113 = 0.0289

= 0.94 -7/14 0.9852 - 7/14 0.5916 = 0.1516

= 0.94 -6/14 1.0 - 8/14 0.8113 = 0.0478

= 0.971 -2/5 0.0 - 2/5 1.0 - 1/5*0.0 = 0. 570

= 0.97 -3/5 0.0 - 2/5 0.0 = 0.97

= 0.94 -3/5 0.9183 - 2/5 1.0 = 0.0192

= 0.971 -0/5 0.0 - 3/5 0.9183 - 2/5*1.0 = 0. 0192

= 0.97 -2/5 1.0 - 3/5 0.9183 = 0.0192

= 0.94 -2/5 0.0 - 3/5 0.0 = 0.97