6th Semester
MACHINE LEARNING
(BCS602)
Module – 3
Chapter 6: Decision Tree Learning
By,
Prof. Samatha R Swamy
ISE Department
Module – 4
Chapter 6: DECISION TREE LEARNING
● Chapter 6 throws light on the concept of
Decision Tree Learning.
● The concept of information theory,
entropy, and information gain are
discussed in this chapter.
● The basics of tree construction algorithms
like ID3, C4.5, CART, and Regression
Trees and its illustration are included in
this chapter.
● The decision tree evaluation is also
introduced here.
Dept of Information Science & Engineering Prof. Samatha R S 2
Module – 4
Chapter 6: DECISION TREE LEARNING
Textbook: Chapter 6 - 6.1 to 6.2
● 6.1 Introduction to Decision Tree
Learning Model
○ 6.1.1 Structure of a Decision Tree
○ 6.1.2 Fundamentals of Entropy
• 6.2Decision Tree Induction Algorithms
○ 6.2.1 ID3 Tree Construction
○ 6.2.2 C4.5 Construction
○ 6.2.3 Classification and Regression
Trees Construction
○ 6.2.4 Regression Trees
Dept of Information Science & Engineering Prof. Samatha R S 3
Module – 4
Chapter 6: DECISION TREE LEARNING
○ The Decision Tree model basically
represents logical rules that predict
the value of a target variable by
inferring from data features.
○ This chapter provides a keen insight
into how to construct a decision tree
and infer knowledge from the tree.
Dept of Information Science & Engineering Prof. Samatha R S 4
Module – 4
Chapter 6: DECISION TREE LEARNING
● Understand the structure of the Decision Tree.
● Know about the fundamentals of Entropy.
● Learn and understand popular
1. Univariate Decision Tree Induction algorithms such as ID3 and C4.5.
2. Multivariate decision tree algorithm such as CART.
● Deal with continuous attributes using improved C4.5 algorithm.
Dept of Information Science & Engineering Prof. Samatha R S 5
Module – 4
Chapter 6: DECISION TREE LEARNING
● Construct Classification and Regression Tree(CART) for
classifying both categorical and continuous-valued target variable.
● Construct Regression Trees where the target feature is a continuous-valued
variable.
● Understand the basics of validating and pruning of decision trees.
Dept of Information Science & Engineering Prof. Samatha R S 6
Module – 4
Chapter 6.1 Introduction to Decision Tree
Learning Model
● Popular Supervised Predictive Learning Models.
● Classifies data instances with high accuracy and
consistency.
● The model performs an inductive inference that reaches a
general conclusion from observed examples.
● Model is variably used for solving complex classification
applications.
● Decision Tree is a concept tree which summarizes the
information contained in the training dataset in the form of
a tree structure.
● Once the concept model is built, test data can be easily
classified.
Dept of Information Science & Engineering Prof. Samatha R S 7
Module – 4
Chapter 6.1 Introduction to Decision Tree Learning Model
● Tree models where the target variable
can take a discrete set of values are
called Classification trees.
● Decision trees where the target
variable can take continuous values
(typically real numbers) are called
Regression trees.
● Classification And Regression Tree
(CART) is general term used.
Dept of Information Science & Engineering Prof. Samatha R S 8
Module – 4
Chapter 6.1 Introduction to Decision Tree Learning Model
● Examples of Tree-Based Learning
algorithms:
○ Decision Trees
○ Random Forest
○ Gradient Boosting
Dept of Information Science & Engineering Prof. Samatha R S 9
Module – 4
Chapter 6.1 Introduction to Decision Tree Learning Model
● Random forest is a commonly-used
machine learning algorithm
trademarked by Leo Breiman and
Adele Cutler, which combines the
output of multiple decision trees to
reach a single result.
● Its ease of use and flexibility have
fueled its adoption, as it handles
both classification and regression
problems.
Dept of Information Science & Engineering Prof. Samatha R S 10
Module – 4
Chapter 6.1 Introduction to Decision Tree Learning Model
● Gradient Boosting is a popular boosting
algorithm in machine learning used for
classification and regression tasks.
● Boosting is one kind of ensemble
Learning method which trains the
model sequentially and each new
model tries to correct the previous
model.
● It combines several weak learners into
strong learners.
Dept of Information Science & Engineering Prof. Samatha R S 11
Module – 4
Chapter 6.1 Introduction to Decision Tree Learning Model
Example: A decision tree is a flowchart-like
structure in which each
● Internal Node represents a test on a feature
(e.g. whether a coin flip comes up heads or tails)
● Each Leaf Node represents a class label
(decision taken after computing all features) and
● Branches represent conjunctions of features
that lead to those class labels.
● The Paths from root to leaf represent
classification rules.
Dept of Information Science & Engineering Prof. Samatha R S 12
Module – 4
Chapter 6.1 Introduction to Decision Tree Learning Model
Dept of Information Science & Engineering Prof. Samatha R S 13
Module – 4
Chapter 6.1 Introduction to Decision Tree Learning Model
● Input to the model are
○ Data instances or
○ Objects with set of features or attributes which can be discrete or
○ Continuous and the output of the model is a decision tree which predicts or
○ Classifies the target class for the test data object.
● In statistical terms input data are called as Independent variables.
● The target feature or target class is called as Response variable.
Dept of Information Science & Engineering Prof. Samatha R S 14
Module – 4
Chapter 6.1 Introduction to Decision Tree Learning Model
Dept of Information Science & Engineering Prof. Samatha R S 15
Module – 4
Chapter 6.1 Introduction to Decision Tree Learning Model
● Given a training dataset X, tree model computes a hypothesis function f(X) as
decision tree.
● Decision Tree Learning model generates a complete hypothesis space in the form
of a tree structure with the given training dataset and allows us to search through
the possible set of hypothesis which in fact would be smaller decision tree as we
walk through the tree. This kind of search bias is called as Preference Bias.
Dept of Information Science & Engineering Prof. Samatha R S 16
Module – 4
6.1.1 Structure of a Decision Tree
1. Root Node: Represents topmost node in the tree.
2. Splitting: A process of dividing a node into two or more
sub-nodes.
3. Decision Node: Internal nodes are the Test nodes also called
Decision Nodes.
4. Leaf/Terminal Node: Nodes that do not split further.
5. Pruning: Process of removing sub-nodes of a decision node.
6. Branch: A subtree/subsection of the entire tree.
7. Parent Node: A node divided into subnodes.
8. Child Node: A node derived from a parent node.
Dept of Information Science & Engineering Prof. Samatha R S 17
Module – 4
6.1.1 Structure of a Decision Tree
❖ Every path from root to a leaf node represents a logical rule that corresponds to
a conjunction of test attributes and the whole tree represents a disjunction of
these conjunctions.
❖ The decision tree model, in general represents a collection of logical rules of
classification in the form of a tree structure.
❖ Decision networks, otherwise called as influence diagrams, have a directed
graph structure with nodes and links.
❖ It is an extension of Bayesian Belief Networks.
Dept of Information Science & Engineering Prof. Samatha R S 18
Module – 4
6.1.1 Structure of a Decision Tree
• Decision trees represent a disjunction of conjunctions of constraints on the
attribute values of instances.
• Each path from the tree root to a leaf corresponds to a conjunction of attribute tests,
and the tree itself to a disjunction of these conjunctions
For example,
The decision tree shown in above figure corresponds to the expression
(Outlook = Sunny 𝖠 Humidity = Normal)
𝗏 (Outlook = Overcast)
𝗏 (Outlook = Rain 𝖠 Wind =
Weak)
Dept of Information Science & Engineering Prof. Samatha R S 19
Module – 4
6.1.1 Structure of a Decision Tree
Nodes in a Decision Tree
Root Node
Decision Node
Leaf Node
Dept of Information Science & Engineering Prof. Samatha R S 20
Module – 4
6.1.1 Structure of a Decision Tree
Building the Tree
1. Goal : Construct a decision tree with the given training dataset.
1. Output : Decision Tree representing the complete hypothesis space.
Dept of Information Science & Engineering Prof. Samatha R S 21
Module – 4
6.1.1 Structure of a Decision Tree
Knowledge Inference or Classification
1. Goal : Given a test instance, infer to the target class it belongs to.
1. Classification : Inferring the target class for the test instance or object is
based on inductive inference on the constructed decision tree.
1. Output : Target label of the test instance.
Dept of Information Science & Engineering Prof. Samatha R S 22
Module – 4
6.1.1 Structure of a Decision Tree
Classification
• Decision trees classify instances by sorting them down the tree from the root to
some leaf node, which provides the classification of the instance.
• Each node in the tree specifies a test of some attribute of the instance, and each
branch descending from that node corresponds to one of the possible values for this
attribute.
• An instance is classified by starting at the root node of the tree, testing the attribute
specified by this node, then moving down the tree branch corresponding to the
value of the attribute in the given example. This process is then repeated for the
subtree rooted at the new node.
Dept of Information Science & Engineering Prof. Samatha R S 23
Module – 4
6.1.1 Structure of a Decision Tree
Advantages of Decision Tree
1. Easy to model and interpret.
2. Simple to understand.
3. The input and output attributes can be discrete or continuous predictor
variables.
4. Can model a high degree of nonlinearity in the relationship between the
target variables and the predictor variables.
5. Quick to train.
Dept of Information Science & Engineering Prof. Samatha R S 24
Module – 4
6.1.1 Structure of a Decision Tree
Disadvantages of Decision Tree
1. It is difficult to determine how deeply a decision tree can be grown or when to stop growing it.
2. If training data has errors or missing attributes values, then the decision tree constructed may
become unstable or biased.
3. If the training data has continuous valued attributes, handling it is computationally complex and
has to be discretized.
4. A complex decision tree may also be over-fitting with the training data.
5. Decision tree learning is not well suited for classifying multiple output classes.
6. Learning an optimal decision tree is also known as known to be NP-complete.
Dept of Information Science & Engineering Prof. Samatha R S 25
Module – 4
6.1.1 Structure of a Decision Tree
Dept of Information Science & Engineering Prof. Samatha R S 26
Module – 4
6.1.1 Structure of a Decision Tree
FIGURE:
A decision tree for the concept
PlayTennis.
An example is classified by
sorting it through the tree to the
appropriate leaf node, then
returning the classification
associated with this leaf
Dept of Information Science & Engineering Prof. Samatha R S 27
Module – 4
6.1.1 Structure of a Decision Tree
An Example of Students in Coaching Class and Non-Coaching Classes
Dept of Information Science & Engineering Prof. Samatha R S 28
Module – 4
6.1.1 Structure of a Decision Tree
A decision tree is not always a binary tree.
It is a tree which can have more than two branches.
Dept of Information Science & Engineering Prof. Samatha R S 29
Module – 4
6.1.1 Structure of a Decision Tree
Example 6.1: How to draw a decision tree to predict a student’s academic performance based on the
given information such as class attendance, class assignments, home-work assignments, tests,
participation in competitions or other events, group activities such as projects and presentations, etc.
ATTRIBUTES VALUES
Class attendance Good, Average, Poor
Class assignments Good, Moderate, Poor
Home-work assignments Yes, No
Assessment Good, Moderate, Poor
Participation in competitions or other events Yes, No
Group activities such as projects and presentations Yes, No
Exam Results Pass, Fail
Dept of Information Science & Engineering Prof. Samatha R S 30
Module – 4
6.1.1 Structure of a Decision Tree
The target feature is the student performance in the final examination whether the student will ‘pass’
or ‘fail’ in the examination. The decision nodes are test nodes which check for conditions. The leaf
nodes represent the outcomes, that is, either ‘pass’ or ‘fail’.
ATTRIBUTES VALUES
Class attendance Good, Average, Poor
Class assignments Good, Moderate, Poor
Home-work assignments Yes, No
Assessment Good, Moderate, Poor
Participation in competitions or other events Yes, No
Group activities such as projects and presentations Yes, No
Exam Results Pass, Fail
Dept of Information Science & Engineering Prof. Samatha R S 31
Module – 4
6.1.1 Structure of a Decision Tree
Example 6.2: Predict student’s academic performance of whether he will pass or fail based on the given
information such as ‘Assessment’ and ‘Assignment’. Independent Variables are ‘Assessment’ and
‘Assignment’. Target Variable is ‘Exam Results’.
ATTRIBUTES VALUES
Assessment >=50, <50
Assignment Yes, No
Exam Results Pass, Fail
Dept of Information Science & Engineering Prof. Samatha R S 32
Module – 4
6.1.1 Structure of a Decision Tree
DECISION TREE PREDICTION: If an instance is given, such as a student has scored 42 marks in his
assessment and has submitted his assignment, then it is predicted with the decision tree that his
exam result is ‘Fail’
Assessment
<50
>=50
Assignment
Yes No
Pass Pass Fail
Dept of Information Science & Engineering Prof. Samatha R S 33
Module – 4
6.1.2 Fundamentals of Entropy
Dept of Information Science & Engineering Prof. Samatha R S 34
Module – 4
6.1.2 Fundamentals of Entropy
Dept of Information Science & Engineering Prof. Samatha R S 35
Module – 4
6.1.2 Fundamentals of Entropy
Dept of Information Science & Engineering Prof. Samatha R S 36
Module – 4
6.1.2 Fundamentals of Entropy
Dept of Information Science & Engineering Prof. Samatha R S 37
Module – 4
6.1.2 Fundamentals of Entropy
Dept of Information Science & Engineering Prof. Samatha R S 38
Module – 4
6.1.2 Fundamentals of Entropy
General Algorithm for Decision Trees:
1. Find the best attribute from the training dataset using an attribute
selection measure and place it at the root of the tree.
2. Split the training dataset into subsets based on the outcomes of the
test attribute and each subset in a branch contains the data instances
or tuples with the same value for the selected test attribute.
3. Repeat step 1 and step 2 on each subset until we end up in leaf nodes
in all the branches of the tree.
4. This splitting process is recursive until the stopping criterion is reached.
Dept of Information Science & Engineering Prof. Samatha R S 39
Module – 4
6.1.2 Fundamentals of Entropy
Stopping Criteria:
The following are some of the common stopping conditions:
1. The data instances are homogenous which means all belong to the
same C, and hence its entropy is 0.
2. A node with some defined minimum number of data instances become
a leaf(Number of data instances in a node is between 0.25 and 1.00%
of the full training dataset).
3. The maximum tree depth is reached, so further splitting is not done
and the node becomes a leaf node.
Dept of Information Science & Engineering Prof. Samatha R S 40
Module – 4
Chapter 6.2: Decision Tree Induction Algorithms
• Most algorithms that have been developed for learning decision treesare
variations on a core algorithm that employs a top-down, greedy search through
the space of possible decision trees.
• This approach is exemplified by the ID3 algorithm and its successor C4.5
Dept of Information Science & Engineering Prof. Samatha R S 41
Module – 4
Chapter 6.2.1: ID3 Tree Construction
• ID3 stands for Iterative Dichotomiser 3
• ID3 is a precursor to the C4.5 Algorithm.
• The ID3 algorithm was invented by Ross Quinlan in 1975
• Used to generate a decision tree from a given data set by employing a top-down,
greedy search, to test each attribute at every node of the tree.
• The resulting tree is used to classify future samples.
Dept of Information Science & Engineering Prof. Samatha R S
42
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 43
Module – 4
Chapter 6.2.1: ID3 Tree Construction
∙ A ← the attribute from Attributes that best* classifies Examples
∙ The decision attribute for Root ← A
∙ For each possible value, vi, of A,
∙ Add a new tree branch below Root, corresponding to the test A = vi
∙ Let Examples vi, be the subset of Examples that have value vi for A
∙ If Examples vi , is empty
∙ Then below this new branch add a leaf node with label = most common
value of Target_attribute in Examples
∙ Else below this new branch add the subtree
ID3(Examples vi, Targe_tattribute, Attributes
– {A}))
∙ End
∙ Return Root
* The best attribute is the one with highest information gain
Dept of Information Science & Engineering Prof. Samatha R S 44
Module – 4
Chapter 6.2.1: ID3 Tree Construction
• The central choice in the ID3 algorithm is selecting which attribute to test at each
node in the tree.
•A statistical property called information gain that measures how well a given attribute
separates the training examples according to their target classification.
• ID3 uses information gain measure to select among the candidate attributes at each step while growing the
tree.
Dept of Information Science & Engineering Prof. Samatha R S 45
Module – 4
Chapter 6.2.1: ID3 Tree Construction
46
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 47
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 48
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 49
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 50
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 51
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 52
Module – 4
Chapter 6.2.1: ID3 Tree Construction
53
Dept of Information Science & Engineering Prof. Samatha R S
Module – 4
Chapter 6.2.1: Entropy And Information Gain
In Decision Tree Learning
Dept of Information Science & Engineering Prof. Samatha R S 54
Module – 4
Chapter 6.2.1: INFORMATION GAIN MEASURES THE
EXPECTED REDUCTION IN ENTROPY
Dept of Information Science & Engineering Prof. Samatha R S 55
Module – 4
Chapter 6. : Decision Trees
Dept of Information Science & Engineering Prof. Samatha R S 56
Module – 4
Chapter 6. : Decision Trees
Dept of Information Science & Engineering Prof. Samatha R S 57
Module – 4
Chapter 6. : Decision Trees
Dept of Information Science & Engineering Prof. Samatha R S 58
Module – 4
Chapter 6: Decision Trees Classifying based on
Descending(Largest) Information Gain
Dept of Information Science & Engineering Prof. Samatha R S 59
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 60
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 61
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 62
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 63
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 64
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 65
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 66
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 67
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 68
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 69
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Example 6.3:
Assess a student’s performance during his course of study and
predict whether a student will get a job offer or not in his final year of the
course. The training dataset T consists of 10 data instances with attributes such
as ‘CGPA’, ‘Interactiveness’, ‘Practical Knowledge’ and ‘Communication Skills’.
The target class attribute is the ‘Job Offer’.
Dept of Information Science & Engineering Prof. Samatha R S 70
Module – 4
Chapter 6.2.1: ID3 Tree Construction
S. No CGPA Interactiveness Practical Communication Skills Job Offer
Knowledge
1 >=9 Yes Very good Good Yes
2 >=8 No Good Moderate Yes
3 >=9 No Average Poor No
4 <8 No Average Good No
5 >=8 Yes Good Moderate Yes
6 >=9 Yes Good Moderate Yes
7 <8 Yes Good Poor No
8 >=9 No Very good Good Yes
9 >=8 Yes Good Good Yes
10 >=8 Yes Average Good Yes
Dept of Information Science & Engineering Prof. Samatha R S 71
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Step 1:
Calculate the Entropy for the target class ‘Job Offer’
Entropy_Info(Target Attribute = Job Offer) = Entropy_Info(7,3)
=> -[7/10 * log2(7/10) + 3/10 * log2(3/10)]
=> -(-0.3599 + -0.5208) = 0.8807
Dept of Information Science & Engineering Prof. Samatha R S 72
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Iteration 1:
Step 2:
Calculate the Entropy_Info and Gain(Information_Gain) for
each of the attribute in the training dataset.
Dept of Information Science & Engineering Prof. Samatha R S
31\ 73
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Table below shows the number of data instances classified with Job
Offer as Yes or No for the attributes CGPA.
Table below shows Entropy Information for CGPA.
CGPA Job Offer = Yes Job Offer = No Total Entropy
>=29 3 1 4
>=8 4 0 4 0
<8 0 2 2 0
Dept of Information Science & Engineering Prof. Samatha R S 74
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Entropy_Info(T, CGPA) = (4/10) * (0.3111 + 0.4997) + 0 + 0
= 0.3243
Gain(CGPA) = 0.8807 - 0.3243
= 0.5564
Dept of Information Science & Engineering Prof. Samatha R S 75
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Table below shows the number of data instances classified with Job
Offer as Yes or No for the attributes Interactiveness.
Table below shows Entropy Information for Interactiveness.
Interactiveness Job Offer = Yes Job Offer = No Total Entropy
YES 5 1 6
NO 2 2 4
Dept of Information Science & Engineering Prof. Samatha R S 76
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Entropy_Info(T, Interactiveness)
= (6/10) * (0.2191 + 0.4306) + (4/10)(0.4997 + 0.4997)
= 0.3898 + 0.3998
= 0.7896
Gain(Interactiveness) = 0.8807 - 0.7896
= 0.0911
Dept of Information Science & Engineering Prof. Samatha R S 77
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Table 6.6 shows the number of data instances classified with Job
Offer as Yes or No for the attributes Practical Knowledge.
Table below shows Entropy Information for Practical Knowledge.
Practical Job Offer = Yes Job Offer = No Total Entropy
Knowledge
Very Good 2 0 2 0
Average 1 2 3
Good 4 1 5
Dept of Information Science & Engineering Prof. Samatha R S 78
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Entropy_Info(T, Practical Knowledge)
= (2/10) * (0) + (3/10)(0.5280 + 0.3897) + 5/10 (0.2574 + 0.4641)
= 0 + 0.2753 + 0.3608
= 0.6361
Gain(Practical Knowledge) = 0.8807 - 0.6361
= 0.02446
Dept of Information Science & Engineering Prof. Samatha R S 79
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Table below shows the number of data instances classified with Job
Offer as Yes or No for the attributes Communication Skills.
Table below shows Entropy Information for Communication Skills.
Communication Job Offer = Yes Job Offer = No Total
Skills
Good 4 1 5
Moderate 3 0 3
Poor 0 2 2
Dept of Information Science & Engineering Prof. Samatha R S 80
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Entropy_Info(T, Communication Skills)
= (5/10) * (0.5280 + 0.3897) + (3/10)*(0) + 2/10*(0)
= 0.3609
Gain(Communication Skills) = 0.8807 - 0.36096
= 0.5203
Dept of Information Science & Engineering Prof. Samatha R S 81
Module – 4
Chapter 6.2.1: ID3 Tree Construction
The Gain calculated for all the attributes is shown in Table below
Attributes Gain
CGPA 0.5564
Interactiveness 0.0911
Practical Knowledge 0.02446
Communication Skills 0.5203
Dept of Information Science & Engineering Prof. Samatha R S 82
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Step 3:
From Gain table, choose the attribute for which entropy is minimum and
therefore the gain is maximum as the best split attribute.
The best split attribute is CGPA since it has the maximum gain. So, we choose
CGPA as the root node. There are 3 distinct values for CGPA with outcomes >=9,
>=8 and >8.
The entropy value is 0 for >=8 and <8 with all instances classified as Job Offer =
Yes for >=8 and Job Offer = No for <8. Hence, both >=8 and <8 end up in a leaf
node.
Dept of Information Science & Engineering Prof. Samatha R S 83
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Decision Tree after Iteration 1
>=9 CGPA
<8
Communication Job Job Total
Skills Offer = Offer = Job Offer = No
Yes No
>=8
Yes Very Good Yes
Good
No Average Poor No
Job Offer = Yes
YES Good Moderate Yes
YES Very Good Yes
Good
Dept of Information Science & Engineering Prof. Samatha R S 84
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Continue the same process for the subset of data instances
branched with CGPA >=9.
In this iteration, the same process of computing the Entropy_Info
and Gain are repeated with the subset of training set. The subset
consists of 4 data instances.
Dept of Information Science & Engineering Prof. Samatha R S 85
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Iteration 2:
Calculate the Entropy for the target class ‘Job Offer’
Entropy_Info(Target Attribute = Job Offer) = Entropy_Info(3,1)
=> -[3/4 * log2(3/4) + 1/4 * log2(1/4)]
=> -(-0.3111 + -0.4997) = 0.8108
Dept of Information Science & Engineering Prof. Samatha R S 86
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Entropy_Info(T, Interactiveness)
= 0 + 0.4997
= 0.3609
Gain(Interactiveness) = 0.8108 - 0.4997
= 0.3111
Dept of Information Science & Engineering Prof. Samatha R S 87
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Entropy_Info(T, Practical Knowledge) = 0
Gain(Practical Knowledge) = 0.8108
Entropy_Info(T, Communication Skills) = 0
Gain(Communication Skills) = 0.8108
Dept of Information Science & Engineering Prof. Samatha R S 88
Module – 4
Chapter 6.2.1: ID3 Tree Construction
The Gain calculated for all the attributes is shown in Table below
Attributes Gain
Interactiveness 0.3111
Practical Knowledge 0.8108
Communication Skills 0.8108
Both the attributes ‘Practical Knowledge’ and ‘Communication Skills’ have the
same Gain. So we can either construct the decision tree using ‘Practical
Knowledge’ or ‘Communication Skills’.
Dept of Information Science & Engineering Prof. Samatha R S 89
Module – 4
Chapter 6.2.1: ID3 Tree Construction
>=8 CGPA
<8
Job Offer = Yes
>=9 Job Offer = Yes
PK
Good Very
Average Good
Job Offer = Yes
Job Offer = Yes Job Offer = Yes
Dept of Information Science & Engineering Prof. Samatha R S 90
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 91
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 92
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 93
Module – 3
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 94
Module – 3
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 95
Module – 3
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 96
Module – 4
Chapter 6.2.1: ID3 Tree Construction
Dept of Information Science & Engineering Prof. Samatha R S 97
Module – 4
6.2.2 C4.5 Construction
Dept of Information Science & Engineering Prof. Samatha R S 98
Module – 4
6.2.2 C4.5 Construction - Algorithm Description
Dept of Information Science & Engineering Prof. Samatha R S 99
Module – 4
6.2.2 C4.5 Construction - Simplified Algorithm
Dept of Information Science & Engineering Prof. Samatha R S 100
Module – 4
6.2.2 C4.5 Construction - Simplified Algorithm
Dept of Information Science & Engineering Prof. Samatha R S 101
Module – 4
6.2.2 C4.5 Construction - Simplified Algorithm
Dept of Information Science & Engineering Prof. Samatha R S 102
Module – 3
Chapter 6.2.3 Classification and Regression
Trees Construction - CART
Dept of Information Science & Engineering Prof. Samatha R S 103
Module – 3
Chapter 6.2.3 Classification and Regression
Trees Construction - Working of CART
Dept of Information Science & Engineering Prof. Samatha R S 104
Module – 3
Chapter 6.2.3 Classification and Regression
Trees Construction - CART Algorithm
Dept of Information Science & Engineering Prof. Samatha R S 105
Module – 4
Chapter 6.2.3: Classification Tree Construction
S. No CGPA Interactiveness Practical Communication Skills Job Offer
Knowledge
1 >=9 Yes Very good Good Yes
2 >=8 No Good Moderate Yes
3 >=9 No Average Poor No
4 <8 No Average Good No
5 >=8 Yes Good Moderate Yes
6 >=9 Yes Good Moderate Yes
7 <8 Yes Good Poor No
8 >=9 No Very good Good Yes
9 >=8 Yes Good Good Yes
10 >=8 Yes Average Good Yes
Dept of Information Science & Engineering Prof. Samatha R S 106
Module – 4
Chapter 6.2.3: Classification Tree Construction
CGPA Job Offer = YES Job Offer = NO
>=9 3 1
>=8 4 0
<8 0 2
Dept of Information Science & Engineering Prof. Samatha R S 107
Module – 4
Chapter 6.2.3: Classification Tree Construction
Combination Gini_Index
1 >=9, >=8 0.2194
2 <8 0
3 ((>=9,>=8), <8) 0.17552
4 (>=9, <8) 0.5
5 (>=8) 0
6 ((>=9,<8), >=8) 0.3
7 (>=8, <8) 0.445
8 >=9 0.375
9 ((>=8, <8), >=9) 0.417
Dept of Information Science & Engineering Prof. Samatha R S 108
Module – 4
Chapter 6.2.3: Classification Tree Construction
Combination Gini_Index Choose Sub_Sets
1 >=9, >=8 0.2194
2 <8 0
3 ((>=9,>=8), <8) 0.17552
4 (>=9, <8) 0.5
5 (>=8) 0
6 ((>=9,<8), >=8) 0.3
7 (>=8, <8) 0.445
8 >=9 0.375
9 ((>=8, <8), >=9) 0.417
Dept of Information Science & Engineering Prof. Samatha R S 109
Module – 4
Chapter 6.2.3: Classification Tree Construction
Combination Gini_Index Choose
Sub_Sets(Minimum Index)
1 ((>=9,>=8), <8) 0.17552
2 ((>=9,<8), >=8) 0.3
3 ((>=8, <8), >=9) 0.417
Dept of Information Science & Engineering Prof. Samatha R S 110
Module – 4
Chapter 6.2.3: Classification Tree Construction
Best Splitting Attribute of CGPA = 0.2445
Dept of Information Science & Engineering Prof. Samatha R S 111
Module – 4
Chapter 6.2.3: Classification Tree Construction
Interactiveness Job_Offer = YES Job_Offer = NO
YES 5 1
NO 2 2
Dept of Information Science & Engineering Prof. Samatha R S 112
Module – 4
Chapter 6.2.3: Classification Tree Construction
Combination Gini_Index Choose Sub_Sets
1 Yes 0.28
2 No 0.5
3 Yes, No 0.368
Dept of Information Science & Engineering Prof. Samatha R S 113
Module – 4
Chapter 6.2.3: Classification Tree Construction
Best Splitting Attribute of Interactiveness = 0.052
Dept of Information Science & Engineering Prof. Samatha R S 114
Module – 4
Chapter 6.2.3: Classification Tree Construction
Practical Job_Offer = YES Job_Offer = NO
Knowledge
Very Good 2 0
Good 4 1
Average 1 2
Dept of Information Science & Engineering Prof. Samatha R S 115
Module – 4
Chapter 6.2.3: Classification Tree Construction
Combination Gini_Index Choose Sub_Sets
1 Very Good, Good 0.2456
2 Average 0.445
3 ((Very Good, Good), Average) 0.3054
4 (Very Good, Average) 0.48
5 Good 0.32
6 ((Very Good, Average), Good) 0.40
7 (Very Good, Average) 0.4688
8 Very Good 0
9 ((Good, Average), Very Good) 0.3750
Dept of Information Science & Engineering Prof. Samatha R S 116
Module – 4
Chapter 6.2.3: Classification Tree Construction
Best Splitting Attribute of Practical Knowledge = 0.1146
Dept of Information Science & Engineering Prof. Samatha R S 117
Module – 4
6.2.4 Regression Trees
Decision Tree Algorithm
● The core algorithm for building decision trees called ID3 by J. R. Quinlan which
employs a top-down greedy, search through the space of possible branches with no
backtracking.
● The ID3 algorithm can be used to construct a decision tree for regression by replacing
Information Gain with Standard Deviation Reduction.
Dept of Information Science & Engineering Prof. Samatha R S 118
Module – 4
6.2.4 Regression Trees
● Regression trees are a variant of decision trees where the target
feature is a continuous valued variable.
● Regression trees can be constructed using an algorithm called
reduction in variance which uses standard deviation to choose the
best splitting attribute.
Dept of Information Science & Engineering Prof. Samatha R S 119
Module – 4
6.2.4 Regression Trees
Algorithm 6.5: Procedure for Constructing Regression Trees
1. Compute standard deviation for each attribute with respect to target attribute.
1. Compute standard deviation for the number of data instances of each distinct value
of an attribute.
1. Compute weighted standard deviation for each attribute.
1. Compute standard deviation reduction by subtracting weighted standard deviation
for each attribute from standard deviation of each attribute.
Dept of Information Science & Engineering Prof. Samatha R S 120
Module – 4
6.2.4 Regression Trees
Algorithm 6.5: Procedure for Constructing Regression Trees
5. Choose the attribute with a higher standard deviation reduction as the best split
attribute.
6. The best split attribute is placed as the root node.
7. The root node is branched into subtrees with each subtree as an outcome of the test
condition of the root node attribute. Accordingly, the training dataset ia also split into
different subsets.
8. Recursively apply the same operation for the subset of the training set with the
remaining attributes until a leaf node is derived or no more training instances are
available in the subset.
Dept of Information Science & Engineering Prof. Samatha R S 121
Module – 4
6.2.4 Regression Trees
S. No Assessment Assignment Project Result(%)
1 Good Yes Yes 95
2 Average Yes No 70
3 Good No Yes 75
4 Poor No No 45
5 Good Yes Yes 98
6 Average No Yes 80
7 Good No No 75
8 Poor Yes Yes 65
9 Average No No 58
10 Good Yes Yes 89
Dept of Information Science & Engineering Prof. Samatha R S 122
Module – 4
6.2.4 Regression Trees
S. No Assessment Assignment Project Result(%)
1 Good Yes Yes 95
2 Average Yes No 70
3 Good No Yes 75
4 Poor No No 45
5 Good Yes Yes 98
6 Average No Yes 80
7 Good No No 75
8 Poor Yes Yes 65
9 Average No No 58
10 Good Yes Yes 89
Dept of Information Science & Engineering Prof. Samatha R S 123
Module – 4
6.2.4 Regression Trees
Average = 75
Standard Deviation = 16.55
Dept of Information Science & Engineering Prof. Samatha R S 124
Wednesday(Feb 28th, 2024) &
Saturday(March 2nd, 2024)
Module – 4
6.2.4 Regression Trees
Assessment = GOOD
S. No Assessment Assignment Project Result(%)
1 Good Yes Yes 95
2 Good No Yes 75
3 Good Yes Yes 98
4 Good No No 75
5 Good Yes Yes 89
Dept of Information Science & Engineering Prof. Samatha R S 125
Module – 4
6.2.4 Regression Trees
Average = 86.4
Standard Deviation = 10.9
Dept of Information Science & Engineering Prof. Samatha R S 126
Module – 4
6.2.4 Regression Trees
Assessment = AVERAGE
S. No Assessment Assignment Project Result(%)
1 Average Yes No 70
2 Average No Yes 80
3 Average No No 58
Dept of Information Science & Engineering Prof. Samatha R S 127
Module – 4
6.2.4 Regression Trees
Average = 69.3
Standard Deviation = 11.01
Dept of Information Science & Engineering Prof. Samatha R S 128
Module – 4
6.2.4 Regression Trees
Assessment = POOR
S. No Assessment Assignment Project Result(%)
1 Poor No No 45
2 Poor Yes Yes 65
Dept of Information Science & Engineering Prof. Samatha R S 129
Module – 4
6.2.4 Regression Trees
Average = 55
Standard Deviation = 14.14
Dept of Information Science & Engineering Prof. Samatha R S 130
Module – 4
6.2.4 Regression Trees
Weighted Standard Deviation for Assessment = 11.58
Standard Deviation for Assessment = 16.55 - 11.58 = 4.97
Dept of Information Science & Engineering Prof. Samatha R S 131
Module – 4
6.2.4 Regression Trees
Standard Deviation for Assessment
Assessment Standard Deviation Data Instances
Good 10.9 5
Average 11.01 3
Poor 14.14 2
Dept of Information Science & Engineering Prof. Samatha R S 132
Module – 4
6.2.4 Regression Trees
S. No Assessment Assignment Project Result(%)
1 Good Yes Yes 95
2 Average Yes No 70
3 Good No Yes 75
4 Poor No No 45
5 Good Yes Yes 98
6 Average No Yes 80
7 Good No No 75
8 Poor Yes Yes 65
9 Average No No 58
10 Good Yes Yes 89
Dept of Information Science & Engineering Prof. Samatha R S 133
Module – 4
6.2.4 Regression Trees
Assignment = YES
S. No Assessment Assignment Project Result(%)
1 Good Yes Yes 95
2 Average Yes No 70
3 Good Yes Yes 98
4 Poor Yes Yes 65
5 Good Yes Yes 89
Dept of Information Science & Engineering Prof. Samatha R S 134
Module – 4
6.2.4 Regression Trees
Average = 83.4
Standard Deviation = 14.98
Dept of Information Science & Engineering Prof. Samatha R S 135
Module – 4
6.2.4 Regression Trees
Assignment = NO
S. No Assessment Assignment Project Result(%)
1 Good No Yes 75
2 Poor No No 45
3 Average No Yes 80
4 Good No No 75
5 Average No No 58
Dept of Information Science & Engineering Prof. Samatha R S 136
Module – 4
6.2.4 Regression Trees
Average = 66.6
Standard Deviation = 14.7
Dept of Information Science & Engineering Prof. Samatha R S 137
Module – 4
6.2.4 Regression Trees
Standard Deviation for Assignment
Assignment Standard Deviation Data Instances
Yes 14.98 5
No 14.7 5
Dept of Information Science & Engineering Prof. Samatha R S 138
Module – 4
6.2.4 Regression Trees
Weighted Standard Deviation for Assessment = 14.84
Standard Deviation for Assessment = 16.55 - 14.84 = 1.71
Dept of Information Science & Engineering Prof. Samatha R S 139
Module – 4
6.2.4 Regression Trees
S. No Assessment Assignment Project Result(%)
1 Good Yes Yes 95
2 Average Yes No 70
3 Good No Yes 75
4 Poor No No 45
5 Good Yes Yes 98
6 Average No Yes 80
7 Good No No 75
8 Poor Yes Yes 65
9 Average No No 58
10 Good Yes Yes 89
Dept of Information Science & Engineering Prof. Samatha R S 140
Module – 4
6.2.4 Regression Trees
Project = YES
S. No Assessment Assignment Project Result(%)
1 Good Yes Yes 95
2 Good No Yes 75
3 Good Yes Yes 98
4 Average No Yes 80
5 Poor Yes Yes 65
6 Good Yes Yes 89
Dept of Information Science & Engineering Prof. Samatha R S 141
Module – 4
6.2.4 Regression Trees
Average = 83.7
Standard Deviation = 12.6
Dept of Information Science & Engineering Prof. Samatha R S 142
Module – 4
6.2.4 Regression Trees
Project = NO
S. No Assessment Assignment Project Result(%)
1 Average Yes No 70
2 Poor No No 45
3 Good No No 75
4 Average No No 58
Dept of Information Science & Engineering Prof. Samatha R S 143
Module – 4
6.2.4 Regression Trees
Average = 62
Standard Deviation = 13.39
Dept of Information Science & Engineering Prof. Samatha R S 144
Module – 4
6.2.4 Regression Trees
Standard Deviation for Project
Project Standard Deviation Data Instances
Yes 12.6 6
No 13.39 4
Dept of Information Science & Engineering Prof. Samatha R S 145
Module – 4
6.2.4 Regression Trees
Weighted Standard Deviation for Assessment = 12.92
Standard Deviation for Assessment = 16.55 - 12.92 = 3.63
Dept of Information Science & Engineering Prof. Samatha R S 146
Module – 4
6.2.4 Regression Trees
Standard Deviation for Reduction for each attribute in the training dataset.
Attributes Standard Deviation Reduction
Assessment 4.97
Assignment 1.71
Project 3.63
The attribute ‘Assessment’ has the maximum Standard Deviation
Reduction and hence it is chosen as the best splitting attribute.
Dept of Information Science & Engineering Prof. Samatha R S 147
Module – 4
6.2.4 Regression Trees
The training dataset is split into subsets based on the attribute “Assessment” and
the process is continued until the entire tree is constructed.
Assessment
GOOD POOR
AVERAGE
Dept of Information Science & Engineering Prof. Samatha R S 148
Module – 4
6.2.4 Regression Trees
A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy), each representing values for the
attribute tested. Leaf node (e.g., Hours Played) represents a decision on the numerical target. The topmost decision node in a
tree which corresponds to the best predictor called root node. Decision trees can handle both categorical and numerical
data.
Dept of Information Science & Engineering Prof. Samatha R S 149
Thank You
Any Queries:
Mail -
[email protected]