0% found this document useful (0 votes)
84 views150 pages

Module 3 - Chapter 6 - Decision Tree Learning

Chapter 6 of the document focuses on Decision Tree Learning, covering concepts such as information theory, entropy, and various tree construction algorithms like ID3, C4.5, and CART. It explains the structure of decision trees, their evaluation, and practical applications in classification and regression tasks. The chapter also discusses the advantages and disadvantages of decision trees, providing insights into their use in machine learning.

Uploaded by

heyitsyoyo135
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views150 pages

Module 3 - Chapter 6 - Decision Tree Learning

Chapter 6 of the document focuses on Decision Tree Learning, covering concepts such as information theory, entropy, and various tree construction algorithms like ID3, C4.5, and CART. It explains the structure of decision trees, their evaluation, and practical applications in classification and regression tasks. The chapter also discusses the advantages and disadvantages of decision trees, providing insights into their use in machine learning.

Uploaded by

heyitsyoyo135
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 150

6th Semester

MACHINE LEARNING
(BCS602)

Module – 3

Chapter 6: Decision Tree Learning

By,
Prof. Samatha R Swamy
ISE Department
Module – 4

Chapter 6: DECISION TREE LEARNING


● Chapter 6 throws light on the concept of
Decision Tree Learning.

● The concept of information theory,


entropy, and information gain are
discussed in this chapter.

● The basics of tree construction algorithms


like ID3, C4.5, CART, and Regression
Trees and its illustration are included in
this chapter.

● The decision tree evaluation is also


introduced here.
Dept of Information Science & Engineering Prof. Samatha R S 2
Module – 4

Chapter 6: DECISION TREE LEARNING


Textbook: Chapter 6 - 6.1 to 6.2

● 6.1 Introduction to Decision Tree


Learning Model
○ 6.1.1 Structure of a Decision Tree
○ 6.1.2 Fundamentals of Entropy
• 6.2Decision Tree Induction Algorithms
○ 6.2.1 ID3 Tree Construction
○ 6.2.2 C4.5 Construction
○ 6.2.3 Classification and Regression
Trees Construction
○ 6.2.4 Regression Trees

Dept of Information Science & Engineering Prof. Samatha R S 3


Module – 4

Chapter 6: DECISION TREE LEARNING

○ The Decision Tree model basically


represents logical rules that predict
the value of a target variable by
inferring from data features.

○ This chapter provides a keen insight


into how to construct a decision tree
and infer knowledge from the tree.

Dept of Information Science & Engineering Prof. Samatha R S 4


Module – 4

Chapter 6: DECISION TREE LEARNING

● Understand the structure of the Decision Tree.

● Know about the fundamentals of Entropy.

● Learn and understand popular


1. Univariate Decision Tree Induction algorithms such as ID3 and C4.5.
2. Multivariate decision tree algorithm such as CART.

● Deal with continuous attributes using improved C4.5 algorithm.


Dept of Information Science & Engineering Prof. Samatha R S 5
Module – 4

Chapter 6: DECISION TREE LEARNING

● Construct Classification and Regression Tree(CART) for


classifying both categorical and continuous-valued target variable.

● Construct Regression Trees where the target feature is a continuous-valued


variable.

● Understand the basics of validating and pruning of decision trees.

Dept of Information Science & Engineering Prof. Samatha R S 6


Module – 4

Chapter 6.1 Introduction to Decision Tree


Learning Model

● Popular Supervised Predictive Learning Models.


● Classifies data instances with high accuracy and
consistency.
● The model performs an inductive inference that reaches a
general conclusion from observed examples.
● Model is variably used for solving complex classification
applications.
● Decision Tree is a concept tree which summarizes the
information contained in the training dataset in the form of
a tree structure.
● Once the concept model is built, test data can be easily
classified.
Dept of Information Science & Engineering Prof. Samatha R S 7
Module – 4

Chapter 6.1 Introduction to Decision Tree Learning Model

● Tree models where the target variable


can take a discrete set of values are
called Classification trees.
● Decision trees where the target
variable can take continuous values
(typically real numbers) are called
Regression trees.

● Classification And Regression Tree


(CART) is general term used.

Dept of Information Science & Engineering Prof. Samatha R S 8


Module – 4

Chapter 6.1 Introduction to Decision Tree Learning Model

● Examples of Tree-Based Learning


algorithms:
○ Decision Trees
○ Random Forest
○ Gradient Boosting

Dept of Information Science & Engineering Prof. Samatha R S 9


Module – 4

Chapter 6.1 Introduction to Decision Tree Learning Model

● Random forest is a commonly-used


machine learning algorithm
trademarked by Leo Breiman and
Adele Cutler, which combines the
output of multiple decision trees to
reach a single result.
● Its ease of use and flexibility have
fueled its adoption, as it handles
both classification and regression
problems.

Dept of Information Science & Engineering Prof. Samatha R S 10


Module – 4

Chapter 6.1 Introduction to Decision Tree Learning Model

● Gradient Boosting is a popular boosting


algorithm in machine learning used for
classification and regression tasks.
● Boosting is one kind of ensemble
Learning method which trains the
model sequentially and each new
model tries to correct the previous
model.
● It combines several weak learners into
strong learners.

Dept of Information Science & Engineering Prof. Samatha R S 11


Module – 4

Chapter 6.1 Introduction to Decision Tree Learning Model

Example: A decision tree is a flowchart-like


structure in which each

● Internal Node represents a test on a feature


(e.g. whether a coin flip comes up heads or tails)
● Each Leaf Node represents a class label
(decision taken after computing all features) and
● Branches represent conjunctions of features
that lead to those class labels.
● The Paths from root to leaf represent
classification rules.
Dept of Information Science & Engineering Prof. Samatha R S 12
Module – 4

Chapter 6.1 Introduction to Decision Tree Learning Model

Dept of Information Science & Engineering Prof. Samatha R S 13


Module – 4

Chapter 6.1 Introduction to Decision Tree Learning Model

● Input to the model are


○ Data instances or
○ Objects with set of features or attributes which can be discrete or
○ Continuous and the output of the model is a decision tree which predicts or
○ Classifies the target class for the test data object.

● In statistical terms input data are called as Independent variables.

● The target feature or target class is called as Response variable.

Dept of Information Science & Engineering Prof. Samatha R S 14


Module – 4

Chapter 6.1 Introduction to Decision Tree Learning Model

Dept of Information Science & Engineering Prof. Samatha R S 15


Module – 4

Chapter 6.1 Introduction to Decision Tree Learning Model

● Given a training dataset X, tree model computes a hypothesis function f(X) as


decision tree.

● Decision Tree Learning model generates a complete hypothesis space in the form
of a tree structure with the given training dataset and allows us to search through
the possible set of hypothesis which in fact would be smaller decision tree as we
walk through the tree. This kind of search bias is called as Preference Bias.

Dept of Information Science & Engineering Prof. Samatha R S 16


Module – 4

6.1.1 Structure of a Decision Tree


1. Root Node: Represents topmost node in the tree.
2. Splitting: A process of dividing a node into two or more
sub-nodes.
3. Decision Node: Internal nodes are the Test nodes also called
Decision Nodes.
4. Leaf/Terminal Node: Nodes that do not split further.
5. Pruning: Process of removing sub-nodes of a decision node.
6. Branch: A subtree/subsection of the entire tree.
7. Parent Node: A node divided into subnodes.
8. Child Node: A node derived from a parent node.
Dept of Information Science & Engineering Prof. Samatha R S 17
Module – 4

6.1.1 Structure of a Decision Tree


❖ Every path from root to a leaf node represents a logical rule that corresponds to
a conjunction of test attributes and the whole tree represents a disjunction of
these conjunctions.
❖ The decision tree model, in general represents a collection of logical rules of
classification in the form of a tree structure.
❖ Decision networks, otherwise called as influence diagrams, have a directed
graph structure with nodes and links.
❖ It is an extension of Bayesian Belief Networks.
Dept of Information Science & Engineering Prof. Samatha R S 18
Module – 4

6.1.1 Structure of a Decision Tree

• Decision trees represent a disjunction of conjunctions of constraints on the


attribute values of instances.
• Each path from the tree root to a leaf corresponds to a conjunction of attribute tests,
and the tree itself to a disjunction of these conjunctions

For example,
The decision tree shown in above figure corresponds to the expression
(Outlook = Sunny 𝖠 Humidity = Normal)
𝗏 (Outlook = Overcast)
𝗏 (Outlook = Rain 𝖠 Wind =
Weak)
Dept of Information Science & Engineering Prof. Samatha R S 19
Module – 4

6.1.1 Structure of a Decision Tree


Nodes in a Decision Tree

Root Node

Decision Node

Leaf Node

Dept of Information Science & Engineering Prof. Samatha R S 20


Module – 4

6.1.1 Structure of a Decision Tree


Building the Tree

1. Goal : Construct a decision tree with the given training dataset.

1. Output : Decision Tree representing the complete hypothesis space.

Dept of Information Science & Engineering Prof. Samatha R S 21


Module – 4

6.1.1 Structure of a Decision Tree


Knowledge Inference or Classification
1. Goal : Given a test instance, infer to the target class it belongs to.

1. Classification : Inferring the target class for the test instance or object is
based on inductive inference on the constructed decision tree.

1. Output : Target label of the test instance.

Dept of Information Science & Engineering Prof. Samatha R S 22


Module – 4

6.1.1 Structure of a Decision Tree

Classification
• Decision trees classify instances by sorting them down the tree from the root to
some leaf node, which provides the classification of the instance.
• Each node in the tree specifies a test of some attribute of the instance, and each
branch descending from that node corresponds to one of the possible values for this
attribute.
• An instance is classified by starting at the root node of the tree, testing the attribute
specified by this node, then moving down the tree branch corresponding to the
value of the attribute in the given example. This process is then repeated for the
subtree rooted at the new node.

Dept of Information Science & Engineering Prof. Samatha R S 23


Module – 4

6.1.1 Structure of a Decision Tree


Advantages of Decision Tree
1. Easy to model and interpret.
2. Simple to understand.
3. The input and output attributes can be discrete or continuous predictor
variables.
4. Can model a high degree of nonlinearity in the relationship between the
target variables and the predictor variables.
5. Quick to train.
Dept of Information Science & Engineering Prof. Samatha R S 24
Module – 4

6.1.1 Structure of a Decision Tree


Disadvantages of Decision Tree
1. It is difficult to determine how deeply a decision tree can be grown or when to stop growing it.
2. If training data has errors or missing attributes values, then the decision tree constructed may
become unstable or biased.
3. If the training data has continuous valued attributes, handling it is computationally complex and
has to be discretized.
4. A complex decision tree may also be over-fitting with the training data.
5. Decision tree learning is not well suited for classifying multiple output classes.
6. Learning an optimal decision tree is also known as known to be NP-complete.

Dept of Information Science & Engineering Prof. Samatha R S 25


Module – 4

6.1.1 Structure of a Decision Tree

Dept of Information Science & Engineering Prof. Samatha R S 26


Module – 4

6.1.1 Structure of a Decision Tree

FIGURE:

A decision tree for the concept


PlayTennis.

An example is classified by
sorting it through the tree to the
appropriate leaf node, then
returning the classification
associated with this leaf

Dept of Information Science & Engineering Prof. Samatha R S 27


Module – 4

6.1.1 Structure of a Decision Tree

An Example of Students in Coaching Class and Non-Coaching Classes

Dept of Information Science & Engineering Prof. Samatha R S 28


Module – 4

6.1.1 Structure of a Decision Tree

A decision tree is not always a binary tree.


It is a tree which can have more than two branches.

Dept of Information Science & Engineering Prof. Samatha R S 29


Module – 4

6.1.1 Structure of a Decision Tree


Example 6.1: How to draw a decision tree to predict a student’s academic performance based on the
given information such as class attendance, class assignments, home-work assignments, tests,
participation in competitions or other events, group activities such as projects and presentations, etc.
ATTRIBUTES VALUES

Class attendance Good, Average, Poor

Class assignments Good, Moderate, Poor

Home-work assignments Yes, No

Assessment Good, Moderate, Poor

Participation in competitions or other events Yes, No

Group activities such as projects and presentations Yes, No

Exam Results Pass, Fail

Dept of Information Science & Engineering Prof. Samatha R S 30


Module – 4

6.1.1 Structure of a Decision Tree


The target feature is the student performance in the final examination whether the student will ‘pass’
or ‘fail’ in the examination. The decision nodes are test nodes which check for conditions. The leaf
nodes represent the outcomes, that is, either ‘pass’ or ‘fail’.
ATTRIBUTES VALUES

Class attendance Good, Average, Poor

Class assignments Good, Moderate, Poor

Home-work assignments Yes, No

Assessment Good, Moderate, Poor

Participation in competitions or other events Yes, No

Group activities such as projects and presentations Yes, No

Exam Results Pass, Fail

Dept of Information Science & Engineering Prof. Samatha R S 31


Module – 4

6.1.1 Structure of a Decision Tree


Example 6.2: Predict student’s academic performance of whether he will pass or fail based on the given
information such as ‘Assessment’ and ‘Assignment’. Independent Variables are ‘Assessment’ and
‘Assignment’. Target Variable is ‘Exam Results’.

ATTRIBUTES VALUES
Assessment >=50, <50

Assignment Yes, No

Exam Results Pass, Fail

Dept of Information Science & Engineering Prof. Samatha R S 32


Module – 4

6.1.1 Structure of a Decision Tree


DECISION TREE PREDICTION: If an instance is given, such as a student has scored 42 marks in his
assessment and has submitted his assignment, then it is predicted with the decision tree that his
exam result is ‘Fail’

Assessment
<50

>=50
Assignment

Yes No

Pass Pass Fail

Dept of Information Science & Engineering Prof. Samatha R S 33


Module – 4

6.1.2 Fundamentals of Entropy

Dept of Information Science & Engineering Prof. Samatha R S 34


Module – 4

6.1.2 Fundamentals of Entropy

Dept of Information Science & Engineering Prof. Samatha R S 35


Module – 4

6.1.2 Fundamentals of Entropy

Dept of Information Science & Engineering Prof. Samatha R S 36


Module – 4

6.1.2 Fundamentals of Entropy

Dept of Information Science & Engineering Prof. Samatha R S 37


Module – 4

6.1.2 Fundamentals of Entropy

Dept of Information Science & Engineering Prof. Samatha R S 38


Module – 4

6.1.2 Fundamentals of Entropy

General Algorithm for Decision Trees:

1. Find the best attribute from the training dataset using an attribute
selection measure and place it at the root of the tree.
2. Split the training dataset into subsets based on the outcomes of the
test attribute and each subset in a branch contains the data instances
or tuples with the same value for the selected test attribute.
3. Repeat step 1 and step 2 on each subset until we end up in leaf nodes
in all the branches of the tree.
4. This splitting process is recursive until the stopping criterion is reached.
Dept of Information Science & Engineering Prof. Samatha R S 39
Module – 4

6.1.2 Fundamentals of Entropy

Stopping Criteria:
The following are some of the common stopping conditions:

1. The data instances are homogenous which means all belong to the
same C, and hence its entropy is 0.
2. A node with some defined minimum number of data instances become
a leaf(Number of data instances in a node is between 0.25 and 1.00%
of the full training dataset).
3. The maximum tree depth is reached, so further splitting is not done
and the node becomes a leaf node.
Dept of Information Science & Engineering Prof. Samatha R S 40
Module – 4

Chapter 6.2: Decision Tree Induction Algorithms

• Most algorithms that have been developed for learning decision treesare
variations on a core algorithm that employs a top-down, greedy search through
the space of possible decision trees.

• This approach is exemplified by the ID3 algorithm and its successor C4.5

Dept of Information Science & Engineering Prof. Samatha R S 41


Module – 4

Chapter 6.2.1: ID3 Tree Construction

• ID3 stands for Iterative Dichotomiser 3


• ID3 is a precursor to the C4.5 Algorithm.
• The ID3 algorithm was invented by Ross Quinlan in 1975
• Used to generate a decision tree from a given data set by employing a top-down,
greedy search, to test each attribute at every node of the tree.
• The resulting tree is used to classify future samples.

Dept of Information Science & Engineering Prof. Samatha R S


42
Module – 4

Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 43


Module – 4

Chapter 6.2.1: ID3 Tree Construction


∙ A ← the attribute from Attributes that best* classifies Examples
∙ The decision attribute for Root ← A
∙ For each possible value, vi, of A,
∙ Add a new tree branch below Root, corresponding to the test A = vi
∙ Let Examples vi, be the subset of Examples that have value vi for A
∙ If Examples vi , is empty
∙ Then below this new branch add a leaf node with label = most common
value of Target_attribute in Examples
∙ Else below this new branch add the subtree
ID3(Examples vi, Targe_tattribute, Attributes
– {A}))
∙ End
∙ Return Root
* The best attribute is the one with highest information gain
Dept of Information Science & Engineering Prof. Samatha R S 44
Module – 4

Chapter 6.2.1: ID3 Tree Construction


• The central choice in the ID3 algorithm is selecting which attribute to test at each
node in the tree.

•A statistical property called information gain that measures how well a given attribute
separates the training examples according to their target classification.

• ID3 uses information gain measure to select among the candidate attributes at each step while growing the
tree.

Dept of Information Science & Engineering Prof. Samatha R S 45


Module – 4

Chapter 6.2.1: ID3 Tree Construction

46
Module – 4

Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 47


Module – 4

Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 48


Module – 4

Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 49


Module – 4

Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 50


Module – 4

Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 51


Module – 4

Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 52


Module – 4

Chapter 6.2.1: ID3 Tree Construction

53
Dept of Information Science & Engineering Prof. Samatha R S
Module – 4

Chapter 6.2.1: Entropy And Information Gain


In Decision Tree Learning

Dept of Information Science & Engineering Prof. Samatha R S 54


Module – 4

Chapter 6.2.1: INFORMATION GAIN MEASURES THE


EXPECTED REDUCTION IN ENTROPY

Dept of Information Science & Engineering Prof. Samatha R S 55


Module – 4

Chapter 6. : Decision Trees

Dept of Information Science & Engineering Prof. Samatha R S 56


Module – 4

Chapter 6. : Decision Trees

Dept of Information Science & Engineering Prof. Samatha R S 57


Module – 4

Chapter 6. : Decision Trees

Dept of Information Science & Engineering Prof. Samatha R S 58


Module – 4

Chapter 6: Decision Trees Classifying based on


Descending(Largest) Information Gain

Dept of Information Science & Engineering Prof. Samatha R S 59


Module – 4
Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 60


Module – 4
Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 61


Module – 4
Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 62


Module – 4
Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 63


Module – 4
Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 64


Module – 4
Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 65


Module – 4
Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 66


Module – 4
Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 67


Module – 4
Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 68


Module – 4
Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 69


Module – 4

Chapter 6.2.1: ID3 Tree Construction

Example 6.3:

Assess a student’s performance during his course of study and


predict whether a student will get a job offer or not in his final year of the
course. The training dataset T consists of 10 data instances with attributes such
as ‘CGPA’, ‘Interactiveness’, ‘Practical Knowledge’ and ‘Communication Skills’.
The target class attribute is the ‘Job Offer’.

Dept of Information Science & Engineering Prof. Samatha R S 70


Module – 4
Chapter 6.2.1: ID3 Tree Construction
S. No CGPA Interactiveness Practical Communication Skills Job Offer
Knowledge

1 >=9 Yes Very good Good Yes

2 >=8 No Good Moderate Yes

3 >=9 No Average Poor No

4 <8 No Average Good No

5 >=8 Yes Good Moderate Yes

6 >=9 Yes Good Moderate Yes

7 <8 Yes Good Poor No

8 >=9 No Very good Good Yes

9 >=8 Yes Good Good Yes

10 >=8 Yes Average Good Yes


Dept of Information Science & Engineering Prof. Samatha R S 71
Module – 4

Chapter 6.2.1: ID3 Tree Construction

Step 1:

Calculate the Entropy for the target class ‘Job Offer’

Entropy_Info(Target Attribute = Job Offer) = Entropy_Info(7,3)

=> -[7/10 * log2(7/10) + 3/10 * log2(3/10)]

=> -(-0.3599 + -0.5208) = 0.8807

Dept of Information Science & Engineering Prof. Samatha R S 72


Module – 4

Chapter 6.2.1: ID3 Tree Construction

Iteration 1:

Step 2:
Calculate the Entropy_Info and Gain(Information_Gain) for
each of the attribute in the training dataset.

Dept of Information Science & Engineering Prof. Samatha R S


31\ 73
Module – 4

Chapter 6.2.1: ID3 Tree Construction

Table below shows the number of data instances classified with Job
Offer as Yes or No for the attributes CGPA.
Table below shows Entropy Information for CGPA.

CGPA Job Offer = Yes Job Offer = No Total Entropy

>=29 3 1 4

>=8 4 0 4 0

<8 0 2 2 0

Dept of Information Science & Engineering Prof. Samatha R S 74


Module – 4

Chapter 6.2.1: ID3 Tree Construction

Entropy_Info(T, CGPA) = (4/10) * (0.3111 + 0.4997) + 0 + 0

= 0.3243

Gain(CGPA) = 0.8807 - 0.3243


= 0.5564

Dept of Information Science & Engineering Prof. Samatha R S 75


Module – 4

Chapter 6.2.1: ID3 Tree Construction

Table below shows the number of data instances classified with Job
Offer as Yes or No for the attributes Interactiveness.
Table below shows Entropy Information for Interactiveness.

Interactiveness Job Offer = Yes Job Offer = No Total Entropy

YES 5 1 6

NO 2 2 4

Dept of Information Science & Engineering Prof. Samatha R S 76


Module – 4

Chapter 6.2.1: ID3 Tree Construction

Entropy_Info(T, Interactiveness)

= (6/10) * (0.2191 + 0.4306) + (4/10)(0.4997 + 0.4997)

= 0.3898 + 0.3998

= 0.7896

Gain(Interactiveness) = 0.8807 - 0.7896


= 0.0911
Dept of Information Science & Engineering Prof. Samatha R S 77
Module – 4

Chapter 6.2.1: ID3 Tree Construction

Table 6.6 shows the number of data instances classified with Job
Offer as Yes or No for the attributes Practical Knowledge.
Table below shows Entropy Information for Practical Knowledge.

Practical Job Offer = Yes Job Offer = No Total Entropy


Knowledge

Very Good 2 0 2 0

Average 1 2 3

Good 4 1 5

Dept of Information Science & Engineering Prof. Samatha R S 78


Module – 4

Chapter 6.2.1: ID3 Tree Construction

Entropy_Info(T, Practical Knowledge)

= (2/10) * (0) + (3/10)(0.5280 + 0.3897) + 5/10 (0.2574 + 0.4641)

= 0 + 0.2753 + 0.3608

= 0.6361

Gain(Practical Knowledge) = 0.8807 - 0.6361


= 0.02446
Dept of Information Science & Engineering Prof. Samatha R S 79
Module – 4

Chapter 6.2.1: ID3 Tree Construction

Table below shows the number of data instances classified with Job
Offer as Yes or No for the attributes Communication Skills.
Table below shows Entropy Information for Communication Skills.

Communication Job Offer = Yes Job Offer = No Total


Skills

Good 4 1 5

Moderate 3 0 3

Poor 0 2 2

Dept of Information Science & Engineering Prof. Samatha R S 80


Module – 4

Chapter 6.2.1: ID3 Tree Construction

Entropy_Info(T, Communication Skills)

= (5/10) * (0.5280 + 0.3897) + (3/10)*(0) + 2/10*(0)

= 0.3609

Gain(Communication Skills) = 0.8807 - 0.36096


= 0.5203

Dept of Information Science & Engineering Prof. Samatha R S 81


Module – 4

Chapter 6.2.1: ID3 Tree Construction

The Gain calculated for all the attributes is shown in Table below

Attributes Gain

CGPA 0.5564

Interactiveness 0.0911

Practical Knowledge 0.02446

Communication Skills 0.5203

Dept of Information Science & Engineering Prof. Samatha R S 82


Module – 4

Chapter 6.2.1: ID3 Tree Construction


Step 3:
From Gain table, choose the attribute for which entropy is minimum and
therefore the gain is maximum as the best split attribute.

The best split attribute is CGPA since it has the maximum gain. So, we choose
CGPA as the root node. There are 3 distinct values for CGPA with outcomes >=9,
>=8 and >8.

The entropy value is 0 for >=8 and <8 with all instances classified as Job Offer =
Yes for >=8 and Job Offer = No for <8. Hence, both >=8 and <8 end up in a leaf
node.
Dept of Information Science & Engineering Prof. Samatha R S 83
Module – 4

Chapter 6.2.1: ID3 Tree Construction

Decision Tree after Iteration 1

>=9 CGPA
<8
Communication Job Job Total
Skills Offer = Offer = Job Offer = No
Yes No
>=8
Yes Very Good Yes
Good

No Average Poor No
Job Offer = Yes
YES Good Moderate Yes

YES Very Good Yes


Good
Dept of Information Science & Engineering Prof. Samatha R S 84
Module – 4

Chapter 6.2.1: ID3 Tree Construction

Continue the same process for the subset of data instances


branched with CGPA >=9.

In this iteration, the same process of computing the Entropy_Info


and Gain are repeated with the subset of training set. The subset
consists of 4 data instances.

Dept of Information Science & Engineering Prof. Samatha R S 85


Module – 4

Chapter 6.2.1: ID3 Tree Construction

Iteration 2:

Calculate the Entropy for the target class ‘Job Offer’

Entropy_Info(Target Attribute = Job Offer) = Entropy_Info(3,1)

=> -[3/4 * log2(3/4) + 1/4 * log2(1/4)]

=> -(-0.3111 + -0.4997) = 0.8108

Dept of Information Science & Engineering Prof. Samatha R S 86


Module – 4

Chapter 6.2.1: ID3 Tree Construction

Entropy_Info(T, Interactiveness)

= 0 + 0.4997

= 0.3609

Gain(Interactiveness) = 0.8108 - 0.4997


= 0.3111

Dept of Information Science & Engineering Prof. Samatha R S 87


Module – 4

Chapter 6.2.1: ID3 Tree Construction

Entropy_Info(T, Practical Knowledge) = 0

Gain(Practical Knowledge) = 0.8108

Entropy_Info(T, Communication Skills) = 0

Gain(Communication Skills) = 0.8108

Dept of Information Science & Engineering Prof. Samatha R S 88


Module – 4

Chapter 6.2.1: ID3 Tree Construction

The Gain calculated for all the attributes is shown in Table below

Attributes Gain

Interactiveness 0.3111

Practical Knowledge 0.8108

Communication Skills 0.8108

Both the attributes ‘Practical Knowledge’ and ‘Communication Skills’ have the
same Gain. So we can either construct the decision tree using ‘Practical
Knowledge’ or ‘Communication Skills’.

Dept of Information Science & Engineering Prof. Samatha R S 89


Module – 4

Chapter 6.2.1: ID3 Tree Construction

>=8 CGPA
<8

Job Offer = Yes


>=9 Job Offer = Yes

PK
Good Very
Average Good
Job Offer = Yes
Job Offer = Yes Job Offer = Yes

Dept of Information Science & Engineering Prof. Samatha R S 90


Module – 4

Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 91


Module – 4

Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 92


Module – 4

Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 93


Module – 3

Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 94


Module – 3

Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 95


Module – 3

Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 96


Module – 4

Chapter 6.2.1: ID3 Tree Construction

Dept of Information Science & Engineering Prof. Samatha R S 97


Module – 4

6.2.2 C4.5 Construction

Dept of Information Science & Engineering Prof. Samatha R S 98


Module – 4

6.2.2 C4.5 Construction - Algorithm Description

Dept of Information Science & Engineering Prof. Samatha R S 99


Module – 4

6.2.2 C4.5 Construction - Simplified Algorithm

Dept of Information Science & Engineering Prof. Samatha R S 100


Module – 4

6.2.2 C4.5 Construction - Simplified Algorithm

Dept of Information Science & Engineering Prof. Samatha R S 101


Module – 4

6.2.2 C4.5 Construction - Simplified Algorithm

Dept of Information Science & Engineering Prof. Samatha R S 102


Module – 3

Chapter 6.2.3 Classification and Regression


Trees Construction - CART

Dept of Information Science & Engineering Prof. Samatha R S 103


Module – 3

Chapter 6.2.3 Classification and Regression


Trees Construction - Working of CART

Dept of Information Science & Engineering Prof. Samatha R S 104


Module – 3

Chapter 6.2.3 Classification and Regression


Trees Construction - CART Algorithm

Dept of Information Science & Engineering Prof. Samatha R S 105


Module – 4
Chapter 6.2.3: Classification Tree Construction
S. No CGPA Interactiveness Practical Communication Skills Job Offer
Knowledge

1 >=9 Yes Very good Good Yes

2 >=8 No Good Moderate Yes

3 >=9 No Average Poor No

4 <8 No Average Good No

5 >=8 Yes Good Moderate Yes

6 >=9 Yes Good Moderate Yes

7 <8 Yes Good Poor No

8 >=9 No Very good Good Yes

9 >=8 Yes Good Good Yes

10 >=8 Yes Average Good Yes


Dept of Information Science & Engineering Prof. Samatha R S 106
Module – 4
Chapter 6.2.3: Classification Tree Construction

CGPA Job Offer = YES Job Offer = NO

>=9 3 1

>=8 4 0

<8 0 2

Dept of Information Science & Engineering Prof. Samatha R S 107


Module – 4
Chapter 6.2.3: Classification Tree Construction
Combination Gini_Index

1 >=9, >=8 0.2194

2 <8 0

3 ((>=9,>=8), <8) 0.17552

4 (>=9, <8) 0.5

5 (>=8) 0

6 ((>=9,<8), >=8) 0.3

7 (>=8, <8) 0.445

8 >=9 0.375

9 ((>=8, <8), >=9) 0.417

Dept of Information Science & Engineering Prof. Samatha R S 108


Module – 4
Chapter 6.2.3: Classification Tree Construction
Combination Gini_Index Choose Sub_Sets

1 >=9, >=8 0.2194

2 <8 0

3 ((>=9,>=8), <8) 0.17552

4 (>=9, <8) 0.5

5 (>=8) 0

6 ((>=9,<8), >=8) 0.3

7 (>=8, <8) 0.445

8 >=9 0.375

9 ((>=8, <8), >=9) 0.417

Dept of Information Science & Engineering Prof. Samatha R S 109


Module – 4
Chapter 6.2.3: Classification Tree Construction

Combination Gini_Index Choose


Sub_Sets(Minimum Index)

1 ((>=9,>=8), <8) 0.17552

2 ((>=9,<8), >=8) 0.3

3 ((>=8, <8), >=9) 0.417

Dept of Information Science & Engineering Prof. Samatha R S 110


Module – 4
Chapter 6.2.3: Classification Tree Construction

Best Splitting Attribute of CGPA = 0.2445

Dept of Information Science & Engineering Prof. Samatha R S 111


Module – 4
Chapter 6.2.3: Classification Tree Construction

Interactiveness Job_Offer = YES Job_Offer = NO

YES 5 1
NO 2 2

Dept of Information Science & Engineering Prof. Samatha R S 112


Module – 4
Chapter 6.2.3: Classification Tree Construction

Combination Gini_Index Choose Sub_Sets


1 Yes 0.28
2 No 0.5
3 Yes, No 0.368

Dept of Information Science & Engineering Prof. Samatha R S 113


Module – 4
Chapter 6.2.3: Classification Tree Construction

Best Splitting Attribute of Interactiveness = 0.052

Dept of Information Science & Engineering Prof. Samatha R S 114


Module – 4
Chapter 6.2.3: Classification Tree Construction

Practical Job_Offer = YES Job_Offer = NO


Knowledge
Very Good 2 0
Good 4 1
Average 1 2

Dept of Information Science & Engineering Prof. Samatha R S 115


Module – 4
Chapter 6.2.3: Classification Tree Construction
Combination Gini_Index Choose Sub_Sets

1 Very Good, Good 0.2456

2 Average 0.445

3 ((Very Good, Good), Average) 0.3054

4 (Very Good, Average) 0.48

5 Good 0.32

6 ((Very Good, Average), Good) 0.40

7 (Very Good, Average) 0.4688

8 Very Good 0

9 ((Good, Average), Very Good) 0.3750

Dept of Information Science & Engineering Prof. Samatha R S 116


Module – 4
Chapter 6.2.3: Classification Tree Construction

Best Splitting Attribute of Practical Knowledge = 0.1146

Dept of Information Science & Engineering Prof. Samatha R S 117


Module – 4

6.2.4 Regression Trees

Decision Tree Algorithm

● The core algorithm for building decision trees called ID3 by J. R. Quinlan which
employs a top-down greedy, search through the space of possible branches with no
backtracking.

● The ID3 algorithm can be used to construct a decision tree for regression by replacing
Information Gain with Standard Deviation Reduction.

Dept of Information Science & Engineering Prof. Samatha R S 118


Module – 4

6.2.4 Regression Trees

● Regression trees are a variant of decision trees where the target


feature is a continuous valued variable.

● Regression trees can be constructed using an algorithm called


reduction in variance which uses standard deviation to choose the
best splitting attribute.

Dept of Information Science & Engineering Prof. Samatha R S 119


Module – 4

6.2.4 Regression Trees

Algorithm 6.5: Procedure for Constructing Regression Trees

1. Compute standard deviation for each attribute with respect to target attribute.

1. Compute standard deviation for the number of data instances of each distinct value
of an attribute.

1. Compute weighted standard deviation for each attribute.

1. Compute standard deviation reduction by subtracting weighted standard deviation


for each attribute from standard deviation of each attribute.

Dept of Information Science & Engineering Prof. Samatha R S 120


Module – 4

6.2.4 Regression Trees


Algorithm 6.5: Procedure for Constructing Regression Trees

5. Choose the attribute with a higher standard deviation reduction as the best split
attribute.

6. The best split attribute is placed as the root node.

7. The root node is branched into subtrees with each subtree as an outcome of the test
condition of the root node attribute. Accordingly, the training dataset ia also split into
different subsets.

8. Recursively apply the same operation for the subset of the training set with the
remaining attributes until a leaf node is derived or no more training instances are
available in the subset.
Dept of Information Science & Engineering Prof. Samatha R S 121
Module – 4
6.2.4 Regression Trees
S. No Assessment Assignment Project Result(%)

1 Good Yes Yes 95

2 Average Yes No 70

3 Good No Yes 75

4 Poor No No 45

5 Good Yes Yes 98

6 Average No Yes 80

7 Good No No 75

8 Poor Yes Yes 65

9 Average No No 58

10 Good Yes Yes 89

Dept of Information Science & Engineering Prof. Samatha R S 122


Module – 4
6.2.4 Regression Trees
S. No Assessment Assignment Project Result(%)

1 Good Yes Yes 95

2 Average Yes No 70

3 Good No Yes 75

4 Poor No No 45

5 Good Yes Yes 98

6 Average No Yes 80

7 Good No No 75

8 Poor Yes Yes 65

9 Average No No 58

10 Good Yes Yes 89

Dept of Information Science & Engineering Prof. Samatha R S 123


Module – 4
6.2.4 Regression Trees

Average = 75

Standard Deviation = 16.55

Dept of Information Science & Engineering Prof. Samatha R S 124


Wednesday(Feb 28th, 2024) &
Saturday(March 2nd, 2024)

Module – 4
6.2.4 Regression Trees

Assessment = GOOD

S. No Assessment Assignment Project Result(%)

1 Good Yes Yes 95

2 Good No Yes 75

3 Good Yes Yes 98

4 Good No No 75

5 Good Yes Yes 89

Dept of Information Science & Engineering Prof. Samatha R S 125


Module – 4
6.2.4 Regression Trees

Average = 86.4

Standard Deviation = 10.9

Dept of Information Science & Engineering Prof. Samatha R S 126


Module – 4
6.2.4 Regression Trees

Assessment = AVERAGE

S. No Assessment Assignment Project Result(%)

1 Average Yes No 70

2 Average No Yes 80

3 Average No No 58

Dept of Information Science & Engineering Prof. Samatha R S 127


Module – 4
6.2.4 Regression Trees

Average = 69.3

Standard Deviation = 11.01

Dept of Information Science & Engineering Prof. Samatha R S 128


Module – 4
6.2.4 Regression Trees

Assessment = POOR

S. No Assessment Assignment Project Result(%)

1 Poor No No 45

2 Poor Yes Yes 65

Dept of Information Science & Engineering Prof. Samatha R S 129


Module – 4
6.2.4 Regression Trees

Average = 55

Standard Deviation = 14.14

Dept of Information Science & Engineering Prof. Samatha R S 130


Module – 4
6.2.4 Regression Trees

Weighted Standard Deviation for Assessment = 11.58

Standard Deviation for Assessment = 16.55 - 11.58 = 4.97

Dept of Information Science & Engineering Prof. Samatha R S 131


Module – 4
6.2.4 Regression Trees

Standard Deviation for Assessment

Assessment Standard Deviation Data Instances


Good 10.9 5

Average 11.01 3

Poor 14.14 2

Dept of Information Science & Engineering Prof. Samatha R S 132


Module – 4
6.2.4 Regression Trees
S. No Assessment Assignment Project Result(%)

1 Good Yes Yes 95

2 Average Yes No 70

3 Good No Yes 75

4 Poor No No 45

5 Good Yes Yes 98

6 Average No Yes 80

7 Good No No 75

8 Poor Yes Yes 65

9 Average No No 58

10 Good Yes Yes 89

Dept of Information Science & Engineering Prof. Samatha R S 133


Module – 4

6.2.4 Regression Trees

Assignment = YES

S. No Assessment Assignment Project Result(%)

1 Good Yes Yes 95

2 Average Yes No 70

3 Good Yes Yes 98

4 Poor Yes Yes 65

5 Good Yes Yes 89

Dept of Information Science & Engineering Prof. Samatha R S 134


Module – 4
6.2.4 Regression Trees

Average = 83.4

Standard Deviation = 14.98

Dept of Information Science & Engineering Prof. Samatha R S 135


Module – 4
6.2.4 Regression Trees

Assignment = NO

S. No Assessment Assignment Project Result(%)

1 Good No Yes 75

2 Poor No No 45

3 Average No Yes 80

4 Good No No 75

5 Average No No 58

Dept of Information Science & Engineering Prof. Samatha R S 136


Module – 4
6.2.4 Regression Trees

Average = 66.6

Standard Deviation = 14.7

Dept of Information Science & Engineering Prof. Samatha R S 137


Module – 4
6.2.4 Regression Trees

Standard Deviation for Assignment

Assignment Standard Deviation Data Instances


Yes 14.98 5

No 14.7 5

Dept of Information Science & Engineering Prof. Samatha R S 138


Module – 4
6.2.4 Regression Trees

Weighted Standard Deviation for Assessment = 14.84

Standard Deviation for Assessment = 16.55 - 14.84 = 1.71

Dept of Information Science & Engineering Prof. Samatha R S 139


Module – 4
6.2.4 Regression Trees
S. No Assessment Assignment Project Result(%)

1 Good Yes Yes 95

2 Average Yes No 70

3 Good No Yes 75

4 Poor No No 45

5 Good Yes Yes 98

6 Average No Yes 80

7 Good No No 75

8 Poor Yes Yes 65

9 Average No No 58

10 Good Yes Yes 89

Dept of Information Science & Engineering Prof. Samatha R S 140


Module – 4
6.2.4 Regression Trees

Project = YES

S. No Assessment Assignment Project Result(%)

1 Good Yes Yes 95

2 Good No Yes 75

3 Good Yes Yes 98

4 Average No Yes 80

5 Poor Yes Yes 65

6 Good Yes Yes 89

Dept of Information Science & Engineering Prof. Samatha R S 141


Module – 4
6.2.4 Regression Trees

Average = 83.7

Standard Deviation = 12.6

Dept of Information Science & Engineering Prof. Samatha R S 142


Module – 4
6.2.4 Regression Trees

Project = NO

S. No Assessment Assignment Project Result(%)

1 Average Yes No 70

2 Poor No No 45

3 Good No No 75

4 Average No No 58

Dept of Information Science & Engineering Prof. Samatha R S 143


Module – 4
6.2.4 Regression Trees

Average = 62

Standard Deviation = 13.39

Dept of Information Science & Engineering Prof. Samatha R S 144


Module – 4
6.2.4 Regression Trees

Standard Deviation for Project

Project Standard Deviation Data Instances


Yes 12.6 6

No 13.39 4

Dept of Information Science & Engineering Prof. Samatha R S 145


Module – 4
6.2.4 Regression Trees

Weighted Standard Deviation for Assessment = 12.92

Standard Deviation for Assessment = 16.55 - 12.92 = 3.63

Dept of Information Science & Engineering Prof. Samatha R S 146


Module – 4
6.2.4 Regression Trees

Standard Deviation for Reduction for each attribute in the training dataset.

Attributes Standard Deviation Reduction


Assessment 4.97

Assignment 1.71

Project 3.63

The attribute ‘Assessment’ has the maximum Standard Deviation


Reduction and hence it is chosen as the best splitting attribute.
Dept of Information Science & Engineering Prof. Samatha R S 147
Module – 4
6.2.4 Regression Trees
The training dataset is split into subsets based on the attribute “Assessment” and
the process is continued until the entire tree is constructed.

Assessment
GOOD POOR

AVERAGE

Dept of Information Science & Engineering Prof. Samatha R S 148


Module – 4

6.2.4 Regression Trees


A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy), each representing values for the
attribute tested. Leaf node (e.g., Hours Played) represents a decision on the numerical target. The topmost decision node in a
tree which corresponds to the best predictor called root node. Decision trees can handle both categorical and numerical
data.

Dept of Information Science & Engineering Prof. Samatha R S 149


Thank You
Any Queries:
Mail - [email protected]

You might also like