Decision Trees
Overview
◼ What is a Decision Tree
◼ Sample Decision Trees
◼ How to Construct a Decision Tree
◼ Problems with Decision Trees
What is a Decision Tree?
◼ An inductive learning task
Use particular facts to make more generalized
conclusions
◼ A predictive model based on a branching
series of Boolean tests
These smaller Boolean tests are less complex
than a one-stage classifier
◼ Let’s look at a sample decision tree…
Predicting Commute Time
Leave At If we leave at
10 AM 9 AM 10 AM and
8 AM
there are no
Stall? Accident?
cars stalled on
No Yes Long No Yes
the road, what
will our
Short Long Medium Long commute time
be?
Inductive Learning
◼ In this decision tree, we made a series of
Boolean decisions and followed the
corresponding branch
Did we leave at 10 AM?
Did a car stall on the road?
Is there an accident on the road?
◼ By answering each of these yes/no
questions, we then came to a conclusion on
how long our commute might take
Decision Trees as Rules
◼ We did not have represent this tree
graphically
◼ We could have represented as a set of
rules. However, this may be much
harder to read…
Decision Tree as a Rule Set
if hour == 8am ◼ Notice that all attributes to
commute time = long not have to be used in each
else if hour == 9am path of the decision.
if accident == yes
commute time = long ◼ As we will see, all attributes
may not even appear in the
else tree.
commute time =
medium
else if hour == 10am
if stall == yes
commute time = long
else
commute time = short
How to Create a Decision Tree
◼ We first make a list of attributes that
we can measure
These attributes (for now) must be
discrete
◼ We then choose a target attribute that
we want to predict
◼ Then create an experience table that
lists what we have seen in the past
Sample Experience Table
Example Attributes Target
Hour Weather Accident Stall Commute
D1 8 AM Sunny No No Long
D2 8 AM Cloudy No Yes Long
D3 10 AM Sunny No No Short
D4 9 AM Rainy Yes No Long
D5 9 AM Sunny Yes Yes Long
D6 10 AM Sunny No No Short
D7 10 AM Cloudy No No Short
D8 9 AM Rainy No No Medium
D9 9 AM Sunny Yes No Long
D10 10 AM Cloudy Yes Yes Long
D11 10 AM Rainy No No Short
D12 8 AM Cloudy Yes No Long
D13 9 AM Sunny No No Medium
Choosing Attributes
◼ The previous experience decision
table showed 4 attributes: hour,
weather, accident and stall
◼ But the decision tree only showed 3
attributes: hour, accident and stall
◼ Why is that?
Choosing Attributes
◼ Methods for selecting attributes show
that weather is not a discriminating
attribute
Choosing Attributes
◼ The basic structure of creating a
decision tree is the same for most
decision tree algorithms
◼ The difference lies in how we select
the attributes for the tree
Decision Tree Algorithms
◼ The basic idea behind any decision tree
algorithm is as follows:
Choose the best attribute(s) to split the
remaining instances and make that attribute a
decision node
Repeat this process for recursively for each child
Stop when:
◼ All the instances have the same target attribute value
◼ There are no more attributes
◼ There are no more instances
Identifying the Best Attributes
◼ Refer back to our original decision tree
Leave At
10 AM 9 AM
8 AM
Stall? Accident?
Long
No Yes No Yes
Short Long Medium Long
◼ How did we know to split on leave at
and then on stall and accident and not
weather?
Problems with Decision Trees
◼ While decision trees classify quickly,
the time for building a tree may be
higher than another type of classifier
◼ Decision trees suffer from a problem of
errors propagating throughout a tree
A very serious problem as the number of
classes increases
Error Propagation
◼ Since decision trees work by a series
of local decisions, what happens when
one of these local decisions is wrong?
Every decision from that point on may be
wrong
We may never return to the correct path
of the tree
Rotational Coolent(Yes Surface
Exp No Speed /No) Treatment(Yes/No) Surface Finish
1 High Yes Yes Good
2 High No Yes Moderate
3 Slow Yes Yes Good
4 Medium Yes No Good
5 Medium No Yes No Change
6 Medium Yes No Good
7 Slow No No Good
8 Slow No Yes Minimal Effect
9 Slow Yes Yes Good
10 High No Yes Moderate
GINI
There are four possible output variables, Good, Moderate, No
Change and Minimal
There are six instances of Good, There are two instances of
Moderate, One Instance of No Change and One Instance of
Minimal Effect
GINI=1-{(6/10)2+(2/10)2+(1/10)2+(1/10)2} = 0.58
Computation of GINI Index for Surface Treatment
It has two possible values, either Yes or No
There are seven instances of Yes and three instances of NO
For surface treatment = No, there are three instances of Good
Surface Finish
GINI=1-{(3/3)2} = 0
For surface treatment = Yes, there are two instances of Moderate
Surface Finish, three instances of Good Surface Finish, one instance
with No Effect and One instance with Minimal surface finish
GINI=1-{(2/7)2+(3/7)2+(1/7)2+(1/7)2} = 0.694
Weighted Average of Surface Treatment=0*(3/10)+0.694*(7/10)=0.486
Rotational Coolent(Yes Surface
Exp No Speed /No) Treatment(Yes/No) Surface Finish
1 High Yes Yes Good
2 High No Yes Moderate
3 Slow Yes Yes Good
4 Medium Yes No Good
5 Medium No Yes No Change
6 Medium Yes No Good
7 Slow No No Good
8 Slow No Yes Minimal Effect
9 Slow Yes Yes Good
10 High No Yes Moderate
Computation of GINI Index for Coolant
It has two possible values, either Yes or No
Both Yes and No occur 5 times
For coolent = Yes, there are 5 instances and all of them have
produced good surface finish
GINI=1-{(5/5)2} = 0
For Coolent = No, there are two instances which provides Moderate
surface finish, one instance which provides Good surface finish, one
provides Minimal surface finish and another provides No Change in the
surface finish
Gini(Coolent=No)=1-[(2/5)2+(1/5)2+(1/5)2+(1/5)2]=0.72
Weighted Average of Surface Treatment= 0*(5/10)+0.72*(5/10)=0.36
Rotational Coolent(Yes Surface
Exp No Speed /No) Treatment(Yes/No) Surface Finish
1 High Yes Yes Good
2 High No Yes Moderate
3 Slow Yes Yes Good
4 Medium Yes No Good
5 Medium No Yes No Change
6 Medium Yes No Good
7 Slow No No Good
8 Slow No Yes Minimal Effect
9 Slow Yes Yes Good
10 High No Yes Moderate
Computation of GINI Index for Rotational Speed
It has three possible values, three instances of High,
three instances of Medium, and four instances of Slow
For Rotational Speed=High, there are one instance of Good Surface
Finish and Two instances of Moderate Surface Finish
Gini(Rotational Speed=High)=1-[(2/3)2+(1/3)2]=0.444
For Rotational Speed=Medium, there are two instances of Good Surface
Finish and one instance of No Change Surface Finish
Gini(Rotational Speed=Medium)=1-[(2/3)2+(1/3)2]=0.444
For Rotational Speed=Slow, there are three instances of Good Surface
Finish and One instance of Minimal Effect Surface Finish
Gini(Rotational Speed=Slow)=1-[(3/4)2+(1/4)2]=0.375
Weighted Average of Surface Treatment=
0.444*(3/10)+0.444*(3/10)+0.375*(4/10)=0.416
Computation of GINI Index
For Coolent, GINI = 0.36
For Rotational Speed, GINI = 0.416
For Surface Treatment, GINI=0.486
Coolant is having the minimum GINI index. So it is taken as
the root of the Tree
Coolant
YES
Surface
Exp Rotational Coolant Treatment(Yes/N Surface
No Speed (Yes/No) o) Finish NO
1 High Yes Yes Good
3 Slow Yes Yes Good
4 Medium Yes No Good
6 Medium Yes No Good
9 Slow Yes Yes Good
For Coolant = Yes, the surface finish is good for all the
conditions. So, no more decision is needed Surface
Coolent(Ye Treatment(
Exp No Rotational Speed s/No) Yes/No) Surface Finish
2 High No Yes Moderate
5 Medium No Yes No Change
7 Slow No No Good
8 Slow No Yes Minimal Effect
10 High No Yes Moderate
Computation of GINI Index for Coolant = No & Rotational Speeds
For Coolant = No three are three possible Rotational Speed Conditions
For Coolent = No and Rotational Speed =High, there are two
instances of surfacefinish viz Moderate
Gini(Coolent= No and Rotational Speed=High)=1-[(2/2)2]=0
For Coolant = No and Rotational Speed =Medium, there is only one
surfacefinish outcome viz No Change
Gini(Coolent= No and Rotational Speed=Medium)=1-[(1/1)2]=0
For Coolant = No and Rotational Speed =Slow, there is only one surface
finish outcome of Good and One Outcome equal to Minimal Effect
Gini(Coolant=No & Rotational Speed=Slow)=1-[(1/2)2+(1/2)2]=0.5
Weighted Average of Coolent and Rotational Speed= 0*(2/5)+0*(1/5)+0.5*(2/5)=0.2
Computation of GINI Index for Coolant = No & Surface Treatment
For Coolant = No three are two possible Surface Treatment Conditions,
Yes and No
For Coolent = No and Surface Treatment =Yes, there are two instances of
surfacefinish viz Moderate, one instance of Surface Finish=Minimal and
another instance of No Change
Gini(Coolent=No and Surface Treatment = Yes)=1-
[(2/4)2+(1/4)2+(1/4)2]=0.625
Computation of GINI Index for Coolant = No & Surface Treatment
For Coolant = No and Surface Treatment =No, there is only one possible
outcome ie Surface Finish=Good
Gini(Coolent=No and Surface Treatment = No)=1-[(1/1)2]=0
Weighted Average of Coolant and Surface Treatment=
0*(1/5)+0.625*(4/5)=0.5
For Coolants = No and Rotational Speed, GINI = 0.2
For Coolants = No and Surface Treatment, GINI = 0.5
Rotational Speed is Chosen as the root as it has the lowest GINI index
For Cholent = No and Rotational Speed = High the
outcome is same(Moderate)
Surface
Rotational Treatment Surface
Exp No Speed Coolant(Yes/No) (Yes/No) Finish
2 High No Yes Moderate
10 High No Yes Moderate
For Cholent = No and Rotational Speed = Medium the
outcome is only one(No Change)
Surface
Rotational Coolant Treatment Surface
Exp No Speed (Yes/No) (Yes/No) Finish
5 Medium No Yes No Change
For Coolant = No and Rotational Speed = Slow the
outcome is two one being Good and another Minimal
Effect
Surface
Rotational Coolant Treatment Surface
Exp No Speed (Yes/No) (Yes/No) Finish
7 Slow No No Good
Minimal
8 Slow No Yes Effect
Decision Tree
Coolant
Yes No
Rotational Speed
Surface Finish=Good
RS=Slow RS=High
RS=Medium
Surface Surface
Treatment No Change Finish=Moderate
Yes
No
Minimal Effect
Good