Decision Tree Exercise
Decision Tree Exercise
a∧ ¬ B
b) (A∧ B)∨ (C∧ D)
2. Build the complete set of decision trees from the following data:
A1 A2 Y
0 1 0
1 0 1
0 0 0
1 1 1
3. Suppose that one intends to build the decision tree using the ID3 algorithm,
based on the dataset presented the following table
A1 A2 A3 Y
0 1 1 1
1 0 1 1
0 0 0 0
1 1 1 0
Example A1 A2 Class
1 dark tall -
2 clear high +
3 clear bass -
4 dark low +
5 dark tall -
6 clear high +
a) Calculate the expected information or entropy of this set of examples and the
gain of information related to attribute A2. A2 may be the root of the tree of
decision obtained by the ID3 algorithm?
b) Present the complete decision tree that would be produced by the algorithm.
ID3.
c) Calculate the expected information or entropy of this set of examples and the
gain of information related to attribute A2. A2 may be the root of the tree
decision obtained by the ID3 algorithm?
d) Present the complete decision tree that would be produced by the algorithm
ID3.
1
Learning: Decision tree exercises
A1 A2 Y
0 0 1
0 0 0
0 0 1
0 1 0
0 1 0
0 1 1
1 0 1
1 0 0
1 0 1
1 1 1
1 1 1
1 1 1
6. Imagine that you want to use the ID3 algorithm to learn a function and that you are given
a set of examples and counter-examples is presented:
A1 A2 A3 A4 A5
0 1 1 0 0
1 0 1 0 0
1 1 1 0 1
0 0 0 1 1
1 0 0 1 0
0 1 0 1 0
2
Learning: Decision Tree Exercises
Present the complete decision tree that would be produced by the ID3 algorithm.
8. Consider the problem of whether to wait or not to wait for a table at a restaurant. The
the objective is to learn a definition for the objective to wait, being that definition
express as a decision tree. There are the following attributes
to describe the situations example:
Alter Bar Sexta Fome Clientes Preço Chuva Reserva Tipo TmpEsp Esperar
Yes No No Yes Some $$$ No Yes French 0-10 Yes
["Yes","No","No","Yes","Full"] $ No Not Thai 30-60 No
No Yes No No Some $ No No Hamburg 0-10 Yes
Sim Não Sim Sim Cheio $ No Not Thai 10-30 Yes
Yes No Yes No Full $$$ No Yes French >60 No
["No","Yes","No","Yes","Some"] $$ Yes Yes Italian 0-10 Yes
No Yes No No None $ Yes Not Hamburg 0-10 No
No No No Yes Some $$ Yes Thai Sim 0-10 Yes
No Yes Yes No Full $ Yes No Hamburg >60 No
Yes Yes Yes Yes Full $$$ No Yes Italian 10-30 No
No No No No None $ No Not Thai 0-10 No
Yes Yes Yes Yes Full $ No Not Hamburg 30-60 Yes
3
Learning: Decision tree exercises
a) Use the Weka tool, through the ID3 algorithm, to build the tree
what does this represent based on these data.
b) Assuming that D1 and D2 are decision trees representing functions
booleans, and that D2 is considered a development of D1 if the ID3 algorithm
can extend D1 to D2, indicate whether the following sentence is true or false: If
the D2 tree is an elaboration of D1, then D1 is more general than D2. If
consider it to be true, prove it; if you consider it to be false, present a
counter-example.
9. The ID3 algorithm finds only one consistent hypothesis while the algorithm
of version space (also called candidate elimination algorithm)
find all consistent hypotheses. Consider the correspondence between these
2 algorithms:
a) present the results obtained by each of these two algorithms from the
examples of training following, for the target concept doesSport:
Training examples:
Sky Temp. Hum. Prev makes Sports
0 sun normal hot equal yes
1 sol hot high same yes
2 rain cold high young no
3 sun hot high waste yes
b) What is the relationship between the learned decision tree and the version space?
obtained? The decision tree is equivalent to one of the members of the space of
versions?
10. Use the data from table 1 as a training set to learn to classify,
according to the 3 classes indicated in the attribute 'Burn'.
a. Calculate the entropy (or expected information) of the training set. Calculate the
information gain relative to each of the attributes and indicate the
attribute that would be chosen, according to this measure, as the root of the tree
of decision by the ID3 algorithm. Comment on the obtained result.
b. Now removing the attribute 'Name', and continuing to use the measure of
information gain, determine the complete decision tree that would be
produced by ID3.
4
Learning: Decision tree exercises
5
Learning: Decision tree exercises
12. Imagine that we have the following dataset, where Y is the target attribute of
classification.
A B C Y
0 0 0 0
0 1 0 1
1 0 0 1
1 1 1 0
What decision tree is chosen by the ID3 algorithm? There will not be another tree.
that involves testing fewer attributes (and possibly with less depth)
capable of correctly representing the data? What justification do you find
so that ID3 does not return this tree?
13. Consider the data presented in the grid below, described by two attributes.
(x and y axes, with integer values between 0 and 8) and classified into 3 classes,
represented by squares, crosses or circles. Using the ID3 algorithm
extended to perform binary tests on numerical attributes, build the tree
of decision in order to correctly classify the data.
14. The following data table is based on Tolkien's book 'The Lord of the Rings'.
the table presents information about a group of people/entities that
appear in this book.
Name Race Weight Lord of the Ring Class
Frodo Hobbit Light Yes Good
Rosie Hobbit Light No Good
Bilbo Hobbit Light Yes Good
Gollum Hobbit Light Yes Want
Faramir Human Medium No Good
Aragorn Human Medium No Good
Wormtongue Human Medium No Want
Celeborn Elves Light No Good
Galadriel Elves Raise Yes Good
Sharku Orc Heavy No Want
6
Learning: Decision tree exercises
Each entity has three attributes (Race, Weight, Ring-lord) and is classified as
being Good or Bad (i.e., whether he wants or does not want to kill Frodo). The attribute 'Lord-
"ring" indicates whether the person/entity has ever possessed the magic ring. Apply the
ID3 algorithm to the data in the table, considering that the leaf nodes are classified
with the majority class. However, introduce a small variation in which one
expand the nodes only when it results in an improvement in the gain.
15. Consider the following dataset, where Y corresponds to the class attribute.
Let's consider ways to prune the decision tree produced by ID3 that do not
involves the use of a test set.
V W X Y
0 00 0
0 10 1
1 00 1
1 10 0
1 11 0
a) Present the decision tree that would be constructed by ID3, without pruning.
b) One possible way to prune the tree is to start from the root node of the
tree, prune the subtree originating from a node if the information gain (or
another given criterion associated with this node is less than a small amount å.
This type of pruning is called descending pruning ('top-down pruning'). What is the
What is the decision tree returned by applying this type of pruning with å=0.0001?
error produced by this pruned tree for the given training set (% of
misclassified examples?)
c)Another possible way to prune the tree is to start from the parent nodes of
leaves of the tree, prune subtrees originating from a node if the information gain
(or another given criterion) for less than a small amount å. According to this
method, no ancestor of a node with high information gain is pruned.
This type of pruning is called upward pruning ('bottom-up pruning'). What is the
decision tree returned by applying this type of pruning with å=0.0001? What is the
error produced by this pruned tree for the given training set (% of
misclassified examples)?
Discuss the possible advantages and disadvantages of these two types of pruning, taking into account
for example, tell about the computational complexity involved and the accuracy of
classification.