A decision tree is a machine-learning model used for classification and regression tasks.
It
works by splitting the dataset into subsets based on the values of the input features, creating
a tree-like structure. Here’s a numerical example of a decision tree for a classification problem.
Example 1
Imagine a dataset where we want to predict whether a person will buy a sports car (`Yes` or
`No`) based on age and income level. Here's a small dataset:
Person Age Income Buys Sports Car
1 25 High Yes
2 35 Medium No
3 45 Low No
4 25 Medium Yes
5 35 High Yes
6 45 Medium No
7 35 Low No
8 25 Low No
9 45 High No
Solution:
Step 1: Calculate Entropy of the Target Variable
The first step in building a decision tree is to calculate the entropy of the target variable
(`Buys_Sports_Car`):
Step 2: Calculate Information Gain for Each Feature
Information Gain is calculated for each feature to decide the best attribute for splitting.
By Prof. Ameya Saonerkar, RBU 1
For the feature `Age`:
1. Age = 25: 3 samples out of 9 (2 Yes, 1 No)
2. Age = 35: 3 samples out of 9 (1 Yes, 2 No)
3. Age = 45: 3 samples out of 9 (0 Yes, 3 No)
The entropy for each group is calculated as follows:
2 2 1 1
Entropy (Age = 25): − (3 𝑙𝑜𝑔2 3 + 3 𝑙𝑜𝑔2 3) = 0.918
1 1 2 2
Entropy (Age = 35): − (3 𝑙𝑜𝑔2 3 + 3 𝑙𝑜𝑔2 3) = 0.918
0 0 3 3
Entropy (Age = 45): − (3 𝑙𝑜𝑔2 3 + 3 𝑙𝑜𝑔2 3) = 0
3 3 3
Calculate the Weighted Entropy = 9 × 0.918 + × 0.918 + × 0 = 0.612
9 9
Information Gain for Age:
Gain(S, Age) = Entropy (S) − Weighted Entropy = 0.9185 − 0.612 = 0.306
Similarly, we calculate for `Income`:
1. Income = High: 3 samples (2 Yes, 1 No)
2. Income = Medium: 3 samples (1 Yes, 2 No)
3. Income = Low: 3 samples (0 Yes, 3 No)
The entropy for each group is calculated as follows:
2 2 1 1
Entropy (Income = High): − (3 𝑙𝑜𝑔2 3 + 3 𝑙𝑜𝑔2 3) = 0.918
1 1 2 2
Entropy (Income = Medium): − (3 𝑙𝑜𝑔2 3 + 3 𝑙𝑜𝑔2 3) = 0.918
0 0 3 3
Entropy (Income = Low): − ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0
3 3 3 3
3 3 3
Calculate the Weighted Entropy = 9 × 0.918 + × 0.918 + × 0 = 0.612
9 9
Information Gain for Age:
Gain(S, Age) = Entropy (S) − Weighted Entropy = 0.918 − 0.612 = 0.306
By Prof. Ameya Saonerkar, RBU 2
Step 3: Choose the Best Feature to Split
The feature with the highest information gain is chosen to split the node. In this case, `Age`
has similar information gain (0.306) compared to `Income` (0.306), so the decision tree will
split on `Age` or ‘Income’.
Step 4: Repeat the Process
Repeat the above steps for each subset of the dataset created by the `Age` split until
stopping criteria are met (e.g., maximum depth, minimum number of samples per node, or
no further information gain).
Conclusion
This numerical example illustrates how a decision tree algorithm selects the best features for
splitting by calculating entropy and information gain, leading to a tree that can be used for
predicting new instances.
Example 2
We use a simple numerical example of building a decision tree using a small dataset. We'll
use a binary classification problem to predict whether a person will play tennis based on
the weather conditions.
Dataset
Day Outlook Temperature Humidity Wind PlayTennis
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No
Step 1: Calculate the Entropy of the Entire Dataset
By Prof. Ameya Saonerkar, RBU 3
Step 2: Calculate the Entropy for Each Feature
We’ll calculate the entropy for the feature `Outlook` first.
For `Outlook`:
Sunny: 5 samples (2 Yes, 3 No)
Overcast: 4 samples (4 Yes, 0 No)
Rain: 5 samples (3 Yes, 2 No)
Entropy for each subset:
By Prof. Ameya Saonerkar, RBU 4
Information Gain for `Outlook`:
Information Gain (S, Outlook) = Entropy (S) − Weighted Entropy = 0.9409 − 0.6935 =
0.2474
Step 3: Calculate the Information Gain for Other Features
Following the same process, calculate the information gain for `Temperature`, `Humidity`,
and `Wind`.
Information Gain for `Temperature`:
- Hot: 4 samples (2 Yes, 2 No)
- Mild: 6 samples (4 Yes, 2 No)
- Cool: 4 samples (3 Yes, 1 No)
The entropy for each group is calculated as follows:
2 2 2 2
Entropy (Hot): − (4 𝑙𝑜𝑔2 4 + 4 𝑙𝑜𝑔2 4) = 1
4 4 2 2
Entropy (Mild): − (6 𝑙𝑜𝑔2 6 + 6 𝑙𝑜𝑔2 6) = 0.918
By Prof. Ameya Saonerkar, RBU 5
3 3 1 1
Entropy (Cool): − (4 𝑙𝑜𝑔2 4 + 4 𝑙𝑜𝑔2 4) = 0.811
Weighted entropy and information gain for `Temperature`:
4 6 4
Calculate the Weighted Entropy = 14 × 1 + × 0.918 + × 0.811 = 0.910
14 14
Information Gain (S, Temperature) = 0.9409 - 0.910 = 0.030
Information Gain for `Humidity`:
High: 7 samples (3 Yes, 4 No)
Normal: 7 samples (6 Yes, 1 No)
The entropy for each group is calculated as follows:
3 3 4 4
Entropy (High): − (7 𝑙𝑜𝑔2 7 + 7 𝑙𝑜𝑔2 7) = 0.985
6 6 1 1
Entropy (Normal): − (7 𝑙𝑜𝑔2 7 + 7 𝑙𝑜𝑔2 7) = 0.591
Weighted entropy and information gain for `Humidity`:
7 7
Calculate the Weighted Entropy = 14 × 0.985 + × 0.591 = 0.788
14
Information Gain (S, Temperature) = 0.9409 - 0.788 = 0.1529
Information Gain for `Wind`:
Weak: 8 samples (6 Yes, 2 No)
Strong: 6 samples (3 Yes, 3 No)
The entropy for each group is calculated as follows:
6 6 2 2
Entropy (Weak): − (8 𝑙𝑜𝑔2 8 + 8 𝑙𝑜𝑔2 8) = 0.811
3 3 3 3
Entropy (Strong): − (6 𝑙𝑜𝑔2 6 + 6 𝑙𝑜𝑔2 6) = 1
Weighted entropy and information gain for `Wind`:
8 6
Calculate the Weighted Entropy = × 0.811 + × 1 = 0.892
14 14
Information Gain (S, Temperature) = 0.9409 - 0.892 = 0.0489
By Prof. Ameya Saonerkar, RBU 6
Step 4: Choose the Best Feature to Split
Based on the information gain calculations:
Outlook: 0.2474
Temperature: 0.03
Humidity: 0.1529
Wind: 0.0489
The best feature to split on is `Outlook` because it has the highest information gain.
Example 3
Determine the root attribute used to construct a decision tree for a sample dataset using
the ID3 Decision Tree algorithm. Consider the measure of impurity as entropy.
Weather Temperature Humidity Wind Play
Sunny Hot High Weak No
Sunny Hot High Strong No
Rainy Cold High Weak Yes
Rainy Cold Low Strong Yes
Sunny Cold Low Weak Yes
Step 1: Calculate Entropy for Target Attribute (Play)
3
𝑃Yes =
5
2
𝑃No =
5
By Prof. Ameya Saonerkar, RBU 7
3 3 2 2
Entropy (S) = − ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0.97095
5 5 5 5
Step 2: Calculate the Entropy for Each Feature
Features Entropy
Weather Sunny- 3 samples- (2 Yes, 1 No) 2 2 1 1
− ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0.91829
3 3 3 3
Rainy- 2 sample- (2 Yes, 0 No) 2 2 0 0
− ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0
2 2 2 2
Temp Hot- 2 sample- (0 Yes, 2 No) 0 0 2 2
− ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0
2 2 2 2
Cold-3 sample- (3 Yes, 0 No) 3 3 0 0
− ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0
3 3 3 3
Humidity High-3 samples- (1 Yes, 2 No) 1 1 2 2
− ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0.91829
3 3 3 3
Low-2 sample- (2 Yes, 0 No)
2 2 0 0
− ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0
2 2 2 2
Wind Weak-3 samples- (2 Yes, 1 No) 2 2 1 1
− ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0.91829
3 3 3 3
Strong-2 samples- (1 Yes, 1 No)
1 1 1 1
− ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 1
2 2 2 2
3 2
Calculate the Weighted Entropy of weather = × 0.91829 + × 0 = 0.550974
5 5
Information Gain (S, weather) = 0.97095 - 0.550974 = 0.419976
2 3
Calculate the Weighted Entropy of Temp = 5 × 0 + ×0= 0
5
Information Gain (S, Temp) = 0.97095 - 0 = 0.97095
3 2
Calculate the Weighted Entropy of Humidity = 5 × 0.91829 + × 0 = 0.550974
5
Information Gain (S, Humidity) = 0.97095 - 0.550974 = 0.419976
3 2
Calculate the Weighted Entropy of Wind = 5 × 0.91829 + × 1 = 0.950974
5
Information Gain (S, Wind) = 0.97095 - 0.950974 = 0.019976
Choose the Best Feature to Split
Based on the information gain calculations:
By Prof. Ameya Saonerkar, RBU 8
Temperature: 0.97095
Humidity: 0.419976
Weather: 0.419976
Wind: 0.019976
The best feature to split on is `Temperature` because it has the highest information gain.
Example 4
Determine the root attribute use to construct a decision tree for the given dataset using the
ID3 Decision Tree algorithm. Consider measure of impurity as entropy.
chills runny nose headache fever Flu?
Y N Mild Y No
Y Y No N Yes
Y N Strong Y Yes
N Y Mild Y Yes
N N No N No
N Y Strong Y Yes
N Y Strong N No
Y Y Mild Y Yes
Solution:
ID3 Algorithm
1. Choose the root node: Select the attribute with the highest information gain as the
root node.
2. Create child nodes: For each possible value of the root attribute, create a child node.
3. Recursively repeat: Repeat steps 1 and 2 for each child node until all leaves are pure
(contain only one class) or a stopping criterion is met.
Step 1: Calculate the Entropy of the Target Variable (Flu?)
The entropy measures the impurity or randomness in the dataset.
Entropy of the target (Flu?):
𝑐
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) = − ∑ 𝑝𝑖 𝑙𝑜𝑔2 𝑝𝑖
𝑖=1
Where, pi is the proportion of class i in the dataset, and c is the number of classes.
For our dataset:
5
𝑝(Yes) = = 0.625
8
3
𝑝(No) = = 0.375
8
By Prof. Ameya Saonerkar, RBU 9
5 5 3 3
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝐹𝑙𝑢) = − (8 𝑙𝑜𝑔2 8 + 8 𝑙𝑜𝑔2 8) = 0.95443
Step 2: Calculate information gain for each attribute:
1. For the feature `chills`:
1. chills = Y: 4 samples out of 8 (3 Yes, 1 No)
2. chills = N: 4 samples out of 8 (2 Yes, 2 No)
The entropy for each group is calculated as follows:
3 3 1 1
Entropy (chills = Y): − (4 𝑙𝑜𝑔2 4 + 4 𝑙𝑜𝑔2 4) = 0.81127
2 2 2 2
Entropy (chills = N): − (4 𝑙𝑜𝑔2 4 + 4 𝑙𝑜𝑔2 4) = 1
4 4
Calculate the Weighted Entropy = 8 × 0.81127 + × 1 = 0.905635
8
Information Gain for chills:
Gain(S, Age) = Entropy (S) − Weighted Entropy = 0.95443 − 0.905635 = 0.048795
2. For the feature `runny nose`:
1. runny nose = Y: 5 samples out of 8 (4 Yes, 1 No)
2. runny nose = N: 3 samples out of 8 (1 Yes, 2 No)
The entropy for each group is calculated as follows:
4 4 1 1
Entropy (runny nose = Y): − (5 𝑙𝑜𝑔2 5 + 5 𝑙𝑜𝑔2 5) = 0.721928
1 1 2 2
Entropy (runny nose = N): − ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0.918295
3 3 3 3
5 3
Calculate the Weighted Entropy = 8 × 0.721928 + × 0.918295 = 0.795565
8
Information Gain for runny nose:
Gain(S, Age) = Entropy (S) − Weighted Entropy = 0.95443 − 0.795565 = 0.158865
3. For the feature `headache`:
By Prof. Ameya Saonerkar, RBU 10
1. headache = Strong: 3 samples out of 8 (2 Yes, 1 No)
2. headache = Mild: 3 samples out of 8 (2 Yes, 1 No)
3. headache = No: 2 samples out of 8 (1 Yes, 1 No)
The entropy for each group is calculated as follows:
2 2 1 1
Entropy (headache = Strong): − ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0.918295
3 3 3 3
2 2 1 1
Entropy (headache = Mild): − (3 𝑙𝑜𝑔2 3 + 3 𝑙𝑜𝑔2 3) = 0.918295
1 1 1 1
Entropy (headache = No): − (2 𝑙𝑜𝑔2 2 + 2 𝑙𝑜𝑔2 2) = 1.0
Calculate the Weighted Entropy =
3 3 2
× 0.918295 + × 0.918295 + × 1.0 = 0.938721
8 8 8
Information Gain for headache:
Gain(S, Age) = Entropy (S) − Weighted Entropy = 0.95443 − 0.938721 = 0.015709
4. For the feature `fever`:
1. fever = Y: 5 samples out of 8 (4 Yes, 1 No)
2. fever = N: 3 samples out of 8 (1 Yes, 2 No)
Which is the same as a runny nose.
The entropy for each group is calculated as follows:
4 4 1 1
Entropy (fever = Y): − (5 𝑙𝑜𝑔2 5 + 5 𝑙𝑜𝑔2 5) = 0.721928
1 1 2 2
Entropy (fever = N): − (3 𝑙𝑜𝑔2 3 + 3 𝑙𝑜𝑔2 3) = 0.918295
5 3
Calculate the Weighted Entropy = 8 × 0.721928 + 8
× 0.918295 = 0.795565
Information Gain for fever:
Gain(S, Age) = Entropy (S) − Weighted Entropy = 0.95443 − 0.795565 = 0.158865
Step 3: Choose the root node:
feature Entropy Entropy Calculated Calculated
weighted Information
entropy Gain
By Prof. Ameya Saonerkar, RBU 11
Chills Y=0.81127 N=1 0.905635 0.048795
runny nose Y=0.721928 N=0.918295 0.795565 0.158865
headache Strong, Mild = 0.918295 No=1 0.938721 0.015709
fever Y=0.721928 N=0.918295 0.795565 0.158865
Feature with the highest information gain becomes root node, in our case, “runny nose” and
“fever” both can be considered as root node.
Step 4: Split the data:
o If runny nose =Y:
▪ Entropy(Flu?) ≈ 0.721928
o If runny nose =N:
▪ Entropy(Flu?) = 0.918295
o If fever =Y:
▪ Entropy(Flu?) ≈ 0.721928
o If fever =N:
▪ Entropy(Flu?) = 0.918295
Step 5: Build subtrees:
o For runny nose = Y, continue splitting on the remaining attributes.
o For runny nose = N, continue splitting on the remaining attributes.
o For fever = Y, continue splitting on the remaining attributes.
o For fever = N, continue splitting on the remaining attributes.
Final Decision Tree:
Runny nose fever
/ \ / \
Y N Y N
/ \ / \ /\ /\
• The algorithm applies the same steps to select the root node and split the data to
each partition created.
• This recursive process continues until a stopping criterion is met. As stopping criteria
are not mentioned, further splitting will be continued
Conclusion: The decision tree constructed using the ID3 algorithm with entropy as the
impurity measure indicates that the presence or absence of a runny nose & fever are the most
significant factors in predicting whether a person has the flu based on the given dataset.
By Prof. Ameya Saonerkar, RBU 12