0% found this document useful (0 votes)
9 views12 pages

Numerical Decision Tree

Uploaded by

giyib10741
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views12 pages

Numerical Decision Tree

Uploaded by

giyib10741
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

A decision tree is a machine-learning model used for classification and regression tasks.

It
works by splitting the dataset into subsets based on the values of the input features, creating
a tree-like structure. Here’s a numerical example of a decision tree for a classification problem.

Example 1

Imagine a dataset where we want to predict whether a person will buy a sports car (`Yes` or
`No`) based on age and income level. Here's a small dataset:

Person Age Income Buys Sports Car


1 25 High Yes
2 35 Medium No
3 45 Low No
4 25 Medium Yes
5 35 High Yes
6 45 Medium No
7 35 Low No
8 25 Low No
9 45 High No

Solution:
Step 1: Calculate Entropy of the Target Variable

The first step in building a decision tree is to calculate the entropy of the target variable
(`Buys_Sports_Car`):

Step 2: Calculate Information Gain for Each Feature

Information Gain is calculated for each feature to decide the best attribute for splitting.

By Prof. Ameya Saonerkar, RBU 1


For the feature `Age`:

1. Age = 25: 3 samples out of 9 (2 Yes, 1 No)


2. Age = 35: 3 samples out of 9 (1 Yes, 2 No)
3. Age = 45: 3 samples out of 9 (0 Yes, 3 No)

The entropy for each group is calculated as follows:

2 2 1 1
Entropy (Age = 25): − (3 𝑙𝑜𝑔2 3 + 3 𝑙𝑜𝑔2 3) = 0.918

1 1 2 2
Entropy (Age = 35): − (3 𝑙𝑜𝑔2 3 + 3 𝑙𝑜𝑔2 3) = 0.918

0 0 3 3
Entropy (Age = 45): − (3 𝑙𝑜𝑔2 3 + 3 𝑙𝑜𝑔2 3) = 0

3 3 3
Calculate the Weighted Entropy = 9 × 0.918 + × 0.918 + × 0 = 0.612
9 9

Information Gain for Age:

Gain(S, Age) = Entropy (S) − Weighted Entropy = 0.9185 − 0.612 = 0.306

Similarly, we calculate for `Income`:

1. Income = High: 3 samples (2 Yes, 1 No)


2. Income = Medium: 3 samples (1 Yes, 2 No)
3. Income = Low: 3 samples (0 Yes, 3 No)

The entropy for each group is calculated as follows:

2 2 1 1
Entropy (Income = High): − (3 𝑙𝑜𝑔2 3 + 3 𝑙𝑜𝑔2 3) = 0.918

1 1 2 2
Entropy (Income = Medium): − (3 𝑙𝑜𝑔2 3 + 3 𝑙𝑜𝑔2 3) = 0.918

0 0 3 3
Entropy (Income = Low): − ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0
3 3 3 3

3 3 3
Calculate the Weighted Entropy = 9 × 0.918 + × 0.918 + × 0 = 0.612
9 9

Information Gain for Age:

Gain(S, Age) = Entropy (S) − Weighted Entropy = 0.918 − 0.612 = 0.306

By Prof. Ameya Saonerkar, RBU 2


Step 3: Choose the Best Feature to Split

The feature with the highest information gain is chosen to split the node. In this case, `Age`
has similar information gain (0.306) compared to `Income` (0.306), so the decision tree will
split on `Age` or ‘Income’.

Step 4: Repeat the Process

Repeat the above steps for each subset of the dataset created by the `Age` split until
stopping criteria are met (e.g., maximum depth, minimum number of samples per node, or
no further information gain).

Conclusion

This numerical example illustrates how a decision tree algorithm selects the best features for
splitting by calculating entropy and information gain, leading to a tree that can be used for
predicting new instances.

Example 2

We use a simple numerical example of building a decision tree using a small dataset. We'll
use a binary classification problem to predict whether a person will play tennis based on
the weather conditions.

Dataset
Day Outlook Temperature Humidity Wind PlayTennis
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No

Step 1: Calculate the Entropy of the Entire Dataset

By Prof. Ameya Saonerkar, RBU 3


Step 2: Calculate the Entropy for Each Feature

We’ll calculate the entropy for the feature `Outlook` first.

For `Outlook`:

Sunny: 5 samples (2 Yes, 3 No)


Overcast: 4 samples (4 Yes, 0 No)
Rain: 5 samples (3 Yes, 2 No)

Entropy for each subset:

By Prof. Ameya Saonerkar, RBU 4


Information Gain for `Outlook`:

Information Gain (S, Outlook) = Entropy (S) − Weighted Entropy = 0.9409 − 0.6935 =
0.2474

Step 3: Calculate the Information Gain for Other Features

Following the same process, calculate the information gain for `Temperature`, `Humidity`,
and `Wind`.

Information Gain for `Temperature`:

- Hot: 4 samples (2 Yes, 2 No)


- Mild: 6 samples (4 Yes, 2 No)
- Cool: 4 samples (3 Yes, 1 No)

The entropy for each group is calculated as follows:

2 2 2 2
Entropy (Hot): − (4 𝑙𝑜𝑔2 4 + 4 𝑙𝑜𝑔2 4) = 1

4 4 2 2
Entropy (Mild): − (6 𝑙𝑜𝑔2 6 + 6 𝑙𝑜𝑔2 6) = 0.918

By Prof. Ameya Saonerkar, RBU 5


3 3 1 1
Entropy (Cool): − (4 𝑙𝑜𝑔2 4 + 4 𝑙𝑜𝑔2 4) = 0.811

Weighted entropy and information gain for `Temperature`:

4 6 4
Calculate the Weighted Entropy = 14 × 1 + × 0.918 + × 0.811 = 0.910
14 14

Information Gain (S, Temperature) = 0.9409 - 0.910 = 0.030

Information Gain for `Humidity`:

High: 7 samples (3 Yes, 4 No)


Normal: 7 samples (6 Yes, 1 No)

The entropy for each group is calculated as follows:

3 3 4 4
Entropy (High): − (7 𝑙𝑜𝑔2 7 + 7 𝑙𝑜𝑔2 7) = 0.985

6 6 1 1
Entropy (Normal): − (7 𝑙𝑜𝑔2 7 + 7 𝑙𝑜𝑔2 7) = 0.591

Weighted entropy and information gain for `Humidity`:

7 7
Calculate the Weighted Entropy = 14 × 0.985 + × 0.591 = 0.788
14

Information Gain (S, Temperature) = 0.9409 - 0.788 = 0.1529

Information Gain for `Wind`:

Weak: 8 samples (6 Yes, 2 No)


Strong: 6 samples (3 Yes, 3 No)

The entropy for each group is calculated as follows:

6 6 2 2
Entropy (Weak): − (8 𝑙𝑜𝑔2 8 + 8 𝑙𝑜𝑔2 8) = 0.811

3 3 3 3
Entropy (Strong): − (6 𝑙𝑜𝑔2 6 + 6 𝑙𝑜𝑔2 6) = 1

Weighted entropy and information gain for `Wind`:

8 6
Calculate the Weighted Entropy = × 0.811 + × 1 = 0.892
14 14

Information Gain (S, Temperature) = 0.9409 - 0.892 = 0.0489

By Prof. Ameya Saonerkar, RBU 6


Step 4: Choose the Best Feature to Split

Based on the information gain calculations:

Outlook: 0.2474
Temperature: 0.03
Humidity: 0.1529
Wind: 0.0489

The best feature to split on is `Outlook` because it has the highest information gain.

Example 3

Determine the root attribute used to construct a decision tree for a sample dataset using
the ID3 Decision Tree algorithm. Consider the measure of impurity as entropy.

Weather Temperature Humidity Wind Play


Sunny Hot High Weak No
Sunny Hot High Strong No
Rainy Cold High Weak Yes
Rainy Cold Low Strong Yes
Sunny Cold Low Weak Yes

Step 1: Calculate Entropy for Target Attribute (Play)

3
𝑃Yes =
5
2
𝑃No =
5

By Prof. Ameya Saonerkar, RBU 7


3 3 2 2
Entropy (S) = − ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0.97095
5 5 5 5
Step 2: Calculate the Entropy for Each Feature

Features Entropy
Weather Sunny- 3 samples- (2 Yes, 1 No) 2 2 1 1
− ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0.91829
3 3 3 3

Rainy- 2 sample- (2 Yes, 0 No) 2 2 0 0


− ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0
2 2 2 2
Temp Hot- 2 sample- (0 Yes, 2 No) 0 0 2 2
− ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0
2 2 2 2

Cold-3 sample- (3 Yes, 0 No) 3 3 0 0


− ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0
3 3 3 3
Humidity High-3 samples- (1 Yes, 2 No) 1 1 2 2
− ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0.91829
3 3 3 3
Low-2 sample- (2 Yes, 0 No)
2 2 0 0
− ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0
2 2 2 2
Wind Weak-3 samples- (2 Yes, 1 No) 2 2 1 1
− ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0.91829
3 3 3 3
Strong-2 samples- (1 Yes, 1 No)
1 1 1 1
− ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 1
2 2 2 2

3 2
Calculate the Weighted Entropy of weather = × 0.91829 + × 0 = 0.550974
5 5

Information Gain (S, weather) = 0.97095 - 0.550974 = 0.419976

2 3
Calculate the Weighted Entropy of Temp = 5 × 0 + ×0= 0
5

Information Gain (S, Temp) = 0.97095 - 0 = 0.97095

3 2
Calculate the Weighted Entropy of Humidity = 5 × 0.91829 + × 0 = 0.550974
5

Information Gain (S, Humidity) = 0.97095 - 0.550974 = 0.419976

3 2
Calculate the Weighted Entropy of Wind = 5 × 0.91829 + × 1 = 0.950974
5

Information Gain (S, Wind) = 0.97095 - 0.950974 = 0.019976

Choose the Best Feature to Split

Based on the information gain calculations:

By Prof. Ameya Saonerkar, RBU 8


Temperature: 0.97095
Humidity: 0.419976
Weather: 0.419976
Wind: 0.019976

The best feature to split on is `Temperature` because it has the highest information gain.

Example 4

Determine the root attribute use to construct a decision tree for the given dataset using the
ID3 Decision Tree algorithm. Consider measure of impurity as entropy.

chills runny nose headache fever Flu?


Y N Mild Y No
Y Y No N Yes
Y N Strong Y Yes
N Y Mild Y Yes
N N No N No
N Y Strong Y Yes
N Y Strong N No
Y Y Mild Y Yes

Solution:

ID3 Algorithm

1. Choose the root node: Select the attribute with the highest information gain as the
root node.
2. Create child nodes: For each possible value of the root attribute, create a child node.
3. Recursively repeat: Repeat steps 1 and 2 for each child node until all leaves are pure
(contain only one class) or a stopping criterion is met.

Step 1: Calculate the Entropy of the Target Variable (Flu?)

The entropy measures the impurity or randomness in the dataset.

Entropy of the target (Flu?):


𝑐

𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) = − ∑ 𝑝𝑖 𝑙𝑜𝑔2 𝑝𝑖
𝑖=1
Where, pi is the proportion of class i in the dataset, and c is the number of classes.
For our dataset:
5
𝑝(Yes) = = 0.625
8
3
𝑝(No) = = 0.375
8

By Prof. Ameya Saonerkar, RBU 9


5 5 3 3
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝐹𝑙𝑢) = − (8 𝑙𝑜𝑔2 8 + 8 𝑙𝑜𝑔2 8) = 0.95443

Step 2: Calculate information gain for each attribute:

1. For the feature `chills`:

1. chills = Y: 4 samples out of 8 (3 Yes, 1 No)


2. chills = N: 4 samples out of 8 (2 Yes, 2 No)

The entropy for each group is calculated as follows:

3 3 1 1
Entropy (chills = Y): − (4 𝑙𝑜𝑔2 4 + 4 𝑙𝑜𝑔2 4) = 0.81127
2 2 2 2
Entropy (chills = N): − (4 𝑙𝑜𝑔2 4 + 4 𝑙𝑜𝑔2 4) = 1

4 4
Calculate the Weighted Entropy = 8 × 0.81127 + × 1 = 0.905635
8

Information Gain for chills:

Gain(S, Age) = Entropy (S) − Weighted Entropy = 0.95443 − 0.905635 = 0.048795

2. For the feature `runny nose`:

1. runny nose = Y: 5 samples out of 8 (4 Yes, 1 No)


2. runny nose = N: 3 samples out of 8 (1 Yes, 2 No)

The entropy for each group is calculated as follows:

4 4 1 1
Entropy (runny nose = Y): − (5 𝑙𝑜𝑔2 5 + 5 𝑙𝑜𝑔2 5) = 0.721928
1 1 2 2
Entropy (runny nose = N): − ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0.918295
3 3 3 3

5 3
Calculate the Weighted Entropy = 8 × 0.721928 + × 0.918295 = 0.795565
8

Information Gain for runny nose:

Gain(S, Age) = Entropy (S) − Weighted Entropy = 0.95443 − 0.795565 = 0.158865

3. For the feature `headache`:

By Prof. Ameya Saonerkar, RBU 10


1. headache = Strong: 3 samples out of 8 (2 Yes, 1 No)
2. headache = Mild: 3 samples out of 8 (2 Yes, 1 No)
3. headache = No: 2 samples out of 8 (1 Yes, 1 No)

The entropy for each group is calculated as follows:

2 2 1 1
Entropy (headache = Strong): − ( 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 ) = 0.918295
3 3 3 3
2 2 1 1
Entropy (headache = Mild): − (3 𝑙𝑜𝑔2 3 + 3 𝑙𝑜𝑔2 3) = 0.918295
1 1 1 1
Entropy (headache = No): − (2 𝑙𝑜𝑔2 2 + 2 𝑙𝑜𝑔2 2) = 1.0

Calculate the Weighted Entropy =


3 3 2
× 0.918295 + × 0.918295 + × 1.0 = 0.938721
8 8 8

Information Gain for headache:

Gain(S, Age) = Entropy (S) − Weighted Entropy = 0.95443 − 0.938721 = 0.015709

4. For the feature `fever`:

1. fever = Y: 5 samples out of 8 (4 Yes, 1 No)


2. fever = N: 3 samples out of 8 (1 Yes, 2 No)

Which is the same as a runny nose.

The entropy for each group is calculated as follows:

4 4 1 1
Entropy (fever = Y): − (5 𝑙𝑜𝑔2 5 + 5 𝑙𝑜𝑔2 5) = 0.721928
1 1 2 2
Entropy (fever = N): − (3 𝑙𝑜𝑔2 3 + 3 𝑙𝑜𝑔2 3) = 0.918295

5 3
Calculate the Weighted Entropy = 8 × 0.721928 + 8
× 0.918295 = 0.795565

Information Gain for fever:

Gain(S, Age) = Entropy (S) − Weighted Entropy = 0.95443 − 0.795565 = 0.158865

Step 3: Choose the root node:

feature Entropy Entropy Calculated Calculated


weighted Information
entropy Gain

By Prof. Ameya Saonerkar, RBU 11


Chills Y=0.81127 N=1 0.905635 0.048795
runny nose Y=0.721928 N=0.918295 0.795565 0.158865
headache Strong, Mild = 0.918295 No=1 0.938721 0.015709
fever Y=0.721928 N=0.918295 0.795565 0.158865

Feature with the highest information gain becomes root node, in our case, “runny nose” and
“fever” both can be considered as root node.

Step 4: Split the data:

o If runny nose =Y:


▪ Entropy(Flu?) ≈ 0.721928
o If runny nose =N:
▪ Entropy(Flu?) = 0.918295
o If fever =Y:
▪ Entropy(Flu?) ≈ 0.721928
o If fever =N:
▪ Entropy(Flu?) = 0.918295

Step 5: Build subtrees:

o For runny nose = Y, continue splitting on the remaining attributes.


o For runny nose = N, continue splitting on the remaining attributes.
o For fever = Y, continue splitting on the remaining attributes.
o For fever = N, continue splitting on the remaining attributes.

Final Decision Tree:

Runny nose fever


/ \ / \
Y N Y N
/ \ / \ /\ /\

• The algorithm applies the same steps to select the root node and split the data to
each partition created.
• This recursive process continues until a stopping criterion is met. As stopping criteria
are not mentioned, further splitting will be continued

Conclusion: The decision tree constructed using the ID3 algorithm with entropy as the
impurity measure indicates that the presence or absence of a runny nose & fever are the most
significant factors in predicting whether a person has the flu based on the given dataset.

By Prof. Ameya Saonerkar, RBU 12

You might also like