0% found this document useful (0 votes)
51 views5 pages

Decision Tree in Machine Learning in Predictive Analytics

Uploaded by

kontham sridhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views5 pages

Decision Tree in Machine Learning in Predictive Analytics

Uploaded by

kontham sridhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Decision Tree in Machine Learning


A decision tree is a supervised learning algorithm used for both classification


and regression tasks. It has a hierarchical tree structure which consists of a
root node, branches, internal nodes and leaf nodes. It It works like a
flowchart help to make decisions step by step where:
 Internal nodes represent attribute tests
 Branches represent attribute values
 Leaf nodes represent final decisions or predictions.
Decision trees are widely used due to their interpretability, flexibility and low
preprocessing needs.

How Does a Decision Tree Work?


A decision tree splits the dataset based on feature values to create pure
subsets ideally all items in a group belong to the same class. Each leaf node
of the tree corresponds to a class label and the internal nodes are feature-
based decision points. Let’s understand this with an example.

Decisi
on Tree
Let’s consider a decision tree for predicting whether a customer will buy a
product based on age, income and previous purchases: Here's how the
decision tree works:
1. Root Node (Income)
First Question: "Is the person’s income greater than $50,000?"
 If Yes, proceed to the next question.
 If No, predict "No Purchase" (leaf node).
2. Internal Node (Age):
If the person’s income is greater than $50,000, ask: "Is the person’s age
above 30?"
 If Yes, proceed to the next question.
 If No, predict "No Purchase" (leaf node).
3. Internal Node (Previous Purchases):
 If the person is above 30 and has made previous purchases, predict
"Purchase" (leaf node).
 If the person is above 30 and has not made previous purchases, predict
"No Purchase" (leaf node).

Decision making with 2 Decision Tree


Example: Predicting Whether a Customer Will Buy a Product Using Two
Decision Trees
Tree 1: Customer Demographics
First tree asks two questions:
1. "Income > $50,000?"
 If Yes, Proceed to the next question.
 If No, "No Purchase"
2. "Age > 30?"
 Yes: "Purchase"
 No: "No Purchase"
Tree 2: Previous Purchases
"Previous Purchases > 0?"
 Yes: "Purchase"
 No: "No Purchase"
Once we have predictions from both trees, we can combine the results to
make a final prediction. If Tree 1 predicts "Purchase" and Tree 2 predicts "No
Purchase", the final prediction might be "Purchase" or "No Purchase"
depending on the weight or confidence assigned to each tree. This can be
decided based on the problem context.
Information Gain and Gini Index in Decision Tree
Till now we have discovered the basic intuition and approach of how decision
tree works, so lets just move to the attribute selection measure of decision
tree. We have two popular attribute selection measures used:
1. Information Gain
Information Gain tells us how useful a question (or feature) is for splitting
data into groups. It measures how much the uncertainty decreases after the
split. A good question will create clearer groups and the feature with the
highest Information Gain is chosen to make the decision.
For example if we split a dataset of people into "Young" and "Old" based on
age and all young people bought the product while all old people did not, the
Information Gain would be high because the split perfectly separates the two
groups with no uncertainty left
 Suppose SS is a set of instances AA is an attribute, SvSv is the subset
of SS, vv represents an individual value that the attribute AA can take
and Values (AA) is the set of all possible values of AA then

−∑vA∣Sv∣∣S∣.Entropy(Sv)Gain(S,A)=Entropy(S)−∑vA∣S∣∣Sv∣
Gain(S,A)=Entropy(S)

.Entropy(Sv)
 Entropy: is the measure of uncertainty of a random variable it
characterizes the impurity of an arbitrary collection of examples. The
higher the entropy more the information content.
For example if a dataset has an equal number of "Yes" and "No" outcomes
(like 3 people who bought a product and 3 who didn’t), the entropy is high
because it’s uncertain which outcome to predict. But if all the outcomes are
the same (all "Yes" or all "No") the entropy is 0 meaning there is no
uncertainty left in predicting the outcome
Suppose SS is a set of instances, AA is an attribute, SvSv is the subset
of SS with AA= vv and Values (AA) is the set of all possible values of AA,

Gain(S,A)=Entropy(S)−∑vϵValues(A) ∣Sv∣∣S∣.Entropy(Sv)
then

Gain(S,A)=Entropy(S)−∑vϵValues(A)∣S∣∣Sv∣.Entropy(Sv)
Example:
For the set X = {a,a,a,b,b,b,b,b}
Total instances: 8
Instances of b: 5
Instances of a: 3

Entropy H(X)=[(38)log⁡238+(58)log⁡258]=−[0.375(−1.415)+0.625(−
0.678)]=−(−0.53−0.424)=0.954 Entropy H(X)=[(83)log283+(85)log2
85]=−[0.375(−1.415)+0.625(−0.678)]=−(−0.53−0.424)=0.954
Building Decision Tree using Information Gain the essentials
 Start with all training instances associated with the root node
 Use info gain to choose which attribute to label each node with
 Recursively construct each subtree on the subset of training instances
that would be classified down that path in the tree.
 If all positive or all negative training instances remain, the label that node
“yes" or “no" accordingly
 If no attributes remain label with a majority vote of training instances left
at that node
 If no instances remain label with a majority vote of the parent's training
instances.
Example: Now let us draw a Decision Tree for the following data using
Information gain. Training set: 3 features and 2 classes
X Y Z C

1 1 1 I

1 1 0 I
X Y Z C

0 0 1 II

1 0 0 II

Here, we have 3 features and 2 output classes. To build a decision tree


using Information gain. We will take each of the features and calculate the
information for each feature.

You might also like