0% found this document useful (0 votes)
29 views5 pages

Decision Tree

A decision tree is a supervised classification model used for predicting a discrete target feature through a series of Boolean tests on input features. There are two main types: classification trees for categorical outcomes and regression trees for continuous values. The process of building a decision tree involves selecting the best attribute for splitting the data, recursive splitting, applying stopping criteria, and pruning to avoid overfitting.

Uploaded by

raja2017pillai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views5 pages

Decision Tree

A decision tree is a supervised classification model used for predicting a discrete target feature through a series of Boolean tests on input features. There are two main types: classification trees for categorical outcomes and regression trees for continuous values. The process of building a decision tree involves selecting the best attribute for splitting the data, recursive splitting, applying stopping criteria, and pruning to avoid overfitting.

Uploaded by

raja2017pillai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

What is a decision tree?

A decision tree is a simple model for supervised classification. It is used for classifying a single discrete
target feature. Each internal node performs a Boolean test on an input feature .The edges are labeled with
the values of that input feature. Each leaf node specifies a value for the target feature.

Classification of Decision Tree


We have mainly two types of decision tree based on the nature of the target variable: classification
trees and regression trees.
· Classification trees: They are designed to predict categorical outcomes means they classify
data into different classes. They can determine whether an email is “spam” or “not spam”
based on various features of the email.
· Regression trees : These are used when the target variable is continuous It predict numerical
values rather than categories. For example a regression tree can estimate the price of a house
based on its size, location, and other features.
How is a Decision Tree Formed?
Building a decision tree involves several steps that ensure the final model is both accurate and
interpretable. Here’s a breakdown of the process:
1. Selecting the Best Attribute
The first step in building a decision tree is selecting the best attribute to split the dataset. This is done
using criteria such as information gain or Gini index. The attribute that results in the highest gain (or
lowest impurity) is chosen as the root node.
2. Recursive Splitting
Once the best attribute is selected, the data is split into subsets based on the value of that attribute. This
process is repeated recursively for each subset, creating branches and sub-branches until the data is
perfectly split or meets a stopping criterion (e.g., maximum depth or minimum number of instances per
leaf).
3. Stopping Criteria
The decision tree continues splitting the data until one of the following conditions is met:
· All instances in a node belong to the same class.
· The maximum depth of the tree is reached.
· The number of instances in a node falls below a specified threshold.
4. Tree Pruning
After the decision tree is fully grown, it may be necessary to prune it to remove overfitting. Pruning helps
simplify the model by removing branches that do not contribute much to the accuracy of the model on
unseen data
Attribute Selection Measures
Choosing the right attribute to split the data at each node is a critical step in building an accurate decision
tree. The most common methods used for attribute selection are information gain and Gini index.
Information Gain
Information Gain is based on the concept of entropy, which measures the amount of disorder or
uncertainty in the dataset. The goal is to select the attribute that reduces entropy the most when the data is
split.

The formula for entropy is:


Entropy = - Σ (pi * log2(pi)),
where "pi" represents the probability of the i-th class occurring within the dataset and "log2" is the base-2
logarithm.
The formula for information gain in a decision tree is:
Information Gain = Entropy(Parent Node) - (Weighted average of Entropy of Child Nodes);
essentially, it calculates the difference between the entropy of the parent node before splitting and the
weighted average entropy of the child nodes after splitting, with the best split being the one that
maximizes information gain.
Gini Index
The Gini index measures the impurity of a dataset. A lower Gini index indicates a purer dataset,
meaning most of the instances belong to a single class.
The Gini index is computed as:
Gini = 1 - ∑(pi)^2
where "pi" represents the proportion of class "i" in the dataset, and the summation is taken over all classes
"i" present in the data; essentially, it calculates 1 minus the sum of the squared probabilities of each class
within a given node.
Comparison
Both information gain and Gini index serve a similar purpose, but information gain is more commonly
used in decision trees for classification tasks, while Gini index is often preferred in CART
(Classification and Regression Trees) algorithms.

Example
 To illustrate the operation of ID3, consider the learning task represented by the
training examples of below table.
 Here the target attribute PlayTennis, which can have values yes or no for different days.
 Consider the first step through the algorithm, in which the topmost node of the
decision tree is created.

Day Outlook Temperature Humidity Wind PlayTennis


D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No

Given a collection S, containing positive and negative examples of some target concept, the entropy of
S relative to this Boolean classification is
Where,

p+ is the proportion of positive examples in S


p- is the proportion of negative examples in S.

Example:
Suppose S is a collection of 14 examples of some boolean concept, including 9 positive and 5 negative
examples. Then the entropy of S relative to this boolean classification is

 ID3 determines the information gain for each candidate attribute (i.e., Outlook,
Temperature, Humidity, and Wind), then selects the one with highest information
gain.
 The information gain values for all four attributes are
Gain(S, Outlook) = 0.246
Gain(S, Humidity) = 0.151
Gain(S, Wind) =
0.048 Gain(S, Temperature) =
0.029

 According to the information gain measure, the Outlook attribute provides the best
prediction of the target attribute, PlayTennis, over the training examples. Therefore,
Outlook is selected as the decision attribute for the root node, and branches are created
below the root for each of its possible values i.e., Sunny, Overcast, and Rain.
SRain = { D4, D5, D6, D10, D14}

Gain (SRain , Humidity) = 0.970 – (2/5)1.0 – (3/5)0.917 = 0.019

Gain (SRain , Temperature) =0.970 – (0/5)0.0 – (3/5)0.918 – (2/5)1.0 = 0.019

Gain (SRain , Wind) =0.970 – (3/5)0.0 – (2/5)0.0 = 0.970

You might also like