0% found this document useful (0 votes)
12 views20 pages

Decision Tree

The document discusses attribute selection measures in decision trees, including information gain, Gini index, entropy, and chi-square, which are used to determine how to split data. It also explains tree-pruning techniques, specifically post-pruning and pre-pruning, to mitigate overfitting by either removing non-essential nodes after full tree growth or limiting tree growth through hyperparameters. These methods aim to enhance model accuracy while simplifying the decision tree structure.

Uploaded by

lakshmichinnu724
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views20 pages

Decision Tree

The document discusses attribute selection measures in decision trees, including information gain, Gini index, entropy, and chi-square, which are used to determine how to split data. It also explains tree-pruning techniques, specifically post-pruning and pre-pruning, to mitigate overfitting by either removing non-essential nodes after full tree growth or limiting tree growth through hyperparameters. These methods aim to enhance model accuracy while simplifying the decision tree structure.

Uploaded by

lakshmichinnu724
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

DECISION TREE

Attribute selection measures


Attribute selection measures in decision trees include entropy, information gain, Gini index, gain
ratio, reduction in variance, and chi-square. These measures are also known as splitting rules.

 Information gain
Measures how much information a feature provides about the class. It's the decrease in entropy after
splitting the dataset.
 Gini index
Also known as Gini impurity, it calculates the probability of a feature being incorrectly classified
when selected randomly.
 Entropy
Measures the impurity of a dataset. It's used to decide how a decision tree can divide the information.
 Chi-square
Used for categorical features.
Tree-Pruning

When a decision tree is built to its full depth, it often overfits to the training data. To combat overfitting, two
techniques are used:

Post-pruning and Pre-pruning.

1. Post-pruning (Cost Complexity Pruning)


Post-pruning involves first allowing the decision tree to grow fully, and then removing parts of the tree that do
not improve its performance.

How it works:
[Link] the Tree Fully:
The decision tree is initially constructed without any constraints, allowing it to overfit on the training data.

[Link] Node Importance:


The tree is then evaluated to identify nodes and subtrees that do not contribute significantly to the accuracy of
the model.
[Link] Subtrees:

Nodes that do not add significant value are converted into leaf nodes. For instance, if a node has 90% “Yes”

and 10% “No” outcomes, further splitting may not be beneficial, so the subtree is pruned.

4. Simplify the Tree:

Reducing tree complexity lowers overfitting while maintaining accuracy, which is particularly useful for

small datasets.
Pre-Pruning

• In pre-pruning, hyperparameters such as max_depth and max_features are set before the tree is

fully constructed to limit its growth.

• max_depth: Limits the maximum depth of the tree.

• max_features: Restricts the number of features considered for splitting at each node.

• This technique reduces the risk of overfitting by preventing the tree from growing too deep and

capturing noise in the data.

You might also like