Tree Pruning

Tree pruning

Uploaded by

sravyasri2806

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

103 views3 pages

Tree Pruning

Tree pruning

Uploaded by

sravyasri2806

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Tree Pruning:

When a decision tree is built, many of the branches will reflect anomalies in the training data
due to noise or outliers. Tree pruning methods address this problem of overfitting the data.
Such methods typically use statistical measures to remove the least-reliable branches. An
unpruned tree and a pruned version of it are shown in Figure 6. Pruned trees tend to be smaller
and less complex and, thus, easier to comprehend. They are usually faster and better at correctly
classifying independent test data (i.e., of previously unseen tuples) than unpruned trees.

Figure 6: An unpruned decision tree and a pruned version of it.

“How does tree pruning work?”

There are two common approaches to tree pruning: prepruning and postpruning.
1. In the prepruning approach, a tree is “pruned” by halting its construction early (e.g., by
deciding not to further split or partition the subset of training tuples at a given node).
Upon halting, the node becomes a leaf. The leaf may hold the most frequent class among
the subset tuples or the probability distribution of those tuples.

When constructing a tree, measures such as statistical significance, information gain, Gini
index, and so on, can be used to assess the goodness of a split. If partitioning the tuples at
a node would result in a split that falls below a prespecified threshold, then further
partitioning of the given subset is halted. There are difficulties, however, in choosing an
appropriate threshold. High thresholds could result in oversimplified trees, whereas low
thresholds could result in very little simplification.

2. The second and more common approach is postpruning, which removes subtrees from a
“fully grown” tree. A subtree at a given node is pruned by removing its branches and
replacing it with a leaf. The leaf is labeled with the most frequent class among the subtree
being replaced. For example, notice the subtree at node “A3?” in the unpruned tree of
Figure 6. Suppose that the most common class within this subtree is “class B.” In the
pruned version of the tree, the subtree in question is pruned by replacing it with the leaf
“class B.”
The cost complexity pruning algorithm used in CART is an example of the postpruning approach.
This approach considers the cost complexity of a tree to be a function of the number of leaves in
the tree and the error rate of the tree (where the error rate is the percentage of tuples
misclassified by the tree). It starts from the bottom of the tree. For each internal node, N, it
computes the cost complexity of the subtree at N, and the cost complexity of the subtree at N if it
were to be pruned (i.e., replaced by a leaf node). The two values are compared. If pruning the
subtree at node N would result in a smaller cost complexity, then the subtree is pruned.
Otherwise, it is kept.

A pruning set of class-labeled tuples is used to estimate cost complexity. This set is independent
of the training set used to build the unpruned tree and of any test set used for accuracy
estimation. The algorithm generates a set of progressively pruned trees. In general, the smallest
decision tree that minimizes the cost complexity is preferred. C4.5 uses a method called
pessimistic pruning, which is similar to the cost complexity method in that it also uses error rate
estimates to make decisions regarding subtree pruning. Pessimistic pruning, however, does not
require the use of a prune set. Instead, it uses the training set to estimate error rates. Recall that
an estimate of accuracy or error based on the training set is overly optimistic and, therefore,
strongly biased. The pessimistic pruning method therefore adjusts the error rates obtained from
the training set by adding a penalty, so as to counter the bias incurred.

Rather than pruning trees based on estimated error rates, we can prune trees based on the
number of bits required to encode them. The “best” pruned tree is the one that minimizes the
number of encoding bits. This method adopts the MDL principle. The basic idea is that the
simplest solution is preferred. Unlike cost complexity pruning, it does not require an independent
set of tuples.

Alternatively, prepruning and postpruning may be interleaved for a combined approach.

Postpruning requires more computation than prepruning, yet generally leads to a more reliable
tree. No single pruning method has been found to be superior over all others. Although some
pruning methods do depend on the availability of additional data for pruning, this is usually not a
concern when dealing with large databases. Although pruned trees tend to be more compact than
their unpruned counterparts, they may still be rather large and complex. Decision trees can suffer
from repetition and replication (Figure 7), making them overwhelming to interpret. Repetition
occurs when an attribute is repeatedly tested along a given branch of the tree
(e.g., “age < 60?” followed by “age < 45?” and so on). In replication, duplicate subtrees exist
within the tree. These situations can impede the accuracy and comprehensibility of a decision
tree.

The use of multivariate splits (splits based on a combination of attributes) can prevent these
problems. Another approach is to use a different form of knowledge representation, such as rules,
instead of decision trees, which shows how a rule-based classifier can be constructed by
extracting IF-THEN rules from a decision tree.
Figure 7: An example of:
(a) Subtree repetition, where an attribute is repeatedly tested along a given branch
of the tree (e.g., age) and
(b) Subtree replication, where duplicate subtrees exist within a tree (e.g., the subtree
headed by the node “credit rating?”)

J48 Decision Tree Pruning in Weka
No ratings yet
J48 Decision Tree Pruning in Weka
3 pages
C 45
No ratings yet
C 45
6 pages
Decision Tree DT
No ratings yet
Decision Tree DT
20 pages
Prunning 2
No ratings yet
Prunning 2
21 pages
Data Mining Algorithms Based On Decision Trees
No ratings yet
Data Mining Algorithms Based On Decision Trees
9 pages
Data Science Model Optimization
No ratings yet
Data Science Model Optimization
3 pages
AIML Ak
No ratings yet
AIML Ak
21 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
2 pages
Pruning Techniques in Decision Trees
No ratings yet
Pruning Techniques in Decision Trees
3 pages
Unit 3
No ratings yet
Unit 3
31 pages
Business Intelligence Unit 5
No ratings yet
Business Intelligence Unit 5
12 pages
A Comparative Analysis of Methods For Pruning Decision Trees
No ratings yet
A Comparative Analysis of Methods For Pruning Decision Trees
16 pages
Decision Tree
No ratings yet
Decision Tree
2 pages
DS Tech M 3 1
No ratings yet
DS Tech M 3 1
13 pages
洪淑文 20201108012790
No ratings yet
洪淑文 20201108012790
11 pages
Optimization of C4.5 Decision Tree Algorithm For Data Mining Application
No ratings yet
Optimization of C4.5 Decision Tree Algorithm For Data Mining Application
5 pages
Unit-3 Decision Tree Learning (Februray 26, 2024)
No ratings yet
Unit-3 Decision Tree Learning (Februray 26, 2024)
51 pages
Aiml QB With Ans - 075736
No ratings yet
Aiml QB With Ans - 075736
69 pages
Unit Ivnotes
No ratings yet
Unit Ivnotes
19 pages
ML Assign2 (1572)
No ratings yet
ML Assign2 (1572)
2 pages
ML Unit-3
No ratings yet
ML Unit-3
5 pages
Decision Trees For Mining Data Streams
No ratings yet
Decision Trees For Mining Data Streams
27 pages
An Empirical Comparison of Pruning Methods For Decision Tree Induction
No ratings yet
An Empirical Comparison of Pruning Methods For Decision Tree Induction
17 pages
Unit 5
No ratings yet
Unit 5
14 pages
The C4.5 Algorithm: A Literature Review
No ratings yet
The C4.5 Algorithm: A Literature Review
6 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Tree
No ratings yet
Tree
7 pages
Parte 1-Data Mining
No ratings yet
Parte 1-Data Mining
7 pages
Unit-4 DM
No ratings yet
Unit-4 DM
15 pages
Decision Tree Classification Overview
No ratings yet
Decision Tree Classification Overview
37 pages
Suitability of Various Intelligent Tree Based Classifiers For Diagnosing Noisy Medical Data
No ratings yet
Suitability of Various Intelligent Tree Based Classifiers For Diagnosing Noisy Medical Data
12 pages
List of Figures Chapter 1: State of The Art
No ratings yet
List of Figures Chapter 1: State of The Art
25 pages
DWM Solution May 2019
No ratings yet
DWM Solution May 2019
9 pages
Lecture 12
No ratings yet
Lecture 12
19 pages
Data Minin1
No ratings yet
Data Minin1
104 pages
T1L2 Classification Trees
No ratings yet
T1L2 Classification Trees
58 pages
Unit 3
No ratings yet
Unit 3
25 pages
Decision Tree
No ratings yet
Decision Tree
28 pages
Unit 4
No ratings yet
Unit 4
33 pages
Data Mining: Decision Trees Explained
No ratings yet
Data Mining: Decision Trees Explained
13 pages
AIML Micro
No ratings yet
AIML Micro
14 pages
Decision Trees: Building & Interpretation Guide
100% (1)
Decision Trees: Building & Interpretation Guide
26 pages
2013 Facilitating Decision Support Through Decision Tree
No ratings yet
2013 Facilitating Decision Support Through Decision Tree
5 pages
Unit-Iii: Classification and Prediction
No ratings yet
Unit-Iii: Classification and Prediction
21 pages
Unit 15
No ratings yet
Unit 15
12 pages
Chapter 5. Paper 1: Fast Rule-Based Classification Using P-Trees 5.1. Abstract
No ratings yet
Chapter 5. Paper 1: Fast Rule-Based Classification Using P-Trees 5.1. Abstract
22 pages
Decision Trees and Rule Learning Techniques
No ratings yet
Decision Trees and Rule Learning Techniques
57 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Decision Tree Pruning Techniques Explained
No ratings yet
Decision Tree Pruning Techniques Explained
7 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
Liu 2000
No ratings yet
Liu 2000
10 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
FDS - Iv Unit
No ratings yet
FDS - Iv Unit
18 pages
Classification and Regression Trees As Alternatives To Regression
No ratings yet
Classification and Regression Trees As Alternatives To Regression
2 pages
Knowledge Mining Using Classification Through Clustering
No ratings yet
Knowledge Mining Using Classification Through Clustering
6 pages
DWM Exp6 C49
No ratings yet
DWM Exp6 C49
15 pages
MLT Unit-3 Important Questions
No ratings yet
MLT Unit-3 Important Questions
8 pages
Main Algorithms Used in Machine Learning Lecture Notes
No ratings yet
Main Algorithms Used in Machine Learning Lecture Notes
26 pages
NOTES
No ratings yet
NOTES
18 pages
Full Monitoring and Evaluation of Production Processes An Analysis of The Automotive Industry Anton Panda Ebook All Chapters
100% (1)
Full Monitoring and Evaluation of Production Processes An Analysis of The Automotive Industry Anton Panda Ebook All Chapters
55 pages
Mathematics - Grade 9 - First Term Revision
No ratings yet
Mathematics - Grade 9 - First Term Revision
9 pages
Electric Actuators for Valves & Dampers
No ratings yet
Electric Actuators for Valves & Dampers
16 pages
Unit-III DS Search Trees
No ratings yet
Unit-III DS Search Trees
69 pages
Servo Electra Transformer Oil
No ratings yet
Servo Electra Transformer Oil
2 pages
A6-Governing Scheme Write Up
No ratings yet
A6-Governing Scheme Write Up
4 pages
Laban Engineer
No ratings yet
Laban Engineer
5 pages
The D and F Block Elements
100% (1)
The D and F Block Elements
9 pages
Chapter - 2 - Energy - Energy Transfer and General Energy Analysis
No ratings yet
Chapter - 2 - Energy - Energy Transfer and General Energy Analysis
35 pages
Finite Element Analysis of Flat Slab With CalcPad
No ratings yet
Finite Element Analysis of Flat Slab With CalcPad
13 pages
Weak Acids and Their Ionization
No ratings yet
Weak Acids and Their Ionization
71 pages
Thermal Energy Storage Optimization
No ratings yet
Thermal Energy Storage Optimization
106 pages
Normal Distribution Problem Set
No ratings yet
Normal Distribution Problem Set
2 pages
Msubbu GATEwayChE
100% (3)
Msubbu GATEwayChE
115 pages
Gas Solenoid Valves and Accessories Guide
No ratings yet
Gas Solenoid Valves and Accessories Guide
1 page
Rakit PC Anda - Komputer Medan Toko Komputer Gaming Terlengkap
No ratings yet
Rakit PC Anda - Komputer Medan Toko Komputer Gaming Terlengkap
3 pages
Cost Estimate for Building Alteration in Colaba
No ratings yet
Cost Estimate for Building Alteration in Colaba
11 pages
GE en N AGui MDTAW SE MOD Proprts Trngls
No ratings yet
GE en N AGui MDTAW SE MOD Proprts Trngls
3 pages
Key Concepts in Electrochemistry and Corrosion
No ratings yet
Key Concepts in Electrochemistry and Corrosion
7 pages
Islam Mohamed CV
No ratings yet
Islam Mohamed CV
2 pages
KISA ISC Preparatory Mathematics
No ratings yet
KISA ISC Preparatory Mathematics
8 pages
TensorMask A Foundation For Dense Object Segmentation
No ratings yet
TensorMask A Foundation For Dense Object Segmentation
12 pages
SM 32
No ratings yet
SM 32
6 pages
AI Exam: CSE & ISE Students
No ratings yet
AI Exam: CSE & ISE Students
10 pages
Tenute Striscianti, Rotary Seals
No ratings yet
Tenute Striscianti, Rotary Seals
220 pages
Engineering Practices For Building Quality Software
No ratings yet
Engineering Practices For Building Quality Software
127 pages
Mercedes-Benz WDB2110411B324751 AllSystemDTC 20250722141937
No ratings yet
Mercedes-Benz WDB2110411B324751 AllSystemDTC 20250722141937
6 pages
Characteristics of Contemporary Architecture
No ratings yet
Characteristics of Contemporary Architecture
4 pages
Wafs Sigwx Upgrade
No ratings yet
Wafs Sigwx Upgrade
25 pages
VCF126 - Breville Barista Max - English Instruction Booklet
No ratings yet
VCF126 - Breville Barista Max - English Instruction Booklet
42 pages

Tree Pruning

Uploaded by

Tree Pruning

Uploaded by

Tree Pruning:

Figure 6: An unpruned decision tree and a pruned version of it.

“How does tree pruning work?”

Alternatively, prepruning and postpruning may be interleaved for a combined approach.

You might also like