Cost-Complexity Pruning of Random Forests

Ravi Kiran

Cost-Complexity Pruning of Random Forests

Ravi Kiran

2017, Lecture Notes in Computer Science

visibility

…

description

11 pages

link

1 file

Random forests perform boostrap-aggregation by sampling the training samples with replacement. This enables the evaluation of out-of-bag error which serves as a internal cross-validation mechanism. Our motivation lies in the using of the unsampled training samples to improve the ensemble of decision trees. In this paper we study the effect of using the out-of-bag samples to improve the generalization error first of the decision trees and second the random forest by post-pruning. A preliminary empirical study on four UCI repository datasets show consistent decrease in the size of the forests without considerable loss in accuracy.

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Computer Science & Information Technology (CS & IT) Computer Science Conference Proceedings (CSCP)

Random Forest is an ensemble machine learning method developed by Leo Breiman in 2001. Since then, it has been considered the state-of-the-art solution in machine learning applications. Compared to the other ensemble methods, random forests exhibit superior predictive performance. However, empirical and statistical studies prove that the random forest algorithm generates unnecessarily large number of base decision trees. This may cost high computational efficiency, predictive time, and occasional decrease in effectiveness. In this paper, Authors survey existing random forest pruning techniques and compare the performance between them. The research revolves around both the static and dynamic pruning technique and analyses the scope of improving the performance of random forest by techniques including generating diverse and accurate decision trees, selecting high performance subset of decision trees, genetic algorithms and other state of art methods, among others.

Log In

Cost-Complexity Pruning of Random Forests

Sign up for access to the world's latest research

Related papers

Related papers

Related topics