Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2015, International Journal for Scientific Research and Development
…
5 pages
1 file
Decision trees are strong predictors which are used to explicitly represent large data sets. An efficient pruning method will prune or eliminate the non-predictive parts of the model and generate a small and accurate model. This paper presents an overview of the issues present in decision trees and the pruning techniques. We evaluated the results of our pruning method on a variety of machine learning data sets from UCI machine learning repository and found that it generates a concise and accurate model.
Pattern Analysis and …, 1997
In this paper, we address the problem of retrospectively pruning decision trees induced from data, according to a topdown approach. This problem has received considerable attention in the areas of pattern recognition and machine learning, and many distinct methods have been proposed in literature. We make a comparative study of six well-known pruning methods with the aim of understanding their theoretical foundations, their computational complexity, and the strengths and weaknesses of their formulation. Comments on the characteristics of each method are empirically supported. In particular, a wide experimentation performed on several data sets leads us to opposite conclusions on the predictive accuracy of simplified trees from some drawn in the literature. We attribute this divergence to differences in experimental designs. Finally, we prove and make use of a property of the reduced error pruning method to obtain an objective evaluation of the tendency to overprune/underprune observed in each method. Index Terms-Decision trees, top-down induction of decision trees, simplification of decision trees, pruning and grafting operators, optimal pruning, comparative studies.
International …, 2010
The development of computer technology has enhanced the people's ability to produce and collect data. Data mining techniques can be effectively utilized for analyzing the data to discover hidden knowledge. One of the well known and efficient techniques is decision trees, due to easy understanding structural output. But they may not always be easy to understand due to very big structural output. To overcome this short coming pruning can be used as a key procedure .It removes overusing noisy, conflicting data, so as to have better generalization. However, In pruning the problem of how to make a trade-off between classification accuracy and tree size has not been well solved. In this paper, firstly we propose a new pruning method aiming on both classification accuracy and tree size. Based upon the method, we introduce a simple decision tree pruning technique, and evaluated the hypothesis-Does our new pruning method yields Better and Compact decision trees? The experimental results are verified by using benchmark datasets from UCI machine learning repository. The results indicate that our new tree pruning method is a feasible way of pruning decision trees.
This document serves as an introduction to the Pruning of Decision Trees. Pruning, which serves to find a sparse subnetwork in a dense network that has the same overall accuracy, helps in reducing the space requirements and cost in operating the network. An introduction to the different approaches to pruning, types of pruning has been mentioned. The Lottery Ticket Hypothesis [1] has been mentioned in order to support the advantages of pruning, along with the strategies employed to do so. Alpha-beta Pruning, which often finds its use in multiplayer gaming to determine the next best moves to be made by a machine, has also been discussed. The merits and demerits of pruning so mentioned help in determining whether pruning would be a feasible option or not.
IRJET, 2021
This document serves as an introduction to the Pruning of Decision Trees. Pruning, which serves to find a sparse subnetwork in a dense network that has the same overall accuracy, helps in reducing the space requirements and cost in operating the network. An introduction to the different approaches to pruning, types of pruning has been mentioned. The Lottery Ticket Hypothesis [1] has been mentioned in order to support the advantages of pruning, along with the strategies employed to do so. Alpha-beta Pruning, which often finds its use in multiplayer gaming to determine the next best moves to be made by a machine, has also been discussed. The merits and demerits of pruning so mentioned help in determining whether pruning would be a feasible option or not.
Applied Stochastic Models …, 1999
International Journal of Computer Applications, 2010
Classification is an important problem in data mining. Given a database of records, each with a class label, a classifier generates a concise and meaningful description for each class that can be used to classify subsequent records. A number of popular classifiers construct decision trees to generate class models. These classifiers first build a decision tree and then prune subtrees from the decision tree in a subsequent pruning phase to improve accuracy and prevent "overfitting". In this paper, the different pruning methodologies available & their various features are discussed. Also the effectiveness of pruning is evaluated in terms of complexity and classification accuracy by applying C4.5 decision tree classification algorithm on Credit Card Database with pruning and without pruning. Instead of classifying the transactions either fraud or non-fraud the transactions are classified in four risk levels which is an innovative concept.
1999
Abstract Recent empirical studies revealed two surprising pathologies of several common decision tree pruning algorithms. First, tree size is often a linear function of training set size, even when additional tree structure yields no increase in accuracy. Second, building trees with data in which the class label and the attributes are independent often results in large trees. In both cases, the pruning algorithms fail to control tree growth as one would expect them to.
2007
Automatic Design of Algorithms through Evolution (ADATE) is a machine learning system for program synthesis with automatic invention of recursive help functions. It is well suited for automatic improvement of other machine learning algorithms since it is difficult to design such algorithms based on theory alone which means that experimental tuning, optimization and even design of such algorithms is essential for the machine learning practitioner. To demonstrate the feasibility and usefulness of “learning how to learn” through program evolution, we used the ADATE system to automatically rewrite the code for the so-called error based pruning that is an important part of Quinlan’s C4.5 decision tree learning algorithm. We evaluated the resulting novel pruning algorithm on a variety of machine learning data sets from the UCI machine learning repository and found that it generates trees with seemingly better generalizing ability. The same meta-learning may be applied to most machine lear...
Random Forest is an ensemble machine learning method developed by Leo Breiman in 2001. Since then, it has been considered the state-of-the-art solution in machine learning applications. Compared to the other ensemble methods, random forests exhibit superior predictive performance. However, empirical and statistical studies prove that the random forest algorithm generates unnecessarily large number of base decision trees. This may cost high computational efficiency, predictive time, and occasional decrease in effectiveness. In this paper, Authors survey existing random forest pruning techniques and compare the performance between them. The research revolves around both the static and dynamic pruning technique and analyses the scope of improving the performance of random forest by techniques including generating diverse and accurate decision trees, selecting high performance subset of decision trees, genetic algorithms and other state of art methods, among others.
IEEE, 2021
The decision tree is the most effective classification method. However, the results of the decision tree can show errors due to overfitting or if the data is too noisy. This may cause the tree to be too big with unnecessary nodes and branches. To handle the error rate pruning is done in the decision tree. Using pruning, the scope of the tree is cut short only keeping the necessary nodes and branches. Pruning is possible in two manners, that is, prepruning where the pruning of the tree is done while the tree is in the process of construction and post pruning is done after the tree is fully constructed, the size is reduced to the expected tree. In this paper, several techniques of both pre-pruning and post-pruning are described to get an overall better understanding of with method to use based on the type of data. The performance of the pruning methods based on various datasets are measured.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Machine Learning, 2008
Proceedings of the Eighth Annual Conference on Computational Learning Theory, 1995
Machine Learning: ECML-95, 1995
Int. J. Comput. Syst. Signals, 2000
Machine Learning, 1989
Expert Systems with Applications, 2012
Lecture Notes in Computer Science, 2002
2012 IEEE International Conference on Control System, Computing and Engineering, 2012
Computers & Operations Research, 2007
Journal of Intelligent Systems, 2021
Knowledge-Based Systems, 2002