Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2020, IEEE Transactions on Knowledge and Data Engineering
…
16 pages
1 file
Clinical decision requires reasoning in the presence of imperfect data. DTs are a well-known decision support tool, owing to their interpretability, fundamental in safety-critical contexts such as medical diagnosis. However, learning DTs from uncertain data leads to poor generalization, and generating predictions for uncertain data hinders prediction accuracy. Several methods have suggested the potential of probabilistic decisions at the internal nodes in making DTs robust to uncertainty. Some approaches only employ probabilistic thresholds during evaluation. Others also consider the uncertainty in the learning phase, at the expense of increased computational complexity or reduced interpretability. The existing methods have not clarified the merit of a probabilistic approach in the distinct phases of DT learning, nor when the uncertainty is present in the training or the test data. We present a probabilistic DT approach that models measurement uncertainty as a noise distribution, independently realized: (1) when searching for the split thresholds, (2) when splitting the training instances, and (3) when generating predictions for unseen data. The soft training approaches (1, 2) achieved a regularizing effect, leading to significant reductions in DT size, while maintaining accuracy, for increased noise. Soft evaluation (3) showed no benefit in handling noise.
2019
Data mining and machine learning (ML) are increasingly at the core of many aspects of modern life. With growing concerns about the impact of relying on predictions we cannot understand, there is widespread agreement regarding the need for reliable interpretable models. One of the areas where this is particularly important is clinical decision-making. Specifically, explainable models have the potential to facilitate the elaboration of clinical guidelines and related decision-support tools. The presented research focuses on the improvement of decision tree (DT) learning, one of the most popular interpretable models, motivated by the challenges posed by clinical data. One of the limitations of interpretable DT algorithms is that they involve decisions based on strict thresholds, which can impair performance in the presence noisy measurements. In this regard, we proposed a probabilistic method that takes into account a model of the noise in the distinct learning phases. When considering...
Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies, 2019
Uncertainty is an intrinsic component of the clinical practice, which manifests itself in a variety of different forms. Despite the growing popularity of Machine Learning-based Decision Support Systems (ML-DSS) in the clinical domain, the effects of the uncertainty that is inherent in the medical data used to train and optimize these systems remain largely under-considered in the Machine Learning community, as well as in the health informatics one. A particularly common type of uncertainty arising in the clinical decision-making process is related to the ambiguity resulting from either lack of decisive information (lack of evidence) or excess of discordant information (lack of consensus). Both types of uncertainty create the opportunity for clinicians to abstain from making a clear-cut classification of the phenomenon under observation and consideration. In this work, we study a Machine Learning model endowed with the ability to directly work with both sources of imperfect information mentioned above. In order to investigate the possible trade-off between accuracy and uncertainty given by the possibility of abstention, we performed an evaluation of the considered model, against a variety of standard Machine Learning algorithms, on a real-world clinical classification problem. We report promising results in terms of commonly used performance metrics.
Annals of Operations Research, 2021
Understanding the data and reaching accurate conclusions are of paramount importance in the present era of big data. Machine learning and probability theory methods have been widely used for this purpose in various fields. One critically important yet less explored aspect is capturing and analyzing uncertainties in the data and model. Proper quantification of uncertainty helps to provide valuable information to obtain accurate diagnosis. This paper reviewed related studies conducted in the last 30 years (from 1991 to 2020) in handling uncertainties in medical data using probability theory and machine learning techniques. Medical data is more prone to uncertainty due to the presence of noise in the data. So, it is very important to have clean medical data without any noise to get accurate diagnosis. The sources of noise in the medical data need to be known to address this issue. Based on the medical data obtained by the physician, diagnosis of disease, and treatment plan are prescribed. Hence, the uncertainty is growing in healthcare and there is limited knowledge to address these problems. Our findings indicate that there are few challenges to be addressed in handling the uncertainty in medical raw data and new models. In this work, we have summarized various methods employed to overcome this problem. Nowadays, various novel deep learning techniques have been proposed to deal with such uncertainties and improve the performance in decision making.
International Journal of Advanced Research in Computer Science and Software Engineering , 2013
Classic decision sapling classifiers assist data whose values are usually known along with precise. We extend such classifiers to manage data using uncertain information .Value scepticism arises in numerous applications throughout the data collection process. Example reasons for uncertainty consist of measurement/quantization problems, data staleness, along with multiple repeated measurements. Along with uncertainty, the worth of any data item is normally represented certainly not by a unitary value, yet by numerous values being created a likelihood distribution. As opposed to abstracting unsure data by means of statistical derivatives (such while mean along with median), we see that the accuracy of a decision sapling classifier is usually much improved if the "complete information" of a data item (taking into account the likelihood density operate (pdf)) is utilized. We extend classical conclusion tree constructing algorithms to manage data tuples using uncertain prices. Extensive experiments have been conducted which usually show that this resulting classifiers will be more accurate than those utilizing value averages. I. INTRODUCTION Classification is a classical problem in machine learning and data mining. Given a set of training data tuples, each having a class label and being represented by a feature vector, the task is to algorithmically build a model that predicts the class label of an unseen test tuple based on the tuple's feature vector. One of the most popular classification models is the decision tree model. Decision trees are popular because they are practical and easy to understand. Rules can also be extracted from decision trees easily. Many algorithms, such as ID3 and C4.5, have been devised for decision tree construction. These algorithms are widely adopted and used in a wide range of applications such as image recognition, medical diagnosis, credit rating of loan applicants, scientific tests, fraud detection, and target marketing. In traditional decision tree classification, a feature (an attribute) of a tuple is either categorical or numerical. For the latter, a precise and definite point value is usually assumed. In many applications, however, data uncertainty is common. The value of a feature/attribute is thus best captured not by a single point value, but by a range of values giving rise to a probability distribution. A simple way to handle data uncertainty is to abstract probability distributions by summary statistics such as means and variances. We call this approach averaging. Another approach is to consider the complete information carried by the probability distributions to build a decision tree. We call this approach Distribution-based. In this paper, we study the problem of constructing decision tree classifiers on data with uncertain numerical attributes. Our goals are 1) to devise an algorithm for building decision trees from uncertain data using the Distribution-based approach, 2) to investigate whether the Distribution-based approach could lead to a higher classification accuracy compared with the Averaging approach, and 3) to establish a theoretical foundation on which pruning techniques are derived that can significantly improve the computational efficiency of the Distribution-based algorithms.
Journal of medical systems, 2002
In medical decision making (classification, diagnosing, etc.) there are many situations where decision must be made effectively and reliably. Conceptual simple decision making models with the possibility of automatic learning are the most appropriate for performing such tasks. Decision trees are a reliable and effective decision making technique that provide high classification accuracy with a simple representation of gathered knowledge and they have been used in different areas of medical decision making. In the paper we present the basic characteristics of decision trees and the successful alternatives to the traditional induction approach with the emphasis on existing and possible future applications in medicine.
IEEE Transactions on Information Technology in Biomedicine, 2007
Bayesian averaging (BA) over ensembles of decision models allows evaluation of the uncertainty of decisions that is of crucial importance for safety-critical applications such as medical diagnostics. The interpretability of the ensemble can also give useful information for experts responsible for making reliable decisions. For this reason, decision trees (DTs) are attractive decision models for experts. However, BA over such models makes an ensemble of DTs uninterpretable. In this paper, we present a new approach to probabilistic interpretation of Bayesian DT ensembles. This approach is based on the quantitative evaluation of uncertainty of the DTs, and allows experts to find a DT that provides a high predictive accuracy and confident outcomes. To make the BA over DTs feasible in our experiments, we use a Markov Chain Monte Carlo technique with a reversible jump extension. The results obtained from clinical data show that in terms of predictive accuracy, the proposed method outperforms the maximum a posteriori (MAP) method that has been suggested for interpretation of DT ensembles.
IJMRAP, 2022
To extract meaningful and non-negligible facts from large amounts of data for the extraction of patterns, anomalies, and correspondence information from large databases, data mining is used. Uncertain Data Implementation and Decision Tree Classifier Performance Evaluation. The study's goal is to build a decision tree from uncertain data, and existing systems have a number of limitations that need to be investigated further and resolved. Measurement errors, stale data, and repeated measurements all contribute to data uncertainty. There are numerous problems with classification, and this applies across a wide range of data mining applications. Data classification using decision trees is very popular because of their simple and robust structure. The accuracy of the decision tree for the uncertain data used is high because appropriate pdfs have been used. Improve the efficiency of a constructed tree by employing various pruning techniques. In comparison to other techniques, the proposed decision tree for uncertain data achieves higher efficiency. For the construction of the decision tree, this method uses classical algorithms that generate enormous numbers of data tuples (one for each decision). The proposed method achieves a better result because the execution time is shorter, and the system's efficiency is higher. The proposed work will be extended in the future to improve the data classifiers' pruning efficiency when building decision trees. This lays the groundwork for the rest of the research project.
IFIP Advances in Information and Communication Technology, 2011
A major drawback of most existing medical decision support systems is that they do not provide any indication about the uncertainty of each of their predictions. This paper addresses this problem with the use of a new machine learning framework for producing valid probabilistic predictions, called Venn Prediction (VP). More specifically, VP is combined with Neural Networks (NNs), which is one of the most widely used machine learning algorithms. The obtained experimental results on two medical datasets demonstrate empirically the validity of the VP outputs and their superiority over the outputs of the original NN classifier in terms of reliability.
2006
Accurate probability estimation generated by learning models is desirable in some practical applications, such as medical diagnosis. In this paper, we empirically study traditional decision-tree learning models and their variants in terms of probability estimation, measured by Conditional Log Likelihood (CLL). Furthermore, we also compare decision tree learning with other kinds of representative learning: naïve Bayes, Naïve Bayes Tree, Bayesian Network, K-Nearest Neighbors and Support Vector Machine with respect to probability estimation. From our experiments, we have several interesting observations. First, among various decision-tree learning models, C4.4 is the best in yielding precise probability estimation measured by CLL, although its performance is not good in terms of other evaluation criteria, such as accuracy and ranking. We provide an explanation for this and reveal the nature of CLL. Second, compared with other popular models, C4.4 achieves the best CLL. Finally, CLL does not dominate another wellestablished relevant measurement AUC (the Area Under the Curve of Receiver Operating Characteristics), which suggests that different decision-tree learning models should be used for different objectives. Our experiments are conducted on the basis of 36 UCI sample sets that cover a wide range of domains and data characteristics. We run all the models within a machine learning platform -Weka.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Medical Physics, 2007
Neural Computing and Applications, 2012
Journal of Medical Systems, 2000
Information Sciences, 2014
Information Sciences, 2006
2010 IEEE International Conference on Data Mining Workshops, 2010
International Journal of Computational Intelligence Systems, 2008
Lecture Notes in Computer Science, 2007
Proceedings of the International Symposium on Human Factors and Ergonomics in Health Care, 2019