Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2007, Lecture Notes in Computer Science
Classification plays an important role in medicine, especially for medical diagnosis. Health applications often require classifiers that minimize the total cost, including misclassifications costs and test costs. In fact, there are many reasons for considering costs in medicine, as diagnostic tests are not free and health budgets are limited. Our aim with this work was to define, implement and test a strategy for cost-sensitive learning. We defined an algorithm for decision tree induction that considers costs, including test costs, delayed costs and costs associated with risk. Then we applied our strategy to train and evaluate cost-sensitive decision trees in medical data. Built trees can be tested following some strategies, including group costs, common costs, and individual costs. Using the factor of "risk" it is possible to penalize invasive or delayed tests and obtain decision trees patient-friendly.
Expert Systems
Decision tree induction is a widely used technique for learning from data, which first emerged in the 1980s. In recent years, several authors have noted that in practice, accuracy alone is not adequate, and it has become increasingly important to take into consideration the cost of misclassifying the data. Several authors have developed techniques to induce cost-sensitive decision trees. There are many studies that include pair-wise comparisons of algorithms, but the comparison including many methods has not been conducted in earlier work. This paper aims to remedy this situation by investigating different cost-sensitive decision tree induction algorithms. A survey has identified 30 cost-sensitive decision tree algorithms, which can be organized into 10 categories. A representative sample of these algorithms has been implemented and an empirical evaluation has been carried. In addition, an accuracy-based look-ahead algorithm has been extended to a new cost-sensitive look-ahead algorithm and also evaluated. The main outcome of the evaluation is that an algorithm based on genetic algorithms, known as Inexpensive Classification with Expensive Tests, performed better over all the range of experiments thus showing that to make a decision tree cost-sensitive, it is better to include all the different types of costs, that is, cost of obtaining the data and misclassification costs, in the induction of the decision tree.
ACM Computing Surveys, 2013
The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field.
International Journal of Computer …, 2011
In data mining, classification is one of the significant techniques with applications in fraud detection, Artificial intelligence, Medical Diagnosis and many other fields. Classification of objects based on their features into predefined categories is a widely studied problem. Decision trees are very much useful to diagnose a patient problem by the physicians. Decision tree classifiers are used extensively for diagnosis of breast tumour in ultrasonic images, ovarian cancer and heart sound diagnosis. In this paper, performance of decision tree induction classifiers on various medical data sets in terms of accuracy and time complexity are analysed.
BMC Bioinformatics, 2009
Background: Most machine-learning classifiers output label predictions for new instances without indicating how reliable the predictions are. The applicability of these classifiers is limited in critical domains where incorrect predictions have serious consequences, like medical diagnosis. Further, the default assumption of equal misclassification costs is most likely violated in medical diagnosis.
Pattern Recognition Letters, 2010
This paper reports a new framework for test-cost sensitive classification. It introduces a new loss function definition, in which misclassification cost and cost of feature extraction are combined qualitatively and the loss is conditioned with current and estimated decisions as well as their consistency. This loss function definition is motivated with the following issues. First, for many applications, the relation between different types of costs can be expressed roughly and usually only in terms of ordinal relations, but not as a precise quantitative number. Second, the redundancy between features can be used to decrease the cost; it is possible not to consider a new feature if it is consistent with the existing ones. In this paper, we show the feasibility of the proposed framework for medical diagnosis problems. Our experiments demonstrate that this framework is efficient to significantly decrease feature extraction cost without decreasing accuracy.
Computers and Biomedical Research, 1993
This paper compares the performance of logistic regression to decision-tree induction in classifying patients as having acute cardiac ischemia. This comparison was performed using the database of 5,773 patients originally used to develop the logistic-regression tool and test it prospectively. Both the ability to classify cases and ability to estimate the probability o f ischemia were compared on the default tree generated by the C4 version of ID3. They were also compared on a tree optimized on the learning set by increased pruning of overspeci ed branches, and on a tree incorporating clinical considerations. Both the LR tool and the improved trees performed at a level fairly close to that of the physicians, although the LR tool de nitely performed better than the decision tree. There were a number of di erences in the performance of the two methods, shedding light on their strengths and weaknesses.
International Journal of Intelligent Systems and Applications in Engineering, 2021
A reliable and precise tool for medical machine learning is in demand. The diagnosis datasets are mostly unbalanced. To propose an accurate prediction tool for medical data we need an accurate machine-learning algorithm for unbalanced data classification. In binary class unbalanced medical dataset, accurate prediction of the minority class is important. Traditional classifiers designed to improve accuracy by giving more weight to the majority class. Existing techniques gives good results by accurately classifying the majority class. Despite the fact that they misclassify the minority cases, the total accuracy value does not reflect this. When the misclassification cost of minority class is high, research should focus on reducing the total misclassification cost. This paper presents a new cost-sensitive classification algorithm that classifies unbalanced data accurately without compromising the accuracy of the minority class. Our proposed minority-sensitive decision tree algorithm employs new splitting criteria called MSplit to ensure accurate prediction of the minority class. The proposed splitting criteria MSplit derived from the exclusive causes of the minority class. For our experiment, we mainly focused on the breast cancer dataset by considering its importance in women's health. Our proposed model shows good results as compared to the recent studies of breast cancer detection. It shows 0.074 misclassification cost that is the least among the other comparison methods. Our model improves the performance for other unbalanced medical datasets as well.
This paper develops a new algorithm for inducing cost-sensitive decision trees that is inspired by the multi-armed bandit problem, in which a player in a casino has to decide which slot machine (bandit) from a selection of slot machines is likely to pay out the most. Game theory proposes a solution to this multi-armed bandit problem by using a process of exploration and exploitation in which reward is maximized. This paper utilizes these concepts to develop a new algorithm by viewing the rewards as a reduction in costs, and utilizing the exploration and exploitation techniques so that a compromise between decisions based on accuracy and decisions based on costs can be found. The algorithm employs the notion of lever pulls in the multi-armed bandit game to select the attributes during decision tree induction, using a look-ahead methodology to explore potential attributes and exploit the attributes which maximizes the reward. The new algorithm is evaluated on 15 datasets and compared with six well-known algorithms J48, EG2, MetaCost, AdaCostM1, ICET and ACT. The results obtained show that the new multi-armed-based algorithm can produce more cost-effective trees without compromising accuracy. The paper also includes a critical appraisal of the limitations of the new algorithm and proposes avenues for further research.
Decision Making Based on Data Proceedings IASE 2019 Satellite Conference, 2019
Fast-and-frugal trees for classification/decision are at the intersection of three families of models: lexicographic, linear and tree-based. We briefly examine the classification performance of simple models when making inferences out of sample, in 11 medical data sets in terms of Receiver Operating Characteristics diagrams and predictive accuracy. The heuristic approaches, Naïve Bayes and fast- and-frugal trees, outperform models that are normatively optimal when fitting data. The success of fast-and-frugal trees lies in their ecological rationality: their construction exploits the structure of information in the data sets. The tool ARBOR, a digital learning tool, which is a plug-in to the freely available data-science education software CODAP can be used for constructing and interpreting fast- and-frugal classification and decision trees. This paper is an abridged version of work by Woike, Hoffrage & Martignon on the integration of classification and decision models into a common ...
Lecture Notes in Computer Science
We report a novel approach for designing test-cost sensitive classifiers that consider the misclassification cost together with the cost of feature extraction utilizing the consistency behavior for the first time. In this approach, we propose to use a new Bayesian decision theoretical framework in which the loss is conditioned with the current decision and the expected decisions after additional features are extracted as well as the consistency among the current and expected decisions. This approach allows us to force the feature extraction for samples for which the current and expected decisions are inconsistent. On the other hand, it forces not to extract any features in the case of consistency, leading to less costly but equally accurate decisions. In this work, we apply this approach to a medical diagnosis problem and demonstrate that it reduces the overall feature extraction cost up to 47.61 percent without decreasing the accuracy.
Journal of medical systems, 2002
In medical decision making (classification, diagnosing, etc.) there are many situations where decision must be made effectively and reliably. Conceptual simple decision making models with the possibility of automatic learning are the most appropriate for performing such tasks. Decision trees are a reliable and effective decision making technique that provide high classification accuracy with a simple representation of gathered knowledge and they have been used in different areas of medical decision making. In the paper we present the basic characteristics of decision trees and the successful alternatives to the traditional induction approach with the emphasis on existing and possible future applications in medicine.
2018
In classification, an algorithm learns to classify a given instance based on a set of observed attribute values. In many real world cases testing the value of an attribute incurs a cost. Furthermore, there can also be a cost associated with the misclassification of an instance. Cost sensitive classification attempts to minimize the expected cost of classification, by deciding after each observed attribute value, which attribute to measure next. In this paper we suggest Markov Decision Processes as a modeling tool for cost sensitive classification. We construct standard decision trees over all attribute subsets, and the leaves of these trees become the state space of our MDP. At each phase we decide on the next attribute to measure, balancing the cost of the measurement and the classification accuracy. We compare our approach to a set of previous approaches, showing our approach to work better for a range of misclassification costs.
Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies, 2019
Uncertainty is an intrinsic component of the clinical practice, which manifests itself in a variety of different forms. Despite the growing popularity of Machine Learning-based Decision Support Systems (ML-DSS) in the clinical domain, the effects of the uncertainty that is inherent in the medical data used to train and optimize these systems remain largely under-considered in the Machine Learning community, as well as in the health informatics one. A particularly common type of uncertainty arising in the clinical decision-making process is related to the ambiguity resulting from either lack of decisive information (lack of evidence) or excess of discordant information (lack of consensus). Both types of uncertainty create the opportunity for clinicians to abstain from making a clear-cut classification of the phenomenon under observation and consideration. In this work, we study a Machine Learning model endowed with the ability to directly work with both sources of imperfect information mentioned above. In order to investigate the possible trade-off between accuracy and uncertainty given by the possibility of abstention, we performed an evaluation of the considered model, against a variety of standard Machine Learning algorithms, on a real-world clinical classification problem. We report promising results in terms of commonly used performance metrics.
2012
Recently, machine learning algorithms have successfully entered large-scale real-world industrial applications (e.g. search engines and email spam filters). Here, the CPU cost during test-time must be budgeted and accounted for. In this paper, we address the challenge of balancing the test-time cost and the classifier accuracy in a principled fashion. The test-time cost of a classifier is often dominated by the computation required for feature extraction-which can vary drastically across features. We decrease this extraction time by constructing a tree of classifiers, through which test inputs traverse along individual paths. Each path extracts different features and is optimized for a specific subpartition of the input space. By only computing features for inputs that benefit from them the most, our cost-sensitive tree of classifiers can match the high accuracies of the current state-of-the-art at a small fraction of the computational cost.
Several authors have studied the problem of inducing decision trees that aim to minimize costs of misclassification and take account of costs of tests. The approaches adopted vary from modifying the information theoretic attribute selection measure used in greedy algorithms such as C4.5 to using methods such as bagging and boosting. This paper presents a new framework, based on game theory, which recognizes that there is a trade-off between the cost of using a test and the misclassification costs. Cost-sensitive learning is viewed as a Multi-Armed Bandit problem, leading to a novel cost-sensitive decision tree algorithm. The new algorithm is evaluated on five data sets and compared to six well known algorithms J48, EG2, MetaCost, AdaCostM1, ICET and ACT. The preliminary results are promising showing that the new multi-armed based algorithm can produce more cost-effective trees without compromising accuracy.
Informatics in Medicine Unlocked, 2021
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Neural Computing and Applications, 2012
Decision support systems help physicians and also play an important role in medical decision-making. They are based on different models, and the best of them are providing an explanation together with an accurate, reliable and quick response. This paper presents a decision support tool for the detection of breast cancer based on three types of decision tree classifiers. They are single decision tree (SDT), boosted decision tree (BDT) and decision tree forest (DTF). Decision tree classification provides a rapid and effective method of categorizing data sets. Decision-making is performed in two stages: training the classifiers with features from Wisconsin breast cancer data set, and then testing. The performance of the proposed structure is evaluated in terms of accuracy, sensitivity, specificity, confusion matrix and receiver operating characteristic (ROC) curves. The results showed that the overall accuracies of SDT and BDT in the training phase achieved 97.07 % with 429 correct classifications and 98.83 % with 437 correct classifications, respectively. BDT performed better than SDT for all performance indices than SDT. Value of ROC and Matthews correlation coefficient (MCC) for BDT in the training phase achieved 0.99971 and 0.9746, respectively, which was superior to SDT classifier. During validation phase, DTF achieved 97.51 %, which was superior to SDT (95.75 %) and BDT (97.07 %) classifiers. Value of ROC and MCC for DTF achieved 0.99382 and 0.9462, respectively. BDT showed the best performance in terms of sensitivity, and SDT was the best only considering speed. Keywords Computer-aided diagnosis (CAD) Á Decision support systems (DSS) Á Decision tree classification Á Single decision tree Á Boosted decision tree Á Decision tree forest Á k-fold cross-validation
2019
Data mining and machine learning (ML) are increasingly at the core of many aspects of modern life. With growing concerns about the impact of relying on predictions we cannot understand, there is widespread agreement regarding the need for reliable interpretable models. One of the areas where this is particularly important is clinical decision-making. Specifically, explainable models have the potential to facilitate the elaboration of clinical guidelines and related decision-support tools. The presented research focuses on the improvement of decision tree (DT) learning, one of the most popular interpretable models, motivated by the challenges posed by clinical data. One of the limitations of interpretable DT algorithms is that they involve decisions based on strict thresholds, which can impair performance in the presence noisy measurements. In this regard, we proposed a probabilistic method that takes into account a model of the noise in the distinct learning phases. When considering...
2018
Machine learning models are increasing in popularity in many domains as they are shown to be able to solve difficult problems. However, selecting a model to implement when there are various alternatives is a difficult problem. Receiver operating characteristic (ROC) curves are useful for selecting binary classification models for real world problems. However, ROC curves only consider the misclassification cost of the classifier. The total cost of a classification system includes various other types of cost including implementation, computation, and feature costs. To extend the ROC analysis to include this additional cost information, the ROC Convex Hull with Cost (ROCCHC) method is introduced. This method extends the ROC Convex Hull (ROCCH) method, which is used to select potentially optimal classifiers in the ROC space using misclassification cost, by selecting potentially optimal classifiers using this additional cost information. The ROCCHC method is tested using three binary cla...
International Journal of Trend in Scientific Research and Development, 2019
Data mining techniques are rapidly developed for many applications. In recent year, Data mining in healthcare is an emerging field research and development of intelligent medical diagnosis system. Classification is the major research topic in data mining. Decision trees are popular methods for classification. In this paper many decision tree classifiers are used for diagnosis of medical datasets. AD Tree, J48, NB Tree, Random Tree and Random Forest algorithms are used for analysis of medical dataset. Heart disease dataset, Diabetes dataset and Hepatitis disorder dataset are used to test the decision tree models.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.