Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012, Methods of Information in Medicine
SummaryBackground: Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem.Objectives: The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities.Methods: Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosi...
Surgery, 2011
Background. Diagnosing acute appendicitis clinically is still difficult. We developed random forests, support vector machines, and artificial neural network models to diagnose acute appendicitis. Methods. Between January 2006 and December 2008, patients who had a consultation session with surgeons for suspected acute appendicitis were enrolled. Seventy-five percent of the data set was used to construct models including random forest, support vector machines, artificial neural networks, and logistic regression. Twenty-five percent of the data set was withheld to evaluate model performance. The area under the receiver operating characteristic curve (AUC) was used to evaluate performance, which was compared with that of the Alvarado score. Results. Data from a total of 180 patients were collected, 135 used for training and 45 for testing. The mean age of patients was 39.4 years (range, 16-85). Final diagnosis revealed 115 patients with and 65 without appendicitis. The AUC of random forest, support vector machines, artificial neural networks, logistic regression, and Alvarado was 0.98, 0.96, 0.91, 0.87, and 0.77, respectively. The sensitivity, specificity, positive, and negative predictive values of random forest were 94%, 100%, 100%, and 87%, respectively. Random forest performed better than artificial neural networks, logistic regression, and Alvarado. Conclusion. We demonstrated that random forest can predict acute appendicitis with good accuracy and, deployed appropriately, can be an effective tool in clinical decision making.
International Journal of Advanced Research in Computer Science
The healthcare applications frequently collect and store the patient data (mostly multivariate) to examine the history of the treatment and thereby enhance the effectiveness of treatment. The efficient treatment to the patient depends on the performance of the machine learning models used for analytics tasks of patient data. It is convenient to have a machine learning classification model in a healthcare application to predict the probability of an observation belonging to each possible class rather than predicting a class value directly for any disease classification problem. Such predicted probabilities are required to be calibrated to assist the overall support and confidence of any machine learning classification model used in many healthcare applications. In this paper, the predicted probabilities are studied to diagnose and i mprove the calibration of models used for probabilistic classification. The general performance of selected classification models on the two latest wart skin disease treatment data is also reported.
BMC Bioinformatics, 2009
Background: Most machine-learning classifiers output label predictions for new instances without indicating how reliable the predictions are. The applicability of these classifiers is limited in critical domains where incorrect predictions have serious consequences, like medical diagnosis. Further, the default assumption of equal misclassification costs is most likely violated in medical diagnosis.
IFIP Advances in Information and Communication Technology, 2011
A major drawback of most existing medical decision support systems is that they do not provide any indication about the uncertainty of each of their predictions. This paper addresses this problem with the use of a new machine learning framework for producing valid probabilistic predictions, called Venn Prediction (VP). More specifically, VP is combined with Neural Networks (NNs), which is one of the most widely used machine learning algorithms. The obtained experimental results on two medical datasets demonstrate empirically the validity of the VP outputs and their superiority over the outputs of the original NN classifier in terms of reliability.
2006
Accurate probability estimation generated by learning models is desirable in some practical applications, such as medical diagnosis. In this paper, we empirically study traditional decision-tree learning models and their variants in terms of probability estimation, measured by Conditional Log Likelihood (CLL). Furthermore, we also compare decision tree learning with other kinds of representative learning: naïve Bayes, Naïve Bayes Tree, Bayesian Network, K-Nearest Neighbors and Support Vector Machine with respect to probability estimation. From our experiments, we have several interesting observations. First, among various decision-tree learning models, C4.4 is the best in yielding precise probability estimation measured by CLL, although its performance is not good in terms of other evaluation criteria, such as accuracy and ranking. We provide an explanation for this and reveal the nature of CLL. Second, compared with other popular models, C4.4 achieves the best CLL. Finally, CLL does not dominate another wellestablished relevant measurement AUC (the Area Under the Curve of Receiver Operating Characteristics), which suggests that different decision-tree learning models should be used for different objectives. Our experiments are conducted on the basis of 36 UCI sample sets that cover a wide range of domains and data characteristics. We run all the models within a machine learning platform -Weka.
2017
Background: Probabilistic assessments of clinical care are essential for quality care. Yet, machine learning, which supports this care process has been limited to categorical results. To maximize its usefulness, it is important to find novel approaches that calibrate the ML output with a likelihood scale. Current state-of-the-art calibration methods are generally accurate and applicable to many ML models, but improved granularity and accuracy of such methods would increase the information available for clinical decision making. This novel non-parametric Bayesian approach is demonstrated on a variety of data sets, including simulated classifier outputs, biomedical data sets from the University of California, Irvine (UCI) Machine Learning Repository, and a clinical data set built to determine suicide risk from the language of emergency department patients. Results: The method is first demonstrated on support-vector machine (SVM) models, which generally produce well-behaved, well understood scores. The method produces calibrations that are comparable to the state-of-the-art Bayesian Binning in Quantiles (BBQ) method when the SVM models are able to effectively separate cases and controls. However, as the SVM models' ability to discriminate classes decreases, our approach yields more granular and dynamic calibrated probabilities comparing to the BBQ method. Improvements in granularity and range are even more dramatic when the discrimination between the classes is artificially degraded by replacing the SVM model with an ad hoc k-means classifier. Conclusions: The method allows both clinicians and patients to have a more nuanced view of the output of an ML model, allowing better decision making. The method is demonstrated on simulated data, various biomedical data sets and a clinical data set, to which diverse ML methods are applied. Trivially extending the method to (non-ML) clinical scores is also discussed.
Health Care Management Science, 2014
The aims of supervised machine learning (ML) applications fall into three broad categories: classification, ranking, and calibration/probability estimation. Many ML methods and evaluation techniques relate to the first two. Nevertheless, there are many applications where having an accurate probability estimate is of great importance. Deriving accurate probabilities from the output of a ML method is therefore an active area of research, resulting in several methods to turn a ranking into class probability estimates. In this manuscript we present a method, splined empirical probabilities, based on the receiver operating characteristic (ROC) to complement existing algorithms such as isotonic regression. Unlike most other methods it works with a cumulative quantity, the ROC curve, and as such can be tagged onto an ROC analysis with minor effort. On a diverse set of measures of the quality of probability estimates (Hosmer-Lemeshow, Kullback-Leibler divergence, differences in the cumulative distribution function) using simulated and real health care data, our approach compares favourably with the standard calibration method, the pool adjacent violators algorithm used to perform isotonic regression.
Frontiers in Cellular and Infection Microbiology
Background and AimsThis study aimed to develop an interpretable random forest model for predicting severe acute pancreatitis (SAP).MethodsClinical and laboratory data of 648 patients with acute pancreatitis were retrospectively reviewed and randomly assigned to the training set and test set in a 3:1 ratio. Univariate analysis was used to select candidate predictors for the SAP. Random forest (RF) and logistic regression (LR) models were developed on the training sample. The prediction models were then applied to the test sample. The performance of the risk models was measured by calculating the area under the receiver operating characteristic (ROC) curves (AUC) and area under precision recall curve. We provide visualized interpretation by using local interpretable model-agnostic explanations (LIME).ResultsThe LR model was developed to predict SAP as the following function: -1.10-0.13×albumin (g/L) + 0.016 × serum creatinine (μmol/L) + 0.14 × glucose (mmol/L) + 1.63 × pleural effusio...
Mathematics
In this work, we investigated the prognosis of three medical data specifically, breast cancer, heart disease, and prostate cancer by using 10 machine learning models. We applied all 10 models to each dataset to identify patterns in them. Furthermore, we use the models to diagnose risk factors that increases the chance of these diseases. All the statistical learning techniques discussed were grouped into linear and nonlinear models based on their similarities and learning styles. The models performances were significantly improved by selecting models while taking into account the bias-variance tradeoffs and using cross-validation for selecting the tuning parameter. Our results suggests that no particular class of models or learning style dominated the prognosis and diagnosis for all three medical datasets. However nonlinear models gave the best predictive performance for breast cancer data. Linear models on the other hand gave the best predictive performance for heart disease data an...
—This paper discuss about the important role of classification algorithms in clinical predictions , two case studies one for breast cancer and other for heart disease prediction with help of classification data mining techniques is presented in this paper. Online freely accessible data is used for the said case studies. Used data is publicly available data on internet consisting of 909 records for heart disease and 699 for breast cancer. C4.5 and the C5.0 Two well-known decision tree algorithms used to get the rules for predictions, and these rules used for improving the quality of an open source Pathology Management System based on Care2x.Performances of these algorithms are also compared. This Paper will further discuss about the importance of open source software in healthcare as well as how a pathology management system can adopt Evidence Based Medicine (EBM). EBM is a new and important approach which can greatly improve decision making in health care. EBM's task is to prevent, diagnose and medicate diseases using medical evidence [5].Clinical decisions must be based on scientific evidence that demonstrates effectiveness. This paper is basically extension of our previous work " A Prototype of Cancer/Heart Disease Prediction Model Using Data Mining " .
IEEE Reviews in Biomedical Engineering, 2020
Clinical decision-making in healthcare is already being influenced by predictions or recommendations made by data-driven machines. Numerous machine learning applications have appeared in the latest clinical literature, especially for outcome prediction models, with outcomes ranging from mortality and cardiac arrest to acute kidney injury and arrhythmia. In this review article, we summarize the state-of-the-art in related works covering data processing, inference, and model evaluation, in the context of outcome prediction models developed using data extracted from electronic health records. We also discuss limitations of prominent modeling assumptions and highlight opportunities for future research.
Journal of Applied Information Science, 2020
Appendicitis is the most serious medical emergency requiring surgery for removing the appendix. Appendicitis treatment needs physical examination accompanied by blood tests and imaging scans to better detect signs of appendicitis or to rule out potential causes of the symptoms. Diagnosing appendicitis can be difficult because of the proximity of the appendix to other pelvic organs and its location, thus its symptoms have a tendency to overlap with other illnesses. The aim of the current study is to compare and analyze the performance of machine learning (ML) techniques in the prediction of appendicitis accurately. In the current paper, three machine learning techniques namely Support Vector Machine (SVM), Decision Tree and K-nearest Neighbor (KNN) have been taken. The experiments were carried out on the benchmark dataset of Appendicitis consisting of 590 patients. The performance of these ML techniques has been evaluated on the basis of three measures i.e. Accuracy, Recall, and Precision. The experimental result revealed that the Decision Tree algorithm performed better with an accuracy of 73.72%, Precision of 75.35%, and Recall of 68.64% as compared to SVM and KNN. It can be inferred from the experimental results that models based on machine learning techniques can predict appendicitis accurately and can serve as a decision-making aid by providing a correct and timely diagnosis of appendicitis, thereby reducing the negative appendectomy rate.
AIMS Public Health, 2019
This work analyses the diagnosis and prognosis of cancer and heart disease data using five Machine Learning (ML) algorithms. We compare the predictive ability of all the ML algorithms to breast cancer and heart disease. The important variables that causes cancer and heart disease are also studied. We predict the test data based on the important variables and compute the prediction accuracy using the Receiver Operating Characteristic (ROC) curve. The Random Forest (RF) and Principal Component Regression (PCR) provides the best performance in analyzing the breast cancer and heart disease data respectively.
G.C Ogwume , 2023
Diabetes Mellitus is a chronic and one of the deadliest diseases. Diabetes disease increases the risk of long-term complications, including heart diseases and kidney failures, among others. Undoubtedly, Diabetes Mellitus patients may live longer and lead healthier lives if the disease is detected early. Over the years, several efforts have been on more accurate and early detection procedures to safe patients of Diabetes Mellitus. Interestingly, with the applications of Information Technology to the disease diagnoses and therapy managements, more attention has been on using machine learning in the predictions and early detection of Diabetes Mellitus. Unfortunately, determining the most appropriate machine learning algorithm with the best performance in terms of optimum accuracy still remains a challenge. The study proposes a framework for Diabetes Mellitus detection using Machine Learning Algorithms. The proposed framework was evaluated using K-nearest neighbor (KNN), Random Forest (RF), and Logistic Regression (LR). Extensive experiments were conducted to analyze the performance of the framework focusing on four distinct different clinical datasets. To ensure robust, web compatible framework, Python and its popular data science related packages, Pandas, Numpy, Seaborn, Matplotlib and Pickle were used for the implementation. Significantly, using the standard datasets obtained from the National Institute of Diabetes and Kidney Disease, Random Forest was able to predict Diabetes Mellitus in the datasets with the best accuracy of 93.4 %.
BMC emergency medicine, 2024
Backgrounds Acute Appendicitis (AA) is one of the most common surgical emergencies worldwide. This study aims to investigate the predictive performances of 6 different Machine Learning (ML) algorithms for simple and complicated AA. Methods Data regarding operated AA patients between 2012 and 2022 were analyzed retrospectively. Based on operative findings, patients were evaluated under two groups: perforated AA and none-perforated AA. The features that showed statistical significance (p < 0.05) in both univariate and multivariate analysis were included in the prediction models as input features. Five different error metrics and the area under the receiver operating characteristic curve (AUC) were used for model comparison. Results A total number of 1132 patients were included in the study. Patients were divided into training (932 samples), testing (100 samples), and validation (100 samples) sets. Age, gender, neutrophil count, lymphocyte count, Neutrophil to Lymphocyte ratio, total bilirubin, C-Reactive Protein (CRP), Appendix Diameter, and PeriAppendicular Liquid Collection (PALC) were significantly different between the two groups. In the multivariate analysis, age, CRP, and PALC continued to show a significant difference in the perforated AA group. According to univariate and multivariate analysis, two data sets were used in the prediction model. K-Nearest Neighbors and Logistic Regression algorithms achieved the best prediction performance in the validation group with an accuracy of 96%. Conclusion The results showed that using only three input features (age, CRP, and PALC), the severity of AA can be predicted with high accuracy. The developed prediction model can be useful in clinical practice. Highlights • ML models can be used in all parts of medical treatments. • With good features, it would be useful in the prediction of surgical pathologies. • ML models are strong predictors of the severity of acute appendicitis. • With simple and easily found tools, the Logistic Regression algorithm predicted the severity of acute appendicitis with 96% accuracy.
Annals of Mathematics and Artificial Intelligence, 2013
This paper describes the methodology of providing multiprobability predictions for proteomic mass spectrometry data. The methodology is based on a newly developed machine learning framework called Venn machines. Is allows to output a valid probability interval. The methodology is designed for mass spectrometry data. For demonstrative purposes, we applied this methodology to MALDI-TOF data sets in order to predict the diagnosis of heart disease and early diagnoses of ovarian cancer and breast cancer. The experiments showed that probability intervals are narrow, that is, the output of the multiprobability predictor is similar to a single probability distribution. In addition, probability intervals produced for heart disease and ovarian cancer data were more accurate than the output of corresponding probability predictor. When Venn machines were forced to make point predictions, the accuracy of such predictions is for the most data better than the accuracy of the underlying algorithm that outputs single probability distribution of a label. Application of this methodology to MALDI-TOF data sets empirically demonstrates the validity. The accuracy of the proposed method on ovarian cancer data rises from 66.7%11 months in advance of the moment of diagnosis to up to 90.2 % at the moment of diagnosis. The same approach has been applied to heart disease data without time dependency, although the achieved accuracy was not as high (up to 69.9 %). The methodology allowed us to confirm mass spectrometry peaks previously identified as carrying statistically significant information for discrimination between controls and cases.
IAES International Journal of Artificial Intelligence, 2021
In the medical field, technology machinery is needed to solve several classification problems. Therefore, this research is useful to solve the problem of the medical field by using machine learning. This study discusses the classification of pancreatic cancer by using regression logistics and random forest. By comparing the accuracy, precision, recall (sensitivity), and F1-score of both methods, then we will know which method is better in classifying the pancreatic cancer dataset that we get from Al-Islam Hospital, Bandung, Indonesia. The results showed that random forest has better accuracy than logistic regressions. It can be seen with maximum accuracy of logistic regressions 96.48 with 30% data training and random forest 99.38% with 20% of data training.
An important part of good clinical care is identifying which patients have a high likelihood of experiencing adverse outcomes. Similarly, due to the significant impact cancer treatment can have on a patient’s quality of life, it is also important to properly identify which patients are likely to benefit from more aggressive treatment options. As such, models for predictive risk stratification can be extremely useful in clinical decision making. In this paper, we present, Survival Random Forest-Clinical Categorization Algorithm (SRF-CLICAL), a new method for patient risk stratification using random forests for survival, regression and classification. As a proof of concept, we demonstrate this method on two different cohorts of cancer patients.
International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 2023
This paper talks about a healthcare operational decision-making system that uses machine learning classifiers to predict decisions based on the actual decisions made by the doctor during healthcare operations. In this type of system for making decisions, most of the supervised machine learning classification and optimization techniques are used. This system can help the doctor decide what to do in the best way. We testify to this system on the caesarian section, which is the most common obstetric operation in the world to help save both mother and baby. This system helps us figure out when surgery is a good idea. This study shows how machine learning algorithms can be used to figure out how to do medical procedures. For this case study, the results show that both k nearest neighbours and Random Forest have an accuracy of 95.00%.
BMC Medical Research Methodology, 2022
Background: This study illustrates the use of logistic regression and machine learning methods, specifically random forest models, in health services research by analyzing outcomes for a cohort of patients with concomitant peripheral artery disease and diabetes mellitus. Methods: Cohort study using fee-for-service Medicare beneficiaries in 2015 who were newly diagnosed with peripheral artery disease and diabetes mellitus. Exposure variables include whether patients received preventive measures in the 6 months following their index date: HbA1c test, foot exam, or vascular imaging study. Outcomes include any reintervention, lower extremity amputation, and death. We fit both logistic regression models as well as random forest models. Results: There were 88,898 fee-for-service Medicare beneficiaries diagnosed with peripheral artery disease and diabetes mellitus in our cohort. The rate of preventative treatments in the first six months following diagnosis were 52% (n = 45,971) with foot exams, 43% (n = 38,393) had vascular imaging, and 50% (n = 44,181) had an HbA1c test. The directionality of the influence for all covariates considered matched those results found with the random forest and logistic regression models. The most predictive covariate in each approach differs as determined by the t-statistics from logistic regression and variable importance (VI) in the random forest model. For amputation we see age 85 + (t = 53.17) urban-residing (VI = 83.42), and for death (t = 65.84, VI = 88.76) and reintervention (t = 34.40, VI = 81.22) both models indicate age is most predictive. Conclusions: The use of random forest models to analyze data and provide predictions for patients holds great potential in identifying modifiable patient-level and health-system factors and cohorts for increased surveillance and intervention to improve outcomes for patients. Random forests are incredibly high performing models with difficult interpretation most ideally suited for times when accurate prediction is most desirable and can be used in tandem with more common approaches to provide a more thorough analysis of observational data.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.