Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012
Under the Supervision of Professor Istvan Lauko Statistics from the National Cancer Institute indicate that 1 in 8 women will develop Breast cancer in their lifetime. Researchers have developed numerous statistical models to predict breast cancer risk however physicians are hesitant to use these models because of disparities in the predictions they produce. In an effort to reduce these disparities, we use Bayesian networks to capture the joint distribution of risk factors, and simulate artificial patient populations (clinical avatars) for interrogating the existing risk prediction models. The challenge in this effort has been to produce a Bayesian network whose dependencies agree with literature and are good estimates of the joint distribution of risk factors. In this work, we propose a methodology for learning Bayesian networks that uses prior knowledge to guide a collection of search algorithms in identifying an optimum structure. Using data from the breast cancer surveillance consortium we have shown that our methodology produces a Bayesian network with consistent dependencies and a better estimate of the distribution of risk factors compared with existing methods.
Breast cancer is the leading cause of cancer-related death for women in Tunisia and the prognosis of its metastasis remains a major problem for oncologists despite advances in treatment. In this work we use Bayesian networks to develop a decision support system that is based on the modeling of relationships between key signaling proteins and clinical and pathological characteristics of breast tumors and patients. Motivated by the lack of prior information on the parameters of the problem, we use the Implicit inference for the structure and parameter learning. A dataset of 84 Tunisian breast cancer patients was used and new prognosis factors were identified. The system predicts a metastasis risk for different patients by computing a score that is the joint probability of the Bayesian network using parameters estimated on the learning database. Based on the results of the developed system we identified that overexpression of ErbB2, ErbB3, bcl2 as well as of oestrogen and progesterone receptors associated with a low level of ErbB4 was the predominant profile associated with high risk of metastasis.
The objective of this paper is to explore the implementation of a Bayesian Belief Network for an automated breast cancer detection support tool. It is intuitive that Bayesian networks are employed as one viable option for computer-aided detection by representing the relationships between diagnoses, physical findings, laboratory test results and imaging study findings. This work brings important entities such as Radiologists, Image Processing Scientists, Data Base Specialists and Applied Mathematicians on a common platform. A brief background concerning causal networks, probability theory and Bayesian networks is given; available computational tools and platforms are described. It is explained that, by exploiting conditional independencies entailed by influence chains, it is possible to represent a large instance in a Bayesian network using little space, and it is often possible to perform probabilistic inference among the features in an acceptable amount of time. The next steps towa...
Value in Health, 2019
The fields of medicine and public health are undergoing a data revolution. An increasing availability of data has brought about a growing interest in machine-learning algorithms. Our objective is to present the reader with an introduction to a knowledge representation and machine-learning tool for risk estimation in medical science known as Bayesian networks (BNs). Study Design: In this article we review how BNs are compact and intuitive graphical representations of joint probability distributions (JPDs) that can be used to conduct causal reasoning and risk estimation analysis and offer several advantages over regression-based methods. We discuss how BNs represent a different approach to risk estimation in that they are graphical representations of JPDs that take the form of a network representing model random variables and the influences between them, respectively. Methods: We explore some of the challenges associated with traditional risk prediction methods and then describe BNs, their construction, application, and advantages in risk prediction based on examples in cancer and heart disease. Results: Risk modeling with BNs has advantages over regressionbased approaches, and in this article we focus on three that are relevant to health outcomes research: (1) the generation of network structures in which relationships between variables can be easily communicated; (2) their ability to apply Bayes's theorem to conduct individual-level risk estimation; and (3) their easy transformation into decision models. Conclusions: Bayesian networks represent a powerful and flexible tool for the analysis of health economics and outcomes research data in the era of precision medicine.
Journal of Biomedical Informatics, 2003
The growth of nursing databases necessitates new approaches to data analyses. These databases, which are known to be massive and multidimensional, easily exceed the capabilities of both human cognition and traditional analytical approaches. One innovative approach, knowledge discovery in large databases (KDD), allows investigators to analyze very large data sets more comprehensively in an automatic or a semi-automatic manner. Among KDD techniques, Bayesian networks, a state-of-the art representation of probabilistic knowledge by a graphical diagram, has emerged in recent years as essential for pattern recognition and classification in the healthcare field. Unlike some data mining techniques, Bayesian networks allow investigators to combine domain knowledge with statistical data, enabling nurse researchers to incorporate clinical and theoretical knowledge into the process of knowledge discovery in large datasets. This tailored discussion presents the basic concepts of Bayesian networks and their use as knowledge discovery tools for nurse researchers.
Artificial Intelligence in Medicine, 2011
Objectives: Bayesian networks (BNs) are rapidly becoming a leading technology in applied Artificial Intelligence, with medicine its most popular application area. Both automated learning of BNs and expert elicitation have been used to build these networks, but the potentially more useful combination of these two methods remains underexplored. In this paper we examine a number of approaches to their combination and present new techniques for assessing their results. Methods and materials: Using public-domain data for heart failure, we run an automated causal discovery system (CaMML), which allows the incorporation of multiple kinds of prior expert knowledge into its search, to test and compare unbiased discovery with discovery biased with different kinds of expert opinion. We use adjacency matrices enhanced with numerical and colour labels to assist with the interpretation of the results. These techniques are presented within a wider context of knowledge engineering with Bayesian networks (KEBN). Results: The adjacency matrices make it clear that for our particular application problem, the heart failure data, the simplest kind of prior information (partially sorting variables into tiers) was more effective in aiding model discovery than either using no prior information or using more sophisticated and detailed expert priors. Conclusion: Hybrid causal learning of BNs is an important emerging technology. We present methods for incorporating it into the knowledge engineering process, including visualisation and analysis of the learned networks.
2009
Bayesian Networks represent one of the most successful tools for medical diagnosis and therapies follow-up. We present an algorithm for Bayesian network structure learning, that is a variation of the standard search-and-score approach. The proposed approach overcomes the creation of redundant network structures that may include non significant connections between variables. In particular, the algorithm finds which relationships between the variables must be prevented, by exploiting the binarization of a square matrix containing the mutual information (MI) among all pairs of variables. Four different binarization methods are implemented. The MI binary matrix is exploited as a preconditioning step for the subsequent greedy search procedure that optimizes the network score, reducing the number of possible search paths in the greedy search. Our approach has been tested on two different medical datasets and compared against the standard search-and-score algorithm as implemented in the DEAL package.
Computers in Biology …, 2007
We evaluate the effectiveness of seven Bayesian network classifiers as potential tools for the diagnosis of breast cancer using two real-world databases containing fine-needle aspiration of the breast lesion cases collected by a single observer and multiple observers, respectively. The results show a certain ingredient of subjectivity implicitly contained in these data: we get an average accuracy of 93.04% for the former and 83.31% for the latter. These findings suggest that observers see different things when looking at the samples in the microscope; a situation that significantly diminishes the performance of these classifiers in diagnosing such a disease. ᭧
Structure learning of Bayesian networks is a well-researched but computationally and NP-hard task. We present an algorithm that integrates a low-order conditional independence approach for learning structures of Bayesian networks. Our algorithm also makes use of basic Bayesian network concepts. We show that the proposed algorithm is capable of handling networks with a large number of variables and small sample size in the case of microarray data analysis. We present the applicability of the proposed algorithm on breast cancer data sets and also compare its performance and computational efficiency with full-order conditional independence method. The experimental results show that our method can efficiently and accurately identify complex network structures from data.
2006
Bayesian networks (BNs) are rapidly becoming a tool of choice for applied Artificial Intelligence. Although BNs have been successfully used for many medical diagnosis problems, there have been few applications to epidemiological data where data mining methods play a significant role. In this paper, we look at the application of BNs to epidemiological data, specifically assessment of risk for coronary heart disease (CHD). We build the BNs: (1) by knowledge engineering BNs from two epidemiological models of CHD in the literature; (2) by applying a causal BN learner. We evaluate these BNs using cross-validation. We compared performance in predicting CHD events over 10 years, measuring area under the ROC curve and Bayesian information reward. The knowledge engineered BNs performed as well as logistic regression, while being easier to interpret. These BNs will serve as the baseline in future efforts to extend BN technology to better handle epidemiological data, specifically to predict and prevent CHD.
Proceedings of the 8th International Joint Conference on Computational Intelligence
Gene and protein networks are very important to model complex large-scale systems in molecular biology. Inferring or reverseengineering such networks can be defined as the process of identifying gene/protein interactions from experimental data through computational analysis. However, this task is typically complicated by the enormously large scale of the unknowns in a rather small sample size. Furthermore, when the goal is to study causal relationships within the network, tools capable of overcoming the limitations of correlation networks are required. In this work, we make use of Bayesian Graphical Models to attach this problem and, specifically, we perform a comparative study of different state-of-the-art heuristics, analyzing their performance in inferring the structure of the Bayesian Network from breast cancer data.
Machine Learning, 1995
We describe algorithms for learning Bayesian networks from a combination of user knowledge and statistical data. The algorithms have two components: a scoring metric and a search procedure. The scoring metric takes a network structure, statistical data, and a user's prior knowledge, and returns a score proportional to the posterior probability of the network structure given the data. The search procedure generates networks for evaluation by the scoring metric. Our contributions are threefold. First, we identify two important properties of metrics, which we call score equivalence and parameter modu. iarity. These properties have been mostly ignored, but when combined, greatly simplify the encoding of a user's prior knowledge. In particular, a user can express his knowledge--for the most part--as a single prior Bayesian network for the domain. Second, we describe greedy hill-climbing and annealing search algorithms to be used in conjunction with scoring metrics. In the spec.is] case where each node has at most one parent, we show that heuristic search can be replaced with a polynomial algorithm to identify the networks with the highest score. Third, we describe a methodology for evaluating Bayesian-network learning algorithms. We apply this approach to a comparison of our metrics and search procedures.
Artificial Intelligence in Medicine, 2004
Thanks to its increasing availability, electronic literature has become a potential source of information for the development of complex Bayesian networks (BN), when human expertise is missing or data is scarce or contains much noise. This opportunity raises the question of how to integrate information from free-text resources with statistical data in learning Bayesian networks. Firstly, we report on the collection of prior information resources in the ovarian cancer domain, which includes ''kernel'' annotations of the domain variables. We introduce methods based on the annotations and literature to derive informative pairwise dependency measures, which are derived from the statistical cooccurrence of the names of the variables, from the similarity of the ''kernel'' descriptions of the variables and from a combined method. We perform wide-scale evaluation of these text-based dependency scores against an expert reference and against data scores (the mutual information (MI) and a Bayesian score). Next, we transform the text-based dependency measures into informative text-based priors for Bayesian network structures. Finally, we report the benefit of such informative text-based priors on the performance of a Bayesian network for the classification of ovarian tumors from clinical data. #
Applied Intelligence, 2005
An experiment in Bayesian model building from a large medical dataset for Mental Retardation is discussed in this paper. We give a step by step description of the practical aspects of building a Bayesian Network from a dataset. We enumerate and briefly describe the tools required, address the problem of missing values in big datasets resulting from incomplete clinical findings and elaborate on our solution to the problem. We advance some reasons why imputation is a more desirable approach for model building than some other ad hoc methods suggested in literature. In our experiment, the initial Bayesian Network is learned from a dataset using a machine learning program called CB. The network structure and the conditional probabilities are then modified under the guidance of a domain expert. We present validation results for the unmodified and modified networks and give some suggestions for improvement of the model.
Studies in health technology and informatics, 2004
Bayesian Networks (BN) is a knowledge representation formalism that has been proven to be valuable in biomedicine for constructing decision support systems and for generating causal hypotheses from data. Given the emergence of datasets in medicine and biology with thousands of variables and that current algorithms do not scale more than a few hundred variables in practical domains, new efficient and accurate algorithms are needed to learn high quality BNs from data. We present a new algorithm called Max-Min Hill-Climbing (MMHC) that builds upon and improves the Sparse Candidate (SC) algorithm; a state-of-the-art algorithm that scales up to datasets involving hundreds of variables provided the generating networks are sparse. Compared to the SC, on a number of datasets from medicine and biology, (a) MMHC discovers BNs that are structurally closer to the data-generating BN, (b) the discovered networks are more probable given the data, (c) MMHC is computationally more efficient and scal...
2000
Bayesian networks are rapidly becoming a tool of choice for applied Artificial Intelligence. There have been many medical applications of BNs however few applying data mining methods to epidemiology. In a previous study we looked at such an application to epidemiologi- cal data, specifically assessment of risk for coronary heart disease. In that previous study, we featured two Bayesian networks
Global journal of computer science and technology, 2013
This documentation describes the implementation of Bayesian Network on Hiroshima Nagasaki atomic bomb survivor data, using “R” software. Bayesian networks, a state-of-the art representation of probabilistic knowledge by a graphical diagram, has emerged in recent years as essential for pattern recognition and classification in the healthcare field. Unlike some data mining techniques, Bayesian networks allow investigators to combine domain knowledge with statistical data. This tailored discussion presents the basic concepts of Bayesian networks and its use for building a health risk model on Epidemiological data. The main objectives of our study is to find interdependencies between various attributes of data and to determine the threshold value of radiation dosage under which death counts are negligible.
Diagnostic Cytopathology, 2018
Background: In the era of extensive data collection, there is a growing need for a large scale data analysis with tools that can handle many variables in one modeling framework. In this article, we present our recent applications of Bayesian network modeling to pathology informatics. Methods: Bayesian networks (BNs) are probabilistic graphical models that represent domain knowledge and allow investigators to process this knowledge following sound rules of probability theory. BNs can be built based on expert opinion as well as learned from accumulating data sets. BN modeling is now recognized as a suitable approach for knowledge representation and reasoning under uncertainty. Over the last two decades BN have been successfully applied to many studies on medical prognosis and diagnosis. Results: Based on data and expert knowledge, we have constructed several BN models to assess patient risk for subsequent specific histopathologic diagnoses and their related prognosis in gynecological cytopathology and breast pathology. These models include the Pittsburgh Cervical Cancer Screening Model assessing risk for histopathologic diagnoses of cervical precancer and cervical cancer, modeling of the significance of benign-appearing endometrial cells in Pap tests, diagnostic modeling to determine whether adenocarcinoma in tissue specimens is of endometrial or endocervical origin, and models to assess risk for recurrence of invasive breast carcinoma and ductal carcinoma in situ. Conclusions: Bayesian network models can be used as powerful and flexible risk assessment tools on large clinical datasets and can quantitatively identify variables that are of greatest significance in predicting specific histopathologic diagnoses and their related prognosis. Resulting BN models are able to provide individualized quantitative risk assessments and prognostication for specific abnormal findings commonly reported in gynecological cytopathology and breast pathology.
The aim of this study was to determine the accuracy of Bayesian networks in supporting breast cancer diagnoses. Systematic review and meta-analysis were carried out, including articles and papers published between January 1990 and March 2013. We included prospective and retrospective cross-sectional studies of the accuracy of diagnoses of breast lesions (target conditions) made using Bayesian networks (index test). Four primary studies that included 1,223 breast lesions were analyzed, 89.52% (444/496) of the breast cancer cases and 6.33% (46/727) of the benign lesions were positive based on the Bayesian network analysis. The area under the curve (AUC) for the summary receiver operating characteristic curve (SROC) was 0.97, with a Q* value of 0.92. Using Bayesian networks to diagnose malignant lesions increased the pretest probability of a true positive from 40.03% to 90.05% and decreased the probability of a false negative to 6.44%. Therefore, our results demonstrated that Bayesian ...
Introduction: Being aware of the relationships between risk and protective factors could be helpful to control and prevent diseases. The disease system analysis will be more complex with increasing risk and protective factors. Therefore, in this article, we use Bayesian Networks (BNs) to investigate the relationship between variables and predict the probabilistic causes of colorectal and gastric cancers.Methods: In this study, structure learning algorithms were score-based (hill-climbing) and hypothetical. Parameter learning was estimated by Bayesian method. Network scores (AIC, BIC and BDe), cross-validation methods, and Bayes factor were used to compare both structure learning algorithms and indicate the optimal structure. Data were analyzed by bnlearn R package (ver. 3.0.4). Results: The variables age, gender, smoking, hookah, alcohol, opium, diet, and family history were involved in colorectal and gastric cancers. In both Hill-Climbing and hypothetical algorithms, the odds of de...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.