One of the greatest challenges to the use of probabilistic reasoning in the assessment of crimina... more One of the greatest challenges to the use of probabilistic reasoning in the assessment of criminal evidence is the "problem of the prior", i.e. the difficulty in establishing an acceptable prior probability of guilt. Even strong supporters of a Bayesian approach have often preferred to ignore priors and focus on the likelihood ratio (LR) of the evidence. But to calculate if the probability of guilt given the evidence reaches the probability required for conviction (the standard of proof), the LR has to be combined with a prior. In this paper, we propose a solution to the "problem of the prior": the defendant shall be treated as a member of the set of "possible perpetrators" defined as the people who had the same or better opportunity as the defendant to commit the crime. For this purpose, we introduce the concept of an "extended crime scene" (ECS). The number of people who had the same or better opportunity as the defendant is the number of people who were just as close or closer to the crime scene, in time and space. We demonstrate how the opportunity prior is incorporated into a generic Bayesian network model that allows us to integrate other evidence about the case.
An important recent preprint by Griffith et al highlights how 'collider bias' in studies ... more An important recent preprint by Griffith et al highlights how 'collider bias' in studies of COVID19 undermines our understanding of the disease risk and severity. This is typically caused by the data being restricted to people who have undergone COVID19 testing, among whom healthcare workers are overrepresented. For example, collider bias caused by smokers being underrepresented in the dataset may (at least partly) explain empirical results that suggest smoking reduces the risk of COVID19. We extend the work of Griffith et al making more explicit use of graphical causal models to interpret observed data. We show that their smoking example can be clarified and improved using Bayesian network models with realistic data and assumptions. We show that there is an even more fundamental problem for risk factors like 'stress' which, unlike smoking, is more rather than less prevalent among healthcare workers; in this case, because of a combination of collider bias from the bi...
Quantitative risk assessment can play a crucial role in effective decision making about cybersecu... more Quantitative risk assessment can play a crucial role in effective decision making about cybersecurity strategies. The Factor Analysis of Information Risk (FAIR) is one of the most popular models for quantitative cybersecurity risk assessment. It provides a taxonomic framework to classify cybersecurity risk into a set of quantifiable risk factors and combines this with quantitative algorithms, in the form of a kind of Monte Carlo (MC) simulation combined with statistical approximation techniques, to estimate cybersecurity risk. However, the FAIR algorithms restrict both the type of statistical distributions that can be used and the expandability of the model structure. Moreover, the applied approximation techniques (including using cached data and interpolation methods) introduce inaccuracy into the FAIR model. To address restrictions of the FAIR model, we develop a more flexible alternative approach, which we call FAIR-BN, to implement the FAIR model using Bayesian Networks (BNs). To evaluate the performance of FAIR and FAIR-BN, we use a MC method (FAIR-MC) to implement calculations of the FAIR model without using any of the approximation techniques adopted by FAIR, thus avoiding the corresponding inaccuracy that can be introduced. We compare the empirical results generated by FAIR and FAIR-BN against a large number of samples generated using FAIR-MC. Both FAIR and FAIR-BN provide consistent results compared with FAIR-MC for general cases. However, the FAIR-BN achieves higher accuracy in several cases that cannot be accurately modelled by the FAIR model. Moreover, we demonstrate that FAIR-BN is more flexible and extensible by showing how it can incorporate process-oriented and game-theoretic methods. We call the resulting combined approach "Extended FAIR-BN" (EFBN) and show that it has the potential to provide an integrated solution for cybersecurity risk assessment and related decision making.
The need is argued for a rigorous and general theory of structured programming as a basis for imp... more The need is argued for a rigorous and general theory of structured programming as a basis for improving software quality. Formal graph theoretic methods are developed which allow the structural modelling, metrication and reconstruction of sequential programs in terms of precisely defined general sets of basic control structures. Throughout, concepts are illustrated by examples based on actual Basic and Pascal text.
When developing a causal probabilistic model, i.e. a Bayesian network (BN), it is common to incor... more When developing a causal probabilistic model, i.e. a Bayesian network (BN), it is common to incorporate expert knowledge of factors that are important for decision analysis but where historical data are unavailable or difficult to obtain. This paper focuses on the problem whereby the distribution of some continuous variable in a BN is known from data, but where we wish to explicitly model the impact of some additional expert variable (for which there is expert judgment but no data). Because the statistical outcomes are already influenced by the causes an expert might identify as variables missing from the dataset, the incentive here is to add the expert factor to the model in such a way that the distribution of the data variable is preserved when the expert factor remains unobserved. We provide a method for eliciting expert judgment that ensures the expected values of a data variable are preserved under all the known conditions. We show that it is generally neither possible, nor rea...
In the absence of an agreed measure of software quality the density of defects has been a very co... more In the absence of an agreed measure of software quality the density of defects has been a very commonly used surrogate measure. As a result there have been numerous attempts to build models for predicting the number of residual software defects. Typically, the key variables in these models are either size and complexity metrics or measures arising from testing information. There are, however, serious statistical and theoretical difficulties with these approaches. Using Bayesian Belief Networks we can overcome some of the more serious problems by taking account of all the diverse factors implicit in defect prevention, detection and complexity.
We review the challenges of Bayesian network learning, especially parameter learning, and specify... more We review the challenges of Bayesian network learning, especially parameter learning, and specify the problem of learning with sparse data.We explain how it is possible to incorporate both qualitative knowledge and data with a multinomial parameter learning method to achieve more accurate predictions with sparse data.
Despite the heroic eorts of a small group of people, like those involved with this journal, a tru... more Despite the heroic eorts of a small group of people, like those involved with this journal, a truly``empirical'' basis for software engineering remains a distant dream. In the current academic year I have been teaching software engineering (a double unit module) at Queen Mary (University of London) where I have been a Professor part-time since March 2000. Although I had been teaching courses on software metrics and quality assurance regularly in recent years, this was the ®rst time I had taught a``standard'' software engineering module since 1992. As the leader on a module with over 100 students, publishers have been keen to send me all their latest oerings. As a result, in the last few months I have received more than a dozen new or revised dedicated software engineering text books, and around two dozeǹ`s oftware engineering with Java'' or``object oriented software engineering'' type books. The good news is that, compared to the text books that were available when I last taught software engineering the new bunch are, almost without exception, a massive improvement. They provide students with techniques and methods that they can actually apply to their own programs and group projects. This compares favourably with the previous generation that simply documented a set of research ideas dreamed up in academia and never applied successfully in practice. This makes it easier and more satisfying to teach, and more rewarding for the students to learn (primarily because they can learn from doing which they could not in the past). Moreover, in this respect, the impression is that software engineering has come closer to being truè`e ngineering''. However, if we accept that an empirical basis is one of the other components that mark out a true engineering discipline, then the latest round of books con®rm that any progress we may have made in this area has had an almost negligible impact. The primary motivation for my own original interest in empirical software engineering was the desire to see a more rational basis for decision-making. For example, I was concerned that methods were being adopted on the basis of who, among the methods' proponents, shouted the loudest. In many cases methods were being pushed, not only without adequate tool support, but without any quantitative justi®cation of their eectiveness. I am not talking about the need for
One of the greatest challenges to the use of probabilistic reasoning in the assessment of crimina... more One of the greatest challenges to the use of probabilistic reasoning in the assessment of criminal evidence is the "problem of the prior", i.e. the difficulty in establishing an acceptable prior probability of guilt. Even strong supporters of a Bayesian approach have often preferred to ignore priors and focus on the likelihood ratio (LR) of the evidence. But to calculate if the probability of guilt given the evidence reaches the probability required for conviction (the standard of proof), the LR has to be combined with a prior. In this paper, we propose a solution to the "problem of the prior": the defendant shall be treated as a member of the set of "possible perpetrators" defined as the people who had the same or better opportunity as the defendant to commit the crime. For this purpose, we introduce the concept of an "extended crime scene" (ECS). The number of people who had the same or better opportunity as the defendant is the number of people who were just as close or closer to the crime scene, in time and space. We demonstrate how the opportunity prior is incorporated into a generic Bayesian network model that allows us to integrate other evidence about the case.
An important recent preprint by Griffith et al highlights how 'collider bias' in studies ... more An important recent preprint by Griffith et al highlights how 'collider bias' in studies of COVID19 undermines our understanding of the disease risk and severity. This is typically caused by the data being restricted to people who have undergone COVID19 testing, among whom healthcare workers are overrepresented. For example, collider bias caused by smokers being underrepresented in the dataset may (at least partly) explain empirical results that suggest smoking reduces the risk of COVID19. We extend the work of Griffith et al making more explicit use of graphical causal models to interpret observed data. We show that their smoking example can be clarified and improved using Bayesian network models with realistic data and assumptions. We show that there is an even more fundamental problem for risk factors like 'stress' which, unlike smoking, is more rather than less prevalent among healthcare workers; in this case, because of a combination of collider bias from the bi...
Quantitative risk assessment can play a crucial role in effective decision making about cybersecu... more Quantitative risk assessment can play a crucial role in effective decision making about cybersecurity strategies. The Factor Analysis of Information Risk (FAIR) is one of the most popular models for quantitative cybersecurity risk assessment. It provides a taxonomic framework to classify cybersecurity risk into a set of quantifiable risk factors and combines this with quantitative algorithms, in the form of a kind of Monte Carlo (MC) simulation combined with statistical approximation techniques, to estimate cybersecurity risk. However, the FAIR algorithms restrict both the type of statistical distributions that can be used and the expandability of the model structure. Moreover, the applied approximation techniques (including using cached data and interpolation methods) introduce inaccuracy into the FAIR model. To address restrictions of the FAIR model, we develop a more flexible alternative approach, which we call FAIR-BN, to implement the FAIR model using Bayesian Networks (BNs). To evaluate the performance of FAIR and FAIR-BN, we use a MC method (FAIR-MC) to implement calculations of the FAIR model without using any of the approximation techniques adopted by FAIR, thus avoiding the corresponding inaccuracy that can be introduced. We compare the empirical results generated by FAIR and FAIR-BN against a large number of samples generated using FAIR-MC. Both FAIR and FAIR-BN provide consistent results compared with FAIR-MC for general cases. However, the FAIR-BN achieves higher accuracy in several cases that cannot be accurately modelled by the FAIR model. Moreover, we demonstrate that FAIR-BN is more flexible and extensible by showing how it can incorporate process-oriented and game-theoretic methods. We call the resulting combined approach "Extended FAIR-BN" (EFBN) and show that it has the potential to provide an integrated solution for cybersecurity risk assessment and related decision making.
The need is argued for a rigorous and general theory of structured programming as a basis for imp... more The need is argued for a rigorous and general theory of structured programming as a basis for improving software quality. Formal graph theoretic methods are developed which allow the structural modelling, metrication and reconstruction of sequential programs in terms of precisely defined general sets of basic control structures. Throughout, concepts are illustrated by examples based on actual Basic and Pascal text.
When developing a causal probabilistic model, i.e. a Bayesian network (BN), it is common to incor... more When developing a causal probabilistic model, i.e. a Bayesian network (BN), it is common to incorporate expert knowledge of factors that are important for decision analysis but where historical data are unavailable or difficult to obtain. This paper focuses on the problem whereby the distribution of some continuous variable in a BN is known from data, but where we wish to explicitly model the impact of some additional expert variable (for which there is expert judgment but no data). Because the statistical outcomes are already influenced by the causes an expert might identify as variables missing from the dataset, the incentive here is to add the expert factor to the model in such a way that the distribution of the data variable is preserved when the expert factor remains unobserved. We provide a method for eliciting expert judgment that ensures the expected values of a data variable are preserved under all the known conditions. We show that it is generally neither possible, nor rea...
In the absence of an agreed measure of software quality the density of defects has been a very co... more In the absence of an agreed measure of software quality the density of defects has been a very commonly used surrogate measure. As a result there have been numerous attempts to build models for predicting the number of residual software defects. Typically, the key variables in these models are either size and complexity metrics or measures arising from testing information. There are, however, serious statistical and theoretical difficulties with these approaches. Using Bayesian Belief Networks we can overcome some of the more serious problems by taking account of all the diverse factors implicit in defect prevention, detection and complexity.
We review the challenges of Bayesian network learning, especially parameter learning, and specify... more We review the challenges of Bayesian network learning, especially parameter learning, and specify the problem of learning with sparse data.We explain how it is possible to incorporate both qualitative knowledge and data with a multinomial parameter learning method to achieve more accurate predictions with sparse data.
Despite the heroic eorts of a small group of people, like those involved with this journal, a tru... more Despite the heroic eorts of a small group of people, like those involved with this journal, a truly``empirical'' basis for software engineering remains a distant dream. In the current academic year I have been teaching software engineering (a double unit module) at Queen Mary (University of London) where I have been a Professor part-time since March 2000. Although I had been teaching courses on software metrics and quality assurance regularly in recent years, this was the ®rst time I had taught a``standard'' software engineering module since 1992. As the leader on a module with over 100 students, publishers have been keen to send me all their latest oerings. As a result, in the last few months I have received more than a dozen new or revised dedicated software engineering text books, and around two dozeǹ`s oftware engineering with Java'' or``object oriented software engineering'' type books. The good news is that, compared to the text books that were available when I last taught software engineering the new bunch are, almost without exception, a massive improvement. They provide students with techniques and methods that they can actually apply to their own programs and group projects. This compares favourably with the previous generation that simply documented a set of research ideas dreamed up in academia and never applied successfully in practice. This makes it easier and more satisfying to teach, and more rewarding for the students to learn (primarily because they can learn from doing which they could not in the past). Moreover, in this respect, the impression is that software engineering has come closer to being truè`e ngineering''. However, if we accept that an empirical basis is one of the other components that mark out a true engineering discipline, then the latest round of books con®rm that any progress we may have made in this area has had an almost negligible impact. The primary motivation for my own original interest in empirical software engineering was the desire to see a more rational basis for decision-making. For example, I was concerned that methods were being adopted on the basis of who, among the methods' proponents, shouted the loudest. In many cases methods were being pushed, not only without adequate tool support, but without any quantitative justi®cation of their eectiveness. I am not talking about the need for
Uploads
Papers by Norman Fenton