Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010
Abstract—Markov blanket was proved as the theoretically optimal feature subset to predict the target. IPC-MB was firstly proposed in 2008 to induce the Markov blanket via local search, and it is believed important progress as compared with previously published work, like IAMB, PCMB and PC. However, the proof appearing in its first publication is not complete and sound enough. In this paper, we revisit IPC-MB with discussion as not found in the original paper, especially on the proof of its theoretical correctness. Besides, experimental studies with small to large scale of problems (Bayesian networks) are conducted and the results demonstrate that IPC- MB achieves much higher accuracy than IAMB, and much better time efficiency than PCMB and PC.
2009
Selecting relevant features is in demand when a large data set is of interest in a classification task. It produces a tractable number of features that are sufficient and possibly improve the classification performance. This paper studies a statistical method of Markov blanket induction algorithm for filtering features and then applies a classifier using the Markov blanket predictors. The Markov blanket contains a minimal subset of relevant features that yields optimal classification performance. We experimentally demonstrate the improved performance of several classifiers using a Markov blanket induction as a feature selection method. In addition, we point out an important assumption behind the Markov blanket induction algorithm and show its effect on the classification performance.
2010
thousands, are becoming common. Therefore, feature selection has been an active research area in pattern recognition, statistics and data mining communities. The main idea of feature reduction is to select a subset of input variables by eliminating features with little or no predictive ability, but without scarifying the performance of the model built on the chosen features. It is also known as variable selection, feature reduction, attribute selection or variable subset selection. By removing most of the irrelevant and redundant features from the data, feature reduction brings many potential benefits to us: Alleviating the effect of the curse of dimensionality to improve prediction performance; Facilitating data visualization and data
Based on Information Theory, optimal feature selection should be carried out by searching Markov blankets. In this paper, we formally analyze the current Markov blanket discovery approach for support vector machines and propose to discover Markov blankets by performing a fast heuristic Bayesian network structure learning. We give a sufficient condition that our approach will improve the performance. Two major factors that make it prohibitive for learning Bayesian networks from high-dimensional data sets are the large search space and the expensive cycle detection operations. We propose to restrict the search space by only considering the promising candidates and detect cycles using an online topological sorting method. Experimental results show that we can efficiently reduce the feature dimensionality while preserving a high degree of classification accuracy.
Neurocomputing, 2010
We aim to identify the minimal subset of random variables that is relevant for probabilistic classification in data sets with many variables but few instances. A principled solution to this problem is to determine the Markov boundary of the class variable. In this paper, we propose a novel constraint-based Markov boundary discovery algorithm called MBOR with the objective of improving accuracy while still remaining scalable to very high dimensional data sets and theoretically correct under the so-called faithfulness condition. We report extensive empirical experiments on synthetic data sets scaling up to tens of thousand variables.
Pattern Recognition, 2011
Feature selection is an important preprocessing step for building efficient, generalizable and interpretable classifiers on high dimensional data sets. Given the assumption on the sufficient labelled samples, the Markov Blanket provides a complete and sound solution to the selection of optimal features, by exploring the conditional independence relationships among the features. In real-world applications, unfortunately, it is usually easy to get unlabelled samples, but expensive to obtain the corresponding accurate labels on the samples. This leads to the potential waste of valuable classification information buried in unlabelled samples.
Journal of Machine Learning Research, 2010
In part I of this work we introduced and evaluated the Generalized Local Learning (GLL) frame- work for producing local causal and Markov blanket induction algorithms. In the present sec- ond part we analyze the behavior of GLL algorithms and provide extensions to the core methods. Specifically, we investigate the empirical convergence of G LL to the true local neighborhood
2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2017
In this paper, we discuss the importance of feature subset selection methods in machine learning techniques. An analysis of microarray expression was used to check whether global biological differences underlie common pathological features for different types of cancer datasets and to identify genes that might anticipate the clinical behavior of this disease. One way of finding relevant gene selection is by using Bayesian network based on Markov blanket. We present and compare the performance of the different approaches of features (genes) subset selection methods based on Wrapper and Markov Blanket models for the five-microarray cancer datasets. The first alternative depends on Memetic algorithms (MAs) for feature selection method. In the second alternative, we use MRMR (Minimum Redundant Maximum Relevant) for feature subset selection method hybridized by genetic search optimization techniques. We compare the performance of Markov blanket model with most common classification algor...
Operations Research/Computer Science Interfaces Series
Data sets with many discrete variables and relatively few cases arise in health care, ecommerce, information security, text mining, and many other domains. Learning effective and efficient prediction models from such data sets is a challenging task. In this paper, we propose a Tabu Search enhanced Markov Blanket (TS/MB) procedure to learn a graphical Markov Blanket classifier from data. The TS/MB procedure is based on the use of restricted neighborhoods in a general Bayesian Network constrained by the Markov condition, called Markov Blanket Neighborhoods. Computational results from real world data sets drawn from several domains indicate that the TS/MB procedure is able to find a parsimonious model with substantially fewer predictor variables than in the full data set, and provides comparable prediction performance when compared against several machine learning methods.
Artificial Intelligence, 2000
A new method for Feature Subset Selection in machine learning, FSS-EBNA (Feature Subset Selection by Estimation of Bayesian Network Algorithm), is presented. FSS-EBNA is an evolutionary, population-based, randomized search algorithm, and it can be executed when domain knowledge is not available. A wrapper approach, over Naive-Bayes and ID3 learning algorithms, is used to evaluate the goodness of each visited solution. FSS-EBNA, based on the EDA (Estimation of Distribution Algorithm) paradigm, avoids the use of crossover and mutation operators to evolve the populations, in contrast to Genetic Algorithms. In absence of these operators, the evolution is guaranteed by the factorization of the probability distribution of the best solutions found in a generation of the search. This factorization is carried out by means of Bayesian networks. Promising results are achieved in a variety of tasks where domain knowledge is not available. The paper explains the main ideas of Feature Subset Selection, Estimation of Distribution Algorithm and Bayesian networks, presenting related work about each concept. A study about the 'overfitting' problem in the Feature Subset Selection process is carried out, obtaining a basis to define the stopping criteria of the new algorithm.
International Journal of Approximate Reasoning, 2001
In this paper we perform a comparison among FSS-EBNA, a randomized, populationbased and evolutionary algorithm, and two genetic and other two sequential search approaches in the well known Feature Subset Selection (FSS) problem. In FSS-EBNA, the FSS problem, stated as a search problem, uses the EBNA (Estimation of Bayesian Network Algorithm) search engine, an algorithm within the EDA (Estimation of Distribution Algorithm) approach. The EDA paradigm is born from the roots of the GA community in order to explicitly discover the relationships among the features of the problem and not disrupt them by genetic recombination operators. The EDA paradigm avoids the use of recombination operators and it guarantees the evolution of the population of solutions and the discovery of these relationships by the factorization of the probability distribution of best individuals in each generation of the search. In EBNA, this factorization is carried out by a Bayesian network induced by a cheap local search mechanism. FSS-EBNA can be seen as an hybrid Soft Computing system, a synergistic combination of probabilistic and evolutionary computing to solve the FSS task. Promising results on a set of real Data Mining domains are achieved by FSS-EBNA in the comparison respect to well known genetic and sequential search algorithms.
2021
We propose Predictive Permutation Feature Selection (PPFS), a novel wrapper-based feature selection method based on the concept of Markov Blanket (MB). Unlike previous MB methods, PPFS is a universal feature selection technique as it can work for both classification as well as regression tasks on datasets containing categorical and/or continuous features. We propose Predictive Permutation Independence (PPI), a new Conditional Independence (CI) test, which enables PPFS to be categorised as a wrapper feature selection method. This is in contrast to current filter based MB feature selection techniques that are unable to harness the advancements in supervised algorithms such as Gradient Boosting Machines (GBM). The PPI test is based on the knockoff framework and utilizes supervised algorithms to measure the association between an individual or a set of features and the target variable. We also propose a novel MB aggregation step that addresses the issue of sample inefficiency. Empirical...
Incorporating subset selection into a classification method often carries a number of advantages, especially when operating in the domain of high-dimensional features. In this paper, we focus on Bayesian network (BN) classifiers and formalize the feature selection from a perspective of improving classification accuracy. To exploring the effect of high-dimensionality we apply the growing dimension asymptotics, meaning that the number of training examples is relatively small compared to the number of feature nodes. In order to ascertain which set of features is indeed relevant for a classification task, we introduce a distance-based scoring measure reflecting how well the set separates different classes. This score is then employed to feature selection, using the weighted form of BN classifier. The idea is to view weights as inclusion-exclusion factors which eliminates the sets of features whose separation score do not exceed a given threshold. We establish the asymptotic optimal threshold and demonstrate that the proposed selection technique carries improvements over classification accuracy for different a priori assumptions concerning the separation strength.
Entropy, 2011
Feature selection is an important step in building accurate classifiers and provides better understanding of the data sets. In this paper, we propose a feature subset selection method based on high-dimensional mutual information. We also propose to use the entropy of the class attribute as a criterion to determine the appropriate subset of features when building classifiers. We prove that if the mutual information between a feature set X and the class attribute Y equals to the entropy of Y , then X is a Markov Blanket of Y . We show that in some cases, it is infeasible to approximate the high-dimensional mutual information with algebraic combinations of pairwise mutual information in any forms. In addition, the exhaustive searches of all combinations of features are prerequisite for finding the optimal feature subsets for classifying these kinds of data sets. We show that our approach outperforms existing filter feature subset selection methods for most of the 24 selected benchmark data sets.
Proceedings of the Sixteenth …, 2003
This paper presents a number of new algorithms for discovering the Markov Blanket of a target variable T from training data. The Markov Blanket can be used for variable selection for classification, for causal discovery, and for Bayesian Network learning. We ...
2015
the accuracy of many classification problems is crucial. The number of features for collected data is increasing, and the need to find the best features to be used to increase the accuracy of classification is a necessity. There are several methods of feature selection, but none of them give the absolute best solution and most of them fall in the trap of local optima. This paper presents a new method that searches for the absolute best solution, or a solution which will give a higher classification accuracy rate by using a novel approach that divides the features into two groups: first group and second group of features. After that the method finds the best combination from the two groups to give the maximum accuracy rate. The purpose from this method is to select and find the best feature/s as individual or in groups.
International Journal of Computer Science and Telecommunications [Volume 9, Issue 7, 2018
Feature selection (FS) is a Machine Learning technique and a preprocessing stage in building intrusion detection system which can be independent of the choice of the learning algorithm or not, it plays important role in eliminating irrelevant and redundant feature in intrusion detection system (IDS); thereby increases the classification accuracy and reduces computational overhead cost of the IDS. it is an efficient way to reduce the dimensionality of an intrusion detection problem. This research examined the features of UNSW-NB15 dataset; a recently published intrusion detection dataset and applied three (3) filtered based feature selection techniques; information gain based, consistency based and correlation based on it to obtained a reduce dataset of attributes to build an intrusion detection system models that reduce the overhead computational cost and increases classification performance accuracy models. The result of the performance evaluations of the IDS model built on the reduced and whole datasets with Naive bayes machine learning algorithm shows that the reduced dataset accuracy and overhead processing cost outperformed the original whole dataset, model built with the consistency based reduced features has highest classification accuracy improvement of 14.16% over the classification accuracy of the whole test dataset, followed by information gain and correlation reduced test dataset with classification accuracy improvement of 13.55% and 10.7% respectively
This paper presents a image classification using Bayesian approach for hybrid feature selection and classification of image data without expert interaction. Our method defines a new hybrid feature subset is able to classify the data without expert knowledge. The proposed method is learned by using the probabilistic bayes net classifier algorithm with less semantic details. The experimental results will show the merits of the proposed methodology in the classifying image data.
2004
Data sets with many discrete variables and relatively few cases arise in many domains. Several studies have sought to identify the Markov Blanket (MB) of a target variable by filtering variables using statistical decisions for conditional independence and ...
Entropy, 2018
Given the increasing size and complexity of datasets needed to train machine learning algorithms, it is necessary to reduce the number of features required to achieve high classification accuracy. This paper presents a novel and efficient approach based on the Monte Carlo Tree Search (MCTS) to find the optimal feature subset through the feature space. The algorithm searches for the best feature subset by combining the benefits of tree search with random sampling. Starting from an empty node, the tree is incrementally built by adding nodes representing the inclusion or exclusion of the features in the feature space. Every iteration leads to a feature subset following the tree and default policies. The accuracy of the classifier on the feature subset is used as the reward and propagated backwards to update the tree. Finally, the subset with the highest reward is chosen as the best feature subset. The efficiency and effectiveness of the proposed method is validated by experimenting on many benchmark datasets. The results are also compared with significant methods in the literature, which demonstrates the superiority of the proposed method.
1996
Abstract Feature selection can be de ned as a problem of nding a minimum set of M relevant attributes that describes the dataset as well as the original N attributes do, where M N. After examining the problems with both the exhaustive and the heuristic approach to feature selection, this paper proposes a probabilistic approach. The theoretic analysis and the experimental study show that the proposed approach is simple to implement and guaranteed to nd the optimal if resources permit.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.