Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2013
…
8 pages
1 file
Editor: Isabelle Guyon et al. In this paper we propose an energy-based model (EBM) for selecting subsets of features that are both causally and predictively relevant for classification tasks. The proposed method is tested in the causality challenge, a competition that promotes research on strengthen feature selection by taking into account causal information of features. Under the proposed approach, an energy value is assigned to every configuration of features and the problem is reduced to that of finding the configuration that minimizes an energy function. We propose an energy function that takes into account causal, predictive, and relevance/correlation information of features. Particularly, we introduce potentials that combine the rankings of individual feature selection methods, Markov blanket information and predictive performance estimations. The configuration with lower energy will be that offering the best tradeoff between these sources of information. Experimental results ...
2008
Abstract In this paper we propose an energy-based model (EBM) for selecting subsets of features that are both causally and predictively relevant for classification tasks. The proposed method is tested in the causality challenge, a competition that promotes research on strengthen feature selection by taking into account causal information of features.
2010
Abstract—Markov blanket was proved as the theoretically optimal feature subset to predict the target. IPC-MB was firstly proposed in 2008 to induce the Markov blanket via local search, and it is believed important progress as compared with previously published work, like IAMB, PCMB and PC. However, the proof appearing in its first publication is not complete and sound enough. In this paper, we revisit IPC-MB with discussion as not found in the original paper, especially on the proof of its theoretical correctness. Besides, experimental studies with small to large scale of problems (Bayesian networks) are conducted and the results demonstrate that IPC- MB achieves much higher accuracy than IAMB, and much better time efficiency than PCMB and PC.
2010
thousands, are becoming common. Therefore, feature selection has been an active research area in pattern recognition, statistics and data mining communities. The main idea of feature reduction is to select a subset of input variables by eliminating features with little or no predictive ability, but without scarifying the performance of the model built on the chosen features. It is also known as variable selection, feature reduction, attribute selection or variable subset selection. By removing most of the irrelevant and redundant features from the data, feature reduction brings many potential benefits to us: Alleviating the effect of the curse of dimensionality to improve prediction performance; Facilitating data visualization and data
PloS one, 2018
Feature selection is considered to be one of the most critical methods for choosing appropriate features from a larger set of items. This task requires two basic steps: ranking and filtering. Of these, the former necessitates the ranking of all features, while the latter involves filtering out all irrelevant features based on some threshold value. In this regard, several feature selection methods with well-documented capabilities and limitations have already been proposed. Similarly, feature ranking is also nontrivial, as it requires the designation of an optimal cutoff value so as to properly select important features from a list of candidate features. However, the availability of a comprehensive feature ranking and a filtering approach, which alleviates the existing limitations and provides an efficient mechanism for achieving optimal results, is a major problem. Keeping in view these facts, we present an efficient and comprehensive univariate ensemble-based feature selection (uEF...
Entropy, 2019
Feature selection aims to select the smallest feature subset that yields the minimum generalization error. In the rich literature in feature selection, information theory-based approaches seek a subset of features such that the mutual information between the selected features and the class labels is maximized. Despite the simplicity of this objective, there still remain several open problems in optimization. These include, for example, the automatic determination of the optimal subset size (i.e., the number of features) or a stopping criterion if the greedy searching strategy is adopted. In this paper, we suggest two stopping criteria by just monitoring the conditional mutual information (CMI) among groups of variables. Using the recently developed multivariate matrix-based Rényi’s α -entropy functional, which can be directly estimated from data samples, we showed that the CMI among groups of variables can be easily computed without any decomposition or approximation, hence making o...
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005
Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004
This paper adopts a Bayesian approach to simultaneously learn both an optimal nonlinear classifier and a subset of predictor variables (or features) that are most relevant to the classification task. The approach uses heavy-tailed priors to promote sparsity in the utilization of both basis functions and features; these priors act as regularizers for the likelihood function that rewards good classification on the training data. We derive an expectation-maximization (EM) algorithm to efficiently compute a maximum a posteriori (MAP) point estimate of the various parameters. The algorithm is an extension of recent state-of-the-art sparse Bayesian classifiers, which in turn can be seen as Bayesian counterparts of support vector machines. Experimental comparisons using kernel classifiers demonstrate both parsimonious feature selection and excellent classification accuracy on a range of synthetic and benchmark data sets.
Journal of Machine Learning Research, 2004
Feature selection is applied to reduce the number of features in many applications where data has hundreds or thousands of features. Existing feature selection methods mainly focus on finding relevant features. In this paper, we show that feature relevance alone is insufficient for efficient feature selection of high-dimensional data. We define feature redundancy and propose to perform explicit redundancy analysis in feature selection. A new framework is introduced that decouples relevance analysis and redundancy analysis. We develop a correlation-based method for relevance and redundancy analysis, and conduct an empirical study of its efficiency and effectiveness comparing with representative methods.
2016
From a machine learning point of view, identifying a subset of relevant features from a real data set can be useful to improve the results achieved by classification methods and to reduce their time and space complexity. To achieve this goal, feature selection methods are usually employed. These approaches assume that the data contains redundant or irrelevant attributes that can be eliminated. In this work, we propose a novel algorithm to manage the optimization problem that is at the foundation of the Mutual Information feature selection methods. Furthermore, our novel approach is able to estimate automatically the number of dimensions to retain. The quality of our method is confirmed by the promising results achieved on standard real data sets.
The Annals of Applied Statistics, 2010
In generalized linear regression problems with an abundant number of features, lasso-type regularization which imposes an ℓ 1 -constraint on the regression coefficients has become a widely established technique. Deficiencies of the lasso in certain scenarios, notably strongly correlated design, were unmasked when Zou and Hastie [J. Roy. Statist. Soc. Ser. B 67 (2005) 301-320] introduced the elastic net. In this paper we propose to extend the elastic net by admitting general nonnegative quadratic constraints as a second form of regularization. The generalized ridge-type constraint will typically make use of the known association structure of features, for example, by using temporal-or spatial closeness.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Lecture Notes in Computer Science, 2009
Pattern Recognition, 2010
2008 Eighth IEEE International Conference on Data Mining, 2008
Pattern Recognition, 2009
Object recognition supported by user interaction for service robots, 2002
arXiv (Cornell University), 2022
Artificial Intelligence, 1997
The Journal of Machine …, 2009
Deliverable, D14. 1 IST Project MiningMart, IST, 2002
Complex & Intelligent Systems
Information Fusion, 2003