Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010, Neural processing letters
In pattern classification, input pattern features usually contribute differently, in accordance to their relevances for a specific classification task. In a previous paper, we have introduced the Energy Supervised Relevance Neural Gas classifier, a kernel method which uses the maximization of Onicescu's informational energy for computing the relevances of input features. Relevances were used to improve classification accuracy. In our present work, we focus on the feature ranking capability of this approach. We compare our algorithm to standard feature ranking methods.
2008
Abstract In this paper we propose an energy-based model (EBM) for selecting subsets of features that are both causally and predictively relevant for classification tasks. The proposed method is tested in the causality challenge, a competition that promotes research on strengthen feature selection by taking into account causal information of features.
2013
Editor: Isabelle Guyon et al. In this paper we propose an energy-based model (EBM) for selecting subsets of features that are both causally and predictively relevant for classification tasks. The proposed method is tested in the causality challenge, a competition that promotes research on strengthen feature selection by taking into account causal information of features. Under the proposed approach, an energy value is assigned to every configuration of features and the problem is reduced to that of finding the configuration that minimizes an energy function. We propose an energy function that takes into account causal, predictive, and relevance/correlation information of features. Particularly, we introduce potentials that combine the rankings of individual feature selection methods, Markov blanket information and predictive performance estimations. The configuration with lower energy will be that offering the best tradeoff between these sources of information. Experimental results ...
2006
Fuzzy ARTMAP with Relevance factor (FAMR) is a Fuzzy ARTMAP (FAM) neural architecture with the following property: Each training pair has a relevance factor assigned to it, proportional to the importance of that pair during the learning phase. Using a relevance factor adds more flexibility to the training phase, allowing ranking of sample pairs according to the confidence we have in the information source or in the pattern itself. We introduce a novel FAMR architecture: FAMR with Feature Weighting (FAM-RFW). In the first stage, the training data features are weighted. In our experiments, we use a feature weighting method based on Onicescu's informational energy (IE). In the second stage, the obtained weights are used to improve FAMRFW training. The effect of this approach is that category dimensions in the direction of relevant features are decreased, whereas category dimensions in the direction of non-relevant feature are increased. Experimental results, performed on several benchmarks, show that feature weighting can improve the classification performance of the general FAMR algorithm.
Int. Conference on Artificial Neural Networks, 2005
We describe a kernel method which uses the maximization of Onicescu’s informational energy as a criteria for computing the relevances of input features. This adaptive relevance determination is used in combination with the neural-gas and the generalized relevance LVQ algorithms. Our quadratic optimization function, as an L 2 type method, leads to linear gradient and thus easier computation. We obtain
European Symposium on Artificial Neural Networks ( …
An Informational Energy LVQ Approach for Feature Ranking R˘azvan Andonie 1 and Angel Cataron 2 1Computer Science Department, Central Washington University, USA ... Denote the set of all codebook vectors by {w1,..., wK}. The components of a vector wj are [wj1,...,wjn]. ...
Lecture Notes in Computer Science, 2009
Most accurate predictions are typically obtained by learning machines with complex feature spaces (as e.g. induced by kernels). Unfortunately, such decision rules are hardly accessible to humans and cannot easily be used to gain insights about the application domain. Therefore, one often resorts to linear models in combination with variable selection, thereby sacrificing some predictive power for presumptive interpretability. Here, we introduce the Feature Importance Ranking Measure (FIRM), which by retrospective analysis of arbitrary learning machines allows to achieve both excellent predictive performance and superior interpretation. In contrast to standard raw feature weighting, FIRM takes the underlying correlation structure of the features into account. Thereby, it is able to discover the most relevant features, even if their appearance in the training data is entirely prevented by noise. The desirable properties of FIRM are investigated analytically and illustrated in simulations.
Lecture Notes in Computer Science, 2009
In order to overcome the limitations of purely deductive approaches to the tasks of classification and retrieval from ontologies, inductive (instance-based) methods have been proposed as efficient and noisetolerant alternative. In this paper we propose an original method based on non-parametric learning: the Reduced Coulomb Energy (RCE) Network. The method requires a limited training effort but it turns out to be very effective during the classification phase. Casting retrieval as the problem of assessing the class-membership of individuals w.r.t. the query concepts, we propose an extension of a classification algorithm using RCE networks based on an entropic similarity measure for OWL. Experimentally we show that the performance of the resulting inductive classifier is comparable with the one of a standard reasoner and often more efficient than with other inductive approaches. Moreover, we show that new knowledge (not logically derivable) is induced and the likelihood of the answers may be provided.
2013
The Relevance Vector Machine (RVM) is a generalized linear model that can use kernel functions as basis functions. The typical RVM solution is very sparse. We present a strategy for feature ranking and selection via evaluating the influence of the features on the relevance vectors. This requires a single training of the RVM, thus, it is very efficient. Experiments on a benchmark regression problem provide evidence that it selects high-quality feature sets at a fraction of the costs of classical methods. Key-Words: Feature Selection, Relevance Vector Machine, Machine Learning
2014 5th International Conference on Intelligent Systems, Modelling and Simulation, 2014
Artificial Neural Networks (ANNs) are often viewed as black box. This limits the comprehensive understanding on how it deals with input neuron/data, as well as how it reached a particular decision. Input significance analysis (ISA) refers to the process of understanding these input neurons/data. And since this work is on classification problem, hence similarly, this process can also be called feature selection; where the goal is to have a classifier that can predict accurately and at the same time, its structure is as simple as possible. This work is particularly interested with ISA methods that manipulate weights, where separately, correlations are also applied. The goal is to create feature ranking list that performed the best in the selected classifiers. For validation methods, memory recall validation and K-Fold cross-validation methods are used. The results show one classifier that uses one of the ISA methods are performing well for both validation methods.
Lecture Notes in Computer Science, 2000
We have developed relative feature importance (RFI), a metric for the classifier-independent ranking of features. Previously, we have shown the metric to rank accurately features for a wide variety of artificial and natural problems, for both two-class and multi-class problems. In this paper, we present the design of the metric, including both theoretical considerations and statistical analysis of the possible components.
IEEE Access, 2022
In many machine learning classification problems, datasets are usually of high dimensionality and therefore require efficient and effective methods for identifying the relative importance of their attributes, eliminating the redundant and irrelevant ones. Due to the huge size of the search space of the possible solutions, the attribute subset evaluation feature selection methods are not very suitable, so in these scenarios feature ranking methods are used. Most of the feature ranking methods described in the literature are univariate methods, which do not detect interactions between factors. In this paper, we propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency, which have been applied for cancer gene expression and genotype-tissue expression classification tasks using public datasets. We statistically proved that the proposed methods outperform the state-of-the-art feature ranking methods Clustering Variation, Chi-Squared, Correlation, Information Gain, ReliefF and Significance, as well as other feature selection methods for attribute subset evaluation based on correlation and consistency with the multi-objective evolutionary search strategy, and with the embedded feature selection methods C4.5 and LASSO. The proposed methods have been implemented on the WEKA platform for public use, making all the results reported in this paper repeatable and replicable.
Neural Networks, 2004. Proceedings. …, 2004
Input feature ranking and selection represent a necessary preprocessing stage in classification, especially when one is required to manage large quantities of data. We introduce a weighted generalized LVQ algorithm, called energy generalized relevance ...
Neural Networks, 2004. …, 2004
A comparison between five feature ranking methods based on entropy is presented on artificial and real datasets. Feature ranking method using ¡ £ ¢ statistics gives results that are very similar to the entropy-based methods. The quality of feature rankings obtained by these methods is evaluated using the decision tree and the nearest neighbor classifier with growing number of most important features. Significant differences are found in some cases, but there is no single best index that works best for all data and all classifiers. Therefore to be sure that a subset of features giving highest accuracy has been selected requires the use of many different indices.
2007
In computational analysis in scientific domains, images are often compared based on their features, e.g., size, depth and other domain-specific aspects. Certain features may be more significant than others while comparing the images and drawing corresponding inferences for specific applications. Though domain experts may have subjective notions of similarity for comparison, they seldom have a distance function that ranks the image features based on their relative importance. We propose a method called FeaturesRank for learning such a distance function in order to capture the semantics of the images. We are given training samples with pairs of images and the extent of similarity identified for each pair. Using a guessed initial distance function, FeaturesRank clusters the given images in levels. It then adjusts the distance function based on the error between the clusters and training samples using heuristics proposed in this paper. The distance function that gives the lowest error is the output. This contains the features ranked in the order most appropriate the domain. FeaturesRank is evaluated with real image data from nanotechnology and bioinformatics. The results of our evaluation are presented in the paper.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005
Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.
Pattern Recognition Letters, 2002
We present a neural network based approach for identifying salient features for classification in feedforward neural networks. Our approach involves neural network training with an augmented cross-entropy error function. The augmented error function forces the neural network to keep low derivatives of the transfer functions of neurons when learning a classification task. Such an approach reduces output sensitivity to the input changes. Feature selection is based on the reaction of the cross-validation data set classification error due to the removal of the individual features. We demonstrate the usefulness of the proposed approach on one artificial and three real-world classification problems. We compared the approach with five other feature selection methods, each of which banks on a different concept. The algorithm developed outperformed the other methods by achieving higher classification accuracy on all the problems tested. Ó
Frontiers in neuroscience, 2017
We introduce (EKRA) that aims to support the automatic identification of brain activity patterns using electroencephalographic recordings. EKRA is a data-driven strategy that incorporates two kernel functions to take advantage of the available joint information, associating neural responses to a given stimulus condition. Regarding this, a is adjusted to learning the linear projection that best discriminates the input feature set, optimizing the required free parameters automatically. Our approach is carried out in two scenarios: (i) feature selection by computing a relevance vector from extracted neural features to facilitating the physiological interpretation of a given brain activity task, and (ii) enhanced feature selection to perform an additional transformation of relevant features aiming to improve the overall identification accuracy. Accordingly, we provide an alternative feature relevance analysis strategy that allows improving the system performance while favoring the data ...
International Journal of Computational Intelligence Systems, 2023
Over the past decades, different classification approaches with different characteristics have been developed to achieve more efficient and accurate results. Although the loss function used in the training procedure is a significant influential factor in the performance of classification models, it has been less considered. In general, in previous research, two main categories of continuous and semi-continuous distance-based loss functions are often applied to estimate the unknown parameters of classification models. Among these, continuous distance-based cost functions are among the most commonly used and most popular loss functions in diverse statistical and intelligent classifiers. In particular, the fundamental of this category of the loss functions is based on the continuous reduction of the distance between the fitted and actual values with the aim of improving the performance of the model. However, since the goal function of classification models belongs to the class of discrete ones, the application of learning procedures based on a continuous distance-based function is not coordinated with the nature of these problems. Consequently, it is theoretically illogical and practically at least inefficient. Accordingly, in order to fill this research gap, the discrete direction-based loss function in the form of mixed-integer programming is proposed in the training procedure of statistical, shallow/deep intelligent classifiers. In this paper, the impact of the loss function type on the classification rate of the classifiers in the energy domain is investigated. For this purpose, the logistic regression (LR), multilayer perceptron (MLP), and deep multilayer perceptron (DMLP), which are respectively among the most widely used statistical, shallow intelligent, and deep learning classifiers, are exemplarily chosen. Numerical outcomes from 13 benchmark energy datasets show that, in all benchmarks, the performances of the discrete direction learning-based classifiers, i.e., discrete learning-based logistic regression (DILR), discrete learning-based multilayer perceptron (DIMLP), and discrete learning-based deep multilayer perceptron (DIDMLP), is higher than its conventional versions. In addition, the proposed DILR, DIMLP, and DIDMLP models can on average yield an 89.88%, 94.53%, and 96.02% classification rate, which indicate a 6.78%, 5.90%, and 4.69% improvement from the classic versions, which only produce an 84.17%, 89.26%, and 91.72% classification rate. Consequently, the discrete direction-based learning methodology can be a more suitable, effective, and valuable alternative for training processes in statistical and shallow/deep intelligent classification models.
IEEE Transactions on Geoscience and Remote Sensing, 2010
The increase in spatial and spectral resolution of the satellite sensors, along with the shortening of the time-revisiting periods, has provided high-quality data for remote sensing image classification. However, the high-dimensional feature space induced by using many heterogeneous information sources precludes the use of simple classifiers: thus, a proper feature selection is required for discarding irrelevant features and adapting the model to the specific problem. This paper proposes to classify the images and simultaneously to learn the relevant features in such high-dimensional scenarios. The proposed method is based on the automatic optimization of a linear combination of kernels dedicated to different meaningful sets of features. Such sets can be groups of bands, contextual or textural features, or bands acquired by different sensors. The combination of kernels is optimized through gradient descent on the support vector machine objective function. Even though the combination is linear, the ranked relevance takes into account the intrinsic nonlinearity of the data through kernels. Since a naive selection of the free parameters of the multiple-kernel method is computationally demanding, we propose an efficient model selection procedure based on the kernel alignment. The result is a weight (learned from the data) for each kernel where both relevant and meaningless image features automatically emerge after training the model. Experiments carried out in multi-and hyperspectral, contextual, and multisource remote sensing data classification confirm the capability of the method in ranking the relevant features and show the computational efficience of the proposed strategy.
2006
Energy-Based Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in finding an energy function in which observed configurations of the variables are given lower energies than unobserved ones. The EBM approach provides a common theoretical framework for many learning models, including traditional discriminative and generative approaches, as well as graph-transformer networks, conditional random fields, maximum margin Markov networks, and several manifold learning methods.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.