Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2006
…
59 pages
1 file
Energy-Based Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in finding an energy function in which observed configurations of the variables are given lower energies than unobserved ones. The EBM approach provides a common theoretical framework for many learning models, including traditional discriminative and generative approaches, as well as graph-transformer networks, conditional random fields, maximum margin Markov networks, and several manifold learning methods.
Probabilistic graphical models associate a probability to each configuration of the relevant variables. Energy-based models (EBM) associate an energy to those configurations, eliminating the need for proper normalization of probability distributions. Making a decision (an inference) with an EBM consists in comparing the energies associated with various configurations of the variable to be predicted, and choosing the one with the smallest energy. Such systems must be trained discriminatively to associate low energies to the desired configurations and higher energies to undesired configurations. A wide variety of loss function can be used for this purpose. We give sufficient conditions that a loss function should satisfy so that its minimization will cause the system to approach to desired behavior. We give many specific examples of suitable loss functions, and show an application to object recognition in images. it is important to note that the energy is quantity minimized
We introduce a view of unsupervised learning that integrates probabilistic and nonprobabilistic methods for clustering, dimensionality reduction, and feature extraction in a unified framework. In this framework, an energy function associates low energies to input points that are similar to training samples, and high energies to unobserved points. Learning consists in minimizing the energies of training samples while ensuring that the energies of unobserved ones are higher. Some traditional methods construct the architecture so that only a small number of points can have low energy, while other methods explicitly "pull up" on the energies of unobserved points. In probabilistic methods the energy of unobserved points is pulled by minimizing the log partition function, an expensive, and sometimes intractable process. We explore different and more efficient methods using an energy-based approach. In particular, we show that a simple solution is to restrict the amount of information contained in codes that represent the data. We demonstrate such a method by training it on natural image patches and by applying to image denoising.
The Machine Learning and Pattern Recognition communities are facing two challenges: solving the normalization problem, and solving the deep learning problem.
ArXiv, 2020
This paper employs a formal connection of machine learning with thermodynamics to characterize the quality of learnt representations for transfer learning. We discuss how information-theoretic functional such as rate, distortion and classification loss of a model lie on a convex, so-called equilibrium surface.We prescribe dynamical processes to traverse this surface under constraints, e.g., an iso-classification process that trades off rate and distortion to keep the classification loss unchanged. We demonstrate how this process can be used for transferring representations from a source dataset to a target dataset while keeping the classification loss constant. Experimental validation of the theoretical results is provided on standard image-classification datasets.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012
A score function induced by a generative model of the data can provide a feature vector of a fixed dimension for each data sample. Data samples themselves may be of differing lengths (e.g., speech segments or other sequential data), but as a score function is based on the properties of the data generation process, it produces a fixed-length vector in a highly informative space, typically referred to as "score space." Discriminative classifiers have been shown to achieve higher performances in appropriately chosen score spaces with respect to what is achievable by either the corresponding generative likelihood-based classifiers or the discriminative classifiers using standard feature extractors. In this paper, we present a novel score space that exploits the free energy associated with a generative model. The resulting free energy score space (FESS) takes into account the latent structure of the data at various levels and can be shown to lead to classification performance that at least matches the performance of the free energy classifier based on the same generative model and the same factorization of the posterior. We also show that in several typical computer vision and computational biology applications the classifiers optimized in FESS outperform the corresponding pure generative approaches, as well as a number of previous approaches combining discriminating and generative models.
to describe supervised and unsupervised training methods for probabilistic and non-probabilistic factor graphs. An energy-based model associates a scalar energy to configurations of inputs, outputs, and latent variables. Inference consists in finding configurations of output and latent variables that minimize the energy. Learning consists in finding parameters that minimize a suitable loss function so that the module produces lower energies for "correct" outputs than for all "incorrect" outputs. Learning machines can be constructed by assembling modules and loss functions. Gradient-based learning procedures are easily implemented through semi-automatic differentiation of complex models constructed by assembling predefined modules. We introduce an open-source and cross-platform C++ library called EBLearn 1 to enable the construction of energy-based learning models. EBLearn is composed of two major components, libidx: an efficient and very flexible multi-dimensional tensor library, and libeblearn: an object-oriented library of trainable modules and learning algorithms. The latter has facilities for such models as convolutional networks, as well as for image processing. It also provides graphical display functions.
2013
Editor: Isabelle Guyon et al. In this paper we propose an energy-based model (EBM) for selecting subsets of features that are both causally and predictively relevant for classification tasks. The proposed method is tested in the causality challenge, a competition that promotes research on strengthen feature selection by taking into account causal information of features. Under the proposed approach, an energy value is assigned to every configuration of features and the problem is reduced to that of finding the configuration that minimizes an energy function. We propose an energy function that takes into account causal, predictive, and relevance/correlation information of features. Particularly, we introduce potentials that combine the rankings of individual feature selection methods, Markov blanket information and predictive performance estimations. The configuration with lower energy will be that offering the best tradeoff between these sources of information. Experimental results ...
2008
Abstract In this paper we propose an energy-based model (EBM) for selecting subsets of features that are both causally and predictively relevant for classification tasks. The proposed method is tested in the causality challenge, a competition that promotes research on strengthen feature selection by taking into account causal information of features.
2009
Hybrid generative-discriminative techniques and, in particular, generative score-space classification methods have proven to be valuable approaches in tackling difficult object or scene recognition problems. A generative model over the available data for each image class is first learned, providing a relatively comprehensive statistical representation. As a result, meaningful new image features at different levels of the model become available, encoding the degree of fitness of the data with respect to the model at different levels. Such features, defining a score space, are then fed into a discriminative classifier which can exploit the intrinsic data separability. In this paper, we present a generative score-space technique which encapsulates the uncertainty present in the generative learning phase usually disregarded by the state-of-the-art methods. In particular, we propose the use of variational free energy terms as feature vectors, so that the degree of fitness of the data and the uncertainty over the generative process are included explicitly in the data description. The proposed method is automatically superior to a pure generative classification, and we also experimentally illustrate it on a wide selection of generative models applied to challenging benchmarks in hard computer vision tasks such as scene, object and shape recognition. In several instances, the proposed approach beats the current state of the art in classification performance, while relying on computationally inexpensive models.
The goal of a generative model is to capture the distribution underlying the data, typically through latent variables. After training, these variables are often used as a new representation, more effective than the original features in a variety of learning tasks. However, the representations constructed by contemporary generative models are usually point-wise deterministic mappings from the original feature space. Thus, even with representations robust to class-specific transformations, statistically driven models trained on them would not be able to generalize when the labeled data is scarce. Inspired by the stochasticity of the synaptic connections in the brain, we introduce Energy-based Stochastic Ensembles. These ensembles can learn non-deterministic representations, i.e., mappings from the feature space to a family of distributions in the latent space. These mappings are encoded in a distribution over a (possibly infinite) collection of models. By conditionally sampling models from the ensemble, we obtain multiple representations for every input example and effectively augment the data. We propose an algorithm similar to contrastive divergence for training restricted Boltzmann stochastic ensembles. Finally, we demonstrate the concept of the stochastic representations on a synthetic dataset as well as test them in the one-shot learning scenario on MNIST.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
The Journal of Machine …, 2003
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019
Proceedings - International Conference on Image Processing, ICIP, 2010
Pattern Recognition, 2014
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
arXiv (Cornell University), 2020
Proceedings of the AAAI Conference on Artificial Intelligence