Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2008
Abstract In this paper we present a model which can decompose a probability densities or count data into a set of shift invariant components. We begin by introducing a regular latent variable model and subsequently extend it to deal with shift invariance in order to model more complex inputs. We develop an expectation maximization algorithm for estimating components and present various results on challenging real-world data.
2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011
Dans cette article, nous présentons une nouvelle méthode de décomposition de spectrogrammes musicaux. Cette méthode vise à transposer les decompositions invariantes par translation qui permettent de décomposer des spectrogrammes à Q-constant (avec une résolution fréquentielle logarithmique) a des spectrogrammes standard issues de transformées de Fourier à court terme (avec une résolution fréquentielle linéaire). Cette technique a l'avantage de permettre facilement une reconstruction des signaux latents par filtrage de Wiener, ce qui peut être utilisé par exemple dans des applications de séparation de sources.
In this paper we provide a conceptual overview of latent variable models within a proba-bilistic modeling framework, an overview that emphasizes the compositional nature and the interconnectedness of the seemingly disparate models commonly encountered in statistical practice.
Statistical Methodology, 2007
A new method for analyzing high-dimensional categorical data, Linear Latent Structure (LLS) analysis, is presented. LLS models belong to the family of latent structure models, which are mixture distribution models constrained to satisfy the local independence assumption. LLS analysis explicitly considers a family of mixed distributions as a linear space and LLS models are obtained by imposing linear constraints on the mixing distribution.
2007
Abstract—An important problem in data-analysis tasks is to find suitable representations that make hidden structure in the data explicit. In this paper, we present a probabilistic latent variable model that is equivalent to a matrix decomposition of nonnegative data. Data is modeled as histograms of multiple draws from an underlying generative process. The model expresses the generative distribution as a mixture of hidden distributions which capture the latent structure.
It should be standard practice to standardize the data first before performing PCA. This is equivalent to working with correlation matrices instead of covariance matrices. (Left) PCA for raw data (Right) PCA of standardized data. 12 Run pcaDemoHeightWeight From PMTK3 In some applications of PCA, the number of data points (𝑁) is smaller than the dimensionality (𝐷) of the data space (e.g. 100 images each with 100,000 pixels). 𝑁 points in a D-dimensional space, where 𝑁 << 𝐷, defines a linear subspace whose dimensionality is at most 𝑁 -1. There is little point in applying PCA for values of 𝑀 that are greater than 𝑁 -1. If we perform PCA, we will find that at least 𝐷 -𝑁 + 1 of the eigenvalues are zero, corresponding to eigenvectors along whose directions the data set has zero variance.
Journal of Statistical Theory and Applications, 2020
The Probabilistic Latent Semantic Analysis has been related with the Singular Value Decomposition. Several problems occur when this comparative is done. Data class restrictions and the existence of several local optima mask the relation, being a formal analogy without any real significance. Moreover, the computational difficulty in terms of time and memory limits the technique applicability. In this work, we use the Nonnegative Matrix Factorization with the Kullback Leibler divergence to prove, when the number of model components is enough and a limit condition is reached, that the Singular Value Decomposition and the Probabilistic Latent Semantic Analysis empirical distributions are arbitrary close. Under such conditions, the Nonnegative Matrix Factorization and the Probabilistic Latent Semantic Analysis equality is obtained. With this result, the Singular Value Decomposition of every nonnegative entries matrix converges to the general case Probabilistic Latent Semantic Analysis results and constitutes the unique probabilistic image. Moreover, a faster algorithm for the Probabilistic Latent Semantic Analysis is provided.
The two papers summarized here both consider the task of clustering or modeling discrete data like text documents. In brief, Latent Diriclet Allocation (LDA), introduced by Blei et al [1], is a generative model where the data is generated from combinations of latent distributions or topics. Buntine [2] later gave a more general interpretation of the model under the name Multinomial PCA.
2007
Abstract An important problem in many fields is the analysis of counts data to extract meaningful latent components. Methods like Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) have been proposed for this purpose. However, they are limited in the number of components they can extract and lack an explicit provision to control the “expressiveness” of the extracted components. In this paper, we present a learning formulation to address these limitations by employing the notion of sparsity.
2012
1 Introduction Statistical models with hidden or latent variables are of great importance in natural language processing, speech, and many other fields. The EM algorithm is a remarkably successful method for parameter estimation within these models: it is simple, it is often relatively efficient, and it has well understood formal properties. It does, however, have a major limitation: it has no guarantee of finding the global optimum of the likelihood function.
2007
Probabilistic Principal Component Analysis is a reformulation of the common multivariate analysis technique known as Principal Component Analysis. It employs a latent variable model framework similar to factor analysis allowing to establish a maximum likelihood solution for the parameters that comprise the model. One of the main assumptions of Probabilistic Principal Component Analysis is that observed data is independent and identically distributed. This assumption is inadequate for many applications, in particular, for modeling sequential data. In this paper, the authors introduce a temporal version of Probabilistic Principal Component Analysis by using a hidden Markov model in order to obtain optimized representations of observed data through time. Combining Probabilistic Principal Component Analyzers with a hidden Markov model, it is possible to enhance the capabilities of transformation and reduction of time series vectors. In order to find automatically the dimensionality of the principal subspace associated with these Probabilistic Principal Component Analyzers through time, a Bayesian treatment of the Principal Component model is introduced as well.
The Annals of Statistics, 2016
A constructive proof of identification of multilinear decompositions of multiway arrays is presented. It can be applied to show identification in a variety of multivariate latent structures. Examples are finite-mixture models and hidden Markov models. The key step to show identification is the joint diagonalization of a set of matrices in the same nonorthogonal basis. An estimator of the latent-structure model may then be based on a sample version of this joint-diagonalization problem. Algorithms are available for computation and we derive distribution theory. We further develop asymptotic theory for orthogonal-series estimators of component densities in mixture models and emission densities in hidden Markov models.
2006
We describe a flexible nonparametric approach to latent variable modelling in which the number of latent variables is unbounded. This approach is based on a probability distribution over equivalence classes of binary matrices with a finite number of rows, corresponding to the data points, and an unbounded number of columns, corresponding to the latent variables. Each data point can be associated with a subset of the possible latent variables, which we refer to as the latent features of that data point. The binary variables in the matrix indicate which latent feature is possessed by which data point, and there is a potentially infinite array of features. We derive the distribution over unbounded binary matrices by taking the limit of a distribution over N × K binary matrices as K → ∞. We define a simple generative processes for this distribution which we call the Indian buffet process (IBP; Griffiths and Ghahramani, 2005, 2006) by analogy to the Chinese restaurant process (Aldous, 1985; Pitman, 2002). The IBP has a single hyperparameter which controls both the number of feature per object and the total number of features. We describe a two-parameter generalization of the IBP which has additional flexibility, independently controlling the number of features per object and the total number of features in the matrix. The use of this distribution as a prior in an infinite latent feature model is illustrated, and Markov chain Monte Carlo algorithms for inference are described.
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '06, 2006
Principal component analysis (PCA) has been extensively applied in data mining, pattern recognition and information retrieval for unsupervised dimensionality reduction. When labels of data are available, e.g., in a classification or regression task, PCA is however not able to use this information. The problem is more interesting if only part of the input data are labeled, i.e., in a semi-supervised setting. In this paper we propose a supervised PCA model called SPPCA and a semi-supervised PCA model called S 2 PPCA, both of which are extensions of a probabilistic PCA model. The proposed models are able to incorporate the label information into the projection phase, and can naturally handle multiple outputs (i.e., in multi-task learning problems). We derive an efficient EM learning algorithm for both models, and also provide theoretical justifications of the model behaviors. SPPCA and S 2 PPCA are compared with other supervised projection methods on various learning tasks, and show not only promising performance but also good scalability.
Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two mode and co-occurrence data, which has applications in information retrieval and ltering, natural language processing, machine learning from text, and in related areas. Compared to standard Latent Semantic Analysis which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed method is based on a mixture decomposition derived from a latent class model. This results in a more principled approach which has a solid foundation in statistics. In order to avoid over tting, we propose a widely applicable generalization of maximum likelihood model tting by tempered EM. Our approach yields substantial and consistent improvements over Latent Semantic Analysis in a number of experiments.
2000
Summary We describe a flexible nonparametric approach to latent variable modelling in which the number of latent variables is unbounded. This approach is based on a probability distribution over equivalence classes of binary matrices with a finite number of rows, corresponding to the data points, and an unbounded number of columns, corresponding to the latent variables. Each data point can
Machine learning, 2005
One of the simplest, and yet most consistently well-performing set of classifiers is the Naïve Bayes models. These models rely on two assumptions: (i) All the attributes used to describe an instance are conditionally independent given the class of that instance, and (ii) all attributes follow a specific parametric family of distributions. In this paper we propose a new set of models for classification in continuous domains, termed latent classification models. The latent classification model can roughly be seen as combining the Naïve Bayes model with a mixture of factor analyzers, thereby relaxing the assumptions of the Naïve Bayes classifier. In the proposed model the continuous attributes are described by a mixture of multivariate Gaussians, where the conditional dependencies among the attributes are encoded using latent variables. We present algorithms for learning both the parameters and the structure of a latent classification model, and we demonstrate empirically that the accuracy of the proposed model is significantly higher than the accuracy of other probabilistic classifiers.
SIAM International Conference …, 2008
Proceedings of the AAAI Conference on Artificial Intelligence
In unsupervised learning, dimensionality reduction is an important tool for data exploration and visualization. Because these aims are typically open-ended, it can be useful to frame the problem as looking for patterns that are enriched in one dataset relative to another. These pairs of datasets occur commonly, for instance a population of interest vs. control or signal vs. signal free recordings. However, there are few methods that work on sets of data as opposed to data points or sequences. Here, we present a probabilistic model for dimensionality reduction to discover signal that is enriched in the target dataset relative to the background dataset. The data in these sets do not need to be paired or grouped beyond set membership. By using a probabilistic model where some structure is shared amongst the two datasets and some is unique to the target dataset, we are able to recover interesting structure in the latent space of the target dataset. The method also has the advantages of ...
Multivariate Behavioral Research, 2018
One of the most relevant problems in principal component analysis and factor analysis is the interpretation of the components/factors. In this paper, disjoint principal component analysis model is extended in a maximum-likelihood framework to allow for inference on the model parameters. A coordinate ascent algorithm is proposed to estimate the model parameters. The performance of the methodology is evaluated on simulated and real data sets.
Pattern Recognition, 2011
Latent variable models are powerful dimensionality reduction approaches in machine learning and pattern recognition. However, this kind of methods only works well under a necessary and strict assumption that the training samples and testing samples are independent and identically distributed. When the samples come from different domains, the distribution of the testing dataset will not be identical with the training dataset. Therefore, the performance of latent variable models will be degraded for the reason that the parameters of the training model do not suit for the testing dataset. This case limits the generalization and application of the traditional latent variable models. To handle this issue, a transfer learning framework for latent variable model is proposed which can utilize the distance (or divergence) of the two datasets to modify the parameters of the obtained latent variable model. So we do not need to rebuild the model and only adjust the parameters according to the divergence, which will adopt different datasets. Experimental results on several real datasets demonstrate the advantages of the proposed framework.