Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2015, Bernoulli
When targeting a distribution that is artificially invariant under some permutations, Markov chain Monte Carlo (MCMC) algorithms face the label-switching problem, rendering marginal inference particularly cumbersome. Such a situation arises, for example, in the Bayesian analysis of finite mixture models. Adaptive MCMC algorithms such as adaptive Metropolis (AM), which self-calibrates its proposal distribution using an online estimate of the covariance matrix of the target, are no exception. To address the label-switching issue, relabeling algorithms associate a permutation to each MCMC sample, trying to obtain reasonable marginals. In the case of adaptive Metropolis (Bernoulli 7 (2001) 223-242), an online relabeling strategy is required. This paper is devoted to the AMOR algorithm, a provably consistent variant of AM that can cope with the label-switching problem. The idea is to nest relabeling steps within the MCMC algorithm based on the estimation of a single covariance matrix that is used both for adapting the covariance of the proposal distribution in the Metropolis algorithm step and for online relabeling. We compare the behavior of AMOR to similar relabeling methods. In the case of compactly supported target distributions, we prove a strong law of large numbers for AMOR and its ergodicity. These are the first results on the consistency of an online relabeling algorithm to our knowledge. The proof underlines latent relations between relabeling and vector quantization.
Journal of the American Statistical Association
There is a lack of methodological results to design efficient Markov chain Monte Carlo (MCMC) algorithms for statistical models with discrete-valued high-dimensional parameters. Motivated by this consideration, we propose a simple framework for the design of informed MCMC proposals (i.e. Metropolis-Hastings proposal distributions that appropriately incorporate local information about the target) which is naturally applicable to both discrete and continuous spaces. We explicitly characterize the class of optimal proposal distributions under this framework, which we refer to as locallybalanced proposals, and prove their Peskun-optimality in high-dimensional regimes. The resulting algorithms are straightforward to implement in discrete spaces and provide orders of magnitude improvements in efficiency compared to alternative MCMC schemes, including discrete versions of Hamiltonian Monte Carlo. Simulations are performed with both simulated and real datasets, including a detailed application to Bayesian record linkage. A direct connection with gradient-based MCMC suggests that locally-balanced proposals may be seen as a natural way to extend the latter to discrete spaces.
Computational Statistics, 2005
D e p a r t m e n t of M a t h e m a t i c s and Statistics P.O.Box 35 (MAD), FIN-40014 University of Jyv~iskyls Finland Summary We introduce a new adaptive MCMC algorithm, based on the traditional single component Metropolis-Hastings algorithm and on our earlier adaptive Metropolis algorithm (AM). In the new algorithm the adaption is performed component by component. The chain is no more Markovian, but it remains ergodic. The algorithm is demonstrated to work well in varying test cases up to 1000 dimensions.
Statistics and Computing, 2006
We propose to combine two quite powerful ideas that have recently appeared in the Markov chain Monte Carlo literature: adaptive Metropolis samplers and delayed rejection. The ergodicity of the resulting non-Markovian sampler is proved, and the efficiency of the combination is demonstrated with various examples. We present situations where the combination outperforms the original methods: adaptation clearly enhances efficiency of the delayed rejection algorithm in cases where good proposal distributions are not available. Similarly, delayed rejection provides a systematic remedy when the adaptation process has a slow start.
Statistics and Computing, 2010
The label switching problem is caused by the likelihood of a Bayesian mixture model being invariant to permutations of the labels. The permutation can change multiple times between Markov Chain Monte Carlo (MCMC) iterations making it difficult to infer component-specific parameters of the model. Various so-called 'relabelling' strategies exist with the goal to 'undo' the label switches that have occurred to enable estimation of functions that depend on component-specific parameters. Most existing approaches rely upon specifying a loss function, and relabelling by minimising its posterior expected loss. In this paper we develop probabilistic approaches to relabelling that allow estimation and incorporation of the uncertainty in the relabelling process. Variants of the probabilistic relabelling algorithm are introduced and compared to existing loss function based methods. We demonstrate that the idea of probabilistic relabelling can be expressed in a rigorous framework based on the EM algorithm.
Statistics and Computing, 2000
We propose to combine two quite powerful ideas that have recently appeared in the Markov chain Monte Carlo literature: adaptive Metropolis samplers and de- laying rejection. The ergodicity of the approach is proved, and the eciency of the combination is demonstrated with various test examples. We present situations where the combination outperforms the original methods: the adaptation clearly enhances the
Journal of Computational and Graphical Statistics, 2019
We propose Adaptive Incremental Mixture Markov chain Monte Carlo (AIMM), a novel approach to sample from challenging probability distributions defined on a general state-space. While adaptive MCMC methods usually update a parametric proposal kernel with a global rule, AIMM locally adapts a semiparametric kernel. AIMM is based on an independent Metropolis-Hastings proposal distribution which takes the form of a finite mixture of Gaussian distributions. Central to this approach is the idea that the proposal distribution adapts to the target by locally adding a mixture component when the discrepancy between the proposal mixture and the target is deemed to be too large. As a result, the number of components in the mixture proposal is not fixed in advance. Theoretically, we prove that there exists a process that can be made arbitrarily close to AIMM and that converges to the correct target distribution. We also illustrate that it performs well in practice in a variety of challenging situations, including high-dimensional and multimodal target distributions.
The Annals of Statistics, 2011
Adaptive and interacting Markov chain Monte Carlo algorithms (MCMC) have been recently introduced in the literature. These novel simulation algorithms are designed to increase the simulation efficiency to sample complex distributions. Motivated by some recently introduced algorithms (such as the adaptive Metropolis algorithm and the interacting tempering algorithm), we develop a general methodological and theoretical framework to establish both the convergence of the marginal distribution and a strong law of large numbers. This framework weakens the conditions introduced in the pioneering paper by Roberts and Rosenthal [J. Appl. Probab. 44 (2007) 458-475]. It also covers the case when the target distribution π is sampled by using Markov transition kernels with a stationary distribution that differs from π .
2019
We analyze the tension between robustness and efficiency for Markov chain Monte Carlo (MCMC) sampling algorithms. In particular, we focus on robustness of MCMC algorithms with respect to heterogeneity in the target and their sensitivity to tuning, an issue of great practical relevance but still understudied theoretically. We show that the spectral gap of the Markov chains induced by classical gradient-based MCMC schemes (e.g. Langevin and Hamiltonian Monte Carlo) decays exponentially fast in the degree of mismatch between the scales of the proposal and target distributions, while for the random walk Metropolis (RWM) the decay is linear. This result provides theoretical support to the notion that gradient-based MCMC schemes are less robust to heterogeneity and more sensitive to tuning. Motivated by these considerations, we propose a novel and simple to implement gradient-based MCMC algorithm, inspired by the classical Barker accept-reject rule, with improved robustness properties. Ex...
2014
A Kernel Adaptive Metropolis-Hastings algorithm is introduced, for the purpose of sampling from a target distribution with strongly nonlinear support. The algorithm embeds the trajectory of the Markov chain into a reproducing kernel Hilbert space (RKHS), such that the feature space covariance of the samples informs the choice of proposal. The procedure is computationally efficient and straightforward to implement, since the RKHS moves can be integrated out analytically: our proposal distribution in the original space is a normal distribution whose mean and covariance depend on where the current sample lies in the support of the target distribution, and adapts to its local covariance structure. Furthermore, the procedure requires neither gradients nor any other higher order information about the target, making it particularly attractive for contexts such as Pseudo-Marginal MCMC. Kernel Adaptive Metropolis-Hastings outperforms competing fixed and adaptive samplers on multivariate, highly nonlinear target distributions, arising in both real-world and synthetic examples.
Scandinavian Journal of Statistics, 2002
The Hastings-Metropolis algorithm is a general MCMC method for sampling from a density known up to a constant. Geometric convergence of this algorithm has been proved under conditions relative to the instrumental distribution (or proposal). We present an inhomogeneous Hastings-Metropolis algorithm for which the proposal density approximates the target density, as the number of iterations increases. The proposal at the nth step is a nonparametric estimate of the density of the algorithm, and uses an increasing number of iid copies of the Markov chain. The resulting algorithm converges (in n) geometrically faster than a Hastings-Metropolis algorithm with an arbitrary proposal. The case of a strictly positive density with compact support is presented rst, then an extension to more general densities is given. We conclude by proposing a practical way of implementation for the algorithm, and illustrate it over a simulated example.
The Annals of Applied Probability, 2009
We propose an adaptive independent Metropolis-Hastings algorithm with the ability to learn from all previous proposals in the chain except the current location. It is an extension of the independent Metropolis-Hastings algorithm. Convergence is proved provided a strong Doeblin condition is satisfied, which essentially requires that all the proposal functions have uniformly heavier tails than the stationary distribution. The proof also holds if proposals depending on the current state are used intermittently, provided the information from these iterations is not used for adaption. The algorithm gives samples from the exact distribution within a finite number of iterations with probability arbitrarily close to 1. The algorithm is particularly useful when a large number of samples from the same distribution is necessary, like in Bayesian estimation, and in CPU intensive applications like, for example, in inverse problems and optimization.
In this paper a simple procedure to deal with label switching when exploring complex posterior distributions by MCMC algorithms is proposed. Although it cannot be generalized to any situation, it may be handy in many applications because of its simplicity and very low computational burden. A possible area where it proves to be useful is when deriving a sample for the posterior distribution arising from finite mixture models when no simple or rational ordering between the components is available.
Bernoulli, 2001
A proper choice of a proposal distribution for MCMC methods, e.g. for the Metropolis-Hastings algorithm, is well known to be a crucial factor for the convergence of the algorithm. In this paper we introduce an adaptive Metropolis Algorithm (AM), where the Gaussian proposal distribution is updated along the process using the full information cumulated so far. Due to the adaptive nature of the process, the AM algorithm is non-Markovian, but we establish here that it has the correct ergodic properties. We also include the results of our numerical tests, which indicate that the AM algorithm competes well with traditional Metropolis-Hastings algorithms, and demonstrate that AM provides an easy to use algorithm for practical computation. 1991 Mathematics Subject Classi cation: 65C05, 65U05.
This paper proposes a new randomized strategy for adaptive MCMC using Bayesian optimization. This approach applies to nondifferentiable objective functions and trades off exploration and exploitation to reduce the number of potentially costly objective function evaluations. We demonstrate the strategy in the complex setting of sampling from constrained, discrete and densely connected probabilistic graphical models where, for each variation of the problem, one needs to adjust the parameters of the proposal mechanism automatically to ensure efficient mixing of the Markov chains.
ICASSP 2013 (and also arXiv preprint arXiv:1212.0122) , 2013
Markov Chain Monte Carlo methods are widely used in signal processing and communications for statistical inference and stochastic optimization. In this work, we introduce an efficient adaptive Metropolis-Hastings algorithm to draw samples from generic multimodal and multidimensional target distributions. The proposal density is a mixture of Gaussian densities with all parameters (weights, mean vectors and covariance matrices) updated using all the previously generated samples applying simple recursive rules. Numerical results for the one and two-dimensional cases are provided.
In this work, we introduce a novel class of adaptive Monte Carlo methods, called adaptive independent sticky MCMC algorithms, for efficient sampling from a generic target probability density function pdf). The new class of algorithms employs adaptive non-parametric proposal densities which become closer and closer to the target as the number of iterations increases. The proposal pdf is built using interpolation procedures based on a set of support points which is constructed iteratively based on previously drawn samples. The algorithm's efficiency is ensured by a test that controls the evolution of the set of support points. This extra stage controls the computational cost and the convergence of the proposal density to the target. Each part of the novel family of algorithms is discussed and several examples are provided. Although the novel algorithms are presented for univariate target densities, we show that they can be easily extended to the multivariate context within a Gibbs-type sampler. The ergodicity is ensured and discussed. Exhaustive numerical examples illustrate the efficiency of sticky schemes, both as a stand-alone methods to sample from complicated one-dimensional pdfs and within Gibbs in order to draw from multi-dimensional target distributions. from any (bounded) univariate target distribution. 1 Then, we also propose a more efficient scheme, called adaptive independent sticky Multiple Try Metropolis (AISMTM). The MTM technique [24] is an extension of the Metropolis-Hastings method, which fosters the exploration of the state space [3, 26]. More-35 over, the new class of methods encompasses different well-known algorithms given in literature: the Griddy Gibbs sampler [33], the adaptive rejection Metropolis Sampling (ARMS) [10, 30], and the independent doubly adaptive Metropolis Sampling (IA 2 RMS) [28, 27]. The ergodicity of the adaptive sticky MCMC methods is ensured and dis-40 cussed. The underlying theoretical support is based on the approach introduced in [13]. It is also important to remark that, AISM and AISMTM also provides automatically an estimation of the normalizing constant of the target (a.k.a.
Journal of the Royal Statistical Society: Series B (Statistical Methodology)
There is a tension between robustness and efficiency when designing Markov chain Monte Carlo (MCMC) sampling algorithms. Here we focus on robustness with respect to tuning parameters, showing that more sophisticated algorithms tend to be more sensitive to the choice of step-size parameter and less robust to heterogeneity of the distribution of interest. We characterise this phenomenon by studying the behaviour of spectral gaps as an increasingly poor step-size is chosen for the algorithm. Motivated by these considerations, we propose a novel and simple gradient-based MCMC algorithm, inspired by the classical Barker accept-reject rule, with improved robustness properties. Extensive theoretical results, dealing with robustness to tuning, geometric ergodicity and scaling with dimension, suggest that the novel scheme combines the robustness of simple schemes with the efficiency of gradient-based ones. We show numerically that this type of robustness is particularly beneficial in the context of adaptive MCMC, giving examples where our proposed scheme significantly outperforms state-of-the-art alternatives.
Neural Computation, 2013
Simulating from distributions with intractable normalizing constants has been a long-standing problem in machine learning. In this letter, we propose a new algorithm, the Monte Carlo Metropolis-Hastings (MCMH) algorithm, for tackling this problem. The MCMH algorithm is a Monte Carlo version of the Metropolis-Hastings algorithm. It replaces the unknown normalizing constant ratio by a Monte Carlo estimate in simulations, while still converges, as shown in the letter, to the desired target distribution under mild conditions. The MCMH algorithm is illustrated with spatial autologistic models and exponential random graph models. Unlike other auxiliary variable Markov chain Monte Carlo (MCMC) algorithms, such as the Møller and exchange algorithms, the MCMH algorithm avoids the requirement for perfect sampling, and thus can be applied to many statistical models for which perfect sampling is not available or very expensive. The MCMH algorithm can also be applied to Bayesian inference for ra...
1997
We study the slice sampler, a method of constructing a reversible Markov chain with a specified invariant distribution. Given an independence Metropolis-Hastings algorithm it is always possible to construct a slice sampler that dominates it in the Peskun sense. This means that the resulting Markov chain produces estimates with a smaller asymptotic variance. Furthermore the slice sampler has a smaller second-largest eigenvalue than the corresponding independence MetropolisHastings algorithm. This ensures faster convergence to the distribution of interest. A sufficient condition for uniform ergodicity of the slice sampler is given and an upper bound for the rate of convergence to stationarity is provided. Keywords: Auxiliary variables, Slice sampler, Peskun ordering, Metropolis-Hastings algorithm, Uniform ergodicity. 1 Introduction The slice sampler is a method of constructing a reversible Markov transition kernel with a given invariant distribution. Auxiliary variables ar...
1999
The Hastings-Metropolis algorithm is a general MCMC method for sampling from a density known up to a constant. Geometric convergence of this algorithm has been proved under conditions relative to the instrumental distribution (or proposal). We present an inhomogeneous Hastings-Metropolis algorithm for which the proposal density approximates the target density, as the number of iterations increases. The proposal at the nth step is a nonparametric estimate of the density of the algorithm, and uses an increasing number of iid copies of the Markov chain. The resulting algorithm converges (in n) geometrically faster than a Hastings-Metropolis algorithm with an arbitrary proposal. The case of a strictly positive density with compact support is presented rst, then an extension to more general densities is given. We conclude by proposing a practical way of implementation for the algorithm, and illustrate it over a simulated example.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.