2001, Proceeding of the 2001 Winter Simulation Conference (Cat. No.01CH37304)
This paper discusses some connections between adaptive Monte Carlo algorithms and general state space Markov chains. Adaptive algorithms are iterative methods in which previously generated samples are used to construct a more efficient sampling distribution at the current iteration. In this paper, we describe two such adaptive algorithms, one arising in a finite-horizon computation of expected reward and the other arising in the context of solving eigenvalue problems. We then discuss the connection between these adaptive algorithms and general state space Markov chain theory, and offer some insights into some of the technical difficulties that arise in trying to apply the known theory for general state space chains to such adaptive algorithms.
Bayesian Time Series Models
Resonance, 2003
In this report, we propose an original approach to solve Fredholm equations of the second kind. We interpret the standard von Neumann expansion of the solution as an expectation with respect to a probability distribution de…ned on an union of subspaces of variable dimension. Based on this representation, it is possible to use trans-dimensional Markov Chain Monte Carlo (MCMC) methods such as Reversible Jump MCMC to approximate the solution numerically. This can be an attractive alternative to standard Sequential Importance Sampling (SIS) methods routinely used in this context. We sketch an application to value function estimation for a Markov decision process.
The Annals of Statistics, 2011
Adaptive and interacting Markov chain Monte Carlo algorithms (MCMC) have been recently introduced in the literature. These novel simulation algorithms are designed to increase the simulation efficiency to sample complex distributions. Motivated by some recently introduced algorithms (such as the adaptive Metropolis algorithm and the interacting tempering algorithm), we develop a general methodological and theoretical framework to establish both the convergence of the marginal distribution and a strong law of large numbers. This framework weakens the conditions introduced in the pioneering paper by Roberts and Rosenthal [J. Appl. Probab. 44 (2007) 458-475]. It also covers the case when the target distribution π is sampled by using Markov transition kernels with a stationary distribution that differs from π .
Applied Mathematical Modelling, 2008
In this paper we analyse applicability and robustness of Markov chain Monte Carlo algorithms for eigenvalue problems. We restrict our consideration to real symmetric matrices.
We introduce a method for sequential minimization of a certain class of (possibly non-convex) cost functions with respect to a high dimensional signal of interest. The proposed approach involves the transformation of the optimization problem into one of estimation in a discrete-time dynamical system. In particular, we describe a methodology for constructing an artificial state-space model which has the signal of interest as its unobserved dynamic state. The model is "adapted" to the cost function in the sense that the maximum a posteriori (MAP) estimate of the system state is also a global minimizer of the cost function. The advantage of the estimation framework is that we can draw from a pool of sequential Monte Carlo methods, for particle approximation of probability measures in dynamic systems, that enable the numerical computation of MAP estimates. We provide examples of how to apply the proposed methodology, including some illustrative simulation results.
BMC Systems Biology
Background: In quantitative biology, mathematical models are used to describe and analyze biological processes. The parameters of these models are usually unknown and need to be estimated from experimental data using statistical methods. In particular, Markov chain Monte Carlo (MCMC) methods have become increasingly popular as they allow for a rigorous analysis of parameter and prediction uncertainties without the need for assuming parameter identifiability or removing non-identifiable parameters. A broad spectrum of MCMC algorithms have been proposed, including single-and multi-chain approaches. However, selecting and tuning sampling algorithms suited for a given problem remains challenging and a comprehensive comparison of different methods is so far not available. Results: We present the results of a thorough benchmarking of state-of-the-art single-and multi-chain sampling methods, including Adaptive Metropolis, Delayed Rejection Adaptive Metropolis, Metropolis adjusted Langevin algorithm, Parallel Tempering and Parallel Hierarchical Sampling. Different initialization and adaptation schemes are considered. To ensure a comprehensive and fair comparison, we consider problems with a range of features such as bifurcations, periodical orbits, multistability of steady-state solutions and chaotic regimes. These problem properties give rise to various posterior distributions including uni-and multi-modal distributions and non-normally distributed mode tails. For an objective comparison, we developed a pipeline for the semi-automatic comparison of sampling results. Conclusion: The comparison of MCMC algorithms, initialization and adaptation schemes revealed that overall multi-chain algorithms perform better than single-chain algorithms. In some cases this performance can be further increased by using a preceding multi-start local optimization scheme. These results can inform the selection of sampling methods and the benchmark collection can serve for the evaluation of new algorithms. Furthermore, our results confirm the need to address exploration quality of MCMC chains before applying the commonly used quality measure of effective sample size to prevent false analysis conclusions.
Sequential Monte Carlo (SMC) methods are a class of importance sampling and resampling techniques designed to simulate from a sequence of probability distributions. These approaches have become very popular over the last few years to solve sequential Bayesian inference problems (e.g. . However, in comparison to Markov chain Monte Carlo (MCMC), the application of SMC remains limited when, in fact, such methods are also appropriate in such contexts (e.g. Chopin ; Del Moral et al. ). In this paper, we present a simple unifying framework which allows us to extend both the SMC methodology and its range of applications. Additionally, reinterpreting SMC algorithms as an approximation of nonlinear MCMC kernels, we present alternative SMC and iterative self-interacting approximation schemes. We demonstrate the performance of the SMC methodology on static and sequential Bayesian inference problems. E γ (x) dx. If π is a high-dimensional, non-standard distribution then, to improve the exploration ability of an algorithm, it is attractive to consider an inhomogeneous sequence of P distributions to move "smoothly" from a tractable distribution π 1 = µ 1 to the target distribution π P = π. In this case
Note this is a random variable with expected value π(f) (i.e. the estimator is unbiased) and standard deviation of order O(1/ √ N). Then by CLT, the errorπ(f) − π(f) will have a limiting normal distribution as N → ∞. Therefore we can compute π(f) by computing samples (plus some regression techniques?). But the problem is if π u is complicated, then it is very difficult to simulate i.i.d. random variables from π(•). The MCMC solution is to construct a Markov chain on X which has π(•) as a stationary distribution, i.e. X π(dx)P (x, dy) = π(dy) Then for large n the distribution of X n will be approximately stationary. We can set Z 1 = X n and get Z 2 , Z 3 ,. .. , Z n repeatedly. Remark. In practice instead of starting a fresh Markov chain every time we take the successive X n 's, for example, (N − B) −1 N i=B+1 f (X i). We tend to ignore the dependence problem as many of the mathematical issues are similar in either implementation. Remark. We have other ways of estimation, such as "rejection sampling" and "importance sampling". But MCMC algorithms is applied most widely. 2 MCMC and its construction This section will explain how MCMC algorithm is constructed. Now we introduce reversibility. Definition. A Markov Chain on state space X is reversible with respect to a probability distribution π(•) on X , if π(dx)P (x, dy) = π(dy)P (y, dx), x, y ∈ X Proposition. A Markov Chain is reversible with respect to π(•), then π(•) is the stationary distribution for the chain. Proof. By reversibility, we have x∈X π(dx)P (x, dy) = x∈X π(dy)P (y, dx) = π(dy) x∈X P (x, dy) = π(dy) Now the simplest way to construct a MCMC algorithm which satisfies reversibility is using Metropolis-Hastings algorithm. 2.1 The Metropolis-Hastings Algorithm. Suppose that π(•) has a (possibly unnormalized) density π u. Let Q(x, •) be essentially any other Markov Chain, whose transitions also have a (possibly unnormalized) density, i.e. Q(x, dy) ∝ q(x, y)dy. First choose some X 0. Then given X n , generate a proposal Y n+1 from Q(X n , •). In the meantime we flip a independent bias coin with probability of heads equals to α(X n , Y n+1), where α(x, y) = min 1, π u (y)q(y, x) π u (x)q(x, y) , π(x)q(x, y) = 0 And α(x, y) = 1 when π(x)q(x, y) = 0. Then if the coin is heads, we accept the proposal and set X n+1 = Y n+1. If the coin is tails, then we reject the proposal and set X n+1 = X n. Then we replace n by n + 1 and repeat. The reason we take α(x, y) as above is explain as follow. Proposition. The Metropolis-Hastings Algorithm produces a Markov Chain {X n } which is reversible with respect to π(•). Proof. We want to show for any x, y ∈ X , π(dx)P (x, dy) = π(dy)P (y, dx) whereȲ i = 1 J j Y ij. The Gibbs sampler then proceeds by updating the K + 3 variables according to the above conditional distributions. This is feasible since the conditional distributions are all easily simulated (IG and N).
Communications and Control Engineering, 2007
Journal of the American Statistical Association, 1998
I propose a convergence diagnostic for Markov Chain Monte Carlo (MCMC) algorithms that is based on couplings of a Markov chain with an auxiliary chain that is periodically restarted from a xed parameter value. The diagnostic provides a mechanism for estimating the speci c constants governing the rate of convergence of geometrically and uniformly ergodic chains, and provides an informal lower bound on the \e ective" sample size of a MCMC chain of xed length. It also provides a simple procedure for obtaining what is, with high probability, an independent sample from the stationary distribution.
Markov chain Monte Carlo (MCMC) methods have become popular as a basis for drawing inference from complex statistical models. Two common di culties with MCMC algorithms are slow convergence and long run-times, which are often closely related. Algorithm convergence can often be aided by careful tuning of the chain's transition kernel. In order to preserve the algorithm's stationary distribution, however, care must be taken when updating a chain's transition kernel based on that same chain's history. In this paper we introduce a technique that allows the transition kernel to be updated at user speci ed intervals, while preserving the chain's stationary distribution. This technique may be bene cial in aiding both the rate of convergence (by allowing adaptation of the transition kernel) and the speed of computing. The approach is particularly helpful when calculation of the full conditional (for a Gibbs algorithm) or of the candidate distribution (for a Metropolis-Hastings algorithm) is computationally expensive.
Physical Review E, 2008
We demonstrate the use of a variational method to determine a quantitative lower bound on the rate of convergence of Markov chain Monte Carlo ͑MCMC͒ algorithms as a function of the target density and proposal density. The bound relies on approximating the second largest eigenvalue in the spectrum of the MCMC operator using a variational principle and the approach is applicable to problems with continuous state spaces. We apply the method to one dimensional examples with Gaussian and quartic target densities, and we contrast the performance of the random walk Metropolis-Hastings algorithm with a "smart" variant that incorporates gradient information into the trial moves, a generalization of the Metropolis adjusted Langevin algorithm. We find that the variational method agrees quite closely with numerical simulations. We also see that the smart MCMC algorithm often fails to converge geometrically in the tails of the target density except in the simplest case we examine, and even then care must be taken to choose the appropriate scaling of the deterministic and random parts of the proposed moves. Again, this calls into question the utility of smart MCMC in more complex problems. Finally, we apply the same method to approximate the rate of convergence in multidimensional Gaussian problems with and without importance sampling. There we demonstrate the necessity of importance sampling for target densities which depend on variables with a wide range of scales.
This paper is concerned with improving the performance of Markov chain algorithms for Monte Carlo simulation. We propose a new algorithm for simulating from multivariate Gaussian densities. This algorithm combines ideas from Metropolis-coupled Markov chain Monte Carlo methods and from an existing algorithm based only on over-relaxation. The speed of convergence of the proposed and existing algorithms can be measured by the spectral radius of certain matrices. We present examples in which the proposed algorithm converges faster than the existing algorithm and the Gibbs sampler. We also derive an expression for the asymptotic variance of any linear combination of the variables simulated by the proposed algorithm. From this expression it follows that the proposed algorithm o ers no asymptotic variance reduction compared with the existing algorithm. We extend the proposed algorithm to the non-Gaussian case and discuss its performance by means of examples from Bayesian image analysis. We nd that better performance is obtained from a special case of the proposed algorithm, which is a modi ed version of the algorithm of , than from a Metropolis algorithm. 1 2. estimating expected values under of functions de ned over the state space.
This article provides a rst theoretical analysis on a new Monte Carlo approach, the dynamic weighting, proposed recently by Wong and Liang. In dynamic weighting, one augments the original state space of interest by a weighting factor, which allows the resulting Markov chain to move more freely and to escape from local modes. It uses a new invariance principle to guide the construction of transition rules. We analyze the behaviors of the weights resulting from such a process and provide detailed recommendations on how to use these weights properly. Our recommendations are supported by a renewal theory-type analysis. Our theoretical investigations are further demonstrated by a simulation study and applications in the neural network training and the Ising model simulations.
We study the slice sampler, a method of constructing a reversible Markov chain with a specified invariant distribution. Given an independence Metropolis-Hastings algorithm it is always possible to construct a slice sampler that dominates it in the Peskun sense. This means that the resulting Markov chain produces estimates with a smaller asymptotic variance. Furthermore the slice sampler has a smaller second-largest eigenvalue than the corresponding independence MetropolisHastings algorithm. This ensures faster convergence to the distribution of interest. A sufficient condition for uniform ergodicity of the slice sampler is given and an upper bound for the rate of convergence to stationarity is provided. Keywords: Auxiliary variables, Slice sampler, Peskun ordering, Metropolis-Hastings algorithm, Uniform ergodicity. 1 Introduction The slice sampler is a method of constructing a reversible Markov transition kernel with a given invariant distribution. Auxiliary variables ar...
We propose a sequential Markov chain Monte Carlo (SMCMC) algorithm to sample from a sequence of probability distributions, corresponding to posterior distributions at different times in on-line applications. SMCMC proceeds as in usual MCMC but with the stationary distribution updated appropriately each time new data arrive. SMCMC has advantages over sequential Monte Carlo (SMC) in avoiding particle degeneracy issues. We provide theoretical guarantees for the marginal convergence of SMCMC under various settings, including parametric and nonparametric models. The proposed approach is compared to competitors in a simulation study. We also consider an application to on-line nonparametric regression.
Markov chain Monte Carlo (MCMC) methods, including the Gibbs sampler and the Metropolis-Hastings algorithm, are very commonly used in Bayesian statistics for sampling from complicated, high-dimensional posterior distributions. A continuing source of uncertainty is how long such a sampler must be run in order to converge approximately to its target stationary distribution. Rosenthal (1995b) presents a method to compute rigorous theoretical upper bounds on the number of iterations required to achieve a specified degree of convergence in total variation distance by verifying drift and minorization conditions. We propose the use of auxiliary simulations to estimate the numerical values needed in Rosenthal's theorem. Our simulation method makes it possible to compute quantitative convergence bounds for models for which the requisite analytical computations would be prohibitively difficult or impossible. On the other hand, although our method appears to perform well in our example problems, it can not provide the guarantees offered by analytical proof.
Machine Learning, 2003
Stochastic search algorithms inspired by physical and biological systems are applied to the problem of learning directed graphical probability models in the presence of missing observations and hidden variables. For this class of problems, deterministic search algorithms tend to halt at local optima, requiring random restarts to obtain solutions of acceptable quality. We compare three stochastic search algorithms: a Metropolis-Hastings Sampler (MHS), an Evolutionary Algorithm (EA), and a new hybrid algorithm called Population Markov Chain Monte Carlo, or popMCMC. PopMCMC uses statistical information from a population of MHSs to inform the proposal distributions for individual samplers in the population. Experimental results show that popMCMC and EAs learn more efficiently than the MHS with no information exchange. Populations of MCMC samplers exhibit more diversity than populations evolving according to EAs not satisfying physics-inspired local reversibility conditions.
Operations Research, 2008
We introduce and study a randomized quasi-Monte Carlo method for the simulation of Markov chains up to a random (and possibly unbounded) stopping time. The method simulates n copies of the chain in parallel, using a (d + 1)-dimensional, highly uniform point set of cardinality n, randomized independently at each step, where d is the number of uniform random numbers required at each transition of the Markov chain. The general idea is to obtain a better approximation of the state distribution, at each step of the chain, than with standard Monte Carlo. The technique can be used in particular to obtain a low-variance unbiased estimator of the expected total cost when state-dependent costs are paid at each step. It is generally more effective when the state space has a natural order related to the cost function.
