Following our arXival on the new version of our HPD based Gelfand & Dey estimator of evidence, I got pointed at Wang et al. (2018), which I had forgotten I had read at the time (as testified by an ‘Og entry). Reading my own comments, I concur (with myself¹⁸!) that the method is not massively compelling since it requires a partition set that is strongly related with the targeted integral. The above illustration for a mixture, that is for a pseudo posterior that is a mixture with two Gaussian components with known variance, also shows (in reverse) the curse of dimension and the need for finely tuned partitions. Said partition corresponding to the myriad of sets on the rhs. With such a degree of partitioning, Riemann integration should also produce perfect estimate, as shown by the zero error in the resulting estimator (Table 4).
Archive for importance sampling
estimating evidence redux
Posted in Books, Statistics, University life with tags Bayesian Analysis, curse of dimensionality, estimating a constant, evidence, harmonic mean estimator, HPD region, importance sampling, marginal likelihood, Monte Carlo Statistical Methods on November 21, 2025 by xi'anfinite variance goals
Posted in Books, Statistics, Travel, University life with tags ChatGPT, control variates, importance sampling, infinite variance estimators, MathJax, Monte Carlo Statistical Methods, Pareto smoothed importance sampling, plugin, University of Warwick, Wordpress, xianblog on November 8, 2025 by xi'anDuring Johan Seger’s seminar in Warwick, on the control variate improvements he developed with Rémi Leluc (which PhD thesis committee I joined), Aymeric Dieuleveut, François Portier, and Aigerim Zhuman, I started wondering at whether or not a control variate could turn an infinite variance Monte Carlo estimate into a finite variance one. And asked… ChatGPT about it, with the above reply that is correct if not practical in the least since the example provided therein was reverse-engineering an infinite variance rv into a sum of an infinite variance rv considered as the control variate and a finite variance rv. As summarised below. In practice, this would mean replacing the integrand of interest with a much simpler integrand that shares the same asymptotic behaviour, not an easy task! (As an aside, I found out that enabling MathJax on this ‘Og would cost me $40 a month!)
5. Summary
✅ Theoretical possibility:
Yes — control variates can make an infinite-variance estimator finite, but only if the control’s sample path shares the same tail driver and its expectation is known.infinite variance rv
In real-world Monte Carlo, when X is heavy-tailed, you usually:
-
Split X = Y + (X-Y), where Y has known expectation and similar tails,
-
Use Y as control variate, and
-
Possibly combine with truncation, conditional expectation, or importance sampling for stability.
mostly [14] M[ar]C[h] seminar
Posted in Books, Statistics, University life with tags dimension reduction, GLMs, gradient algorithm, importance sampling, Issy-les-Moulineaux, log-normal distribution, MCMC, Monte Carlo methods, Mostly MC, Mostly MCMC seminar, Ocean, Paris, PariSanté campus, Porte de Versailles, PSL, seminar, Università Bocconi, Université Paris Dauphine on March 8, 2025 by xi'angentle importance sampling
Posted in Books, pictures, Statistics with tags BayesComp 2023, Cédric Villani, Comptes Rendus de l'Académie des Sciences, CRAS, George Casella, harmonic mean estimator, Hyvärinen score, importance sampling, infinite variance estimators, Levi, noise contrasting estimation, Pierre Louis Lions, survey on February 24, 2025 by xi'an
A new (and gentle!) survey by Luca Martino! And by Fernando Llorente. On importance sampling, with coverage of normalised and self-normalised versions. And their usage in different configurations (one vs several integrals, one vs several families of distributions). Some points relating to earlier remarks or musing of mine’s:
- the fact that the optimal importance function does not lead to a zero variance importance estimator when the integrand f is not of constant sign (p.7) can be cancelled by first decomposing f as f⁺-f⁻, since both allow for a zero variance importance estimator, if formally requiring two different samples (of size zero!), a trick considered later on p.18 and repeated for the ratio in self-normalised importance (p.19)
- the special case when the integrand f is constant is not of practical interest but relevant for checking properties of different estimators. For instance, this case allowed George and myself to spot a mistake in an early importance paper. In the same volume of the Comptes Rendus as an early paper of Lions and Villani.
- the remark that self-normalised (SNIS) importance sampling can prove more efficient than (properly normalised) importance sampling, although the property that SNIS is always bounded should not be seen as a major point given that it is simply due to using a finite sample and hence a finite set of images of f
- the case of integrals involving several target pdfs or several integrands is not necessarily of major interest if simulating different samples for each unidimensional integral can be implemented (again formally leading to zero variance at no cost)
- the issue of merging several estimators in an optimal way is briefly mentioned in §5.4, a challenge Victor Elvira and I have been approaching over the past years, if not yet concluding satisfactorily (mea culpa)
- when replacing the target with a noisy estimate (p.22), the fact that this estimate must be normalised is correct, but pales against the impact of using this estimate, which may prove catastrophic. And unbiasedness is not particularly crucially important in this setup for the same reason
- the section on evidence approximation (§7) is more standard, with the harmonic mean estimator being called reverse importance sampling, which brings us to the “elephant in the room”, namely that
- the issue of infinite variance of some importance sampling estimators is not directly covered (except once in §8, p.34), thus perceiving importance sampling as a variance reduction method being somewhat misleading (unless the authors consider solely the optimal importance function, which is rarely of practical use)
The paper concludes with an interesting notion that
“we suggest the analysis of the relevant connection between importance sampling and contrastive learning Gutmann and Hyvärinen (2012)”
that I also have been pointing out for a while. All in all, a useful summing-up that I will likely suggest to my students.

Adrien Corenflos (University of Warwick) and Hai-Dang Dau (NUS) just ![A mostly Monte Carlo seminar next week, at 3pm [CET] on 14 March while I'll still be in Japan, unfortunately missing both talks!](https://xianblog.wordpress.com/wp-content/uploads/2025/03/screenshot_20250302_180323.png?w=450)