Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022;32(5):78.
doi: 10.1007/s11222-022-10147-6. Epub 2022 Sep 19.

Geometry-informed irreversible perturbations for accelerated convergence of Langevin dynamics

Affiliations

Geometry-informed irreversible perturbations for accelerated convergence of Langevin dynamics

Benjamin J Zhang et al. Stat Comput. 2022.

Abstract

We introduce a novel geometry-informed irreversible perturbation that accelerates convergence of the Langevin algorithm for Bayesian computation. It is well documented that there exist perturbations to the Langevin dynamics that preserve its invariant measure while accelerating its convergence. Irreversible perturbations and reversible perturbations (such as Riemannian manifold Langevin dynamics (RMLD)) have separately been shown to improve the performance of Langevin samplers. We consider these two perturbations simultaneously by presenting a novel form of irreversible perturbation for RMLD that is informed by the underlying geometry. Through numerical examples, we show that this new irreversible perturbation can improve estimation performance over irreversible perturbations that do not take the geometry into account. Moreover we demonstrate that irreversible perturbations generally can be implemented in conjunction with the stochastic gradient version of the Langevin algorithm. Lastly, while continuous-time irreversible perturbations cannot impair the performance of a Langevin estimator, the situation can sometimes be more complicated when discretization is considered. To this end, we describe a discrete-time example in which irreversibility increases both the bias and variance of the resulting estimator.

Keywords: Bayesian computation; Geometry-informed irreversibility; Monte Carlo sampling; Riemannian manifold Langevin dynamics; Stochastic gradient Langevin dynamics.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interestNot applicable.

Figures

Fig. 1
Fig. 1
MSE of the running average for the first moment. Stochastic gradients are computed
Fig. 2
Fig. 2
MSE of the running average for the second moment. Stochastic gradients are computed
Fig. 3
Fig. 3
Kernelized Stein discrepancy plot for the Gaussian example. Black line has slope -1/2, which denotes the expected convergence rate
Fig. 4
Fig. 4
Trajectory burn-in: each trajectory is run for T=2.5. Left: single trajectories, right: mean paths. The gradients are computed exactly here
Fig. 5
Fig. 5
Observable: ϕ1(μ,σ)=μ+σ, δ=2. Stochastic gradients are computed
Fig. 6
Fig. 6
Observable: ϕ2(μ,σ)=μ2+σ2, δ=2. Stochastic gradients are computed
Fig. 7
Fig. 7
Kernelized Stein discrepancy plot for the parameters of a normal distribution example. Black line has slope -1/2, which denotes the expected convergence rate
Fig. 8
Fig. 8
Observable: ϕ1(w)=iwi. Bayesian logistic regression. Here, d=20
Fig. 9
Fig. 9
Observable: ϕ2(w)=iwi2. Bayesian logistic regression. Here, d=20
Fig. 10
Fig. 10
Kernelized Stein discrepancy plot for Bayesian logistic regression example. Black line has slope -1/2, which denotes the expected convergence rate
Fig. 11
Fig. 11
Posterior distribution sampled with standard Langevin with a deterministic gradient with T=10000 and h=10-4. Notice that the system is very multimodal and non-Gaussian
Fig. 12
Fig. 12
Trace plots of the W11 marginal
Fig. 13
Fig. 13
Variance of running average estimators. For the second moment (middle plot), there is less difference among the samplers as the distribution is quite symmetric, and one can properly estimate the second moment even if the samplers are stuck in a single mode. GiIrr is able to estimate the observable with cross-moments (right plot) better than the other samplers
Fig. 14
Fig. 14
Kernelized Stein discrepancy plot for the ICA example. Black line has slope -1/2, which denotes the expected convergence rate
Fig. 15
Fig. 15
Variance for different δ, fixed h
Fig. 16
Fig. 16
(Squared) bias and variance of ϕ(θ)=θ2 for varying levels of irreversibility

References

    1. Amari, Shun-ichi, Cichocki, Andrzej, Yang, Howard Hua: A new learning algorithm for blind signal separation. In: Advances in Neural Information Processing Systems, pages 757–763. Morgan Kaufmann Publishers, (1996)
    1. Asmussen Søren, Glynn Peter W. Stochastic simulation: algorithms and analysis. Germany: Springer Science & Business Media; 2007.
    1. Bierkens Joris. Non-reversible Metropolis-Hastings. Stat. Comput. 2016;26:1213–1228. doi: 10.1007/s11222-015-9598-x. - DOI
    1. Brosse, Nicolas, Durmus, Alain, Moulines, Éric: The promises and pitfalls of stochastic gradient Langevin dynamics. In: NeurIPS 2018 (Advances in Neural Information Processing Systems 2018), (2018)
    1. Diaconis P, Holmes S, Neal R. Analysis of a nonreversible Markov chain sampler. Ann. Appl. Probab. 2010;10:726–752.

LinkOut - more resources