Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 30:5:53.
doi: 10.12688/wellcomeopenres.15770.1. eCollection 2020.

Hamiltonian Monte Carlo sampling to estimate past population dynamics using the skygrid coalescent model in a Bayesian phylogenetics framework

Affiliations

Hamiltonian Monte Carlo sampling to estimate past population dynamics using the skygrid coalescent model in a Bayesian phylogenetics framework

Guy Baele et al. Wellcome Open Res. .

Abstract

Nonparametric coalescent-based models are often employed to infer past population dynamics over time. Several of these models, such as the skyride and skygrid models, are equipped with a block-updating Markov chain Monte Carlo sampling scheme to efficiently estimate model parameters. The advent of powerful computational hardware along with the use of high-performance libraries for statistical phylogenetics has, however, made the development of alternative estimation methods feasible. We here present the implementation and performance assessment of a Hamiltonian Monte Carlo gradient-based sampler to infer the parameters of the skygrid model. The skygrid is a popular and flexible coalescent-based model for estimating population dynamics over time and is available in BEAST 1.10.5, a widely-used software package for Bayesian pylogenetic and phylodynamic analysis. Taking into account the increased computational cost of gradient evaluation, we report substantial increases in effective sample size per time unit compared to the established block-updating sampler. We expect gradient-based samplers to assume an increasingly important role for different classes of parameters typically estimated in Bayesian phylogenetic and phylodynamic analyses.

Keywords: BEAGLE; BEAST; Bayesian skygrid; Hamiltonian Monte Carlo; Markov chain Monte Carlo; pathogen phylodynamics; phylogenetics.

PubMed Disclaimer

Conflict of interest statement

No competing interests were disclosed.

Figures

Figure 1.
Figure 1.. Rabies data set - fixed tree analysis.
Bars correspond to the estimated effective sample size (ESS) per second averaged across five independent replicates for all log population size parameters, using the block-updating Markov chain Monte Carlo (BUMCMC) and Hamiltonian Monte Carlo (HMC) transition kernels. The height of each bar indicates the number of parameters that achieve the given ESS per second value. The HMC transition kernel improves upon the performance of the BUMCMC transition kernel by factors of 5.35 and 3.91 for the minimum and median ESS per second across all log population sizes.
Figure 2.
Figure 2.. Ebola virus data set - fixed tree analysis.
Bars correspond to the estimated effective sample size (ESS) per second averaged across five independent replicates for all log population size parameters, using the block-updating Markov chain Monte Carlo (BUMCMC) and Hamiltonian Monte Carlo (HMC) transition kernels. The height of each bar indicates the number of parameters that achieve the given ESS per second value. The HMC transition kernel improves upon the performance of the BUMCMC transition kernel by a factor of 1.41 for the median ESS per second but BUMCMC yields a 1.08-fold improvement over HMC for the minimum ESS per second.
Figure 3.
Figure 3.. Rice yellow mottle virus data set - fixed tree analysis.
Bars correspond to the estimated effective sample size (ESS) per second averaged across five independent replicates for all log population size parameters, using the block-updating Markov chain Monte Carlo (BUMCMC) and Hamiltonian Monte Carlo (HMC) transition kernels. The height of each bar indicates the number of log population size parameters that achieve the given ESS per second value. The HMC transition kernel improves upon the performance of the BUMCMC transition kernel by factors of 2.05 and 2.77 for the median and minimum ESS per second, respectively.
Figure 4.
Figure 4.. Rabies data set analysis using the skygrid coalescent model on a central processing unit (CPU).
Bars correspond to the estimated effective sample size (ESS) per second averaged across five independent replicates for all log population size parameters and the precision parameter, using the block-updating Markov chain Monte Carlo (BUMCMC) and Hamiltonian Monte Carlo (HMC) transition kernels. The height of each bar indicates the number of skygrid parameters that achieve the given ESS per minute value. The HMC transition kernel improves upon the performance of the BUMCMC transition kernel by factors of 3.38 and 1.51 for the median and minimum ESS per minute, respectively, while a 3.99-fold improvement for the precision was generated.
Figure 5.
Figure 5.. Ebola virus data set analysis using the skygrid coalescent model on a graphical processing unit (GPU).
Bars correspond to the estimated effective sample size (ESS) per second averaged across five independent replicates for all log population size parameters and the precision parameter, using the block-updating Markov chain Monte Carlo (BUMCMC) and Hamiltonian Monte Carlo (HMC) transition kernels. The height of each bar indicates the number of skygrid parameters that achieve the given ESS per minute value. The HMC transition kernel improves upon the performance of the BUMCMC transition kernel by factors of 1.48 and 1.35 for the median and minimum ESS per minute, and a factor of 1.56 for the precision.
Figure 6.
Figure 6.. Rice yellow mottle virus data set analysis using the skygrid coalescent model on a central processing unit (CPU).
Bars correspond to the estimated effective sample size (ESS) per hour averaged across five independent replicates for all log population size parameters and the precision parameter, using the block-updating Markov chain Monte Carlo (BUMCMC) and Hamiltonian Monte Carlo (HMC) transition kernels. The height of each bar indicates the number of skygrid parameters that achieve the given ESS per hour value. The HMC transition kernel improves upon the performance of the BUMCMC transition kernel by factors of 1.07 and 1.45 for the minimum ESS per hour of the log population sizes and the precision, respectively. In turn, BUMCMC yield a 1.08-fold improvement over HMC for the median ESS per hour of the log population sizes.

References

    1. Kingman JFC: On the genealogy of large populations. J Appl Probab. 1982;19(A):27–43. 10.2307/3213548 - DOI
    1. Pybus OG, Rambaut A, Harvey PH: An integrated framework for the inference of viral population history from reconstructed genealogies. Genetics. 2000;155(3):1429–1437. - PMC - PubMed
    1. Strimmer K, Pybus OG: Exploring the demographic history of DNA sequences using the generalized skyline plot. Mol Biol Evol. 2001;18(12):2298–2305. 10.1093/oxfordjournals.molbev.a003776 - DOI - PubMed
    1. Drummond AJ, Rambaut A, Shapiro B, et al. : Bayesian coalescent inference of past population dynamics from molecular sequences. Mol Biol Evol. 2005;22(5):1185–1192. 10.1093/molbev/msi103 - DOI - PubMed
    1. Opgen-Rhein R, Fahrmeir L, Strimmer K: Inference of demographic history from genealogical trees using reversible jump Markov chain Monte Carlo. BMC Evol Biol. 2005;5: 6. 10.1186/1471-2148-5-6 - DOI - PMC - PubMed