Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun 1;37(6):1832-1842.
doi: 10.1093/molbev/msaa047.

Online Bayesian Phylodynamic Inference in BEAST with Application to Epidemic Reconstruction

Affiliations

Online Bayesian Phylodynamic Inference in BEAST with Application to Epidemic Reconstruction

Mandev S Gill et al. Mol Biol Evol. .

Abstract

Reconstructing pathogen dynamics from genetic data as they become available during an outbreak or epidemic represents an important statistical scenario in which observations arrive sequentially in time and one is interested in performing inference in an "online" fashion. Widely used Bayesian phylogenetic inference packages are not set up for this purpose, generally requiring one to recompute trees and evolutionary model parameters de novo when new data arrive. To accommodate increasing data flow in a Bayesian phylogenetic framework, we introduce a methodology to efficiently update the posterior distribution with newly available genetic data. Our procedure is implemented in the BEAST 1.10 software package, and relies on a distance-based measure to insert new taxa into the current estimate of the phylogeny and imputes plausible values for new model parameters to accommodate growing dimensionality. This augmentation creates informed starting values and re-uses optimally tuned transition kernels for posterior exploration of growing data sets, reducing the time necessary to converge to target posterior distributions. We apply our framework to data from the recent West African Ebola virus epidemic and demonstrate a considerable reduction in time required to obtain posterior estimates at different time points of the outbreak. Beyond epidemic monitoring, this framework easily finds other applications within the phylogenetics community, where changes in the data-in terms of alignment changes, sequence addition or removal-present common scenarios that can benefit from online inference.

Keywords: BEAST; Bayesian phylogenetics; Markov chain Monte Carlo; online inference; pathogen phylodynamics; real-time analysis.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.
Fig. 1.
Comparison of burn-in resulting from standard de novo analyses versus online Bayesian analyses to compute updated inferences from data taken from different time points of the West African Ebola virus epidemic. The data flow of the epidemic, in terms of total sequence available during each epi week, is recreated in the background of the plot in gray bars. Dark gray bars show the data corresponding to the five time points at which we compute updated inferences. The plots chart the burn-in required by de novo analyses, represented by circles, and online analyses, represented by diamonds. Solid lines correspond to burn-in estimates based on visual analyses of trace plots whereas dotted lines correspond to burn-in estimates based on maximizing ESS values.
<sc>Fig</sc>. 2.
Fig. 2.
Box plots show distribution of savings in computation time by using online inference as compared with standard de novo analyses to update inferences for data from different time points in the West African Ebola virus epidemic. White box plots correspond to analyses using a Tesla P100 graphics card for scientific computing and gray boxes correspond to analyses using a multi-core CPU. Irrespective of the actual hardware used, the time savings are substantial with up to 600 h on average saved using our online approach on CPU for our most demanding scenario. The axis corresponding to running time (in hours) is log-transformed to allow for greater visibility of plots for smaller data sets.
<sc>Fig</sc>. 3.
Fig. 3.
A new sequence is inserted into an existing phylogenetic tree by determining the closest observed sequence (in terms of genetic distance) already in the tree, and inserting a new ancestor node for the new sequence and its closest sequence. The genetic distance between the new sequence and its closest sequence is converted into a distance in units of time, dt, by dividing by the evolutionary rate associated with the branch leading to the closest sequence. To determine the insertion time tinsert of the new ancestor node (in terms of time prior to the present time), we require (tinserttc)+(tinserttn)=dt, where tn is the sampling time of the new sequence, and tc the sampling time of its closest sequence. This yields tinsert=(dt+tn+tc)/2.

Similar articles

Cited by

References

    1. Al-Qahtani AA, Baele G, Khalaf N, Suchard MA, Al-Anazi MR, Abdo AA, Sanai FM, Al-Ashgar HI, Khan MQ, Al-Ahdal MN, et al. 2017. The epidemic dynamics of hepatitis C virus subtypes 4a and 4d in Saudi Arabia. Sci Rep. 7:44947. - PMC - PubMed
    1. Arias A, Watson SJ, Asogun D, Tobin EA, Lu J, Phan MVT, Jah U, Wadoum REG, Meredith L, Thorne L, et al. 2016. Rapid outbreak sequencing of Ebola virus in Sierra Leone identifies transmission chains linked to sporadic cases. Virus Evol. 2(1):vew016. - PMC - PubMed
    1. Ayres DL, Cummings MP, Baele G, Darling AE, Lewis PO, Swofford DL, Huelsenbeck JP, Lemey P, Rambaut A, Suchard MA.. 2019. BEAGLE 3: improved performance, scaling, and usability for a high-performance computing library for statistical phylogenetics. Syst Biol. 68(6):1052–1061. - PMC - PubMed
    1. Ayres DL, Darling A, Zwickl DJ, Beerli P, Holder MT, Lewis PO, Huelsenbeck JP, Ronquist F, Swofford DL, Cummings MP, et al. 2012. BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics. Syst Biol. 61(1):170–173. - PMC - PubMed
    1. Baele G, Dellicour S, Suchard MA, Lemey P, Vrancken B.. 2018. Recent advances in computational phylodynamics. Curr Opin Virol. 31:24–32. - PubMed

Publication types