Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 21;21(4):e1012995.
doi: 10.1371/journal.pcbi.1012995. eCollection 2025 Apr.

Bayesian Inference of Pathogen Phylogeography using the Structured Coalescent Model

Affiliations

Bayesian Inference of Pathogen Phylogeography using the Structured Coalescent Model

Ian Roberts et al. PLoS Comput Biol. .

Abstract

Over the past decade, pathogen genome sequencing has become well established as a powerful approach to study infectious disease epidemiology. In particular, when multiple genomes are available from several geographical locations, comparing them is informative about the relative size of the local pathogen populations as well as past migration rates and events between locations. The structured coalescent model has a long history of being used as the underlying process for such phylogeographic analysis. However, the computational cost of using this model does not scale well to the large number of genomes frequently analysed in pathogen genomic epidemiology studies. Several approximations of the structured coalescent model have been proposed, but their effects are difficult to predict. Here we show how the exact structured coalescent model can be used to analyse a precomputed dated phylogeny, in order to perform Bayesian inference on the past migration history, the effective population sizes in each location, and the directed migration rates from any location to another. We describe an efficient reversible jump Markov Chain Monte Carlo scheme which is implemented in a new R package StructCoalescent. We use simulations to demonstrate the scalability and correctness of our method and to compare it with existing software. We also applied our new method to several state-of-the-art datasets on the population structure of real pathogens to showcase the relevance of our method to current data scales and research questions.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Full migration history update using a radius-based subtree selection.
A subtree is selected with centre at x and radius r (a). The migration history on the subtree is erased and the deme at the internal coalescent event R is resampled (b). Finally, a new migration history is sampled under DTA conditional on the demes at the points where the subtree reconnects to the migration history (c).
Fig 2
Fig 2. Backward filtering-forward sampling messages for a tree with 3 leaves.
Messages are computed in the order green-orange-red, and messages in grey could be computed but are unnecessary to sample a new configuration of demes at nodes I and R.
Fig 3
Fig 3. Assessment of MCMC convergence and mixing.
(a) Trace plot of the total migration count. The red dashed line indicates the number of migration events in the simulated phylogeny and the black dashed line indicates the minimum number of required migration events for a consistent migration history. (b) Stacked trace plot of the proportion of the migration history falling into each deme across the six MCMC chains.
Fig 4
Fig 4. 60% consensus migration history and 95% posterior credible intervals of evolutionary parameters for MCMC samples on a single simulated structured phylogeny.
Samples are aggregated over all six chains prior to computations. (a) 95% posterior credible intervals for coalescent rate estimates. (b) 95% posterior credible intervals for backwards-in-time migration rate estimates. (c) 60% consensus migration history. Inset: Structured phylogenetic tree simulated using MASTER.
Fig 5
Fig 5. Posterior median of inferred evolutionary parameters plotted against the known simulation parameters.
Inferred coalescent rates are plotted in red and inferred migration rates are plotted in black.
Fig 6
Fig 6. 60% consensus migration history and kernel density estimates of the posterior density of evolutionary parameters for the S. aureus ST239 analysis.
(a) 60% consensus migration history. Inset: migration model with 5 demes. The radius of each circle is proportional to the median effective population size (inverse coalescent rate) for that deme and the width of an arrow connecting two demes is proportional to the median backwards-in-time migration rate between that pair of demes. (b) Kernel density estimates of the posterior density of coalescent rates. (c) Kernel density estimates of the posterior density of backwards-in-time migration rates.
Fig 7
Fig 7. 60% consensus migration history computed on all posterior migration history samples from the AIV analysis with default priors.
Inset: migration model with 5 demes. The radius of each deme circle is proportional to the median inferred effective population size (inverse coalescent rate) for that deme and the width of an arrow connecting two demes is proportional to the magnitude of the backwards-in-time migration rate between the pair of demes.
Fig 8
Fig 8. 60% consensus migration history based on all migration histories sampled across the eleven MCMC chains in our cholera analysis.
Inset: migration model with 11 demes. The radius of each deme circle is proportional to the median inferred effective population size for that deme and the width of an arrow connecting two demes is proportional to the magnitude of the backwards-in-time migration rate between the pair of demes.

Similar articles

Cited by

References

    1. Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, et al. GenBank. Nucleic Acids Res. 2017;45(D1):D37–42. doi: 10.1093/nar/gkw1070 - DOI - PMC - PubMed
    1. Shu Y, McCauley J. GISAID: global initiative on sharing all influenza data—from vision to reality. Euro Surveill. 2017;22(13):30494. doi: 10.2807/1560-7917.ES.2017.22.13.30494 - DOI - PMC - PubMed
    1. Jolley KA, Bray JE, Maiden MCJ. Open-access bacterial population genomics: BIGSdb software, the PubMLST.Org website and their applications. Wellcome Open Res. 2018;3:1–20. doi: 10.12688/wellcomeopenres.14826.1 - DOI - PMC - PubMed
    1. Didelot X, Bowden R, Wilson DJ, Peto TEA, Crook DW. Transforming clinical microbiology with bacterial genome sequencing. Nat Rev Genet. 2012;13:601–12. doi: 10.1038/nrg3226 - DOI - PMC - PubMed
    1. Houldcroft CJ, Beale MA, Breuer J. Clinical and biological insights from viral genome sequencing. Nat Rev Microbiol. 2017;15(3):183–92. doi: 10.1038/nrmicro.2016.182 - DOI - PMC - PubMed

LinkOut - more resources