Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2024 May 3;41(5):msae073.
doi: 10.1093/molbev/msae073.

Comparison of Bayesian Coalescent Skyline Plot Models for Inferring Demographic Histories

Affiliations
Comparative Study

Comparison of Bayesian Coalescent Skyline Plot Models for Inferring Demographic Histories

Ronja J Billenstein et al. Mol Biol Evol. .

Abstract

Bayesian coalescent skyline plot models are widely used to infer demographic histories. The first (non-Bayesian) coalescent skyline plot model assumed a known genealogy as data, while subsequent models and implementations jointly inferred the genealogy and demographic history from sequence data, including heterochronous samples. Overall, there exist multiple different Bayesian coalescent skyline plot models which mainly differ in two key aspects: (i) how changes in population size are modeled through independent or autocorrelated prior distributions, and (ii) how many change-points in the demographic history are used, where they occur and if the number is pre-specified or inferred. The specific impact of each of these choices on the inferred demographic history is not known because of two reasons: first, not all models are implemented in the same software, and second, each model implementation makes specific choices that the biologist cannot influence. To facilitate a detailed evaluation of Bayesian coalescent skyline plot models, we implemented all currently described models in a flexible design into the software RevBayes. Furthermore, we evaluated models and choices on an empirical dataset of horses supplemented by a small simulation study. We find that estimated demographic histories can be grouped broadly into two groups depending on how change-points in the demographic history are specified (either independent of or at coalescent events). Our simulations suggest that models using change-points at coalescent events produce spurious variation near the present, while most models using independent change-points tend to over-smooth the inferred demographic history.

Keywords: RevBayes; coalescent; demographic histories; heterochronous samples.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Population size trajectories estimated with nine different models from isochronous sequence data on a timeline from 0 to 500,000 years ago. Genealogies and population size were jointly estimated. All analyses considered possible variation in population size between 0 and 500,000 years ago. The bold line represents the median of the posterior distribution of the population size and the shaded area shows the 95% credible intervals. The four models in the top right corner (BSP, EBSP, Skyline, Skyride) are coalescent event based models, i.e. a change in population size can only occur at the time of a coalescent event. In the three models in the bottom row (Skygrid, GMRF, HSMRF), population size changes can happen at specified times, independent from coalescent events. Here, all intervals for these models are equally sized. In the Skyfish model, the number of intervals as well as their duration are estimated. All models except for the Constant model, the EBSP model, and the Skyline model have correlated intervals. MCMCs were run for 100,000 iterations, sampling every tenth iteration, with a burn-in of 10% and two replicates, yielding 18,000 samples in total. For plotting, the resulting trajectories were evaluated at 500 exponentially spaced grid points between 0 and 500,000 years ago.
Fig. 2.
Fig. 2.
Population size trajectories estimated with nine different models from heterochronous sequence data on a timeline from 0 to 500,000 years ago. Genealogies and population size were jointly estimated. All analyses considered possible variation in population size between 0 and 1,200,000 years ago. The bold line represents the median of the posterior distribution of the population size and the shaded area shows the 95% credible intervals. The four models in the top right corner (BSP, EBSP, Skyline, Skyride) are coalescent event based models, i.e. a change in population size can only occur at the time of a coalescent event. In the three models in the bottom row (Skygrid, GMRF, HSMRF), population size changes can happen at specified times, independent from coalescent events. Here, all intervals for these models are equally sized. In the Skyfish model, the number of intervals as well as their duration are estimated. All models except for the Constant model, the EBSP model, and the Skyline model have correlated intervals. MCMCs were run for 100,000 iterations, sampling every tenth iteration, with a burn-in of 10% and two replicates, yielding 18,000 samples in total. For plotting, the resulting trajectories were evaluated at 500 exponentially spaced grid points between 0 and 500,000 years ago.
Fig. 3.
Fig. 3.
Simulation study results. Ten simulations of sequence data with 36 tips were performed under the resulting population size trajectory from the BSP analysis (top row) and the Constant analysis (bottom row), both with isochronous data. Analyses of each of the simulations were performed with the BSP model (left column) and the Skyfish model (right column). The true trajectory used for simulations is depicted in black, medians of the resulting analyses are depicted in hues of blue to green.
Fig. 4.
Fig. 4.
Comparing joint and sequential inference of population size trajectories with the Skyfish model from heterochronous data. The Skyfish analysis was run between 0 and 1,200,000 years ago. From left to right: sequence-based analysis from the original data; MAP tree based analysis using the MAP tree from the Constant analysis with sequence data; analysis based on 10 trees from the posterior distribution of the Constant analysis with sequence data; analysis based on 100 trees from the posterior distribution of the Constant analysis with sequence data. The bold line represents the median of the posterior distribution of the population size and the shaded area shows the 95% credible intervals.
Fig. 5.
Fig. 5.
Schematic figure of a coalescent tree with seven samples. Time runs backwards, from samples to the most recent common ancestor (right to left). Coalescent events t are marked by dashed blue lines. Waiting times wm between coalescent events start at time tm and end at time tm1. Sampling times s are marked by dashed orange lines. Below the genealogy, two realizations of a Bayesian coalescent skyline plot model with a constant population size within intervals are shown. In violet: coalescent event based model, the interval change-points coincide with coalescent events. In green: model with interval change-points independent from coalescent events, with equally sized intervals. The solid horizontal lines show the median population size in the intervals, the shaded areas are their credible intervals. Change-points are denoted by x on the timeline.

Similar articles

Cited by

References

    1. Baele G, Lemey P. Bayesian model selection in phylogenetics and genealogy-based population genetics. In: Bayesian phylogenetics, methods, algorithms, and applications. 2014. New York: Chapman and Hall/CRC. p. 59–93.
    1. Baele G, Lemey P, Bedford T, Rambaut A, Suchard MA, Alekseyenko AV. Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Mol Biol Evol. 2012:29(9):2157–2167. 10.1093/molbev/mss084. - DOI - PMC - PubMed
    1. Baele G, Li WLS, Drummond AJ, Suchard MA, Lemey P. Accurate model selection of relaxed molecular clocks in Bayesian phylogenetics. Mol Biol Evol. 2013:30(2):239–243. 10.1093/molbev/mss243. - DOI - PMC - PubMed
    1. Cappello L, Ghosh S, Palacios JA. Discussion on “Horseshoe-based Bayesian nonparametric estimation of effective population size trajectories” by James R. Faulkner, Andrew F. Magee, Beth Shapiro, and Vladimir N. Minin. Biometrics. 2020:76(3):691–694. 10.1111/biom.v76.3. - DOI - PMC - PubMed
    1. Drummond AJ, Rambaut A, Shapiro B, Pybus OG. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol Biol Evol. 2005:22(5):1185–1192. 10.1093/molbev/msi103. - DOI - PubMed

Publication types