Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 27;14(8):1648.
doi: 10.3390/v14081648.

Robust Phylodynamic Analysis of Genetic Sequencing Data from Structured Populations

Affiliations

Robust Phylodynamic Analysis of Genetic Sequencing Data from Structured Populations

Jérémie Scire et al. Viruses. .

Abstract

The multi-type birth-death model with sampling is a phylodynamic model which enables the quantification of past population dynamics in structured populations based on phylogenetic trees. The BEAST 2 package bdmm implements an algorithm for numerically computing the probability density of a phylogenetic tree given the population dynamic parameters under this model. In the initial release of bdmm, analyses were computationally limited to trees consisting of up to approximately 250 genetic samples. We implemented important algorithmic changes to bdmm which dramatically increased the number of genetic samples that could be analyzed and which improved the numerical robustness and efficiency of the calculations. Including more samples led to the improved precision of parameter estimates, particularly for structured models with a high number of inferred parameters. Furthermore, we report on several model extensions to bdmm, inspired by properties common to empirical datasets. We applied this improved algorithm to two partly overlapping datasets of the Influenza A virus HA sequences sampled around the world-one with 500 samples and the other with only 175-for comparison. We report and compare the global migration patterns and seasonal dynamics inferred from each dataset. In this way, we show the information that is gained by analyzing the bigger dataset, which became possible with the presented algorithmic changes to bdmm. In summary, bdmm allows for the robust, faster, and more general phylodynamic inference of larger datasets.

Keywords: Bayesian inference; phylodynamics; phylogenetics; population structure.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Figures

Figure A1
Figure A1
Possible configurations for node ne on branch e at time tk1<tetk. (A) ne is a ψ-sampled node at time te<tk, with or without sampled descendants. (B) ne is a branching event at time te<tk. (C) ne is a ρ-sampling node at time te=tk, with or without sampled descendants. (D) ne is a degree-2 node at time te=tk without sampling.
Figure A2
Figure A2
Scale factor choice. (A) Simplest case. The scale factor is the inverse of the smallest non-scaled value. (B) If A is not applicable, the scale factor is chosen such that the initial conditions are centered inside the range of acceptable values. The mid-point (on a log scale) of this interval is approximately 1. (C) Last case, if all scaled values cannot fit at once inside the range of accepted values, the lowest non-scaled values are dropped and set to zero so that the problem is simplified to case 1 (panel A) or 2 (panel B). In all panels, the white rectangle represents values that can be represented using DPFP. Dots represent the values of initial conditions for the differential equations of the multi-type birth–death model, before (1) and after (2) scaling. Red dots represent values that are initially outside the window of values that can be represented using DPFP.
Figure A3
Figure A3
Comparisons of likelihood computation results between the original and improved bdmm versions for additional trees. (A,B) Randomly simulated ten-tip tree and log-likelihood computation results against λ1 (birth rate of red deme). (C,D) Randomly simulated hundred-tip tree and log-likelihood computation results against λ1 (birth rate of red deme). (E,F) Randomly simulated ten-tip tree and log-likelihood computation results against μ1 (death rate of red deme).
Figure 1
Figure 1
Complete tree (left) and sampled trees (middle and right) obtained from a multi-type birth–death process with two types. The orange and blue dots on the trees represent sampled individuals and are colored according to the type these individuals belong to. A ρ-sampling event happens at time t1. The grey squares represent degree-2 nodes added to branches crossing this event. ρ-sampling also happens in the present (time T). As seen in the complete tree, the three individuals who were first sampled were not removed from the population upon sampling, whereas the three individuals sampled at time t1 were.
Figure 2
Figure 2
Comparison between the original and the updated implementations of the multi-type birth–death model. (A) Speed comparison. Only successful calculations were taken into account, i.e., calculations where the log probability density was different from . (B) Success in calculating probability density values plotted against tree size. The values presented in this panel correspond to the same set of calculations as the one in panel (A).
Figure 3
Figure 3
Comparison of computation results between the original bdmm and improved bdmm versions. (A) Randomly simulated tree with 10 tips and 2 demes, used for comparison. (B) Log-likelihood values obtained with each bdmm version as a function of λ1 (birth rate of orange deme).
Figure 4
Figure 4
Maximum Clade Credibility (MCC) trees from analyses of (A) 175 samples and (B) 500 samples.
Figure 5
Figure 5
(A) Seasonal effective reproduction numbers for each sample location, for both datasets. (B) Migration rates inferred for each dataset. N, S, and T refer respectively to North, South, and Tropics. For instance, “Mig. rate N-T” represents the migration rate from the Northern location to the Tropical one.

Similar articles

Cited by

References

    1. Felsenstein J. Estimating effective population size from samples of sequences: Inefficiency of pairwise and segregating sites as compared to phylogenetic estimates. Genet. Res. 1992;59:139–147. doi: 10.1017/S0016672300030354. - DOI - PubMed
    1. Hey J., Machado C.A. The study of structured populations? New hope for a difficult and divided science. Nat. Rev. Genet. 2003;4:535–543. doi: 10.1038/nrg1112. - DOI - PubMed
    1. Stadler T., Bonhoeffer S. Uncovering epidemiological dynamics in heterogeneous host populations using phylogenetic methods. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 2013;368:20120198. doi: 10.1098/rstb.2012.0198. - DOI - PMC - PubMed
    1. Grenfell B.T., Pybus O.G., Gog J.R., Wood J.L., Daly J.M., Mumford J.A., Holmes E.C. Unifying the epidemiological and evolutionary dynamics of pathogens. Science. 2004;303:327–332. doi: 10.1126/science.1090727. - DOI - PubMed
    1. Kühnert D., Wu C.H., Drummond A.J. Phylogenetic and epidemic modeling of rapidly evolving infectious diseases. Infect. Genet. Evol. 2011;11:1825–1841. doi: 10.1016/j.meegid.2011.08.005. - DOI - PMC - PubMed

Publication types

LinkOut - more resources