Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Nov;195(3):1055-62.
doi: 10.1534/genetics.113.154856. Epub 2013 Sep 13.

Relating phylogenetic trees to transmission trees of infectious disease outbreaks

Affiliations

Relating phylogenetic trees to transmission trees of infectious disease outbreaks

Rolf J F Ypma et al. Genetics. 2013 Nov.

Abstract

Transmission events are the fundamental building blocks of the dynamics of any infectious disease. Much about the epidemiology of a disease can be learned when these individual transmission events are known or can be estimated. Such estimations are difficult and generally feasible only when detailed epidemiological data are available. The genealogy estimated from genetic sequences of sampled pathogens is another rich source of information on transmission history. Optimal inference of transmission events calls for the combination of genetic data and epidemiological data into one joint analysis. A key difficulty is that the transmission tree, which describes the transmission events between infected hosts, differs from the phylogenetic tree, which describes the ancestral relationships between pathogens sampled from these hosts. The trees differ both in timing of the internal nodes and in topology. These differences become more pronounced when a higher fraction of infected hosts is sampled. We show how the phylogenetic tree of sampled pathogens is related to the transmission tree of an outbreak of an infectious disease, by the within-host dynamics of pathogens. We provide a statistical framework to infer key epidemiological and mutational parameters by simultaneously estimating the phylogenetic tree and the transmission tree. We test the approach using simulations and illustrate its use on an outbreak of foot-and-mouth disease. The approach unifies existing methods in the emerging field of phylodynamics with transmission tree reconstruction methods that are used in infectious disease epidemiology.

Keywords: Markov chain Monte Carlo (MCMC); foot-and-mouth disease; molecular epidemiology; transmission tree.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic for viral dynamics. Throughout the figure, time progresses from left to right. Hosts are depicted as gray pods, virus particles as blue dots, and sampled virus particles as red dots. (A) The timing of coalescence of viral lineages depends on within-host viral dynamics. Virus (blue) numbers within hosts (gray) rapidly increase at onset of infection and decrease near the end of the infection, influencing coalescent rates. A possible ancestry between sampled viruses (red) is given in black. Although the initial host infects the latter two, the sampled viruses from these latter two are more closely related, as they coalesce with each other before coalescing with the virus sampled from the initial host. (B) When viruses are sampled from only a few hosts in a large outbreak, the timing of the coalescence of the sampled viruses is nearly identical to the timing of transmission immediately following the coalescence. The timing of coalescent events is mainly governed by interhost infection dynamics, and the phylogenetic tree derived from the sequences (blue) is very similar to the one derived when internal node times are equated with transmission times (red). (C) When viruses are sampled from all hosts in an outbreak, coalescent times and transmission times are very different. The phylogenetic tree derived when approximating coalescent times by transmission times (red) is very different from the actual phylogenetic tree (blue).
Figure 2
Figure 2
Accuracy of estimating transmission trees using genetic sequences of pathogens, for different simulation scenarios. Solid lines give the average percentage of infected hosts for which the actual infector has been assigned a probability of at least the level indicated on the x-axis. Dashed lines give the average percentage of infected hosts for which the infector has been incorrectly identified, with a probability at least the level indicated on the x-axis. (A) Results when 100% (black), 50% (blue), or 0% (pink) of infected hosts have been sampled. When fewer hosts are sampled, only a few infectors can be identified at a high probability level. (B) Results when all (black) or 80% (turquoise) of hosts are observed. When fewer hosts are observed, fewer infectors are identified correctly, and incorrect inferences are made even at high probability levels. (C) Results when substitution rate is 3 × 10−3 (black), an increased 1 × 10−2 (yellow), or a decreased 1 × 10−3 substitutions/site/year (green). At higher substitution rates the inference is more accurate. (D) Results when coalescent events are allowed to differ from transmission events (black) or when coalescent events are incorrectly assumed to coincide with transmission events (orange). The incorrect assumption leads to incorrect estimations even at a high probability level.
Figure 3
Figure 3
Robustness of estimates of genetic and epidemiological parameters under various scenarios. (A) Distribution of point estimates of the substitution rates for 100 simulations, for three simulation scenarios. Actual value is 0.003 (black line). Estimates are accurate when all information is available (gray) and when 50% of hosts are sampled (blue), although the latter leads to a broader distribution. Assuming that coalescent events coincide with transmission events (pink), however, leads to a large overestimation, since the total branch length of the phylogenetic tree is underestimated. Mean estimates are 3.3 × 10−3, 3.3 × 10−3, and 6.7 × 10−3, respectively. (B) Point estimate (black) and 95% confidence interval (gray) of the fraction of infections due to adults, where actual value is 0. Shown are 100 sorted estimates, for each of seven scenarios (complete data, missing sequences, unobserved hosts, altered substitution rate, and incorrect within-host model). Estimates are away from the actual value of 0 when only 80% of hosts are observed or when coalescent times are equated with transmission times. The width of the confidence interval depends largely on the amount of information available; e.g., when less genetic information is available due to incomplete sampling; the point estimates are accurate, but the confidence interval can become very broad.
Figure 4
Figure 4
Results from the analysis on the foot-and-mouth disease data sets. (A) A typical transmission tree sampled from the MCMC. Shown are infected farms (labeled pods), their latent periods (gray) and infectious periods (green), samples viruses (red), and the phylogenetic tree connecting these viruses (black). The phylogenetic tree is contained within the transmission tree; due to the exponentially increasing within-host effective pathogen population size assumed, most coalescents occur early during an infection. (B) Posterior distribution for the mean latency period β1. Solid black line gives the median, and dashed lines give the 2.5th and 97.5th percentile. Blue line gives a previous estimate from the literature, and green line gives the estimate derived from the same data set in a previous study that ignored within-host genetic diversity. The estimate (solid black) is higher than we would expect from the literature (blue). The overestimation could be due to unobserved infected farms. Not allowing for within-host genetic diversity gives an overestimation (green). (C) Posterior distribution for the substitution rate μ. Solid black line gives the median, dashed lines give the 2.5th and 97.5th percentile, and blue line gives a previous estimate from the literature. The higher estimate we obtained could be due to an overly simplified within-host model.

References

    1. Bataille A., Van Der Meer F., Stegeman A., Koch G., 2011. Evolutionary analysis of inter-farm transmission dynamics in a highly pathogenic avian influenza epidemic. PLoS Pathog. 7: e1002094. - PMC - PubMed
    1. Cauchemez S., Bhattarai A., Marchbanks T. L., Fagan R. P., Ostroff S., et al. , 2011. Role of social networks in shaping disease transmission during a community outbreak of 2009 H1N1 pandemic influenza. Proc. Natl. Acad. Sci. USA 108: 2825–2830 - PMC - PubMed
    1. Charleston B., Bankowski B. M., Gubbins S., Chase-Topping M. E., Schley D., et al. , 2011. Relationship between clinical signs and transmission of an infectious disease and the implications for control. Science 332: 726–729 - PMC - PubMed
    1. Cottam E. M., Haydon D. T., Paton D. J., Gloster J., Wilesmith J. W., et al. , 2006. Molecular epidemiology of the foot-and-mouth disease virus outbreak in the United Kingdom in 2001. J. Virol. 80: 11274–11282 - PMC - PubMed
    1. Cottam E. M., Thebaud G., Wadsworth J., Gloster J., Mansley L., et al. , 2008. Integrating genetic and epidemiological data to determine transmission pathways of foot-and-mouth disease virus. Proc. Biol. Sci. 275: 887–895 - PMC - PubMed

MeSH terms