Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Feb 8;14(2):e1006885.
doi: 10.1371/journal.ppat.1006885. eCollection 2018 Feb.

When are pathogen genome sequences informative of transmission events?

Affiliations

When are pathogen genome sequences informative of transmission events?

Finlay Campbell et al. PLoS Pathog. .

Abstract

Recent years have seen the development of numerous methodologies for reconstructing transmission trees in infectious disease outbreaks from densely sampled whole genome sequence data. However, a fundamental and as of yet poorly addressed limitation of such approaches is the requirement for genetic diversity to arise on epidemiological timescales. Specifically, the position of infected individuals in a transmission tree can only be resolved by genetic data if mutations have accumulated between the sampled pathogen genomes. To quantify and compare the useful genetic diversity expected from genetic data in different pathogen outbreaks, we introduce here the concept of 'transmission divergence', defined as the number of mutations separating whole genome sequences sampled from transmission pairs. Using parameter values obtained by literature review, we simulate outbreak scenarios alongside sequence evolution using two models described in the literature to describe transmission divergence of ten major outbreak-causing pathogens. We find that while mean values vary significantly between the pathogens considered, their transmission divergence is generally very low, with many outbreaks characterised by large numbers of genetically identical transmission pairs. We describe the impact of transmission divergence on our ability to reconstruct outbreaks using two outbreak reconstruction tools, the R packages outbreaker and phybreak, and demonstrate that, in agreement with previous observations, genetic sequence data of rapidly evolving pathogens such as RNA viruses can provide valuable information on individual transmission events. Conversely, sequence data of pathogens with lower mean transmission divergence, including Streptococcus pneumoniae, Shigella sonnei and Clostridium difficile, provide little to no information about individual transmission events. Our results highlight the informational limitations of genetic sequence data in certain outbreak scenarios, and demonstrate the need to expand the toolkit of outbreak reconstruction tools to integrate other types of epidemiological data.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Distributions of simulated transmission divergence values for different pathogens using the outbreaker and phybreak models.
A) Transmission divergence is defined as the number of mutations separating pathogen WGS sampled from transmission pairs. Horizontal bars indicate the proportion of transmission pairs separated by that number of mutations, across 100 outbreak simulations per pathogen. Outbreaks were simulated using both the outbreaker and phybreak models. B) For each simulated outbreak, we calculated the proportion of sequences that were unique. Black circles represent empirical observations of the proportion of unique sequences for a given outbreak (S6 Table), scaled by the size of the outbreak. The grey circle in the EBOV column represents the weighted mean across the four outbreaks. The violin plots with the dotted outlines in the K. pneumoniae column were generated using the empirical serial interval of 25.8 days observed over the course of the outbreak described by Snitkin et al. [106], which differs significantly from the value of 62.7 days in our literature review.
Fig 2
Fig 2. Impact of transmission divergence on outbreak reconstruction.
Transmission divergence is defined as the number of mutations separating pathogen WGS sampled from transmission pairs. A) Change in accuracy of outbreak reconstruction. Accuracy of outbreak reconstruction is defined as the proportion of correctly assigned ancestries in the consensus transmission tree, itself defined as the tree with the most frequent posterior infector for each infectee. Coloured points represent individual simulated outbreaks. The solid black line represents the fitted relationship of the form ii*exp(-a*K), where K is the transmission divergence and a and i the fitting variables. Dotted black lines represent the corresponding 95% prediction interval. B) Change in posterior entropy. Posterior entropy is related to the number of plausible posterior infectors for a given case. Lower average entropy indicates greater statistical confidence in the proposed transmission tree. The solid black line represents the fitted relationship of the form i*exp(-a*K)—i, where K is the transmission divergence and a and i the fitting variables.
Fig 3
Fig 3. Impact of the proportion of unique sequences on outbreak reconstruction.
A) Change in accuracy of outbreak reconstruction. Accuracy of outbreak reconstruction is defined as the proportion of correctly assigned ancestries in the consensus transmission tree, itself defined as the tree with the most frequent posterior infector for each infectee. Coloured points represent individual simulated outbreaks. The solid black line represents the fitted linear model, the dotted black lines the 95% prediction interval. B) Change in posterior entropy. Posterior entropy is related to the number of plausible posterior infectors for a given case. Lower average entropy indicates greater statistical confidence in the proposed transmission tree. The solid black line represents the fitted linear model, the dotted black lines the 95% prediction interval.

References

    1. Ferguson NM, Donnelly CA, Anderson RM. Transmission intensity and impact of control policies on the foot and mouth epidemic in Great Britain. Nature. 2001;413: 542–548. doi: 10.1038/35097116 - DOI - PubMed
    1. Wallinga J, Teunis P. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. Am J Epidemiol. 2004;160: 509–516. doi: 10.1093/aje/kwh255 - DOI - PMC - PubMed
    1. Spada E, Sagliocca L, Sourdis J, Garbuglia AR, Poggi V, De Fusco C, et al. Use of the minimum spanning tree model for molecular epidemiological investigation of a nosocomial outbreak of hepatitis C virus infection. J Clin Microbiol. 2004;42: 4230–4236. doi: 10.1128/JCM.42.9.4230-4236.2004 - DOI - PMC - PubMed
    1. Lloyd-Smith JO, Schreiber SJ, Kopp PE, Getz WM. Superspreading and the effect of individual variation on disease emergence. Nature. 2005;438: 355–359. doi: 10.1038/nature04153 - DOI - PMC - PubMed
    1. Jombart T, Cori A, Didelot X, Cauchemez S, Fraser C, Ferguson N. Bayesian Reconstruction of Disease Outbreaks by Combining Epidemiologic and Genomic Data. PLoS Comput Biol. 2014;10 doi: 10.1371/journal.pcbi.1003457 - DOI - PMC - PubMed

Publication types

MeSH terms