Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 28;12(9):e1005130.
doi: 10.1371/journal.pcbi.1005130. eCollection 2016 Sep.

SCOTTI: Efficient Reconstruction of Transmission within Outbreaks with the Structured Coalescent

Affiliations

SCOTTI: Efficient Reconstruction of Transmission within Outbreaks with the Structured Coalescent

Nicola De Maio et al. PLoS Comput Biol. .

Abstract

Exploiting pathogen genomes to reconstruct transmission represents a powerful tool in the fight against infectious disease. However, their interpretation rests on a number of simplifying assumptions that regularly ignore important complexities of real data, in particular within-host evolution and non-sampled patients. Here we propose a new approach to transmission inference called SCOTTI (Structured COalescent Transmission Tree Inference). This method is based on a statistical framework that models each host as a distinct population, and transmissions between hosts as migration events. Our computationally efficient implementation of this model enables the inference of host-to-host transmission while accommodating within-host evolution and non-sampled hosts. SCOTTI is distributed as an open source package for the phylogenetic software BEAST2. We show that SCOTTI can generally infer transmission events even in the presence of considerable within-host variation, can account for the uncertainty associated with the possible presence of non-sampled hosts, and can efficiently use data from multiple samples of the same host, although there is some reduction in accuracy when samples are collected very close to the infection time. We illustrate the features of our approach by investigating transmission from genetic and epidemiological data in a Foot and Mouth Disease Virus (FMDV) veterinary outbreak in England and a Klebsiella pneumoniae outbreak in a Nepali neonatal unit. Transmission histories inferred with SCOTTI will be important in devising effective measures to prevent and halt transmission.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Examples of transmission complexities.
Reconstruction of transmission can be hindered by several complexities causing disagreement between the actual transmission history and the phylogeny of the sampled pathogen. Here we show four examples of these complexities: A) Within-host evolution (similar to incomplete lineage sorting, can happen even with strong transmission bottlenecks), B) Incomplete transmission bottlenecks (or large transmission inocula) and within-host evolution, C) Non-sampled hosts (such as unknown or asymptomatic hosts), D) Multiple infections of the same host (or mixed infections). Different hosts (named H1, H2, and H3) are represented as black rectangles, and the rectangle with a dashed border represents a non-sampled host (a host for which no pathogen sample has been collected and sequenced, and for which there is no exposure time information). The top and bottom edge of each rectangle indicate the introduction and removal times, that is, the beginning and the end of the time interval within which a host is either infective or can be infected (e.g., arrival and departure time from the contaminated ward). Red dots represent pathogen sequence samples (respectively S1, S2, and S3), and red lines are lineages of the pathogen phylogeny. Blue tubes represent transmission/bottleneck events, where the contained lineages are transferred between hosts. Below each “nested” tree plot (representing phylogeny and transmission tree simultaneously, see Fig A in S1 Text), the corresponding transmission history is represented with black “beanbags”, and, in red, the phylogenetic tree of the sequences.
Fig 2
Fig 2. Graphical representation of models of transmission and evolution.
In the present work we consider three different models of pathogen evolution within an outbreak: A) The multispecies coalescent model with transmission bottlenecks, used for simulations, B) The structured coalescent (SCOTTI) model used for inference, C) The Outbreaker model also used for inference. The pictures highlight some key parameters and features of the models. Different hosts (H1, H2, H3, and H4) are represented as black rectangles. The top and bottom edge of each rectangle are the introduction and removal times of the respective hosts in A and B. The hosts with a dashed border are non-sampled. Red dots represent samples (only one per host allowed by Outbreaker), red vertical lines are lineages of the phylogeny. Smaller black dots represent coalescent events. Red arrows are transmissions/migrations in B and C. Blue tubes are transmissions with bottlenecks in A, and transmitted lineages are contained within them. In A, a transmission bottleneck from host H1 to H2 causes two lineages in H2 to coalesce (find a common ancestor backwards in time) at the same time of transmission. This does not happen at the transmission from H3 to H4, where the two lineages in H4 do not coalesce (incomplete bottleneck) and are both inherited from H3 to H4 at a single transmission event.
Fig 3
Fig 3. Accuracy of SCOTTI vs. Outbreaker in the base simulation scenario.
In our base simulation setting, SCOTTI has higher accuracy than Outbreaker, in particular when provided multiple samples per host. The coloured “Maypole” tree (see Fig A in S1 Text) represents the first transmission history used for simulations, with one colour associated to each host, internal nodes corresponding to infection events and times, and tips representing infection clearance times. The pie charts refer to the accuracy of transmission estimation in the base scenario with strong bottleneck. The coloured slice in each pie chart is the proportion of replicates (out of a total of 100) for which the correct origin of transmission has been correctly inferred. Pie charts are plotted below the branch corresponding to the transmission they refer to, while the pie charts for the index host K are plotted next to the root.
Fig 4
Fig 4. Summary of transmission inference accuracy.
SCOTTI shows higher accuracy than Outbreaker in all scenarios except with early sampling, while Outbreaker credible sets are poorly calibrated. Pathogen sequence evolution was simulated under transmission history 1, used in A and C, and transmission history 2, used in B and D. In A and B bars represent proportions, expressed as percentages, of correct inferences of transmission origin (i.e. donor host) over 100 replicates and all transmission events for each method (differentiated by colour as in legend). On the X axis are different simulation scenarios. In C and D bars represent average posterior supports, again expressed as percentages, for the correct sources over all patients and replicates. In E and F bars represent proportions (expressed as percentages) of 95% posterior credible sets that contain the simulated (true) origin. The 95% posterior credible set for a host is the minimum set of origins with cumulative probability ≥95%, and such that all origins in the set have higher posterior probability than all origins outside of it.
Fig 5
Fig 5. Reconstruction of transmission events in a FMDV outbreak.
Outbreaker (A) and SCOTTI (B) provide different interpretations of the 2007 South of England FMDV outbreak. A) “Beanbag” tree (see Fig A in S1 Text) of Transmission events inferred with Outbreaker. The two numbers on each transmission arrow represent respectively the number of nucleotide substitutions separating two hosts, and the inferred posterior support of the event (in this case always 1, meaning 100% support). All transmissions are inferred to be direct with more than 95% posterior probability. B) “Beanbag” tree of transmission events inferred with SCOTTI. Numbers within host circles represent the posterior probabilities of the corresponding host being the index host (the root) of the considered outbreak. Numbers on arrows represent the inferred posterior probabilities of the corresponding direct transmission events. Colour intensity is proportional to posterior probability.
Fig 6
Fig 6. Reconstruction of Transmission events in a K. pneumoniae outbreak.
Outbreaker (A) and SCOTTI (B and C) provide different interpretations of the K. pneumoniae outbreak. A) “Beanbag” tree of transmission events inferred with Outbreaker. Each circle represents a host, with “PMK” removed from their name. The number on transmission arrows represents the inferred posterior probability of the event. All arrows represent direct transmissions (without intermediate non-sampled hosts, with more than 85% support) except the one from PMK9 to PMK10 which is inferred to be through at least one intermediate host. B) “Beanbag” tree of transmission events inferred with SCOTTI. Numbers on arrows represent the inferred posterior probabilities of the corresponding direct transmission events. Colour intensity is proportional to posterior support. C) “Maypole” maximum clade credibility tree (see Fig A in S1 Text) inferred with SCOTTI, annotated and coloured with the highest posterior probability hosts for internal nodes. “NS” represents all non-sampled hosts. Branch width indicates the posterior probability of the inferred host at the node at the right end of the considered branch. Branches are annotated with 95% posterior intervals of the number of transmissions. For non-annotated branches, the interval is [0, 1].

References

    1. Didelot X, Bowden R, Wilson DJ, Peto TE, Crook DW. Transforming clinical microbiology with bacterial genome sequencing. Nature Reviews Genetics. 2012;13(9):601–612. 10.1038/nrg3226 - DOI - PMC - PubMed
    1. Wilson DJ. Insights from genomics into bacterial pathogen populations. PLoS Pathog. 2012;8(9):e1002874 10.1371/journal.ppat.1002874 - DOI - PMC - PubMed
    1. Köser CU, Ellington MJ, Cartwright E, Gillespie SH, Brown NM, Farrington M, et al. Routine use of microbial whole genome sequencing in diagnostic and public health microbiology. PLoS Pathog. 2012;8(8):e1002824 10.1371/journal.ppat.1002824 - DOI - PMC - PubMed
    1. Le V, Diep BA. Selected insights from application of whole-genome sequencing for outbreak investigations. Current opinion in critical care. 2013;19(5):432–439. 10.1097/MCC.0b013e3283636b8c - DOI - PMC - PubMed
    1. Eyre DW, Cule ML, Wilson DJ, Griffiths D, Vaughan A, O’Connor L, et al. Diverse sources of C. difficile infection identified on whole-genome sequencing. New England Journal of Medicine. 2013;369(13):1195–1205. 10.1056/NEJMoa1216064 - DOI - PMC - PubMed