Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 May;22(5):282-91.
doi: 10.1016/j.tim.2014.02.011. Epub 2014 Mar 22.

Supersize me: how whole-genome sequencing and big data are transforming epidemiology

Affiliations
Review

Supersize me: how whole-genome sequencing and big data are transforming epidemiology

Rowland R Kao et al. Trends Microbiol. 2014 May.

Abstract

In epidemiology, the identification of 'who infected whom' allows us to quantify key characteristics such as incubation periods, heterogeneity in transmission rates, duration of infectiousness, and the existence of high-risk groups. Although invaluable, the existence of many plausible infection pathways makes this difficult, and epidemiological contact tracing either uncertain, logistically prohibitive, or both. The recent advent of next-generation sequencing technology allows the identification of traceable differences in the pathogen genome that are transforming our ability to understand high-resolution disease transmission, sometimes even down to the host-to-host scale. We review recent examples of the use of pathogen whole-genome sequencing for the purpose of forensic tracing of transmission pathways, focusing on the particular problems where evolutionary dynamics must be supplemented by epidemiological information on the most likely timing of events as well as possible transmission pathways. We also discuss potential pitfalls in the over-interpretation of these data, and highlight the manner in which a confluence of this technology with sophisticated mathematical and statistical approaches has the potential to produce a paradigm shift in our understanding of infectious disease transmission and control.

Keywords: Bayesian inference; Forensic epidemiology; Mathematical modeling; Pathogen evolution; Who-infected-whom?.

PubMed Disclaimer

Figures

Figure I
Figure I
A single observed genealogy is consistent with multiple observed transmission processes.
Figure 1
Figure 1
Identifying ‘who infected whom’ often requires more detailed contact information than is needed at the scales most amenable to phylogeographic approaches (Great Britain scale map, left). At finer scales (right), there are two types of information: genetic information from sampled pathogens (where samples are indicated by red circles) provides direct insight into the transmission network indicated by the red arrows, whereas the possibly bidirectional purple arrows represent the social network (or denominator data). Both help to reconstruct the true transmission tree (red arrows), but deviate from it in different ways. The social network may contain many links that do not cause transmission. By contrast, the transmitted genotypes (blue circles) are indicative of the transmission tree but, especially when mutation rates are low compared to generation times, may lack informative single-nucleotide polymorphisms (SNPs; where the filled circles represent at least one additional mutation, but the open blue circle in C indicates a type identical to what is found in B). A pooled sample from D (broken oval encompassing samples from two lineages) could generate a consensus sequence that is not representative of either transmitting lineage, but these could be recovered by the existence of two divergent sequences that could be identified via deep sequencing.
Figure 2
Figure 2
Phylodynamic reconstruction of a foot-and-mouth disease (FMD) epidemic. (A) Identified likelihood that a particular infected premises was the source of another infected premises based on a space–time–genetic model. Circle size is proportional to the relative likelihood of that event. (B) Spatial relationships among premises in the dataset. Reproduced from , with permission of the corresponding author.
Figure 3
Figure 3
Biased sampling for multi-host systems causes problems for interpretation of genetic data even where the density of samples in one host is very high. The trees in the figure depict phylogenies of a pathogen in a two-host system; circles represent sampled sequences from the red or blue species. (A) For low mixing, random sampling reveals the relationship between the two host species but has a high probability of missing rare crossover events (red star). By contrast, dense sampling of one host (in red) will miss the existence of the second host species unless the crossover event is sampled, in which case the long branch length associated with it is instructive. By contrast, in (B) the distribution of branch lengths under biased sampling reveals the presence of unsampled events, although the nature of those events would not be determined by phylogenies alone. In (C), where mixing is substantial, the absence of data from the hidden host is likely unobserved or interpreted as greater variability in the mutation rate. It would be quickly revealed by even moderate sampling, although the phylogeny would remain difficult to distinguish from the case of a spillover host. The trees were created and displayed using a custom R script; random trees were created with the ape package, and a two-host discrete traits model was used with the package phytools to generate the ancestral and tip states.
Figure I
Figure I
Pathogen emergence or spillover? The figure represents the infection of horses by an avian pathogen. In (A), black arrows represent an avian virus that is introduced multiple times (spillover) but cannot be transmitted among horses, whereas (B) represents a single introduction event followed by onward transmission. (C) Represents the circulation of two distinct lineages in both species that share a closely related ancestor. If differences between the lineages are minimal (such as a single nucleotide polymorphism), whole-genome sequencing (WGS) would be invaluable because otherwise there is likely to be no other detectable difference between the viruses in the two hosts.

References

    1. Ferguson N.M. The foot-and-mouth epidemic in Great Britain: pattern of spread and impact of interventions. Science. 2001;292:1155–1160. - PubMed
    1. Parnell S. Optimal strategies for the eradication of asiatic citrus canker in heterogeneous host landscapes. Phytopathology. 2009;99:1370–1376. - PubMed
    1. Hampson K. Transmission dynamics and prospects for the elimination of canine rabies. PLoS Biol. 2009;7:462–471. - PMC - PubMed
    1. Harris S.R. Evolution of MRSA during hospital transmission and intercontinental spread. Science. 2010;327:469–474. - PMC - PubMed
    1. Maiden M.C. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl. Acad. Sci. U.S.A. 1998;95:3140–3145. - PMC - PubMed

MeSH terms

LinkOut - more resources