Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;9(5):e1003059.
doi: 10.1371/journal.pcbi.1003059. Epub 2013 May 2.

Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in Clostridium difficile transmission

Affiliations

Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in Clostridium difficile transmission

David W Eyre et al. PLoS Comput Biol. 2013.

Abstract

Bacterial whole genome sequencing offers the prospect of rapid and high precision investigation of infectious disease outbreaks. Close genetic relationships between microorganisms isolated from different infected cases suggest transmission is a strong possibility, whereas transmission between cases with genetically distinct bacterial isolates can be excluded. However, undetected mixed infections-infection with ≥2 unrelated strains of the same species where only one is sequenced-potentially impairs exclusion of transmission with certainty, and may therefore limit the utility of this technique. We investigated the problem by developing a computationally efficient method for detecting mixed infection without the need for resource-intensive independent sequencing of multiple bacterial colonies. Given the relatively low density of single nucleotide polymorphisms within bacterial sequence data, direct reconstruction of mixed infection haplotypes from current short-read sequence data is not consistently possible. We therefore use a two-step maximum likelihood-based approach, assuming each sample contains up to two infecting strains. We jointly estimate the proportion of the infection arising from the dominant and minor strains, and the sequence divergence between these strains. In cases where mixed infection is confirmed, the dominant and minor haplotypes are then matched to a database of previously sequenced local isolates. We demonstrate the performance of our algorithm with in silico and in vitro mixed infection experiments, and apply it to transmission of an important healthcare-associated pathogen, Clostridium difficile. Using hospital ward movement data in a previously described stochastic transmission model, 15 pairs of cases enriched for likely transmission events associated with mixed infection were selected. Our method identified four previously undetected mixed infections, and a previously undetected transmission event, but no direct transmission between the pairs of cases under investigation. These results demonstrate that mixed infections can be detected without additional sequencing effort, and this will be important in assessing the extent of cryptic transmission in our hospitals.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Epidemiological relationships between 15 potential donors and recipients of mixed infection transmission.
Potential donors are shown in grey, and potential recipients in black. Time on a hospital ward around the time of diagnosis is shown as a horizontal line bounded by short vertical lines. Each hospital/hospital area is given a distinct letter, each ward a number, and groups of similar wards are given the same number followed by a lower case letter. Positive samples for C. difficile are shown as crosses.
Figure 2
Figure 2. In vitro simulated mixed infections.
Panel A shows the estimated mixture proportion for 3 input DNA mixture proportions. For ease of visualisation individual data points have different x-axis values, but correspond to the 3 x-axis values as indicated by the grey background. Points obtained from mixes of two differing STs are shown in red, and points from mixing two isolates of the same ST in blue. The large confidence intervals for the leftmost of each red group of samples is a sample with only a single variant site between the two input sequences, which when excluded in bootstrap sampling makes the sample appear unmixed. Panel B shows the estimated divergence between sequences for differing input divergence and mixture proportions. The leftmost group of 3 points represent a sample with a single variant site, which when excluded in bootstrap sampling makes the sample appear unmixed and estimates of d unstable between 0 and 1.
Figure 3
Figure 3. Reads mapped and estimated mixture proportion for possible mixed-ST clinical infections, across two alignment programs.
Points in orange show evidence of contamination with other bacteria (i.e. <80% of reads mapped to reference genome), other points are shown in blue. Mixed infections detected using a −2 log likelihood ratio statistic threshold of ≥19.4 (as defined in the calibration samples) are shown as filled circles, other points are shown as crosses. Panel A shows the data obtained from alignments generated using Stampy. Stampy is designed to perform well with relatively high sequence variation relative to the reference, in particular insertions or deletions. In the more contaminated samples we observed markedly divergent reads from other species mapped to the highly conserved MLST loci resulting in falsely identifying mixed infections. Panel B show the data obtained from alignments generated with Burrows Wheeler Aligner. Two mixed infections were detected.
Figure 4
Figure 4. Phylogenetic and epidemiological relationships between cases related to a detected mixed infection.
Panel A shows a depiction of the 2 mixed infections identified in donor A and recipient A. A transmission event from donor A to recipient A was predicted by a stochastic transmission model based on ward admission data. However donor A and recipient A had differing multilocus sequence types (STs) on initial testing of a single isolate from each case, suggesting a possible undetected mixed infection. Using the mixed infection estimator a minor ST infection in recipient A was found sharing the same ST, ST1 as donor A. However, applying the estimator to variable sites within ST1, the minor sequence in recipient A was most likely to have arisen from another case, donor B, shown in blue. Panel B shows the epidemiological relationships between donor A, recipient A, donor B and cases sharing similar sequences. Ward stays are shown as horizontal lines and positive tests as crosses. Panel C shows a phylogenetic tree of 45 distinct whole genome sequences from Oxfordshire patients with ST1 Clostridium difficile infection. Maximum likelihood tree based on 79 variable sites identified drawn using PhyML . The donor proposed by the transmission model is shown in grey (donor A). The minor sequence in recipient A is shown in black, matching the sequence found in donor B, in blue. Recipient B shared an identical sequence to recipient A. Recipients C and D are two cases phylogenetically descended from the donor B, recipient A, recipient B sequences. Note only donor A and recipient A were analysed for the presence of mixed infection.

Similar articles

Cited by

References

    1. Didelot X, Bowden R, Wilson DJ, Peto TEA, Crook DW (2012) Transforming clinical microbiology with bacterial genome sequencing. Nat Rev Genet 13: 601–612 doi:10.1038/nrg3226. - DOI - PMC - PubMed
    1. Wilson DJ (2012) Insights from genomics into bacterial pathogen populations. PLoS Pathog 8: e1002874 doi:10.1371/journal.ppat.1002874. - DOI - PMC - PubMed
    1. Rohde H, Qin J, Cui Y, Li D, Loman NJ, et al. (2011) Open-source genomic analysis of Shiga-toxin-producing E. coli O104:H4. N Engl J Med 365: 718–724 doi:10.1056/NEJMoa1107643. - DOI - PubMed
    1. Rasko DA, Webster DR, Sahl JW, Bashir A, Boisen N, et al. (2011) Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. N Engl J Med 365: 709–717 doi:10.1056/NEJMoa1106920. - DOI - PMC - PubMed
    1. Mellmann A, Harmsen D, Cummings CA, Zentz EB, Leopold SR, et al. (2011) Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104:H4 outbreak by rapid next generation sequencing technology. PLoS ONE 6: e22751 doi:10.1371/journal.pone.0022751. - DOI - PMC - PubMed

Publication types