. 2013;9(5):e1003059.

doi: 10.1371/journal.pcbi.1003059. Epub 2013 May 2.

Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in Clostridium difficile transmission

David W Eyre¹, Madeleine L Cule, David Griffiths, Derrick W Crook, Tim E A Peto, A Sarah Walker, Daniel J Wilson

Affiliations

PMID: 23658511
PMCID: PMC3642043
DOI: 10.1371/journal.pcbi.1003059

Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in Clostridium difficile transmission

David W Eyre et al. PLoS Comput Biol. 2013.

. 2013;9(5):e1003059.

doi: 10.1371/journal.pcbi.1003059. Epub 2013 May 2.

Authors

David W Eyre¹, Madeleine L Cule, David Griffiths, Derrick W Crook, Tim E A Peto, A Sarah Walker, Daniel J Wilson

Affiliation

¹ Nuffield Department of Clinical Medicine, University of Oxford, John Radcliffe Hospital, Oxford, United Kingdom. david.eyre@ndm.ox.ac.uk

PMID: 23658511
PMCID: PMC3642043
DOI: 10.1371/journal.pcbi.1003059

Abstract

Bacterial whole genome sequencing offers the prospect of rapid and high precision investigation of infectious disease outbreaks. Close genetic relationships between microorganisms isolated from different infected cases suggest transmission is a strong possibility, whereas transmission between cases with genetically distinct bacterial isolates can be excluded. However, undetected mixed infections-infection with ≥2 unrelated strains of the same species where only one is sequenced-potentially impairs exclusion of transmission with certainty, and may therefore limit the utility of this technique. We investigated the problem by developing a computationally efficient method for detecting mixed infection without the need for resource-intensive independent sequencing of multiple bacterial colonies. Given the relatively low density of single nucleotide polymorphisms within bacterial sequence data, direct reconstruction of mixed infection haplotypes from current short-read sequence data is not consistently possible. We therefore use a two-step maximum likelihood-based approach, assuming each sample contains up to two infecting strains. We jointly estimate the proportion of the infection arising from the dominant and minor strains, and the sequence divergence between these strains. In cases where mixed infection is confirmed, the dominant and minor haplotypes are then matched to a database of previously sequenced local isolates. We demonstrate the performance of our algorithm with in silico and in vitro mixed infection experiments, and apply it to transmission of an important healthcare-associated pathogen, Clostridium difficile. Using hospital ward movement data in a previously described stochastic transmission model, 15 pairs of cases enriched for likely transmission events associated with mixed infection were selected. Our method identified four previously undetected mixed infections, and a previously undetected transmission event, but no direct transmission between the pairs of cases under investigation. These results demonstrate that mixed infections can be detected without additional sequencing effort, and this will be important in assessing the extent of cryptic transmission in our hospitals.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Figure 1. Epidemiological relationships between 15 potential donors and recipients of mixed infection transmission.**
Potential donors are shown in grey, and potential recipients in black. Time on a hospital ward around the time of diagnosis is shown as a horizontal line bounded by short vertical lines. Each hospital/hospital area is given a distinct letter, each ward a number, and groups of similar wards are given the same number followed by a lower case letter. Positive samples for *C. difficile* are shown as crosses.

**Figure 2. *In vitro* simulated mixed infections.**
Panel A shows the estimated mixture proportion for 3 input DNA mixture proportions. For ease of visualisation individual data points have different x-axis values, but correspond to the 3 x-axis values as indicated by the grey background. Points obtained from mixes of two differing STs are shown in red, and points from mixing two isolates of the same ST in blue. The large confidence intervals for the leftmost of each red group of samples is a sample with only a single variant site between the two input sequences, which when excluded in bootstrap sampling makes the sample appear unmixed. Panel B shows the estimated divergence between sequences for differing input divergence and mixture proportions. The leftmost group of 3 points represent a sample with a single variant site, which when excluded in bootstrap sampling makes the sample appear unmixed and estimates of d unstable between 0 and 1.

**Figure 3. Reads mapped and estimated mixture proportion for possible mixed-ST clinical infections, across two alignment programs.**
Points in orange show evidence of contamination with other bacteria (i.e. <80% of reads mapped to reference genome), other points are shown in blue. Mixed infections detected using a −2 log likelihood ratio statistic threshold of ≥19.4 (as defined in the calibration samples) are shown as filled circles, other points are shown as crosses. Panel A shows the data obtained from alignments generated using Stampy. Stampy is designed to perform well with relatively high sequence variation relative to the reference, in particular insertions or deletions. In the more contaminated samples we observed markedly divergent reads from other species mapped to the highly conserved MLST loci resulting in falsely identifying mixed infections. Panel B show the data obtained from alignments generated with Burrows Wheeler Aligner. Two mixed infections were detected.

**Figure 4. Phylogenetic and epidemiological relationships between cases related to a detected mixed infection.**
Panel A shows a depiction of the 2 mixed infections identified in donor A and recipient A. A transmission event from donor A to recipient A was predicted by a stochastic transmission model based on ward admission data. However donor A and recipient A had differing multilocus sequence types (STs) on initial testing of a single isolate from each case, suggesting a possible undetected mixed infection. Using the mixed infection estimator a minor ST infection in recipient A was found sharing the same ST, ST1 as donor A. However, applying the estimator to variable sites within ST1, the minor sequence in recipient A was most likely to have arisen from another case, donor B, shown in blue. Panel B shows the epidemiological relationships between donor A, recipient A, donor B and cases sharing similar sequences. Ward stays are shown as horizontal lines and positive tests as crosses. Panel C shows a phylogenetic tree of 45 distinct whole genome sequences from Oxfordshire patients with ST1 *Clostridium difficile* infection. Maximum likelihood tree based on 79 variable sites identified drawn using PhyML . The donor proposed by the transmission model is shown in grey (donor A). The minor sequence in recipient A is shown in black, matching the sequence found in donor B, in blue. Recipient B shared an identical sequence to recipient A. Recipients C and D are two cases phylogenetically descended from the donor B, recipient A, recipient B sequences. Note only donor A and recipient A were analysed for the presence of mixed infection.

See this image and copyright information in PMC

Cited by

Comparative Genomics of Clostridioides difficile.
Janezic S, Garneau JR, Monot M. Janezic S, et al. Adv Exp Med Biol. 2024;1435:199-218. doi: 10.1007/978-3-031-42108-2_10. Adv Exp Med Biol. 2024. PMID: 38175477
Identifying Mixed Mycobacterium tuberculosis Infection and Laboratory Cross-Contamination during Mycobacterial Sequencing Programs.
Wyllie DH, Robinson E, Peto T, Crook DW, Ajileye A, Rathod P, Allen R, Jarrett L, Smith EG, Walker AS. Wyllie DH, et al. J Clin Microbiol. 2018 Oct 25;56(11):e00923-18. doi: 10.1128/JCM.00923-18. Print 2018 Nov. J Clin Microbiol. 2018. PMID: 30209183 Free PMC article.
Mycobacterium intracellulare subsp. chimaera from Cardio Surgery Heating-Cooling Units and from Clinical Samples in Israel Are Genetically Unrelated.
Rubinstein M, Grossman R, Nissan I, Schwaber MJ, Carmeli Y, Kaidar-Shwartz H, Dveyrin Z, Rorman E. Rubinstein M, et al. Pathogens. 2021 Oct 27;10(11):1392. doi: 10.3390/pathogens10111392. Pathogens. 2021. PMID: 34832548 Free PMC article.
BHap: a novel approach for bacterial haplotype reconstruction.
Li X, Saadat S, Hu H, Li X. Li X, et al. Bioinformatics. 2019 Nov 1;35(22):4624-4631. doi: 10.1093/bioinformatics/btz280. Bioinformatics. 2019. PMID: 31004480 Free PMC article.
Bacterial Genomics Reveal the Complex Epidemiology of an Emerging Pathogen in Arctic and Boreal Ungulates.
Forde TL, Orsel K, Zadoks RN, Biek R, Adams LG, Checkley SL, Davison T, De Buck J, Dumond M, Elkin BT, Finnegan L, Macbeth BJ, Nelson C, Niptanatiak A, Sather S, Schwantje HM, van der Meer F, Kutz SJ. Forde TL, et al. Front Microbiol. 2016 Nov 7;7:1759. doi: 10.3389/fmicb.2016.01759. eCollection 2016. Front Microbiol. 2016. PMID: 27872617 Free PMC article.

See all "Cited by" articles

References

1. Didelot X, Bowden R, Wilson DJ, Peto TEA, Crook DW (2012) Transforming clinical microbiology with bacterial genome sequencing. Nat Rev Genet 13: 601–612 doi:10.1038/nrg3226. - DOI - PMC - PubMed
1. Wilson DJ (2012) Insights from genomics into bacterial pathogen populations. PLoS Pathog 8: e1002874 doi:10.1371/journal.ppat.1002874. - DOI - PMC - PubMed
1. Rohde H, Qin J, Cui Y, Li D, Loman NJ, et al. (2011) Open-source genomic analysis of Shiga-toxin-producing E. coli O104:H4. N Engl J Med 365: 718–724 doi:10.1056/NEJMoa1107643. - DOI - PubMed
1. Rasko DA, Webster DR, Sahl JW, Bashir A, Boisen N, et al. (2011) Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. N Engl J Med 365: 709–717 doi:10.1056/NEJMoa1106920. - DOI - PMC - PubMed
1. Mellmann A, Harmsen D, Cummings CA, Zentz EB, Leopold SR, et al. (2011) Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104:H4 outbreak by rapid next generation sequencing technology. PLoS ONE 6: e22751 doi:10.1371/journal.pone.0022751. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in Clostridium difficile transmission

Affiliation

Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in Clostridium difficile transmission

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical