Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Feb;13(2):137-46.
doi: 10.1016/S1473-3099(12)70277-3. Epub 2012 Nov 15.

Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study

Affiliations

Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study

Timothy M Walker et al. Lancet Infect Dis. 2013 Feb.

Abstract

Background: Tuberculosis incidence in the UK has risen in the past decade. Disease control depends on epidemiological data, which can be difficult to obtain. Whole-genome sequencing can detect microevolution within Mycobacterium tuberculosis strains. We aimed to estimate the genetic diversity of related M tuberculosis strains in the UK Midlands and to investigate how this measurement might be used to investigate community outbreaks.

Methods: In a retrospective observational study, we used Illumina technology to sequence M tuberculosis genomes from an archive of frozen cultures. We characterised isolates into four groups: cross-sectional, longitudinal, household, and community. We measured pairwise nucleotide differences within hosts and between hosts in household outbreaks and estimated the rate of change in DNA sequences. We used the findings to interpret network diagrams constructed from 11 community clusters derived from mycobacterial interspersed repetitive-unit-variable-number tandem-repeat data.

Findings: We sequenced 390 separate isolates from 254 patients, including representatives from all five major lineages of M tuberculosis. The estimated rate of change in DNA sequences was 0.5 single nucleotide polymorphisms (SNPs) per genome per year (95% CI 0.3-0.7) in longitudinal isolates from 30 individuals and 25 families. Divergence is rarely higher than five SNPs in 3 years. 109 (96%) of 114 paired isolates from individuals and households differed by five or fewer SNPs. More than five SNPs separated isolates from none of 69 epidemiologically linked patients, two (15%) of 13 possibly linked patients, and 13 (17%) of 75 epidemiologically unlinked patients (three-way comparison exact p<0.0001). Genetic trees and clinical and epidemiological data suggest that super-spreaders were present in two community clusters.

Interpretation: Whole-genome sequencing can delineate outbreaks of tuberculosis and allows inference about direction of transmission between cases. The technique could identify super-spreaders and predict the existence of undiagnosed cases, potentially leading to early treatment of infectious patients and their contacts.

Funding: Medical Research Council, Wellcome Trust, National Institute for Health Research, and the Health Protection Agency.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Sample selection The cross-sectional and community analyses datasets overlapped by 14 isolates (eight patients); the longitudinal and community analyses by 23 (five); the longitudinal, household, and community analyses by 26 (seven); and the household and community analyses by 32 (29). WGS=whole-genome sequencing. *Cluster 9 is a large, previously described cluster defined by mycobacterial interspersed repetitive-unit–variable-number tandem-repeat genotyping that we did not attempt to sequence completely because of its size (>280 patients); 18 patients from this cluster were included because 46 isolates from them had been sequenced as cross-sectional, longitudinal, or household isolates; in the remaining ten community clusters, we attempted to culture and sequence all 207 isolates (173 patients), successfully sequencing 171 isolates (150 patients). †Mean time since original isolation was 8 years for missing isolates and 9 years for isolates that failed to regrow, compared with 5 years for successfully cultured isolates. ‡Mean time since original isolation was 10 years for missing isolates and 8 years for isolates that failed to regrow, compared with 6 years for successfully cultured isolates. §One patient excluded because all his or her isolates failed to grow. ¶Only two households were excluded.
Figure 2
Figure 2
Genetic diversity of related isolates of Mycobacterium tuberculosis (A) Time-unadjusted pairwise genetic distances in SNPs. 22 of the 38 links within the 25 household clusters also occur within community clusters (ie, known linkage) but are shown with household isolates and not with community isolates. Top horizontal dashed line indicates the threshold above which direct transmission can be judged to be unlikely; bottom horizontal dashed line indicates the threshold below which transmission should be investigated. (B) Rate of change in DNA sequences estimated by coalescent-based maximum likelihood from the first and last isolates from individuals with persistent open tuberculosis and from households. SNP=single nucleotide polymorphism. MIRU-VNTR=mycobacterial interspersed repetitive-unit–variable-number tandem-repeat.*Isolates had substantially different MIRU-VNTR profiles. †Pair of Mycobacterium africanum isolates are represented two SNPs apart.
Figure 3
Figure 3
SNPs between MIRU-VNTR types by number of locus differences Comparison of all isolates with complete 24-locus MIRU-VNTR profiles. As each isolate was compared to each other isolate, the number of SNPs and MIRU-VNTR loci at which they diverge was recorded. Results are plotted on a log scale. Circle sizes are proportionate to the number of pairs diverging by a specific number of loci and SNPs. Dashed red box includes isolates that differ by five or fewer SNPs. SNP=single nucleotide polymorphism. MIRU-VNTR=mycobacterial interspersed repetitive-unit–variable-number tandem-repeat.
Figure 4
Figure 4
Genetic distances within 11 community clusters Genetic distances estimated with maximum likelihood. Each blue circle represents a node of people who were infected with isolates separated by no SNPs. Each number within a circle is one patient, the number indicates at which year during the outbreak they were diagnosed (the first infected is represented by 0). For patients with several isolates, the closest in SNPs to the next patient is included. Black circles are added when patients within blue circles are separated by more than one SNP; one black circle represents a difference of one SNP. Dashed lines in clusters three and ten show larger SNP distances (not to scale), with numbers representing the SNP difference. Arrows indicate the next closest isolate in the sequenced collection. Cluster five has three red nodes that were sequenced after the blue nodes; the existence of the central red node was suggested by the constellation of surrounding blue nodes. SNP=single nucleotide polymorphism. *Two isolates from one patient.
Figure 5
Figure 5
Detailed investigation of cluster seven Genetic tree and matrix of nucleotide variants (A). Genetic distances estimated with maximum likelihood. Each blue circle represents a node of people who were infected with isolates separated by no single nucleotide polymorphisms. Numbers within nodes are patient numbers and years of sample isolation are given in parentheses. The matrix shows nucleotide variants. Epidemiological network (B). Time of onset of symptoms, diagnosis, and treatment (C). Sputum smear positive samples show probable infectious periods.

Comment in

References

    1. Health Protection Agency Tuberculosis in the UK: 2012 report. July 5, 2012. http://www.hpa.org.uk/webc/HPAwebFile/HPAweb_C/1317134913404 (accessed Oct 9, 2012).
    1. Abubakar I, Lipman M, Anderson C, Davies P, Zumla A. Tuberculosis in the UK—time to regain control. BMJ. 2011;343:d4281. - PubMed
    1. Allix-Beguec C, Fauville-Dufaux M, Supply P. Three-year population-based evaluation of standardized mycobacterial interspersed repetitive-unit-variable-number tandem-repeat typing of Mycobacterium tuberculosis. J Clin Microbiol. 2008;46:1398–1406. - PMC - PubMed
    1. Hawkey PM, Smith EG, Evans JT. Mycobacterial interspersed repetitive unit typing of Mycobacterium tuberculosis compared to IS6110-based restriction fragment length polymorphism analysis for investigation of apparently clustered cases of tuberculosis. J Clin Microbiol. 2003;41:3514–3520. - PMC - PubMed
    1. Li J, Driver CR, Munsiff SS, Fujiwara PI. Finding contacts of homeless tuberculosis patients in New York City. Int J Tuberc Lung Dis. 2003;7:S397–S404. - PubMed

Publication types

MeSH terms

LinkOut - more resources