Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan;6(1):100936.
doi: 10.1016/j.lanmic.2024.06.003. Epub 2024 Nov 28.

Signatures of transmission in within-host Mycobacterium tuberculosis complex variation: a retrospective genomic epidemiology study

Affiliations

Signatures of transmission in within-host Mycobacterium tuberculosis complex variation: a retrospective genomic epidemiology study

Katharine S Walter et al. Lancet Microbe. 2025 Jan.

Abstract

Background: Mycobacterium tuberculosis complex (MTBC) species evolve slowly, so isolates from individuals linked in transmission often have identical or nearly identical genomes, making it difficult to reconstruct transmission chains. Finding additional sources of shared MTBC variation could help overcome this problem. Previous studies have reported MTBC diversity within infected individuals; however, whether within-host variation improves transmission inferences remains unclear. Here, we aimed to quantify within-host MTBC variation and assess whether such information improves transmission inferences.

Methods: We conducted a retrospective genomic epidemiology study in which we reanalysed publicly available sequence data from household transmission studies published in PubMed from database inception until Jan 31, 2024, for which both genomic and epidemiological contact data were available, using household membership as a proxy for transmission linkage. We quantified minority variants (ie, positions with two or more alleles each supported by at least five-fold coverage and with a minor allele frequency of 1% or more) outside of PE and PPE genes, within individual samples and shared across samples. We used receiver operator characteristic (ROC) curves to compare the performance of a general linear model for household membership that included shared minority variants and one that included only fixed genetic differences.

Findings: We identified three MTBC household transmission studies with publicly available whole-genome sequencing data and epidemiological linkages: a household transmission study in Vitória, Brazil (Colangeli et al), a retrospective population-based study of paediatric tuberculosis in British Columbia, Canada (Guthrie et al), and a retrospective population-based study in Oxfordshire, England (Walker et al). We found moderate levels of minority variation present in MTBC sequence data from cultured isolates that varied significantly across studies: mean 168·6 minority variants (95% CI 151·4-185·9) for the Colangeli et al dataset, 5·8 (1·5-10·2) for Guthrie et al (p<0·0001, Wilcoxon rank sum test, vs Colangeli et al), and 7·1 (2·4-11·9) for Walker et al (p<0·0001, Wilcoxon rank sum test, vs Colangeli et al). Isolates from household pairs shared more minority variants than did randomly selected pairs of isolates: mean 97·7 shared minority variants (79·1-116·3) versus 9·8 (8·6-11·0) in Colangeli et al, 0·8 (0·1-1·5) versus 0·2 (0·1-0·2) in Guthrie et al, and 0·7 (0·1-1·3) versus 0·2 (0·2-0·2) in Walker et al (all p<0·0001, Wilcoxon rank sum test). Shared within-host variation was significantly associated with household membership (odds ratio 1·51 [95% CI 1·30-1·71], p<0·0001), for one standard deviation increase in shared minority variants. Models that included shared within-host variation versus models without within-host variation improved the accuracy of predicting household membership in all three studies: area under the ROC curve 0·95 versus 0·92 for the Colangeli et al study, 0·99 versus 0·95 for the Guthrie et al study, and 0·93 versus 0·91 for the Walker et al study.

Interpretation: Within-host MTBC variation persists through culture of sputum and could enhance the resolution of transmission inferences. The substantial differences in minority variation recovered across studies highlight the need to optimise approaches to recover and incorporate within-host variation into automated phylogenetic and transmission inference.

Funding: National Institutes of Health.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests JRA reports support from the US National Institutes of Health (NIH); receiving donated Cepheid disposable open cartridges from Cepheid to Stanford University for NIH-funded tuberculosis research; participation in two Data Safety Monitoring Boards for NIH-funded tuberculosis trials; and participation in the Science Advisory Board for an NIH-funded tuberculosis network. JC reports grants from the NIH, Valneva/Butantan Institution, Coalition for Epidemic Preparedness Innovations–Sabin Institute, MSD, and Sanofi Pasteur for clinical and epidemiological studies; honoraria from Pfizer; participation in the Brazil Advisory Board for the mRNA-1273 vaccine Moderna–Zodiac, the Latin America and Brazilian Advisory Board for paxlovid (nirmatrelvir–ritonavir; Pfizer), the Brazil Advisory Board for the Qdenga vaccine (Takeda), and the Takeda Global Dengue Steering Committee. All other authors declare no competing interests.

Figures

Figure 1:
Figure 1:. Histograms of pairwise genetic distances between MTBC consensus genomes (A), and maximum likelihood phylogeny of consensus MTBC sequences (B)
(A) Histograms indicate counts of pairwise genetic distances between MTBC consensus genomes, for each included study (Colangeli at al, Guthrie et al, and Walker et al) and type pairwise comparison (household and unlinked pairs). (B) Maximum likelihood phylogeny of consensus MTBC sequences for each study. Trees are midpoint rooted and tree tips are coloured by household for individuals within households or with known epidemiological links. Tree branches are in units of substitutions per site. MTBC=Mycobacterium tuberculosis complex. SNP=single-nucleotide polymorphism.
Figure 1:
Figure 1:. Histograms of pairwise genetic distances between MTBC consensus genomes (A), and maximum likelihood phylogeny of consensus MTBC sequences (B)
(A) Histograms indicate counts of pairwise genetic distances between MTBC consensus genomes, for each included study (Colangeli at al, Guthrie et al, and Walker et al) and type pairwise comparison (household and unlinked pairs). (B) Maximum likelihood phylogeny of consensus MTBC sequences for each study. Trees are midpoint rooted and tree tips are coloured by household for individuals within households or with known epidemiological links. Tree branches are in units of substitutions per site. MTBC=Mycobacterium tuberculosis complex. SNP=single-nucleotide polymorphism.
Figure 2:
Figure 2:. Ridgeline plot of the distribution of minority variants across minor allele frequencies for ten randomly selected samples from each study,,
Each row indicates a unique sample and row height indicates the density of minority variants within a particular minor allele frequency bin identified for each sample, with scaling calculated separately for each panel. Panels indicate genomic region: outside PE and PPE genes and within PE and PPE genes. Some samples do not have minority variants detected outside the PE and PPE genes.
Figure 3:
Figure 3:. Boxplots of the number of high-quality shared minority variants between sample pairs in three previously published MTBC transmission studies,, with jittered points indicating pairwise observations
We report the number of minority variants within samples (sample; blue), shared by household members (household; yellow), or shared by non-household members (unlinked; grey). Boxes indicate group interquartile ranges, centre lines indicate group medians, and whiskers show the range of the top and bottom 25% of values, excluding outliers. MTBC=Mycobacterium tuberculosis complex.
Figure 4:
Figure 4:. Stacked bar plots of the proportion of sample pairs across different levels of shared minority variants (A) and receiver operating characteristic curves for predicting household membership in three general linear models (B)
(A) Minority variants with a minor allele frequency of 1% or more were considered. (B) Sensitivity (true positive rate) is shown as a function of 1–specificity (true negative rate). The full model includes both shared minority variants and consensus sequence-based clusters, the genomic clustering model includes the consensus sequence-based cluster only, and the minority variants model includes shared minority variants only. All models include the study as a predictor.

Update of

References

    1. Meehan CJ, Goig GA, Kohl TA, et al. Whole genome sequencing of Mycobacterium tuberculosis: current standards and open issues. Nat Rev Microbiol 2019; 17: 533–45. - PubMed
    1. Auld SC, Shah NS, Mathema B, et al. Extensively drug-resistant tuberculosis in South Africa: genomic evidence supporting transmission in communities. Eur Respir J 2018; 52: 1800246. - PMC - PubMed
    1. Middelkoop K, Mathema B, Myer L, et al. Transmission of tuberculosis in a South African community with a high prevalence of HIV infection. J Infect Dis 2015; 211: 53–61. - PMC - PubMed
    1. Ypma RJF, Altes HK, van Soolingen D, Wallinga J, van Ballegooijen WM. A sign of superspreading in tuberculosis: highly skewed distribution of genotypic cluster sizes. Epidemiology 2013; 24: 395–400. - PubMed
    1. Gygli SM, Loiseau C, Jugheli L, et al. Prisons as ecological drivers of fitness-compensated multidrug-resistant. Mycobacterium tuberculosis. Nat Med 2021; 27: 1171–77. - PMC - PubMed