Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 4;16(1):2149.
doi: 10.1038/s41467-025-57174-1.

Multivariable regression models improve accuracy and sensitive grading of antibiotic resistance mutations in Mycobacterium tuberculosis

Affiliations

Multivariable regression models improve accuracy and sensitive grading of antibiotic resistance mutations in Mycobacterium tuberculosis

Sanjana G Kulkarni et al. Nat Commun. .

Abstract

Rapid genotype-based drug susceptibility testing for the Mycobacterium tuberculosis complex (MTBC) relies on a comprehensive knowledgebase of the genetic determinants of resistance. Here we present a catalogue of resistance-associated mutations using a regression-based approach and benchmark it against the 2nd edition of the World Health Organisation (WHO) mutation catalogue. We train multivariate logistic regression models on over 52,000 MTBC isolates to associate binary resistance phenotypes for 15 antitubercular drugs with variants extracted from candidate resistance genes. Regression detects 450/457 (98%) resistance-associated variants identified using the existing method (a.k.a, SOLO method) and grades 221 (29%) more total variants than SOLO. The regression-based catalogue achieves higher sensitivity on average (+3.2 percentage points, pp) than SOLO with smaller average decreases in specificity (-1.0 pp) and positive predictive value (-1.6 pp). Sensitivity gains are highest for ethambutol, clofazimine, streptomycin, and ethionamide as regression graded considerably more resistance-associated variants than SOLO for these drugs. There is no difference between SOLO and regression with regards to meeting the target product profiles set by the WHO for genetic drug susceptibility testing, except for rifampicin, for which regression specificity is below the threshold of 98% at 97%. The regression pipeline also detects isoniazid resistance compensatory mutations in ahpC and variants linked to bedaquiline and aminoglycoside hypersusceptibility. These results inform the continued development of targeted next generation sequencing, whole genome sequencing, and other commercial molecular assays for diagnosing resistance in the MTBC.

PubMed Disclaimer

Conflict of interest statement

Competing interests: T.C.R. received funding support from FIND through a service contract with UC San Diego. T.C.R. received grant funding from NIH to develop and evaluate a tNGS solution for drug resistant TB (R01AI176401). T.C.R. is a co-founder, board member and unpaid shareholder of Verus Diagnostics Inc. T.C.R. is a co-inventor on patents pertaining to tNGS. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of isolates included in the regression models.
a Percentages of phenotypically resistant (dark shades) and susceptible (light shades) isolates in the base models for the WHO (blue) and ALL (orange) datasets, across 15 drugs. b Lineage distribution for isolates in the base model for the ALL dataset only. Other category = M. bovis and L5-L7. The percentages for the “Other” category are not shown for readability. Only isolates with a single primary lineage according to the Coll 2014 scheme are shown in (b). In both panels, isolate counts for each bar are shown in parentheses. Bar colours: pink = L1, blue = L2, purple = L3, red = L4, white = all other lineages.
Fig. 2
Fig. 2. Single-model grading flowchart.
All p values were computed from the estimated coefficient and the distribution of coefficients from the 1000 permuted models (Supplementary Fig 1a, b), two-sided for the neutral test and one-sided for all others. False discovery rate (FDR) correction was performed using the Benjamini-Hochberg method. A variant is considered significant if it has an FDR-corrected p ≤ 0.05 for non-silent variants and FDR ≤ 0.01 for silent variants. LB lower bound in a binomial exact confidence interval. *: Relaxed thresholds for pncA are the same as in the SOLO algorithm -- Present in ≥2 PZA-resistant or susceptible isolates (depending on the sign of the OR) and PPV ≥ 0.5. +: Significance testing exception: raw p values and a cutoff of 0.05 for silent variants in the neutral permutation test.
Fig. 3
Fig. 3. Summary of regression classifications and comparison to SOLO results for 21,589 (drug, variant) pairs.
Regression variant gradings for 15 drugs, coloured by number of variants in each cell. Group 3) Uncertain significance variants are not shown in (a). Grading comparison tables for regression vs. SOLO (b) and regression with GR vs. SOLO with GR (c). Variant colouring: Dark blue = variants graded “Uncertain” by SOLO, not “Uncertain” by regression; light blue = variants graded “Uncertain” by regression, not “Uncertain” by SOLO; red = major up-/down-grade discrepancies by regression; grey = group agreement; black = Group 1 or 2 by both regression and SOLO but not perfect agreement.
Fig. 4
Fig. 4. MIC model results can provide additional evidence for novel associations derived from the binary pDST data.
Neutral variants were excluded from this analysis. WHO dataset ORs vs. MIC coefficients for 232 variants graded Group 3 by SOLO and Groups 1–2 (b, N = 204) or 4–5 (a, N = 28) by regression, were tested in MIC models, and have a significant OR in the WHO dataset. Point colour reflects the direction of association in the MIC model and significance at FDR ≤ 0.05 for all variants (red = significant/positive, blue = significant/negative, grey = not significant). Coef: coefficient in the MIC model.
Fig. 5
Fig. 5. Comparison of binary prediction metrics between four mutation lists.
Sensitivity (a), specificity (b), and PPV (c) comparison between SOLO (green), SOLO + GR (purple), regression (orange), and regression + GR (grey) mutation lists. Bars are the computed sensitivity, specificity, and PPV, as percent, for each drug and model type. Error bars are 95% exact binomial confidence intervals computed using the Clopper-Pearson method. Source data are in Supplementary Data 7, including the dataset sizes.

References

    1. Hall, M. B., Lima, L., Coin, L. J. M. & Iqbal, Z. Drug resistance prediction for Mycobacterium tuberculosis with reference graphs. Microb. Genom.9, mgen001081 (2023). - PMC - PubMed
    1. WHO. The use of next-generation sequencing for the surveillance of drug-resistant tuberculosis: An implementation manual (World Health Organization, 2023).
    1. Hunt, M. et al. Antibiotic resistance prediction for Mycobacterium tuberculosis from genome sequence data with Mykrobe. Wellcome open Res.4, 191 (2019). - PMC - PubMed
    1. Phelan, J. et al. Integrating informatics tools and portable sequencing technology for rapid detection of resistance to anti-tuberculous drugs. Genome Med.11, 41 (2019). - PMC - PubMed
    1. Miotto, P. et al. A standardised method for interpreting the association between mutations and phenotypic drug resistance in Mycobacterium tuberculosis. Eur. Respir. J.50, 1701354 (2017). - PMC - PubMed

MeSH terms

Substances