. 2025 Mar 25;16(1):2933.

doi: 10.1038/s41467-025-58214-6.

Enhanced diagnosis of multi-drug-resistant microbes using group association modeling and machine learning

Julian G Saliba^{1

2}, Wenshu Zheng^{3

4}, Qingbo Shu^{1

5}, Liqiang Li^{6

7}, Chi Wu^{6

7}, Yi Xie⁸, Christopher J Lyon^{1

5}, Jiuxin Qu^{6

7}, Hairong Huang⁹, Binwu Ying⁸, Tony Ye Hu^{10

11}

Affiliations

¹ Center for Cellular and Molecular Diagnostics, Tulane University School of Medicine, New Orleans, LA, USA.
² Department of Biomedical Engineering, Tulane University School of Science and Engineering, New Orleans, LA, USA.
³ Center for Cellular and Molecular Diagnostics, Tulane University School of Medicine, New Orleans, LA, USA. wzheng5@tulane.edu.
⁴ Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, LA, USA. wzheng5@tulane.edu.
⁵ Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, LA, USA.
⁶ Department of Clinical Laboratory, Shenzhen Third People's Hospital, Shenzhen, Guangdong, China.
⁷ National Clinical Research Center for Infectious Diseases, Shenzhen, Guangdong, China.
⁸ Department of Laboratory Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan, China.
⁹ National Clinical Laboratory on Tuberculosis, Beijing Chest Hospital of Capital Medical University, Beijing, China.
¹⁰ Center for Cellular and Molecular Diagnostics, Tulane University School of Medicine, New Orleans, LA, USA. tonyhu@tulane.edu.
¹¹ Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, LA, USA. tonyhu@tulane.edu.

PMID: 40133304
PMCID: PMC11937555
DOI: 10.1038/s41467-025-58214-6

Enhanced diagnosis of multi-drug-resistant microbes using group association modeling and machine learning

Julian G Saliba et al. Nat Commun. 2025.

. 2025 Mar 25;16(1):2933.

doi: 10.1038/s41467-025-58214-6.

Authors

Affiliations

¹ Center for Cellular and Molecular Diagnostics, Tulane University School of Medicine, New Orleans, LA, USA.
² Department of Biomedical Engineering, Tulane University School of Science and Engineering, New Orleans, LA, USA.
³ Center for Cellular and Molecular Diagnostics, Tulane University School of Medicine, New Orleans, LA, USA. wzheng5@tulane.edu.
⁴ Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, LA, USA. wzheng5@tulane.edu.
⁵ Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, LA, USA.
⁶ Department of Clinical Laboratory, Shenzhen Third People's Hospital, Shenzhen, Guangdong, China.
⁷ National Clinical Research Center for Infectious Diseases, Shenzhen, Guangdong, China.
⁸ Department of Laboratory Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan, China.
⁹ National Clinical Laboratory on Tuberculosis, Beijing Chest Hospital of Capital Medical University, Beijing, China.
¹⁰ Center for Cellular and Molecular Diagnostics, Tulane University School of Medicine, New Orleans, LA, USA. tonyhu@tulane.edu.
¹¹ Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, LA, USA. tonyhu@tulane.edu.

PMID: 40133304
PMCID: PMC11937555
DOI: 10.1038/s41467-025-58214-6

Abstract

New solutions are needed to detect genotype-phenotype associations involved in microbial drug resistance. Herein, we describe a Group Association Model (GAM) that accurately identifies genetic variants linked to drug resistance and mitigates false-positive cross-resistance artifacts without prior knowledge. GAM analysis of 7,179 Mycobacterium tuberculosis (Mtb) isolates identifies gene targets for all analyzed drugs, revealing comparable performance but fewer cross-resistance artifacts than World Health Organization (WHO) mutation catalogue approach, which requires expert rules and precedents. GAM also reveals generalizability, demonstrating high predictive accuracy with 3,942 S. aureus isolates. GAM refinement by machine learning (ML) improves predictive accuracy with small or incomplete datasets. These findings were validated using 427 Mtb isolates from three sites, where GAM inputs are also found to be more suitable in ML prediction models than WHO inputs. GAM + ML could thus address the limitations of current drug resistance prediction methods to improve treatment decisions for drug-resistant microbial infections.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

**Fig. 1. GAM + ML workflow summary.**
a Genotyping and minimum inhibitory concentration (MIC) culture analysis for drug susceptibility testing (DST) phenotypes of *Mtb* isolates. b Data filtration via genotype and phenotype information. c *Mtb* isolate sequence and DST data are fed into GAM to identify mutations associated with drug resistance, after which GAM classification performance is evaluated using statistical metrics. d Machine learning is applied to SNPs that GAM classifies as being associated with drug resistance to predict drug resistance profiles. e Multi-site cross-validation is performed to characterize the utility of this GAM + ML prediction approach. Created in BioRender.

**Fig. 2. Summary of the GAM process and groups associated with specific drug resistance profiles.**
a GAM scheme and phylogenetic tree of DS2 isolates. b DS2 percentages derived from each lineage of all CRyPTIC isolates. Created in BioRender. c Number of isolates in groups containing multiple (Test/Control) or single (Non-Test) isolates. Mono-resistant (Mono), MDR/RR, Pre-XDR, XDR, Poly (RIF susceptible but resistant to ≥2 other drugs), and INH S (RIF + INH susceptible but resistant to ≥2 other drugs). d Group size ranges in size-ranked drug-resistant group quartiles. e Number of DS2 groups resistant to one or more drugs. f Mean number of isolates in groups resistant to one or more drugs. g Specific drug resistance frequencies in all drug-resistant DS2 groups. Source data are provided as a Source Data file.

**Fig. 3. GAM and LMM detection of drug resistance associations.**
a GAM workflow for data grouping and association. b Gene level interpretations of DNA variants associated with specific drug resistance as calculated by Fisher’s exact test, indicating the significance threshold (dashed line; -log₁₀^p-value < 5.22) determined after Bonferroni correction for multiple tests. c Gene-drug interactions detected by both LMM and GAM (orange), LMM alone (blue), or neither (white), using associations in the top 20 LMM associations for each drug. d Co-occurrence of DS2 drug-resistant phenotypes, where dark and light green indicates high and low percent overlap, respectively. e True positive, (f) false positive, (g) false negative mutations found by GWAS LMM (blue) and GAM (red). Source data are provided as a Source Data file.

**Fig. 4. Optimization of variant detection as predictors for drug resistance.**
a Schematic comparing prior knowledge requirements and accuracy of different approaches. b Boxplot of GAM + ML classification accuracy across model runs (N = 10), each using a different random test set and seed. Data depict median (center bar), 25th and 75th percentile (lower and upper box bounds), and minimum and maximum values (lower and upper whiskers). P-values were calculated from repeat measure 1-way ANOVAs, followed by Dunnett’s test for multiple comparisons, comparing the results to a Gradient Boosting reference model. c Workflow of the ML model using GAM variants as input. Calculated (d) PPV, (e) specificity, and (f) sensitivity (error bars indicate two-sided 95% confidence intervals) of predictive approaches applied to DS1 for specific drug resistance using variants identified by GAM (blue); 2021 (yellow) and 2023 (green) WHO interim criteria; and a gradient boosting model using GAM variants (red). Sample sizes for these comparisons varied according to the number of *Mtb* isolates with phenotype data for AMI (n = 10027), EMB (n = 8911), ETH (n = 9356), INH (n = 10025), KAN (n = 10085), LEV (n = 10114), MXF (n = 10139), and RIF (n = 10052). Source data are provided as a Source Data file.

**Fig. 5. Effect of sample size and DST data incompleteness on GAM and ML-GAM outputs.**
a Effect of sample size on GAM and LMM true positive (TP) and false positive (FP) gene identifications. Y-axis breaks between 20 and 200. b Heatmap of mean PPV from model runs, each using a different random test set and seed (N = 10), for GAM and LMM for varying sample sizes. c Effect of missing data on GAM performance. a, c Solid and dashed lines represent nonlinear sigmoidal curves and their two-sided 95% confidence intervals, respectively. Data points display mean ± standard error values from model runs, each using a different random test set and seed (N = 10). d ML-GAM workflow for datasets with missing data. e ML training set size effect on GAM accuracy, indicating median (central line) and minimum and maximum range (box boundaries), and p-value from a 1-way ANOVA with Tukey’s multiple comparison test from model runs, each using a different random test set and seed (N = 30). f Effect of missing data on accurate GAM gene identification after adjusting data with ML models trained with different sample sizes, where the remaining samples are analyzed as the GAM test samples. Solid and dashed lines represent nonlinear sigmoidal curves and their two-sided 95% confidence intervals, respectively. Data points display mean ± standard error values from model runs, each using a different random test set and seed (N = 5). Source data are provided as a Source Data file.

**Fig. 6. GAM vs WHO ML model accuracy for drug resistance prediction in 427 *Mtb* isolates.**
a *Mtb* isolates from three hospital sites in China were analyzed by drug susceptibility testing and sequenced to identify variant sequences. Created in BioRender. b–i Pair-matched model accuracy for isolates resistant to eight drug targets as assessed across N = 10 random seeds and analyzed by 1-way ANOVAs with Geisser-Greenhouse corrections and Dunnett’s tests for multiple comparisons. The number of isolates used for these comparisons varied according to the number of isolates with phenotype data for (b) amikacin (n = 427), (c) ethambutol (n = 423), (d) ethionamide (n = 421), (e) isoniazid (n = 352), (f) kanamycin (n = 427), (g) levofloxacin (n = 112), (h) moxifloxacin (n = 415), and (i) rifampicin (n = 185) susceptibility tests. Source data are provided as a Source Data file.

See this image and copyright information in PMC

Cited by

Extracellular Vesicles for Clinical Diagnostics: From Bulk Measurements to Single-Vesicle Analysis.
Tran HL, Zheng W, Issadore DA, Im H, Cho YK, Zhang Y, Liu D, Liu Y, Li B, Liu F, Wong DTW, Sun J, Qian K, He M, Wan M, Zeng Y, Cheng K, Huang TJ, Chiu DT, Lee LP, Zheng L, Godwin AK, Kalluri R, Soper SA, Hu TY. Tran HL, et al. ACS Nano. 2025 Aug 12;19(31):28021-28109. doi: 10.1021/acsnano.5c00706. Epub 2025 Jul 28. ACS Nano. 2025. PMID: 40720603 Free PMC article. Review.

References

1. Cohen, M. L. Epidemiology of drug resistance: implications for a post-antimicrobial era. Science257, 1050–1055 (1992). - PubMed
1. Alanis, A. J. Resistance to antibiotics: are we in the post-antibiotic era? Arch. Med. Res.36, 697–705 (2005). - PubMed
1. Michael, C. A., Dominey-Howes, D. & Labbate, M. The antimicrobial resistance crisis: causes, consequences, and management. Front. Public. Health2, 145 (2014). - PMC - PubMed
1. Mazel, D. & Davies, J. Antibiotic resistance in microbes. Cell. Mol. Life Sci.56, 742–754 (1999). - PMC - PubMed
1. Rowneki, M. et al. Detection of drug resistant Mycobacterium tuberculosis by high-throughput sequencing of DNA isolated from acid fast bacilli smears. PLoS One15, e0232343 (2020). - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Enhanced diagnosis of multi-drug-resistant microbes using group association modeling and machine learning

Affiliations

Enhanced diagnosis of multi-drug-resistant microbes using group association modeling and machine learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources