First-line drug resistance profiling of Mycobacterium tuberculosis: a machine learning approach
- PMID: 35309001
- PMCID: PMC8861754
First-line drug resistance profiling of Mycobacterium tuberculosis: a machine learning approach
Abstract
The persistence and emergence of new multi-drug resistant Mycobacterium tuberculosis (M. tb) strains continues to advance the devastating tuberculosis (TB) epidemic. Robust systems are needed to accurately and rapidly perform drug-resistance profiling, and machine learning (ML) methods combined with genomic sequence data may provide novel insights into drug-resistance mechanisms. Using 372 M. tb isolates, the combined utility of ML and bioinformatics to perform drug-resistance profiling is demonstrated. SNPs, InDels, and dinucleotide frequencies are explored as input features for three ML models, namely Decision Trees, Random Forest, and the eXtreme Gradient Boosted model. Using SNPs and InDels, all three models performed equally well yielding a 99% accuracy, 97% recall, and 99% F1-score. Using dinucleotide frequencies, the XGBoost algorithm was superior with a 97% accuracy, 94% recall and 97% F1-score. This study validates the use of variants and presents dinucleotide features as another effective feature encoding method for ML-based phenotype classification.
©2021 AMIA - All rights reserved.
Figures





References
-
- World Health Organization. Global tuberculosis report 2020: executive summary. 2020.
-
- Weyer K, Mirzayev F, Migliori GB, Van Gemert W, D’Ambrosio L, Zignol M, et al. Rapid molecular TB diagnosis: evidence, policy making and global implementation of Xpert MTB/RIF. European Respiratory Journal. 2013;42(1):252–271. - PubMed
-
- Brossier F, Veziris N, Jarlier V, Sougakoff W. Performance of MTBDR plus for detecting high/low levels of Mycobacterium tuberculosis resistance to isoniazid. The International Journal of Tuberculosis and Lung Disease. 2009;13(2):260–265. - PubMed
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Medical