Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 21:2021:891-899.
eCollection 2021.

First-line drug resistance profiling of Mycobacterium tuberculosis: a machine learning approach

Affiliations

First-line drug resistance profiling of Mycobacterium tuberculosis: a machine learning approach

Stephanie J Müller et al. AMIA Annu Symp Proc. .

Abstract

The persistence and emergence of new multi-drug resistant Mycobacterium tuberculosis (M. tb) strains continues to advance the devastating tuberculosis (TB) epidemic. Robust systems are needed to accurately and rapidly perform drug-resistance profiling, and machine learning (ML) methods combined with genomic sequence data may provide novel insights into drug-resistance mechanisms. Using 372 M. tb isolates, the combined utility of ML and bioinformatics to perform drug-resistance profiling is demonstrated. SNPs, InDels, and dinucleotide frequencies are explored as input features for three ML models, namely Decision Trees, Random Forest, and the eXtreme Gradient Boosted model. Using SNPs and InDels, all three models performed equally well yielding a 99% accuracy, 97% recall, and 99% F1-score. Using dinucleotide frequencies, the XGBoost algorithm was superior with a 97% accuracy, 94% recall and 97% F1-score. This study validates the use of variants and presents dinucleotide features as another effective feature encoding method for ML-based phenotype classification.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Generalized workflow used in this study
Figure 2:
Figure 2:
(A) PCA and (B) t-SNE of variant features for 16 genes
Figure 3:
Figure 3:
(A) PCA and (B) t-SNE of dinucleotide features for 15 genes
Figure 4:
Figure 4:
Confusion matrices of the variant features for three models: (A) Decision Tree, (B) Random Forest, (C) XGBoost
Figure 5:
Figure 5:
Confusion matrices of the dinucleotide features for three models: (A) Decision Tree, (B) Random Forest, (C) XGBoost

References

    1. World Health Organization. Global tuberculosis report 2020: executive summary. 2020.
    1. Weyer K, Mirzayev F, Migliori GB, Van Gemert W, D’Ambrosio L, Zignol M, et al. Rapid molecular TB diagnosis: evidence, policy making and global implementation of Xpert MTB/RIF. European Respiratory Journal. 2013;42(1):252–271. - PubMed
    1. Brossier F, Veziris N, Jarlier V, Sougakoff W. Performance of MTBDR plus for detecting high/low levels of Mycobacterium tuberculosis resistance to isoniazid. The International Journal of Tuberculosis and Lung Disease. 2009;13(2):260–265. - PubMed
    1. Walker TM, Kohl TA, Omar SV, Hedge J, Elias CDO, Bradley P, et al. Whole-genome sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort study. The Lancet infectious diseases. 2015;15(10):1193–1202. - PMC - PubMed
    1. Daum LT, Rodriguez JD, Worthy SA, Ismail NA, Omar SV, Dreyer AW, et al. Next-generation ion torrent sequencing of drug resistance mutations in Mycobacterium tuberculosis strains. Journal of clinical microbiology. 2012;50(12):3831–3837. - PMC - PubMed

MeSH terms

Substances

LinkOut - more resources