Deep learning on genomic sequences for rapid identification of drug-resistant tuberculosis
- PMID: 41402010
- DOI: 10.1016/j.ijtb.2025.11.020
Deep learning on genomic sequences for rapid identification of drug-resistant tuberculosis
Abstract
Background: The bacteria Mycobacterium tuberculosis (MTB) cause tuberculosis (TB), which is still a major public health problem around the world. This is especially true now that drug-resistant forms of TB like Multi-Drug Resistant (MDR) and Extensively Drug-Resistant (XDR) are becoming more common. Traditional ways of diagnosing drug resistance take a lot of time and resources, which means they can't be used in places with few resources. Next-generation sequencing (NGS) advances give us a lot of genetic data that can show us changes that make bacteria resistant. But, understanding this complicated data needs strong computing methods.
Methods: This study creates and tests deep learning models that can quickly and correctly guess drug resistance traits from Mtb's raw genome sequences. We used big, freely available whole-genome sequencing files that were marked up with results from drug resistance tests. Different types of model designs were looked at, such as Transformer-based models, Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs), along with different ways of storing data. Quality control, variant naming, and normalisation were all parts of data preparation. To make sure that all resistance phenotypes were fairly represented, models were trained with stratified training-validation splits. Metrics like accuracy, precision, recall, F1-score, and AUC-ROC were used to measure performance.
Results: The Transformer model did better than CNN and RNN designs, with a validation accuracy of 93.5 %, a precision of 91.8 %, a recall of 89.9 %, an F1-score of 90.8 %, and an AUC-ROC of 95.7 %. It also showed faster convergence and less training/validation loss, which showed that it could understand both local and global sequence relationships. Deep learning models were better at predicting the future than common machine learning baselines like logistic regression, random forests, and gradient boosting.
Conclusion: Deep learning methods, especially Transformer-based models, show a lot of potential for quickly predicting drug-resistant TB based on the genome. These models can automatically learn features from raw sequences, which cuts down on the need for human feature engineering and makes it possible to do accurate assessments on a large scale. Future work will focus on adding more types of datasets, making them easier to understand, and combining multi-omics data to make predictions even more accurate and useful in clinical settings.
Keywords: Deep learning; Drug resistance prediction; Transformer models; Tuberculosis; Whole-genome sequencing.
Copyright © 2025. Published by Elsevier B.V.
Conflict of interest statement
Conflict of interest The authors declare no conflict of interest related to the content of this manuscript. No financial or non-financial relationships influenced the study design, execution, data interpretation, or manuscript preparation.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
