Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun;26(3):1345-1356.
doi: 10.1007/s11030-021-10238-y. Epub 2021 Jun 10.

Development and validation of consensus machine learning-based models for the prediction of novel small molecules as potential anti-tubercular agents

Affiliations

Development and validation of consensus machine learning-based models for the prediction of novel small molecules as potential anti-tubercular agents

Mushtaq Ahmad Wani et al. Mol Divers. 2022 Jun.

Abstract

Tuberculosis (TB) is an infectious disease and the leading cause of death globally. The rapidly emerging cases of drug resistance among pathogenic mycobacteria have been a global threat urging the need of new drug discovery and development. However, considering the fact that the new drug discovery and development is commonly lengthy and costly processes, strategic use of the cutting-edge machine learning (ML) algorithms may be very supportive in reducing both the cost and time involved. Considering the urgency of new drugs for TB, herein, we have attempted to develop predictive ML algorithms-based models useful in the selection of novel potential small molecules for subsequent in vitro validation. For this purpose, we used the GlaxoSmithKline (GSK) TCAMS TB dataset comprising a total of 776 hits that were made publicly available to the wider scientific community through the ChEMBL Neglected Tropical Diseases (ChEMBL-NTD) database. After exploring the different ML classifiers, viz. decision trees (DT), support vector machine (SVM), random forest (RF), Bernoulli Naive Bayes (BNB), K-nearest neighbors (k-NN), and linear logistic regression (LLR), and ensemble learning models (bagging and Adaboost) for training the model using the GSK dataset, we concluded with three best models, viz. Adaboost decision tree (ABDT), RF classifier, and k-NN models that gave the top prediction results for both the training and test sets. However, during the prediction of the external set of known anti-tubercular compounds/drugs, it was realized that each of these models had some limitations. The ABDT model correctly predicted 22 molecules as actives, while both the RF and k-NN models predicted 18 molecules correctly as actives; a number of molecules were predicted as actives by two of these models, while the third model predicted these compounds as inactives. Therefore, we concluded that while deciding the anti-tubercular potential of a new molecule, one should rely on the use of consensus predictions using these three models; it may lessen the attrition rate during the in vitro validation. We believe that this study may assist the wider anti-tuberculosis research community by providing a platform for predicting small molecules with subsequent validation for drug discovery and development.

Keywords: ABDT; Machine learning; Mycobacterium tuberculosis; RF; Tuberculosis.

PubMed Disclaimer

References

    1. Global tuberculosis report (2020) World Health Organization: switzerland. https://apps.who.int/iris/bitstream/handle/10665/336069/9789240013131-en...
    1. What is DOTS?: A guide to understanding the WHO-recommended TB Control Strategy Known as DOTS. (1999), World Health Organization, Switzerland. https://apps.who.int/iris/handle/10665/65979
    1. Corbett EL, Watt CJ, Walker N, Maher D, Williams BG, Raviglione MC, Dye C (2003) The growing burden of tuberculosis: global trends and interactions with the HIV epidemic. Arch Intern Med 163(9):1009–10021 - DOI
    1. Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, Li B, Madabhushi A, Shah P, Spitzer M, Zhao S (2019) Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 18(6):463–477 - DOI
    1. Chibani S, Coudert F-X (2020) Machine learning approaches for the prediction of materials properties. APL Mater 8(8):080701 - DOI

Substances

LinkOut - more resources