Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation
- PMID: 24144044
- PMCID: PMC3910492
- DOI: 10.1021/ci400480s
Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation
Abstract
The search for new tuberculosis treatments continues as we need to find molecules that can act more quickly, be accommodated in multidrug regimens, and overcome ever increasing levels of drug resistance. Multiple large scale phenotypic high-throughput screens against Mycobacterium tuberculosis (Mtb) have generated dose response data, enabling the generation of machine learning models. These models also incorporated cytotoxicity data and were recently validated with a large external data set. A cheminformatics data-fusion approach followed by Bayesian machine learning, Support Vector Machine, or Recursive Partitioning model development (based on publicly available Mtb screening data) was used to compare individual data sets and subsequent combined models. A set of 1924 commercially available molecules with promising antitubercular activity (and lack of relative cytotoxicity to Vero cells) were used to evaluate the predictive nature of the models. We demonstrate that combining three data sets incorporating antitubercular and cytotoxicity data in Vero cells from our previous screens results in external validation receiver operator curve (ROC) of 0.83 (Bayesian or RP Forest). Models that do not have the highest 5-fold cross-validation ROC scores can outperform other models in a test set dependent manner. We demonstrate with predictions for a recently published set of Mtb leads from GlaxoSmithKline that no single machine learning model may be enough to identify compounds of interest. Data set fusion represents a further useful strategy for machine learning construction as illustrated with Mtb. Coverage of chemistry and Mtb target spaces may also be limiting factors for the whole-cell screening data generated to date.
Conflict of interest statement
SE is a consultant for Collaborative Drug Discovery, Inc.
Figures



Similar articles
-
Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis.J Chem Inf Model. 2014 Jul 28;54(7):2157-65. doi: 10.1021/ci500264r. Epub 2014 Jul 17. J Chem Inf Model. 2014. PMID: 24968215 Free PMC article.
-
Combining computational methods for hit to lead optimization in Mycobacterium tuberculosis drug discovery.Pharm Res. 2014 Feb;31(2):414-35. doi: 10.1007/s11095-013-1172-7. Epub 2013 Oct 17. Pharm Res. 2014. PMID: 24132686 Free PMC article.
-
Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery.Mol Pharm. 2018 Oct 1;15(10):4346-4360. doi: 10.1021/acs.molpharmaceut.8b00083. Epub 2018 Apr 26. Mol Pharm. 2018. PMID: 29672063 Free PMC article.
-
Enhancing hit identification in Mycobacterium tuberculosis drug discovery using validated dual-event Bayesian models.PLoS One. 2013 May 7;8(5):e63240. doi: 10.1371/journal.pone.0063240. Print 2013. PLoS One. 2013. PMID: 23667592 Free PMC article.
-
The importance of choosing a proper validation strategy in predictive models. A tutorial with real examples.Anal Chim Acta. 2023 Sep 22;1275:341532. doi: 10.1016/j.aca.2023.341532. Epub 2023 Jun 17. Anal Chim Acta. 2023. PMID: 37524478 Review.
Cited by
-
Intrabacterial Metabolism Obscures the Successful Prediction of an InhA Inhibitor of Mycobacterium tuberculosis.ACS Infect Dis. 2019 Dec 13;5(12):2148-2163. doi: 10.1021/acsinfecdis.9b00295. Epub 2019 Nov 5. ACS Infect Dis. 2019. PMID: 31625383 Free PMC article.
-
High-Throughput Phenotypic Screening and Machine Learning Methods Enabled the Selection of Broad-Spectrum Low-Toxicity Antitrypanosomatidic Agents.J Med Chem. 2023 Nov 23;66(22):15230-15255. doi: 10.1021/acs.jmedchem.3c01322. Epub 2023 Nov 3. J Med Chem. 2023. PMID: 37921561 Free PMC article.
-
Molecule Property Analyses of Active Compounds for Mycobacterium tuberculosis.J Med Chem. 2020 Sep 10;63(17):8917-8955. doi: 10.1021/acs.jmedchem.9b02075. Epub 2020 Apr 20. J Med Chem. 2020. PMID: 32259446 Free PMC article. Review.
-
Machine Learning Model Analysis and Data Visualization with Small Molecules Tested in a Mouse Model of Mycobacterium tuberculosis Infection (2014-2015).J Chem Inf Model. 2016 Jul 25;56(7):1332-43. doi: 10.1021/acs.jcim.6b00004. Epub 2016 Jul 1. J Chem Inf Model. 2016. PMID: 27335215 Free PMC article.
-
Machine Learning Models for Mycobacterium tuberculosisIn Vitro Activity: Prediction and Target Visualization.Mol Pharm. 2022 Feb 7;19(2):674-689. doi: 10.1021/acs.molpharmaceut.1c00791. Epub 2021 Dec 29. Mol Pharm. 2022. PMID: 34964633 Free PMC article.
References
-
- Balganesh TS, Alzari PM, Cole ST. Rising standards for tuberculosis drug development. Trends Pharmacol Sci. 2008;29:576–581. - PubMed
-
- Zhang Y. The magic bullets and tuberculosis drug targets. Annu Rev Pharmacol Toxicol. 2005;45:529–564. - PubMed
-
- Cole ST, Brosch R, Parkhill J, Garnier T, Churcher C, Harris D, Gordon SV, Eiglmeier K, Gas S, Barry CE, 3rd., Tekaia F, Badcock K, Basham D, Brown D, Chillingworth T, Connor R, Davies R, Devlin K, Feltwell T, Gentles S, Hamlin N, Holroyd S, Hornsby T, Jagels K, Krogh A, McLean J, Moule S, Murphy L, Oliver K, Osborne J, Quail MA, Rajandream MA, Rogers J, Rutter S, Seeger K, Skelton J, Squares R, Squares S, Sulston JE, Taylor K, Whitehead S, Barrell BG. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature. 1998;393(6685):537–544. - PubMed
-
- Koul A, Arnoult E, Lounis N, Guillemont J, Andries K. The challenge of new drug discovery for tuberculosis. Nature. 2011;469(7331):483–490. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources