Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis
- PMID: 24968215
- PMCID: PMC4951206
- DOI: 10.1021/ci500264r
Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis
Abstract
Tuberculosis is a major, neglected disease for which the quest to find new treatments continues. There is an abundance of data from large phenotypic screens in the public domain against Mycobacterium tuberculosis (Mtb). Since machine learning methods can learn from past data, we were interested in addressing whether more data builds better models. We now describe using Bayesian machine learning to assess whether we can improve our models by combining the large quantities of single-point data with the much smaller (higher quality) dual-event data sets, which use both dose-response data for both whole-cell antitubercular activity and Vero cell cytotoxicity. We have evaluated 12 models ranging from different single-point, dual-event dose-response, single-point and dual-event dose-response as well as combined data sets for three distinct data sets from the same laboratory. We used a fourth data set of active and inactive compounds from the same group as well as a smaller set of 177 active compounds from GlaxoSmithKline as test sets. Our data suggest combining single-point with dual-event dose-response data does not diminish the internal or external predictive ability of the models based on the receiver operator curve (ROC) for these models (internal ROC range 0.83-0.91, external ROC range 0.62-0.83) compared to the orders of magnitude smaller dual-event models (internal ROC range 0.6-0.83 and external ROC 0.54-0.83). In conclusion, models developed with 1200-5000 compounds appear to be as predictive as those generated with 25 000-350 000 molecules. Our results have implications for justifying further high-throughput screening versus focused testing based on model predictions.
Conflict of interest statement
SE is a consultant for Collaborative Drug Discovery, Inc.
Figures


Similar articles
-
Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation.J Chem Inf Model. 2013 Nov 25;53(11):3054-63. doi: 10.1021/ci400480s. Epub 2013 Oct 30. J Chem Inf Model. 2013. PMID: 24144044 Free PMC article.
-
Bayesian models leveraging bioactivity and cytotoxicity information for drug discovery.Chem Biol. 2013 Mar 21;20(3):370-8. doi: 10.1016/j.chembiol.2013.01.011. Chem Biol. 2013. PMID: 23521795 Free PMC article.
-
Enhancing hit identification in Mycobacterium tuberculosis drug discovery using validated dual-event Bayesian models.PLoS One. 2013 May 7;8(5):e63240. doi: 10.1371/journal.pone.0063240. Print 2013. PLoS One. 2013. PMID: 23667592 Free PMC article.
-
Combining computational methods for hit to lead optimization in Mycobacterium tuberculosis drug discovery.Pharm Res. 2014 Feb;31(2):414-35. doi: 10.1007/s11095-013-1172-7. Epub 2013 Oct 17. Pharm Res. 2014. PMID: 24132686 Free PMC article.
-
Comprehensive analysis of methods used for the evaluation of compounds against Mycobacterium tuberculosis.Tuberculosis (Edinb). 2012 Nov;92(6):453-88. doi: 10.1016/j.tube.2012.07.003. Epub 2012 Aug 30. Tuberculosis (Edinb). 2012. PMID: 22940006 Review.
Cited by
-
Machine Learning Models for Mycobacterium tuberculosisIn Vitro Activity: Prediction and Target Visualization.Mol Pharm. 2022 Feb 7;19(2):674-689. doi: 10.1021/acs.molpharmaceut.1c00791. Epub 2021 Dec 29. Mol Pharm. 2022. PMID: 34964633 Free PMC article.
-
Molecule Property Analyses of Active Compounds for Mycobacterium tuberculosis.J Med Chem. 2020 Sep 10;63(17):8917-8955. doi: 10.1021/acs.jmedchem.9b02075. Epub 2020 Apr 20. J Med Chem. 2020. PMID: 32259446 Free PMC article. Review.
-
Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery.Mol Pharm. 2018 Oct 1;15(10):4346-4360. doi: 10.1021/acs.molpharmaceut.8b00083. Epub 2018 Apr 26. Mol Pharm. 2018. PMID: 29672063 Free PMC article.
-
Combining Metabolite-Based Pharmacophores with Bayesian Machine Learning Models for Mycobacterium tuberculosis Drug Discovery.PLoS One. 2015 Oct 30;10(10):e0141076. doi: 10.1371/journal.pone.0141076. eCollection 2015. PLoS One. 2015. PMID: 26517557 Free PMC article.
-
Collaborative drug discovery for More Medicines for Tuberculosis (MM4TB).Drug Discov Today. 2017 Mar;22(3):555-565. doi: 10.1016/j.drudis.2016.10.009. Epub 2016 Nov 22. Drug Discov Today. 2017. PMID: 27884746 Free PMC article. Review.
References
-
- Anon Global tuberculosis report 2013. http://www.who.int/tb/publications/global_report/en/
-
- Zhang Y. The magic bullets and tuberculosis drug targets. Annu Rev Pharmacol Toxicol. 2005;45:529–64. - PubMed
-
- Zumla AI, Gillespie SH, Hoelscher M, Philips PP, Cole ST, Abubakar I, McHugh TD, Schito M, Maeurer M, Nunn AJ. New antituberculosis drugs, regimens, and adjunct therapies: needs, advances, and future prospects. Lancet Infect Dis. 2014;14:327–340. - PubMed
-
- Ponder EL, Freundlich JS, Sarker M, Ekins S. Computational Models for Neglected Diseases: Gaps and Opportunities. Pharm Res. 2014;31:271–7. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous