Improved Machine Learning Predictions of EC50s Using Uncertainty Estimation from Dose-Response Data
- PMID: 40384077
- PMCID: PMC12152940
- DOI: 10.1021/acs.jcim.5c00249
Improved Machine Learning Predictions of EC50s Using Uncertainty Estimation from Dose-Response Data
Abstract
In early-stage drug design, machine learning models often rely on compressed representations of data, where raw experimental results are distilled into a single metric per molecule through curve fitting. This process discards valuable information about the quality of the curve fit. In this study, we incorporated a fit-quality metric into machine learning models to capture the reliability of metrics for individual molecules. Using 40 data sets from PubChem (public) and BASF (private), we demonstrated that including this quality metric can significantly improve predictive performance without additional experiments. Four methods were tested: random forests with parametric bootstrap, weighted random forests, variable output smearing random forests, and weighted support vector regression. When using fit-quality metrics, at least one of these methods led to a statistically significant improvement on 31 of the 40 data sets. In the best case, these methods led to a 22% reduction in the root-mean-squared error of the models. Overall, our results demonstrate that by adapting data processing to account for curve fit quality, we can improve predictive performance across a range of different data sets.
Figures






Similar articles
-
General Approach to Estimate Error Bars for Quantitative Structure-Activity Relationship Predictions of Molecular Activity.J Chem Inf Model. 2018 Aug 27;58(8):1561-1575. doi: 10.1021/acs.jcim.8b00114. Epub 2018 Jul 17. J Chem Inf Model. 2018. PMID: 29949366
-
A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification.Metabolomics. 2019 Nov 15;15(12):150. doi: 10.1007/s11306-019-1612-4. Metabolomics. 2019. PMID: 31728648 Free PMC article.
-
Uncertainty-based saltwater intrusion prediction using integrated Bayesian machine learning modeling (IBMLM) in a deep aquifer.J Environ Manage. 2024 Mar;354:120252. doi: 10.1016/j.jenvman.2024.120252. Epub 2024 Feb 22. J Environ Manage. 2024. PMID: 38394869
-
American society of anesthesiologists physical status classification significantly affects the performances of machine learning models in intraoperative hypotension inference.J Clin Anesth. 2024 Feb;92:111309. doi: 10.1016/j.jclinane.2023.111309. Epub 2023 Nov 2. J Clin Anesth. 2024. PMID: 37922642 Free PMC article.
-
A decision-theoretic approach to the evaluation of machine learning algorithms in computational drug discovery.Bioinformatics. 2019 Nov 1;35(22):4656-4663. doi: 10.1093/bioinformatics/btz293. Bioinformatics. 2019. PMID: 31070704 Free PMC article.
References
-
- Gomes M. N., Braga R. C., Grzelak E. M., Neves B. J., Muratov E., Ma R., Klein L. L., Cho S., Oliveira G. R., Franzblau S. G.. et al. QSAR-driven design, synthesis and discovery of potent chalcone derivatives with antitubercular activity. Eur. J. Med. Chem. 2017;137:126–138. doi: 10.1016/j.ejmech.2017.05.026. - DOI - PMC - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources