Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 9;65(11):5623-5634.
doi: 10.1021/acs.jcim.5c00249. Epub 2025 May 19.

Improved Machine Learning Predictions of EC50s Using Uncertainty Estimation from Dose-Response Data

Affiliations

Improved Machine Learning Predictions of EC50s Using Uncertainty Estimation from Dose-Response Data

Hugo Bellamy et al. J Chem Inf Model. .

Abstract

In early-stage drug design, machine learning models often rely on compressed representations of data, where raw experimental results are distilled into a single metric per molecule through curve fitting. This process discards valuable information about the quality of the curve fit. In this study, we incorporated a fit-quality metric into machine learning models to capture the reliability of metrics for individual molecules. Using 40 data sets from PubChem (public) and BASF (private), we demonstrated that including this quality metric can significantly improve predictive performance without additional experiments. Four methods were tested: random forests with parametric bootstrap, weighted random forests, variable output smearing random forests, and weighted support vector regression. When using fit-quality metrics, at least one of these methods led to a statistically significant improvement on 31 of the 40 data sets. In the best case, these methods led to a 22% reduction in the root-mean-squared error of the models. Overall, our results demonstrate that by adapting data processing to account for curve fit quality, we can improve predictive performance across a range of different data sets.

PubMed Disclaimer

Figures

1
1
(a) Reliable vs (b) unreliable fit of the Hill equation to experimental data points.
2
2
Graphical examples of regression and Bayesian fitting procedures.
3
3
Schematic comparing the standard approach to our modified testing procedure. The difference is in which values are tested: in this study we test how well we can predict experimental values rather than estimated EC50 values. The point where arrows pointing in opposite directions meet is where the evaluation metric is calculated.
4
4
Number of times uncertainty information caused model performance to be better, significantly better, the same and worse, than the equivalent model that did not use this information on the PubChem data sets. PB-RF, random forest with parametric bootstrap; W-RF, weighted random forest; VOS, random forest with variable output smearing; SVR, WSVR, weighted support vector regression.
5
5
Number of times uncertainty information caused model performance to be better, significantly better, the same and worse, than the equivalent model that did not use this information on the BASF data sets. PB-RF, random forest with parametric bootstrap; W-RF, weighted random forest; VOS, random forest with variable output smearing; SVR, WSVR, weighted support vector regression.
6
6
Change in root mean squared error as α is changed on data set AID 449756.

Similar articles

References

    1. Tropsha A.. Best practices for QSAR model development, validation, and exploitation. Molecular Informatics. 2010;29:476–488. doi: 10.1002/minf.201000061. - DOI - PubMed
    1. Cherkasov A., Muratov E. N., Fourches D., Varnek A., Baskin I. I., Cronin M., Dearden J., Gramatica P., Martin Y. C., Todeschini R.. et al. QSAR modeling: where have you been? Where are you going to? J. Med. Chem. 2014;57:4977–5010. doi: 10.1021/jm4004285. - DOI - PMC - PubMed
    1. Zhang L., Fourches D., Sedykh A., Zhu H., Golbraikh A., Ekins S., Clark J., Connelly M. C., Sigal M., Hodges D.. et al. Discovery of novel antimalarial compounds enabled by QSAR-based virtual screening. J. Chem. Inf. Model. 2013;53:475–492. doi: 10.1021/ci300421n. - DOI - PMC - PubMed
    1. Gomes M. N., Braga R. C., Grzelak E. M., Neves B. J., Muratov E., Ma R., Klein L. L., Cho S., Oliveira G. R., Franzblau S. G.. et al. QSAR-driven design, synthesis and discovery of potent chalcone derivatives with antitubercular activity. Eur. J. Med. Chem. 2017;137:126–138. doi: 10.1016/j.ejmech.2017.05.026. - DOI - PMC - PubMed
    1. Macalino S. J. Y., Gosu V., Hong S., Choi S.. Role of computer-aided drug design in modern drug discovery. Archives of Pharmacal Research. 2015;38:1686–1701. doi: 10.1007/s12272-015-0640-5. - DOI - PubMed

LinkOut - more resources