Will we ever be able to accurately predict solubility?
- PMID: 38499581
- PMCID: PMC10948805
- DOI: 10.1038/s41597-024-03105-6
Will we ever be able to accurately predict solubility?
Abstract
Accurate prediction of thermodynamic solubility by machine learning remains a challenge. Recent models often display good performances, but their reliability may be deceiving when used prospectively. This study investigates the origins of these discrepancies, following three directions: a historical perspective, an analysis of the aqueous solubility dataverse and data quality. We investigated over 20 years of published solubility datasets and models, highlighting overlooked datasets and the overlaps between popular sets. We benchmarked recently published models on a novel curated solubility dataset and report poor performances. We also propose a workflow to cure aqueous solubility data aiming at producing useful models for bench chemist. Our results demonstrate that some state-of-the-art models are not ready for public usage because they lack a well-defined applicability domain and overlook historical data sources. We report the impact of factors influencing the utility of the models: interlaboratory standard deviation, ionic state of the solute and data sources. The herein obtained models, and quality-assessed datasets are publicly available.
© 2024. The Author(s).
Conflict of interest statement
C. Minoletti and P. Llompart are Sanofi employees and may hold shares and/or stock options in the company. S. Baybekov, D. Horvath, G. Marcou, and A. Varnek have nothing to disclose.
Figures















Similar articles
-
Pruned Machine Learning Models to Predict Aqueous Solubility.ACS Omega. 2020 Jul 1;5(27):16562-16567. doi: 10.1021/acsomega.0c01251. eCollection 2020 Jul 14. ACS Omega. 2020. PMID: 32685821 Free PMC article.
-
Predicting absolute aqueous solubility by applying a machine learning model for an artificially liquid-state as proxy for the solid-state.J Comput Aided Mol Des. 2023 Dec;37(12):765-789. doi: 10.1007/s10822-023-00538-w. Epub 2023 Oct 25. J Comput Aided Mol Des. 2023. PMID: 37878216
-
ADME prediction with KNIME: A retrospective contribution to the second "Solubility Challenge".ADMET DMPK. 2021 Jul 12;9(3):209-218. doi: 10.5599/admet.979. eCollection 2021. ADMET DMPK. 2021. PMID: 35300359 Free PMC article.
-
Mechanistically transparent models for predicting aqueous solubility of rigid, slightly flexible, and very flexible drugs (MW<2000) Accuracy near that of random forest regression.ADMET DMPK. 2023 Aug 21;11(3):317-330. doi: 10.5599/admet.1879. eCollection 2023. ADMET DMPK. 2023. PMID: 37829322 Free PMC article. Review.
-
Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes.Artif Intell Med. 2019 Jul;98:109-134. doi: 10.1016/j.artmed.2019.07.007. Epub 2019 Jul 26. Artif Intell Med. 2019. PMID: 31383477 Review.
Cited by
-
Prediction of the water solubility by a graph convolutional-based neural network on a highly curated dataset.J Cheminform. 2025 Apr 21;17(1):55. doi: 10.1186/s13321-025-01000-9. J Cheminform. 2025. PMID: 40259418 Free PMC article.
-
Benchmarking quantum chemical methods with X-ray structures via structure-specific restraints.IUCrJ. 2025 Jul 1;12(Pt 4):472-487. doi: 10.1107/S2052252525004543. IUCrJ. 2025. PMID: 40530596 Free PMC article.
-
Physics-Based Solubility Prediction for Organic Molecules.Chem Rev. 2025 Aug 13;125(15):7057-7098. doi: 10.1021/acs.chemrev.4c00855. Epub 2025 Jul 29. Chem Rev. 2025. PMID: 40728940 Free PMC article. Review.
-
Data-driven organic solubility prediction at the limit of aleatoric uncertainty.Nat Commun. 2025 Aug 19;16(1):7497. doi: 10.1038/s41467-025-62717-7. Nat Commun. 2025. PMID: 40830351 Free PMC article.
-
Advancing Aqueous Solubility Prediction: A Machine Learning Approach for Organic Compounds Using a Curated Data Set.J Chem Inf Model. 2025 Aug 25;65(16):8426-8434. doi: 10.1021/acs.jcim.4c02399. Epub 2025 Aug 10. J Chem Inf Model. 2025. PMID: 40783839
References
-
- Kennedy T. Managing the drug discovery/development interface. Drug Discov. Today. 1997;2:436–444. doi: 10.1016/S1359-6446(97)01099-4. - DOI
-
- Jouyban A, Abolghassemi Fakhree MA. Solubility prediction methods for drug/drug like molecules. Recent Pat. Chem. Eng. 2008;1:220–231. doi: 10.2174/2211334710801030220. - DOI
LinkOut - more resources
Full Text Sources