Advancing Aqueous Solubility Prediction: A Machine Learning Approach for Organic Compounds Using a Curated Data Set
- PMID: 40783839
- PMCID: PMC12382903
- DOI: 10.1021/acs.jcim.4c02399
Advancing Aqueous Solubility Prediction: A Machine Learning Approach for Organic Compounds Using a Curated Data Set
Abstract
Aqueous solubility is one key property of a chemical compound that determines its possible use in different applications, from drug development to materials sciences. In this work, we present a model for the prediction of aqueous solubility that leverages a curated data set merged from four distinct sources. This data set encompasses a diverse range of organic compounds, providing a robust foundation for our investigation of solubility prediction. Our approach involves employing a variety of machine learning and deep learning models that combine an extensive array of chemical descriptors, fingerprints, and functional groups. This methodology is designed to address the complexities of solubility prediction and is tailored to achieve high accuracy and generalization. We tested the finalized model on a diverse data set of 1282 unique organic compounds from the Huuskonen data set. The results of our analysis demonstrate the success of our model, which, given an R2 value of 0.92 and an MAE value of 0.40, outperforms existing prediction methods for aqueous solubility on one of the most diverse data sets in the field.
Similar articles
-
Prescription of Controlled Substances: Benefits and Risks.2025 Jul 6. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. 2025 Jul 6. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. PMID: 30726003 Free Books & Documents.
-
Development of Machine Learning-based Algorithms to Predict the 2- and 5-year Risk of TKA After Tibial Plateau Fracture Treatment.Clin Orthop Relat Res. 2025 Mar 12;483(9):1731-1743. doi: 10.1097/CORR.0000000000003442. Clin Orthop Relat Res. 2025. PMID: 40106382
-
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12. Clin Orthop Relat Res. 2024. PMID: 37306629 Free PMC article.
-
Approaches for predicting dairy cattle methane emissions: from traditional methods to machine learning.J Anim Sci. 2024 Jan 3;102:skae219. doi: 10.1093/jas/skae219. J Anim Sci. 2024. PMID: 39123286 Free PMC article.
-
A systematic review and individual patient data meta-analysis of prognostic factors for foot ulceration in people with diabetes: the international research collaboration for the prediction of diabetic foot ulcerations (PODUS).Health Technol Assess. 2015 Jul;19(57):1-210. doi: 10.3310/hta19570. Health Technol Assess. 2015. PMID: 26211920 Free PMC article.
References
-
- Luo J., Su Q., Zhai X., Zou Y., Yu Q.. An improved gravimetric method with anti-solvent addition to measure the solubility of d-allulose in water. J. Food Eng. 2023;355:111582. doi: 10.1016/j.jfoodeng.2023.111582. - DOI
-
- Hückel W.. Solubility of non-electrolytes. Von Prof. Joel H. Hildebrand. 203 Seiten. Reinhold Publishing Corporation, New York 1936. Preis geb. $4,50. Angew. Chem. Weinheim Bergstr. Ger. 1936;49:703–704. doi: 10.1002/ange.19360493815. - DOI
MeSH terms
Substances
LinkOut - more resources
Full Text Sources