Bridging Machine Learning and Thermodynamics for Accurate p K a Prediction
- PMID: 39328749
- PMCID: PMC11423309
- DOI: 10.1021/jacsau.4c00271
Bridging Machine Learning and Thermodynamics for Accurate p K a Prediction
Abstract
Integrating scientific principles into machine learning models to enhance their predictive performance and generalizability is a central challenge in the development of AI for Science. Herein, we introduce Uni-pK a, a novel framework that successfully incorporates thermodynamic principles into machine learning modeling, achieving high-precision predictions of acid dissociation constants (pK a), a crucial task in the rational design of drugs and catalysts, as well as a modeling challenge in computational physical chemistry for small organic molecules. Uni-pK a utilizes a comprehensive free energy model to represent molecular protonation equilibria accurately. It features a structure enumerator that reconstructs molecular configurations from pK a data, coupled with a neural network that functions as a free energy predictor, ensuring high-throughput, data-driven prediction while preserving thermodynamic consistency. Employing a pretraining-finetuning strategy with both predicted and experimental pK a data, Uni-pK a not only achieves state-of-the-art accuracy in chemoinformatics but also shows comparable precision to quantum mechanics-based methods.
© 2024 The Authors. Published by American Chemical Society.
Conflict of interest statement
The authors declare no competing financial interest.
Figures



Similar articles
-
Overview of the SAMPL6 pKa challenge: evaluating small molecule microscopic and macroscopic pKa predictions.J Comput Aided Mol Des. 2021 Feb;35(2):131-166. doi: 10.1007/s10822-020-00362-6. Epub 2021 Jan 4. J Comput Aided Mol Des. 2021. PMID: 33394238 Free PMC article.
-
Computational Predictions of Nonclinical Pharmacokinetics at the Drug Design Stage.J Chem Inf Model. 2023 Jan 23;63(2):442-458. doi: 10.1021/acs.jcim.2c01134. Epub 2023 Jan 3. J Chem Inf Model. 2023. PMID: 36595708
-
pKa measurements for the SAMPL6 prediction challenge for a set of kinase inhibitor-like fragments.J Comput Aided Mol Des. 2018 Oct;32(10):1117-1138. doi: 10.1007/s10822-018-0168-0. Epub 2018 Nov 7. J Comput Aided Mol Des. 2018. PMID: 30406372 Free PMC article.
-
Accurate Prediction of Aqueous Free Solvation Energies Using 3D Atomic Feature-Based Graph Neural Network with Transfer Learning.J Chem Inf Model. 2022 Apr 25;62(8):1840-1848. doi: 10.1021/acs.jcim.2c00260. Epub 2022 Apr 14. J Chem Inf Model. 2022. PMID: 35422122 Free PMC article. Review.
-
Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes.Artif Intell Med. 2019 Jul;98:109-134. doi: 10.1016/j.artmed.2019.07.007. Epub 2019 Jul 26. Artif Intell Med. 2019. PMID: 31383477 Review.
Cited by
-
A workflow to create a high-quality protein-ligand binding dataset for training, validation, and prediction tasks.Digit Discov. 2025 Apr 2;4(5):1209-1220. doi: 10.1039/d4dd00357h. eCollection 2025 May 14. Digit Discov. 2025. PMID: 40190768 Free PMC article.
-
pKa prediction in non-aqueous solvents.J Comput Chem. 2025 Jan 5;46(1):e27517. doi: 10.1002/jcc.27517. J Comput Chem. 2025. PMID: 39661411 Free PMC article.
-
Interpretable Deep-Learning pKa Prediction for Small Molecule Drugs via Atomic Sensitivity Analysis.J Chem Inf Model. 2025 Jan 13;65(1):101-113. doi: 10.1021/acs.jcim.4c01472. Epub 2024 Dec 30. J Chem Inf Model. 2025. PMID: 39801290 Free PMC article.
-
Computational tools for the prediction of site- and regioselectivity of organic reactions.Chem Sci. 2025 Mar 4;16(13):5383-5412. doi: 10.1039/d5sc00541h. eCollection 2025 Mar 26. Chem Sci. 2025. PMID: 40070469 Free PMC article. Review.
-
Developing a Machine Learning Model for Hydrogen Bond Acceptance Based on Natural Bond Orbital Descriptors.J Org Chem. 2025 Jul 18;90(28):9776-9788. doi: 10.1021/acs.joc.5c00724. Epub 2025 Jul 6. J Org Chem. 2025. PMID: 40619683 Free PMC article.
References
-
- Jablonka K. M.; Ai Q.; Al-Feghali A.; Badhwar S.; Bocarsly J. D.; Bran A. M.; Stefan Bringuier L.; Brinson C.; Choudhary K.; Circi D.; et al. 14 examples of how llms can transform materials science and chemistry: a reflection on a large language model hackathon. Digital Discovery 2023, 2 (5), 1233–1250. 10.1039/D3DD00113J. - DOI - PMC - PubMed
-
- Nandy A.; Duan C.; Kulik H. J. Audacity of huge: overcoming challenges of data scarcity and data quality for machine learning in computational materials discovery. Curr. Opin. Chem. Eng. 2022, 36, 10077810.1016/j.coche.2021.100778. - DOI
-
- Frey N. C.; Soklaski R.; Axelrod S.; Samsi S.; Gomez-Bombarelli R.; Coley C. W.; Gadepally V. Neural scaling of deep chemical models.. Nat. Mach. Intell. 2023, 5 (11), 1297–1305. 10.1038/s42256-023-00740-3. - DOI
LinkOut - more resources
Full Text Sources