. 2024 Jul 17;4(9):3451-3465.

doi: 10.1021/jacsau.4c00271. eCollection 2024 Sep 23.

Bridging Machine Learning and Thermodynamics for Accurate p K _a Prediction

Weiliang Luo^{1

2}, Gengmo Zhou^{2

3}, Zhengdan Zhu², Yannan Yuan², Guolin Ke², Zhewei Wei³, Zhifeng Gao², Hang Zheng²

Affiliations

¹ Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.
² DP Technology, Beijing 100089, China.
³ Gaoling School of Artificial Intelligence, Renmin University of China, Beijing 100872, China.

PMID: 39328749
PMCID: PMC11423309
DOI: 10.1021/jacsau.4c00271

Bridging Machine Learning and Thermodynamics for Accurate p K _a Prediction

Weiliang Luo et al. JACS Au. 2024.

. 2024 Jul 17;4(9):3451-3465.

doi: 10.1021/jacsau.4c00271. eCollection 2024 Sep 23.

Authors

Weiliang Luo^{1

2}, Gengmo Zhou^{2

3}, Zhengdan Zhu², Yannan Yuan², Guolin Ke², Zhewei Wei³, Zhifeng Gao², Hang Zheng²

Affiliations

¹ Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.
² DP Technology, Beijing 100089, China.
³ Gaoling School of Artificial Intelligence, Renmin University of China, Beijing 100872, China.

PMID: 39328749
PMCID: PMC11423309
DOI: 10.1021/jacsau.4c00271

Abstract

Integrating scientific principles into machine learning models to enhance their predictive performance and generalizability is a central challenge in the development of AI for Science. Herein, we introduce Uni-pK _a, a novel framework that successfully incorporates thermodynamic principles into machine learning modeling, achieving high-precision predictions of acid dissociation constants (pK _a), a crucial task in the rational design of drugs and catalysts, as well as a modeling challenge in computational physical chemistry for small organic molecules. Uni-pK _a utilizes a comprehensive free energy model to represent molecular protonation equilibria accurately. It features a structure enumerator that reconstructs molecular configurations from pK _a data, coupled with a neural network that functions as a free energy predictor, ensuring high-throughput, data-driven prediction while preserving thermodynamic consistency. Employing a pretraining-finetuning strategy with both predicted and experimental pK _a data, Uni-pK _a not only achieves state-of-the-art accuracy in chemoinformatics but also shows comparable precision to quantum mechanics-based methods.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

**Scheme 1. Schematic Overview of Uni-pK_a Framework**
(A) Data preparation workflow. We implement a microstate enumerator to systematically build the protonation ensemble from a single structure. (B) Pretraining workflow. Our pretraining strategy combines 1 weakly supervised task, pK_a-prediction, and 3 self-supervised pretraining tasks, masked atom prediction, masked charge prediction, and 3D position recovery, to make the most use of the chemical information in 3 million microstate structures. In the pK_a-prediction task, we introduce a free energy-to-pK_a (FE2pK_a) module to establish the relationship between the model-predicted free energy and pK_a. This module also enables us to predict pK_a from free energies. (C) Finetuning workflow. In this phase, we also employ the FE2pK_a module, training the model using experimental pK_a to enhance its capability for predicting pK_a with high accuracy. (D) Inference workflow. After pretraining and finetuning, the well-trained Uni-pK_a framework is equipped to handle three inference tasks, including macro-pK_a prediction, micro-pK_a prediction, and distribution fraction prediction.

**Scheme 2. Inference Stage of Uni-pK_a**
(A) Structures of microstates in the protonation ensemble of one reference molecule are reconstructed by the microstate generator. (B) The atom types, atomic charges, and geometry information of the microstates are fed into the Uni-Mol backbone, and the free energies are predicted for each microstate. (C) If the acid and base macrostates are specified by the user input, the macro-pK_a-free-energy formula is used to transform the free energy prediction to macro-pK_a prediction. If the microstates are further specified, the micro-pK_a-free-energy formula is used as a special case of the macro-pK_a prediction where there is only one microstate in both macrostates. (D) If pH is given by the user input, the distribution-free-energy formula is used to calculate the fraction of all the microstates in the protonation ensemble.

**Figure 1**
Uni-pK_a’s concern for detailed acid–base equilibria. (A) Example of 2-hydroxybenzoic acid, where one of the dissociation is dominant. (B) Example of 2-((dimethylamino)methyl)phenol, where both reactions are dominant. (C) Uni-pK_a results on SAMPL6 micro-pK_a data sets involving tautomerism. (D) Thermodynamic cycle of the glycine. pK_i is the dissociation equilibrium constant. The green and orange arrows indicate different protonation routes.

See this image and copyright information in PMC

Cited by

A workflow to create a high-quality protein-ligand binding dataset for training, validation, and prediction tasks.
Wang Y, Sun K, Li J, Guan X, Zhang O, Bagni D, Zhang Y, Carlson HA, Head-Gordon T. Wang Y, et al. Digit Discov. 2025 Apr 2;4(5):1209-1220. doi: 10.1039/d4dd00357h. eCollection 2025 May 14. Digit Discov. 2025. PMID: 40190768 Free PMC article.
pK_a prediction in non-aqueous solvents.
Zheng JW, Al Ibrahim E, Kaljurand I, Leito I, Green WH. Zheng JW, et al. J Comput Chem. 2025 Jan 5;46(1):e27517. doi: 10.1002/jcc.27517. J Comput Chem. 2025. PMID: 39661411 Free PMC article.
Interpretable Deep-Learning pK_a Prediction for Small Molecule Drugs via Atomic Sensitivity Analysis.
DeCorte J, Brown B, Jeffrey R, Meiler J. DeCorte J, et al. J Chem Inf Model. 2025 Jan 13;65(1):101-113. doi: 10.1021/acs.jcim.4c01472. Epub 2024 Dec 30. J Chem Inf Model. 2025. PMID: 39801290 Free PMC article.
Computational tools for the prediction of site- and regioselectivity of organic reactions.
Sigmund LM, Assante M, Johansson MJ, Norrby PO, Jorner K, Kabeshov M. Sigmund LM, et al. Chem Sci. 2025 Mar 4;16(13):5383-5412. doi: 10.1039/d5sc00541h. eCollection 2025 Mar 26. Chem Sci. 2025. PMID: 40070469 Free PMC article. Review.
Developing a Machine Learning Model for Hydrogen Bond Acceptance Based on Natural Bond Orbital Descriptors.
Melo DU, Carneiro LM, Coutinho-Neto MD, Homem-de-Mello P, Bartoloni FH. Melo DU, et al. J Org Chem. 2025 Jul 18;90(28):9776-9788. doi: 10.1021/acs.joc.5c00724. Epub 2025 Jul 6. J Org Chem. 2025. PMID: 40619683 Free PMC article.

See all "Cited by" articles

References

1. Wang H.; Tianfan F.; Yuanqi D.; Gao W.; Huang K.; Liu Z.; Chandak P.; Liu S.; Van Katwyk P.; Deac A.; et al. Scientific discovery in the age of artificial intelligence. Nature 2023, 620 (7972), 47–60. 10.1038/s41586-023-06221-2. - DOI - PubMed
1. Jablonka K. M.; Ai Q.; Al-Feghali A.; Badhwar S.; Bocarsly J. D.; Bran A. M.; Stefan Bringuier L.; Brinson C.; Choudhary K.; Circi D.; et al. 14 examples of how llms can transform materials science and chemistry: a reflection on a large language model hackathon. Digital Discovery 2023, 2 (5), 1233–1250. 10.1039/D3DD00113J. - DOI - PMC - PubMed
1. Rodrigues T. The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technol. 2019, 32, 3–8. 10.1016/j.ddtec.2020.07.001. - DOI - PMC - PubMed
1. Nandy A.; Duan C.; Kulik H. J. Audacity of huge: overcoming challenges of data scarcity and data quality for machine learning in computational materials discovery. Curr. Opin. Chem. Eng. 2022, 36, 10077810.1016/j.coche.2021.100778. - DOI
1. Frey N. C.; Soklaski R.; Axelrod S.; Samsi S.; Gomez-Bombarelli R.; Coley C. W.; Gadepally V. Neural scaling of deep chemical models.. Nat. Mach. Intell. 2023, 5 (11), 1297–1305. 10.1038/s42256-023-00740-3. - DOI

LinkOut - more resources

Full Text Sources
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Bridging Machine Learning and Thermodynamics for Accurate p K _a Prediction

Affiliations

Bridging Machine Learning and Thermodynamics for Accurate p K _a Prediction

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources