. 2025 Jun 27;21(6):e1013216.

doi: 10.1371/journal.pcbi.1013216. eCollection 2025 Jun.

Predicting Affinity Through Homology (PATH): Interpretable binding affinity prediction with persistent homology

Yuxi Long¹, Bruce R Donald^{1

2}

Affiliations

¹ Department of Computer Science, Department of Mathematics, Duke University, Durham, North Carolina, United States of America.
² Department of Biochemistry, Department of Chemistry, Duke University and Duke University School of Medicine, Durham, North Carolina, United States of America.

PMID: 40577377
PMCID: PMC12226026
DOI: 10.1371/journal.pcbi.1013216

Predicting Affinity Through Homology (PATH): Interpretable binding affinity prediction with persistent homology

Yuxi Long et al. PLoS Comput Biol. 2025.

. 2025 Jun 27;21(6):e1013216.

doi: 10.1371/journal.pcbi.1013216. eCollection 2025 Jun.

Authors

Yuxi Long¹, Bruce R Donald^{1

2}

Affiliations

¹ Department of Computer Science, Department of Mathematics, Duke University, Durham, North Carolina, United States of America.
² Department of Biochemistry, Department of Chemistry, Duke University and Duke University School of Medicine, Durham, North Carolina, United States of America.

PMID: 40577377
PMCID: PMC12226026
DOI: 10.1371/journal.pcbi.1013216

Abstract

Accurate binding affinity prediction (BAP) is crucial to structure-based drug design. We present PATH+, a novel, generalizable machine learning algorithm for BAP that exploits recent advances in computational topology. Compared to current binding affinity prediction algorithms, PATH+ shows similar or better accuracy and is more generalizable across orthogonal datasets. PATH+ is not only one of the most accurate algorithms for BAP, it is also the first algorithm that is inherently interpretable. Interpretability is a key factor of trust for an algorithm and alongside generalizability, which allows PATH+ to be trusted in critical applications, such as inhibitor design. We visualized the features captured by PATH+ for two clinically relevant protein-ligand complexes and find that PATH+ captures binding-relevant structural mutations that are corroborated by biochemical data. Our work also sheds light on the features captured by current computational topology BAP algorithms that contributed to their high performance, which have been poorly understood. PATH+ also offers an improvement of 𝒪 (m + n)3 in computational complexity and is empirically over 10 times faster than the dominant (uninterpretable) computational topology algorithm for BAP. Based on insights from PATH+, we built PATH-, a scoring function for differentiating between binders and non-binders that has outstanding accuracy against 11 current algorithms for BAP. In summary, we report progress in a novel combination of interpretability, speed, and accuracy that should further empower topological screening of large virtual inhibitor libraries to protein targets, and allow binding affinity predictions to be understood and trusted. The source code for PATH+ and PATH- is released open-source as part of the OSPREY protein design software package.

Copyright: © 2025 Long, Donald. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

I have read the journal’s policy and the authors of this manuscript have the following competing interests: B.R.D. is a founder of Ten63 Therapeutics, Inc. B.R.D. was previously a guest editor for PLoS Comp. Biol.

Figures

**Fig 1. PATH has state-of-the-art performance versus previous binding affinity prediction algorithms.**
^aPATH⁺ shows comparable or better performance with less overfitting, as evidenced by a smaller slope, with much less increase in $Δ G$ RMSEs beyond the training dataset, compared to established binding affinity prediction algorithms spanning a variety of methods. The benchmarked algorithms include physics-based and deep learning algorithms from the famous AutoDock framework (scoring function of AutoDock4 implemented in the AutoDockFR package [68,77], Vinardo [69], GNINA [70]), empirical (AA-Score [71]), knowledge-based (SMoG2016 [72]), and deep learning-based scoring functions (OnionNet [73], PLANET [74]). We believe that PATH⁺ overfit far less to training dataset than other methods due to the small number of parameters in the sprase regression trees of PATH⁺. ^bROC curves of scoring functions benchmarked on the DUD-E dataset show PATH⁻ has state-of-the-art performance in discriminating decoys in the DUD-E dataset. AutoDock4, gnina, and vinardo are all benchmarked as scoring functions. We also plot interpolated ROC curves (dashed) based on AUCs from [75] which benchmarked Vina [78], smina [79], and idock [80] using the full AutoDock framework. The only algorithms with non-diagonal ROCs are PATH⁻ (AUC=0.696), and the three scoring functions tested with the full AutoDock framework: Vina (AUC=0.69), Smina (AUC=0.71), and Idock (AUC=0.68). (Full results in numerical tables in Sect E of S1 Text.)

Fig 2. Visualization by PaCMAP [52] shows that persistence fingerprint clusters protein-ligand complexes with similar binding affinity reasonably well, even beyond the training dataset (PDBBind v2020 refined set, left panel).
The x- and y- axes are the dimensionality reduced axes from PaCMAP. The color of each point is the experimental binding affinity of the protein-ligand complex. ^aPaCMAP of the persistence fingerprints of the PDBBind v2020 refined set (training set), ^bBinding MOAD dataset, and ^cBindingDB dataset.

Fig 3. PATH (with persistence fingerprint) runs significantly faster than TNet-BP, a representative binding affinity prediction algorithm that uses persistent homology, on larger protein-ligand complexes.
The runtime of PATH (shown in orange) is constant with respect of the number of protein atoms in the complex (n), while the runtime of TNet-BP is proportional to $n^{7.2}$ asymptotically.

**Fig 4. An overview of PATH⁺.**
Given a protein-ligand complex, PATH⁺ computes internuclear persistence contours (IPCs) using persistent homology, and selects a subset of features into persistence fingerprint, which is then used to predict binding affinity by a sparse set of regression trees (orange). During training (blue), protein-ligand structures with experimentally measured binding affinities from PDBBind are used to derive an optimal set of features for persistence fingerprint and an optimal set of regression trees.

**Fig 5. ^aTwo HIV-1 protease mutants bound to inhibitor darunavir (G48V: PDB ID 3cyw [81] & L90M: PDB ID 2f81 [82]).**
Light blue: darunavir. The carbon atoms are colored by their individual contributions (blue through yellow, see legend) to the $2^{nd}$ component of persistence fingerprint (carbon-carbon IPC density at dimension 1 and bin [9.0, 9.5]). Grey: other protein heavy atoms. ^bDetail of residues 27-32 for each protease with darunavir. Note change in conformation (and IPC densities) of Asp30. [82] observed a strong hydrogen bond (2.5 Å) to the carboxylate moiety of Asp30. This correlates to Asp30 of the L90M variant contributing highly to the persistence fingerprint component, which obtained a prediction of tighter binding affinity for L90M via the decision trees. ^cHistograms of atomic contributions of residues 27-32 to the persistence fingerprint shows the carbon atoms of 2f81 in these residues had generally higher contributions to persistence fingerprint.

Fig 6. PATH⁺ correctly predicted a weaker binding affinity for HIV-1 protease with the drug-resistant G48V mutation (right, experimental ΔG=−10.6 kcal/mol, PDB ID: 3cyw [81]) bound to darunavir, compared to L90M HIV-1 protease (left, experimental ΔG=−14.35 kcal/mol, PDB ID: 2f81 [82]) complexed with the same inhibitor.
^aThe structure of each complex. ^bThe discretized internuclear persistence contour (IPC) of each complex. ^cThe persistence fingerprint of each complex. ^dPATH⁺ correctly predicted a weaker binding affinity for the HIV-1 protease with G48V mutation.

**Fig 7. PATH⁺ explains tighter binding of carbonic anhydrase II by brinzolamide.**
Carbonic anhydrase II bound with two inhibitors: brinzolamide (green sticks, PDB ID 4m2r) dorzolamide (pink sticks, PDB ID 4m2u) and [83]. [83] noted that the flexible methoxypropyl tail of brinzolamide could make favorable interactions with the residues of carbonic anhydrase II, resulting in a 0.61 kcal/mol lower measured $Δ G$ in brinzolamide complex than in the dorzolamide complex, which corresponds to a 3-fold improvement in K_d. This corresponds to a $17 %$ stronger contribution of residues around brinzolamide than dorzolamide (residues 62-67^c, 131-136^b, 203-207^a) to the persistence fingerprint^d, which contributes to prediction of a tighter binding of brinzolamide than dorzolamide by 0.37 kcal/mol lower $Δ G$ by PATH⁺. Small changes in K_d (less than one order of magnitude) have been difficult to correctly predict previously, but nevertheless can have great clinical importance [7]. Furthermore, ^b shows atomic distances (in Å) between the closest protein and ligand hydrogen (¹H) atoms. The ¹H-¹H distances between brinzolamide and the carbonic anhydrase II PDB model (4m2r) could be close enough for physics-based methods to predict a clash based on this static structure, even though the clash may not persist when dynamics is considered. Based on this observation, we hypothesize a mechanism through which IPC robustly captures binding activity, elaborated in Section Discussion.

**Fig 8. Construction of internuclear persistence contours (IPCs).**
IPCs are constructed for each pair of protein and ligand heavy atoms in the training dataset, and the integrals of IPCs in certain bins are selected into persistence fingerprint. ^aProtein-ligand complex shown as example: the HIV protease (mutant Q7K/L33I/L63I) complexed with KNI-764 (an inhibitor), PDB ID: 1msm [110]. ^bA point cloud is created from subsets of atoms with certain element types in protein and ligand (detailed in Table A of S1 Text). Shown as example: carbon atoms from the protein and carbon atoms from the ligand. ^cPersistent homology is calculated on this point cloud using opposition distance, and the birth filtration radii for 1D homology groups and death filtration radii for the 0D homology groups are collected (see Section Internuclear Persistence Contours (IPCs) for why these suffice). ^dInternuclear persistence contours (IPCs) are constructed by summing Gaussians centered at each of the birth or death radius. The IPCs in PATH are constructed with a standard deviation of 0.1. Two IPCs are shown. Top: carbon-carbon IPC dimension 1. Bottom: carbon-carbon IPC dimension 0.

**Fig 9. Two protein-ligand complexes shown with their persistence fingerprint.**
^TopHIV-1 protease in complex with VX-478 (PDB ID: 1hpv [115]). ^BottomHumanised monomeric RadA in complex with indazole (PDB ID: 4b2i [116]). Contributions to three persistence fingerprint components are shown. The ligand atoms are shown in yellow. Each protein atom is colored according to their contribution to the persistence fingerprint, just like in Fig 5. Each persistence fingerprint component is labeled by the IPC and bin where the IPC is integrated over to yield this component.

Fig 10. Scatter plots of PATH⁺ and our implementation of TNet-BP’s predictions on a held-out, test subset of PDBBind v2020 refined set for one run (90:10 train:test split ratio, ntest=519) shows that PATH⁺ produces better predictions, especially on protein-ligand complexes whose binding affinity that deviate significantly from the mean. This highlights PATH⁺’s generalizability.
^aPredictions of PATH⁺: $R^{2} = 0.46$ , *RMSE*=1.95 kcal/mol ^bPredictions of TNet-BP: $R^{2} = 0.26$ , *RMSE*=2.26 kcal/mol. ^cTo declutter the TNet-BP scatter plot in ^b, we removed 142 data points that are all predicted to have $Δ G$ within 0.001 kcal/mol of -8.691 kcal/mol by TNet-BP, and instead show the distribution of these points on a separate histogram. The 1-run performances of each algorithm in ^a,b are very close to their average performances over 100 runs in Table 2.

See this image and copyright information in PMC

Update of

Predicting Affinity Through Homology (PATH): Interpretable Binding Affinity Prediction with Persistent Homology.
Long Y, Donald BR. Long Y, et al. bioRxiv [Preprint]. 2024 Oct 21:2023.11.16.567384. doi: 10.1101/2023.11.16.567384. bioRxiv. 2024. Update in: PLoS Comput Biol. 2025 Jun 27;21(6):e1013216. doi: 10.1371/journal.pcbi.1013216. PMID: 38014181 Free PMC article. Updated. Preprint.

References

1. Batool M, Ahmad B, Choi S. A structure-based drug discovery paradigm. Int J Mol Sci. 2019;20(11):2783. doi: 10.3390/ijms20112783 - DOI - PMC - PubMed
1. Shoichet BK. Virtual screening of chemical libraries. Nature. 2004;432(7019):862–5. doi: 10.1038/nature03197 - DOI - PMC - PubMed
1. Kontoyianni M. Docking and virtual screening in drug discovery. Proteomics for drug discovery: Methods and protocols. 2017. p. 255–66. - PubMed
1. Maia EHB, Assis LC, de Oliveira TA, da Silva AM, Taranto AG. Structure-based virtual screening: from classical to artificial intelligence. Front Chem. 2020;8:343. doi: 10.3389/fchem.2020.00343 - DOI - PMC - PubMed
1. Seo S, Choi J, Park S, Ahn J. Binding affinity prediction for protein-ligand complex using deep attention mechanism based on intermolecular interactions. BMC Bioinformatics. 2021;22(1):542. doi: 10.1186/s12859-021-04466-0 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

R35 GM144042/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
- PubMed Central
- Public Library of Science

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Predicting Affinity Through Homology (PATH): Interpretable binding affinity prediction with persistent homology

Affiliations

Predicting Affinity Through Homology (PATH): Interpretable binding affinity prediction with persistent homology

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Update of

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources