Prediction and design of protease enzyme specificity using a structure-aware graph convolutional network
- PMID: 37729196
- PMCID: PMC10523478
- DOI: 10.1073/pnas.2303590120
Prediction and design of protease enzyme specificity using a structure-aware graph convolutional network
Abstract
Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key posttranslational modification involved in physiology and disease. The ability to robustly and rapidly predict protease-substrate specificity would also enable targeted proteolytic cleavage by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pretrained PGCN model to guide the design of protease libraries for cleaving two noncanonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins.
Keywords: geometric machine learning; machine learning; protease specificity; protein design; yeast surface display.
Conflict of interest statement
The authors declare no competing interest.
Figures







Update of
-
Prediction and Design of Protease Enzyme Specificity Using a Structure-Aware Graph Convolutional Network.bioRxiv [Preprint]. 2023 Feb 16:2023.02.16.528728. doi: 10.1101/2023.02.16.528728. bioRxiv. 2023. Update in: Proc Natl Acad Sci U S A. 2023 Sep 26;120(39):e2303590120. doi: 10.1073/pnas.2303590120. PMID: 36824945 Free PMC article. Updated. Preprint.
Similar articles
-
Prediction and Design of Protease Enzyme Specificity Using a Structure-Aware Graph Convolutional Network.bioRxiv [Preprint]. 2023 Feb 16:2023.02.16.528728. doi: 10.1101/2023.02.16.528728. bioRxiv. 2023. Update in: Proc Natl Acad Sci U S A. 2023 Sep 26;120(39):e2303590120. doi: 10.1073/pnas.2303590120. PMID: 36824945 Free PMC article. Updated. Preprint.
-
Data-driven supervised learning of a viral protease specificity landscape from deep sequencing and molecular simulations.Proc Natl Acad Sci U S A. 2019 Jan 2;116(1):168-176. doi: 10.1073/pnas.1805256116. Epub 2018 Dec 26. Proc Natl Acad Sci U S A. 2019. PMID: 30587591 Free PMC article.
-
iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites.Brief Bioinform. 2019 Mar 25;20(2):638-658. doi: 10.1093/bib/bby028. Brief Bioinform. 2019. PMID: 29897410 Free PMC article. Review.
-
PROSPER: an integrated feature-based tool for predicting protease substrate cleavage sites.PLoS One. 2012;7(11):e50300. doi: 10.1371/journal.pone.0050300. Epub 2012 Nov 29. PLoS One. 2012. PMID: 23209700 Free PMC article.
-
Twenty years of bioinformatics research for protease-specific substrate and cleavage site prediction: a comprehensive revisit and benchmarking of existing methods.Brief Bioinform. 2019 Nov 27;20(6):2150-2166. doi: 10.1093/bib/bby077. Brief Bioinform. 2019. PMID: 30184176 Free PMC article. Review.
Cited by
-
Substrate prediction for RiPP biosynthetic enzymes via masked language modeling and transfer learning.Digit Discov. 2024 Dec 2;4(2):343-354. doi: 10.1039/d4dd00170b. eCollection 2025 Feb 12. Digit Discov. 2024. PMID: 39649639 Free PMC article.
-
Data-driven protease engineering by DNA-recording and epistasis-aware machine learning.Nat Commun. 2025 Jul 1;16(1):5466. doi: 10.1038/s41467-025-60622-7. Nat Commun. 2025. PMID: 40593579 Free PMC article.
-
Advances in ligand-specific biosensing for structurally similar molecules.Cell Syst. 2023 Dec 20;14(12):1024-1043. doi: 10.1016/j.cels.2023.10.009. Cell Syst. 2023. PMID: 38128482 Free PMC article. Review.
-
Protease engineering: Approaches, tools, and emerging trends.Biotechnol Adv. 2025 Sep;82:108602. doi: 10.1016/j.biotechadv.2025.108602. Epub 2025 May 12. Biotechnol Adv. 2025. PMID: 40368116 Free PMC article. Review.
-
Substrate Prediction for RiPP Biosynthetic Enzymes via Masked Language Modeling and Transfer Learning.ArXiv [Preprint]. 2024 Feb 23:arXiv:2402.15181v1. ArXiv. 2024. Update in: Digit Discov. 2024 Dec 2;4(2):343-354. doi: 10.1039/d4dd00170b. PMID: 38463513 Free PMC article. Updated. Preprint.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources