. 2024 Jul 25;25(5):bbae409.

doi: 10.1093/bib/bbae409.

DeepEnzyme: a robust deep learning model for improved enzyme turnover number prediction by utilizing features of protein 3D-structures

Tong Wang^{1

2}, Guangming Xiang¹, Siwei He¹, Liyun Su², Yuguang Wang^{3

4}, Xuefeng Yan^{5

6}, Hongzhong Lu¹

Affiliations

¹ State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan RD. Minhang District, Shanghai 200240, China.
² College of Science, Chongqing University of Technology, 69 Hongguang Avenue, Banan District, Chongqing 400054, China.
³ Institute of Natural Sciences, School of Mathematical Sciences, Zhangjiang Institute of Advanced Study, Shanghai Jiao Tong University, 800 Dongchuan RD. Minhang District, Shanghai 200240, China.
⁴ Shanghai Artificial Intelligence Laboratory, 701 Yunjin Road, Xuhui District, Shanghai 200237, China.
⁵ Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, 130 Meilong Road, Xuhui District, Shanghai 200237, China.
⁶ State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Xuhui District, Shanghai 200237, China.

PMID: 39162313
PMCID: PMC11880767
DOI: 10.1093/bib/bbae409

DeepEnzyme: a robust deep learning model for improved enzyme turnover number prediction by utilizing features of protein 3D-structures

Tong Wang et al. Brief Bioinform. 2024.

. 2024 Jul 25;25(5):bbae409.

doi: 10.1093/bib/bbae409.

Authors

Tong Wang^{1

2}, Guangming Xiang¹, Siwei He¹, Liyun Su², Yuguang Wang^{3

4}, Xuefeng Yan^{5

6}, Hongzhong Lu¹

Affiliations

¹ State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan RD. Minhang District, Shanghai 200240, China.
² College of Science, Chongqing University of Technology, 69 Hongguang Avenue, Banan District, Chongqing 400054, China.
³ Institute of Natural Sciences, School of Mathematical Sciences, Zhangjiang Institute of Advanced Study, Shanghai Jiao Tong University, 800 Dongchuan RD. Minhang District, Shanghai 200240, China.
⁴ Shanghai Artificial Intelligence Laboratory, 701 Yunjin Road, Xuhui District, Shanghai 200237, China.
⁵ Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, 130 Meilong Road, Xuhui District, Shanghai 200237, China.
⁶ State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Xuhui District, Shanghai 200237, China.

PMID: 39162313
PMCID: PMC11880767
DOI: 10.1093/bib/bbae409

Abstract

Turnover numbers (kcat), which indicate an enzyme's catalytic efficiency, have a wide range of applications in fields including protein engineering and synthetic biology. Experimentally measuring the enzymes' kcat is always time-consuming. Recently, the prediction of kcat using deep learning models has mitigated this problem. However, the accuracy and robustness in kcat prediction still needs to be improved significantly, particularly when dealing with enzymes with low sequence similarity compared to those within the training dataset. Herein, we present DeepEnzyme, a cutting-edge deep learning model that combines the most recent Transformer and Graph Convolutional Network (GCN) to capture the information of both the sequence and 3D-structure of a protein. To improve the prediction accuracy, DeepEnzyme was trained by leveraging the integrated features from both sequences and 3D-structures. Consequently, DeepEnzyme exhibits remarkable robustness when processing enzymes with low sequence similarity compared to those in the training dataset by utilizing additional features from high-quality protein 3D-structures. DeepEnzyme also makes it possible to evaluate how point mutations affect the catalytic activity of the enzyme, which helps identify residue sites that are crucial for the catalytic function. In summary, DeepEnzyme represents a pioneering effort in predicting enzymes' kcat values with improved accuracy and robustness compared to previous algorithms. This advancement will significantly contribute to our comprehension of enzyme function and its evolutionary patterns across species.

Keywords: deep learning; enzyme turnover number; graph convolutional network; protein 3D-structure.

PubMed Disclaimer

Figures

**Figure 1**
**The framework of DeepEnzyme for kcat prediction.** DeepEnzyme integrates transformer and GCN models to distill features from both the enzyme and substrate for predicting k_cat. GCN is employed to extract structural features based on protein 3D-structures and substrate adjacency matrices; transformer is utilized to extract sequence features from protein sequences.

**Figure 2**
**Evaluation of DeepEnzyme performance in kcat prediction.** (a) the performance of DeepEnzyme on the test dataset was evaluated by the Pearson correlation coefficient (PCC) and P values calculated from the predicted and experimental k_cat values. (b-c) DeepEnzyme prediction performance on wild-type (b) and mutant (c) enzymes in the test dataset. (d) DeepEnzyme prediction performance for enzymes classified by different EC numbers in the test dataset. (e) the performance of the model with different types of datasets as input. DeepEnzyme: Enzyme sequence, enzyme structure, and substrate information are used as inputs; Only-structure: Substrate information and enzyme structure information as inputs; Only-sequence: Substrate and enzyme sequence as inputs. In (a)–(e) specific seed was adopted during the calculation. (f) Comparison in average coefficient of determination (R²) values for testing dataset and training dataset from five rounds of training.

**Figure 3**
Improved performances of DeepEnzyme in kcat prediction compared to existing models, even for protein sequences in the test dataset exhibiting lower similarity compared to those in the training dataset. (a) Comparison of R² values on the test dataset for different models. (b) Comparison of RMSE values on the test dataset for different models. (c) Comparison of R² in k_cat value prediction for enzymes in the test dataset at different levels of sequence similarity by DeepEnzyme, TurNuP, DLKcat, and DLTKcat. (d) Two enzymes from *Myxococcus xanthus* and *Bacillus subtilis*, both with EC numbers 1.3.3.4, are highly similar in protein 3D-structure (TM-score = 0.8762), gray for enzyme from *M. xanthus* and red for enzyme from *B. subtilis*. (e) the similarity of the amino acid sequences for the above two enzymes is 27% (Q for enzyme from *M. xanthus*, T for enzyme from *B. subtilis*).

**Figure 4**
**Analysis of the prediction ability of DeepEnzyme for two enzymes with saturation mutagenesis datasets.** (a) Comparison of predicted results for different CYP2C9 variants: Red for missense variants, green for nonsense variants. (b) Comparison of experimental activity score for different CYP2C9 variants: Red for missense variants, green for nonsense variants [40]. (c) Comparison of predicted k_cat values for different PafA mutations: Green for low k_cat mutations, red for high k_cat mutations [41]. (d) Comparison of experimentally measured k_cat values for different PafA mutations: Green for low k_cat mutations, red for high k_cat mutations [41].

**Figure 5**
**Comparison between the binding/active site and high-weight site (these 5% residues sites with the highest weight scores calculated by DeepEnzyme) within protein 3D-structures.** (a) the weight scores of different residue sites in PafA; the red points are binding/active sites. (b) Comparison of the weight scores between the binding/active sites and general sites in PafA; the green box line indicates general sites, and the red box line indicates binding/active sites. (c) the distribution of the binding/active sites and high-weight sites regions within the 3D-structure of PafA, the red region represents for binding/active sites, the blue region for high-weight sites. (d) the weights of different residue sites in P00558; the red points are binding/active sites. (e) Comparison of the weights between the binding/active sites and general sites in P00558, where the green box line indicates general sites, and the red box line indicates binding/active sites. (f) the distribution of the binding/active sites and high-weight sites regions within the 3D-structure of P00558, the red region represents for binding/active sites, the blue region for high-weight sites, the yellow region for the overlap between the two.

**Figure 6**
**Predicted kcat values for enzyme-catalyzed reactions in genome-scale metabolic models.** (a) Distribution of kcat values predicted by DeepEnzyme for enzyme-catalyzed reactions in metabolic models including those for *Homo sapiens*, *Mus musculus*, *Saccharomyces cerevisiae*, and *E. Coli* [42]. (b) Distribution of kcat values predicted by DeepEnzyme for enzyme-catalyzed reactions from the GEMs of *Geobacter metallireducens* GS-15 (BiGG ID: iAF987).

See this image and copyright information in PMC

Cited by

Advances in Microbial Alkaline Proteases: Addressing Industrial Bottlenecks Through Genetic and Enzyme Engineering.
Srivastava N, Khare SK. Srivastava N, et al. Appl Biochem Biotechnol. 2025 Aug;197(8):4861-4896. doi: 10.1007/s12010-025-05270-9. Epub 2025 May 15. Appl Biochem Biotechnol. 2025. PMID: 40372653 Review.
Enzyme catalytic efficiency prediction: employing convolutional neural networks and XGBoost.
Alazmi M. Alazmi M. Front Artif Intell. 2024 Oct 21;7:1446063. doi: 10.3389/frai.2024.1446063. eCollection 2024. Front Artif Intell. 2024. PMID: 39498388 Free PMC article.
NNKcat: deep neural network to predict catalytic constants (Kcat) by integrating protein sequence and substrate structure with enhanced data imbalance handling.
Zhai J, Qi X, Cai L, Liu Y, Tang H, Xie L, Wang J. Zhai J, et al. Brief Bioinform. 2025 May 1;26(3):bbaf212. doi: 10.1093/bib/bbaf212. Brief Bioinform. 2025. PMID: 40370097 Free PMC article.
A structure-oriented kinetics dataset of enzyme-substrate interactions.
Krishnan SR, Pandey N, Srinivasan R, Roy A. Krishnan SR, et al. Sci Data. 2025 Aug 26;12(1):1489. doi: 10.1038/s41597-025-05829-5. Sci Data. 2025. PMID: 40858593 Free PMC article.
IECata: interpretable bilinear attention network and evidential deep learning improve the catalytic efficiency prediction of enzymes.
Wang J, Zhao Y, Yang Z, Yao G, Han P, Liu J, Chen C, Zan P, Wan X, Bo X, Jiang H. Wang J, et al. Brief Bioinform. 2025 May 1;26(3):bbaf283. doi: 10.1093/bib/bbaf283. Brief Bioinform. 2025. PMID: 40548541 Free PMC article.

See all "Cited by" articles

References

1. Wendering P, Arend M, Razaghi-Moghadam Z. et al. Data integration across conditions improves turnover number estimates and metabolic predictions. Nat Commun 2023;14:1485. 10.1038/s41467-023-37151-2. - DOI - PMC - PubMed
1. Davidi D, Noor E, Liebermeister W. et al. Global characterization of in vivo enzyme catalytic rates and their correspondence to in vitro k cat measurements. Proc Natl Acad Sci 2016;113:3401–6. - PMC - PubMed
1. Nilsson A, Nielsen J, Palsson BO. Metabolic models of protein allocation call for the kinetome. Cell Systems 2017;5:538–41. - PubMed
1. Sánchez BJ, Zhang C, Nilsson A. et al. Improving the phenotype predictions of a yeast genome-scale metabolic model by incorporating enzymatic constraints. Mol Syst Biol 2017;13:935. 10.15252/msb.20167411. - DOI - PMC - PubMed
1. Yang L, Yurkovich JT, King ZA. et al. Modeling the multi-scale mechanisms of macromolecular resource allocation. Curr Opin Microbiol 2018;45:8–15. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

DeepEnzyme: a robust deep learning model for improved enzyme turnover number prediction by utilizing features of protein 3D-structures

Affiliations

DeepEnzyme: a robust deep learning model for improved enzyme turnover number prediction by utilizing features of protein 3D-structures

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources