Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 25;25(5):bbae409.
doi: 10.1093/bib/bbae409.

DeepEnzyme: a robust deep learning model for improved enzyme turnover number prediction by utilizing features of protein 3D-structures

Affiliations

DeepEnzyme: a robust deep learning model for improved enzyme turnover number prediction by utilizing features of protein 3D-structures

Tong Wang et al. Brief Bioinform. .

Abstract

Turnover numbers (kcat), which indicate an enzyme's catalytic efficiency, have a wide range of applications in fields including protein engineering and synthetic biology. Experimentally measuring the enzymes' kcat is always time-consuming. Recently, the prediction of kcat using deep learning models has mitigated this problem. However, the accuracy and robustness in kcat prediction still needs to be improved significantly, particularly when dealing with enzymes with low sequence similarity compared to those within the training dataset. Herein, we present DeepEnzyme, a cutting-edge deep learning model that combines the most recent Transformer and Graph Convolutional Network (GCN) to capture the information of both the sequence and 3D-structure of a protein. To improve the prediction accuracy, DeepEnzyme was trained by leveraging the integrated features from both sequences and 3D-structures. Consequently, DeepEnzyme exhibits remarkable robustness when processing enzymes with low sequence similarity compared to those in the training dataset by utilizing additional features from high-quality protein 3D-structures. DeepEnzyme also makes it possible to evaluate how point mutations affect the catalytic activity of the enzyme, which helps identify residue sites that are crucial for the catalytic function. In summary, DeepEnzyme represents a pioneering effort in predicting enzymes' kcat values with improved accuracy and robustness compared to previous algorithms. This advancement will significantly contribute to our comprehension of enzyme function and its evolutionary patterns across species.

Keywords: deep learning; enzyme turnover number; graph convolutional network; protein 3D-structure.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The framework of DeepEnzyme for kcat prediction. DeepEnzyme integrates transformer and GCN models to distill features from both the enzyme and substrate for predicting kcat. GCN is employed to extract structural features based on protein 3D-structures and substrate adjacency matrices; transformer is utilized to extract sequence features from protein sequences.
Figure 2
Figure 2
Evaluation of DeepEnzyme performance in kcat prediction. (a) the performance of DeepEnzyme on the test dataset was evaluated by the Pearson correlation coefficient (PCC) and P values calculated from the predicted and experimental kcat values. (b-c) DeepEnzyme prediction performance on wild-type (b) and mutant (c) enzymes in the test dataset. (d) DeepEnzyme prediction performance for enzymes classified by different EC numbers in the test dataset. (e) the performance of the model with different types of datasets as input. DeepEnzyme: Enzyme sequence, enzyme structure, and substrate information are used as inputs; Only-structure: Substrate information and enzyme structure information as inputs; Only-sequence: Substrate and enzyme sequence as inputs. In (a)–(e) specific seed was adopted during the calculation. (f) Comparison in average coefficient of determination (R2) values for testing dataset and training dataset from five rounds of training.
Figure 3
Figure 3
Improved performances of DeepEnzyme in kcat prediction compared to existing models, even for protein sequences in the test dataset exhibiting lower similarity compared to those in the training dataset. (a) Comparison of R2 values on the test dataset for different models. (b) Comparison of RMSE values on the test dataset for different models. (c) Comparison of R2 in kcat value prediction for enzymes in the test dataset at different levels of sequence similarity by DeepEnzyme, TurNuP, DLKcat, and DLTKcat. (d) Two enzymes from Myxococcus xanthus and Bacillus subtilis, both with EC numbers 1.3.3.4, are highly similar in protein 3D-structure (TM-score = 0.8762), gray for enzyme from M. xanthus and red for enzyme from B. subtilis. (e) the similarity of the amino acid sequences for the above two enzymes is 27% (Q for enzyme from M. xanthus, T for enzyme from B. subtilis).
Figure 4
Figure 4
Analysis of the prediction ability of DeepEnzyme for two enzymes with saturation mutagenesis datasets. (a) Comparison of predicted results for different CYP2C9 variants: Red for missense variants, green for nonsense variants. (b) Comparison of experimental activity score for different CYP2C9 variants: Red for missense variants, green for nonsense variants [40]. (c) Comparison of predicted kcat values for different PafA mutations: Green for low kcat mutations, red for high kcat mutations [41]. (d) Comparison of experimentally measured kcat values for different PafA mutations: Green for low kcat mutations, red for high kcat mutations [41].
Figure 5
Figure 5
Comparison between the binding/active site and high-weight site (these 5% residues sites with the highest weight scores calculated by DeepEnzyme) within protein 3D-structures. (a) the weight scores of different residue sites in PafA; the red points are binding/active sites. (b) Comparison of the weight scores between the binding/active sites and general sites in PafA; the green box line indicates general sites, and the red box line indicates binding/active sites. (c) the distribution of the binding/active sites and high-weight sites regions within the 3D-structure of PafA, the red region represents for binding/active sites, the blue region for high-weight sites. (d) the weights of different residue sites in P00558; the red points are binding/active sites. (e) Comparison of the weights between the binding/active sites and general sites in P00558, where the green box line indicates general sites, and the red box line indicates binding/active sites. (f) the distribution of the binding/active sites and high-weight sites regions within the 3D-structure of P00558, the red region represents for binding/active sites, the blue region for high-weight sites, the yellow region for the overlap between the two.
Figure 6
Figure 6
Predicted kcat values for enzyme-catalyzed reactions in genome-scale metabolic models. (a) Distribution of kcat values predicted by DeepEnzyme for enzyme-catalyzed reactions in metabolic models including those for Homo sapiens, Mus musculus, Saccharomyces cerevisiae, and E. Coli [42]. (b) Distribution of kcat values predicted by DeepEnzyme for enzyme-catalyzed reactions from the GEMs of Geobacter metallireducens GS-15 (BiGG ID: iAF987).

Similar articles

Cited by

References

    1. Wendering P, Arend M, Razaghi-Moghadam Z. et al. Data integration across conditions improves turnover number estimates and metabolic predictions. Nat Commun 2023;14:1485. 10.1038/s41467-023-37151-2. - DOI - PMC - PubMed
    1. Davidi D, Noor E, Liebermeister W. et al. Global characterization of in vivo enzyme catalytic rates and their correspondence to in vitro k cat measurements. Proc Natl Acad Sci 2016;113:3401–6. - PMC - PubMed
    1. Nilsson A, Nielsen J, Palsson BO. Metabolic models of protein allocation call for the kinetome. Cell Systems 2017;5:538–41. - PubMed
    1. Sánchez BJ, Zhang C, Nilsson A. et al. Improving the phenotype predictions of a yeast genome-scale metabolic model by incorporating enzymatic constraints. Mol Syst Biol 2017;13:935. 10.15252/msb.20167411. - DOI - PMC - PubMed
    1. Yang L, Yurkovich JT, King ZA. et al. Modeling the multi-scale mechanisms of macromolecular resource allocation. Curr Opin Microbiol 2018;45:8–15. - PMC - PubMed