. 2023 Dec 11;14(1):8211.

doi: 10.1038/s41467-023-44113-1.

UniKP: a unified framework for the prediction of enzyme kinetic parameters

Han Yu^#^{1

2

3

4}, Huaxiang Deng^#^{1

3

4}, Jiahui He^{1

3

4}, Jay D Keasling^{4

5

6

7

8}, Xiaozhou Luo^{9

10

11

12}

Affiliations

¹ Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
² University of Chinese Academy of Sciences, Beijing, 100049, China.
³ CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
⁴ Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
⁵ Joint BioEnergy Institute, Emeryville, CA, 94608, USA.
⁶ Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
⁷ Department of Chemical and Biomolecular Engineering & Department of Bioengineering, University of California, Berkeley, CA, 94720, USA.
⁸ Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800, Kgs, Lyngby, Denmark.
⁹ Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China. xz.luo@siat.ac.cn.
¹⁰ University of Chinese Academy of Sciences, Beijing, 100049, China. xz.luo@siat.ac.cn.
¹¹ CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China. xz.luo@siat.ac.cn.
¹² Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China. xz.luo@siat.ac.cn.

^# Contributed equally.

PMID: 38081905
PMCID: PMC10713628
DOI: 10.1038/s41467-023-44113-1

UniKP: a unified framework for the prediction of enzyme kinetic parameters

Han Yu et al. Nat Commun. 2023.

. 2023 Dec 11;14(1):8211.

doi: 10.1038/s41467-023-44113-1.

Authors

Han Yu^#^{1

2

3

4}, Huaxiang Deng^#^{1

3

4}, Jiahui He^{1

3

4}, Jay D Keasling^{4

5

6

7

8}, Xiaozhou Luo^{9

10

11

12}

Affiliations

¹ Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
² University of Chinese Academy of Sciences, Beijing, 100049, China.
³ CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
⁴ Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
⁵ Joint BioEnergy Institute, Emeryville, CA, 94608, USA.
⁶ Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
⁷ Department of Chemical and Biomolecular Engineering & Department of Bioengineering, University of California, Berkeley, CA, 94720, USA.
⁸ Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800, Kgs, Lyngby, Denmark.
⁹ Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China. xz.luo@siat.ac.cn.
¹⁰ University of Chinese Academy of Sciences, Beijing, 100049, China. xz.luo@siat.ac.cn.
¹¹ CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China. xz.luo@siat.ac.cn.
¹² Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China. xz.luo@siat.ac.cn.

^# Contributed equally.

PMID: 38081905
PMCID: PMC10713628
DOI: 10.1038/s41467-023-44113-1

Abstract

Prediction of enzyme kinetic parameters is essential for designing and optimizing enzymes for various biotechnological and industrial applications, but the limited performance of current prediction tools on diverse tasks hinders their practical applications. Here, we introduce UniKP, a unified framework based on pretrained language models for the prediction of enzyme kinetic parameters, including enzyme turnover number (k_cat), Michaelis constant (K_m), and catalytic efficiency (k_cat / K_m), from protein sequences and substrate structures. A two-layer framework derived from UniKP (EF-UniKP) has also been proposed to allow robust k_cat prediction in considering environmental factors, including pH and temperature. In addition, four representative re-weighting methods are systematically explored to successfully reduce the prediction error in high-value prediction tasks. We have demonstrated the application of UniKP and EF-UniKP in several enzyme discovery and directed evolution tasks, leading to the identification of new enzymes and enzyme mutants with higher activity. UniKP is a valuable tool for deciphering the mechanisms of enzyme kinetics and enables novel insights into enzyme engineering and their industrial applications.

PubMed Disclaimer

Conflict of interest statement

X.L. has a financial interest in Demetrix and Synceres. J.D.K. has a financial interest in Amyris, Lygos, Demetrix, Napigen, Maple Bio, Apertor Labs, Zero Acre Farms, Berkeley Yeast, and Ansa Biotechnology. The remaining authors declare no competing interests.

Figures

**Fig. 1. The overview of UniKP.**
a Enzyme sequence representation module: Information about enzymes was encoded using a pretrained language model, ProtT5-XL-UniRef50. Each amino acid was converted into a 1024-dimensional vector on the last hidden layer, and the resulting vectors were summed and averaged by mean pooling, generating a 1024-dimensional vector to represent the enzyme. b Substrate structure representation module: Information about substrates was encoded using a pretrained language model, SMILES Transformer model. The substrate structure was converted into a simplified molecular-input line-entry system (SMILES) representation and input into a pretrained SMILES transformer to generate a 1024-dimensional vector. This vector was generated by concatenating the mean and max pooling of the last layer, along with the first outputs of the last and penultimate layers. c Machine learning module: An explainable Extra Trees model took the concatenated representation vector of both the enzyme and substrate as input and generated a predicted k_cat, K_m or k_cat / K_m value. d EF-UniKP: A framework that considers environmental factors to generate an optimized prediction. It is validated on two representative datasets: pH and temperature datasets. e Various re-weighting methods were used to adjust the sample weight distribution to generate an optimized prediction for high-value prediction task.

**Fig. 2. Performance comparison of different models.**
Comparison of Root Mean Square Error (RMSE) (a), Pearson Correlation Coefficient (PCC) (b), Mean Absolute Error (MAE) (c), and R² (Coefficient of Determination) (d) values between experimentally measured k_cat values and predicted k_cat values of 16 diverse machine learning models and 2 deep learning models. The k_cat values of all samples were predicted independently using 5-fold cross-validation. Each bar in the graph represents the models’ performance with respect to this metric. The “Extra Trees” model is highlighted in yellow, while other models are depicted in blue. The corresponding numerical values for each bar are provided on the right side. Source data are provided as a Source Data file.

**Fig. 3. High accuracy of UniKP in enzyme k_cat prediction.**
a Comparison of average coefficient of determination (R²) values for DLKcat and UniKP after five rounds of random test set splitting (n = 1684). b Comparison of the root mean square error (RMSE) between experimentally measured k_cat values and predicted k_cat values of DLKcat and UniKP for training (n = 15,154) and test sets (n = 1684). Dark bars represent RMSE of DLKcat and light bars for UniKP. c Scatter plot illustrating the Pearson coefficient correlation (PCC) between experimentally measured k_cat values and predicted k_cat values of UniKP for the test set (N = 1684), showing a strong linear correlation. The color gradient represents the density of data points, ranging from blue (0.02) to red (0.28). d Comparison of RMSE between experimentally measured k_cat values and predicted k_cat values of DLKcat and UniKP in various experimental k_cat numerical intervals. Dark bars represent RMSE of DLKcat and light bars for UniKP. e Enzymes with significantly different k_cat values between primary central and energy metabolism, and intermediary and secondary metabolism. An independent two-sided t-test to determine whether the means of two independent samples differ significantly. Primary central and energy metabolism (n = 3098) and intermediary and secondary metabolism (n = 4201) were examined in this analysis. f Shapley additive explanations (SHAP) analysis for the top 20-feature Extra Trees model. The impact of each feature on k_cat values is illustrated through a swarm plot of their corresponding SHAP values. The color of the dot represents the relative value of the feature in the dataset (high-to-low depicted as red-to-blue). The horizontal location of the dots shows whether the effect of that feature value contributed positively or negatively in that prediction instance (x-axis). In each box plot (a, e), the central band represents the median value, the box represents the upper and lower quartiles and the whiskers extend up to 1.5 times the interquartile range beyond the box range. Source data are provided as a Source Data file.

**Fig. 4. UniKP markedly discriminates k_cat values of enzymes and their mutants.**
Scatter plot illustrating the Pearson coefficient correlation (PCC) between experimentally measured k_cat values and predicted k_cat values of UniKP for wild type enzymes (a) (N = 936) and mutated enzymes (b) (N = 748). The color gradient represents the density of data points, ranging from blue (0.02) to red (0.28). c PCC values of wild-type and mutated enzymes on the test set of DLKcat and UniKP. Dark bars represent PCC values of DLKcat and light bars for UniKP. Source data are provided as a Source Data file.

**Fig. 5. A two-layer framework considering environmental factors.**
a A two-layer framework called EF-UniKP that consists of a base layer and a meta layer. The base layer contains two models, namely UniKP and Revised UniKP. The UniKP takes the concatenated representation vector of the enzyme and substrate as input, while the Revised UniKP uses a concatenated representation vector of the enzyme and substrate, combined with the pH or temperature value. Both models are trained using the Extra Trees algorithm. The meta layer of this framework includes a linear regressor that uses the predicted k_cat values from both the UniKP and Revised UniKP to predict the final k_cat value. Scatter plot illustrating the Pearson coefficient correlation (PCC) between experimentally measured k_cat values and predicted k_cat values of Revised UniKP for pH set (b) (N = 636) and temperature set (c) (N = 572). The color gradient represents the density of data points, ranging from blue (0.02) to red (0.28). d Coefficient of determination (R²) values between experimentally measured k_cat values and predicted k_cat values on pH and temperature test sets of EF-UniKP, Revised UniKP and UniKP. Light bars represent R² of EF-UniKP, dark bars for Revised UniKP and darkish bars for UniKP. e R² values between experimentally measured k_cat values and predicted k_cat values on more strict pH and temperature test sets of EF-UniKP, Revised UniKP and UniKP. These are the samples in the test set where at least either the substrate or enzyme was not included in the training set, resulting in 62 and 61 samples for pH and temperature, respectively. Light bars represent R² of EF-UniKP, dark bars for Revised UniKP and darkish bars for UniKP. Source data are provided as a Source Data file.

**Fig. 6. Enhancing high k_cat prediction through re-weighting methods and unified framework for K_m and k_cat / K_m predictions.**
a The distribution of k_cat values in the k_cat dataset. All samples are divided into 50 bins. b The absolute error between experimentally measured k_cat values and predicted k_cat values of each sample. The k_cat values of all samples were predicted independently using five-fold cross-validation. c Root mean square error (RMSE) between experimentally measured k_cat values and predicted k_cat values of 149 samples with k_cat values higher than 4 (logarithm value) using various re-weighting methods and the initial UniKP. d, e RMSE, coefficient of determination (R²) between experimentally measured K_m values and predicted K_m values on K_m test set. f Scatter plot illustrating the Pearson coefficient correlation (PCC) between experimentally measured k_cat / K_m values and predicted k_cat / K_m values of UniKP for k_cat / K_m dataset (N = 910). The color gradient represents the density of data points, ranging from blue (0.02) to red (0.28). Source data are provided as a Source Data file.

See this image and copyright information in PMC

References

1. Kuchner O, Arnold FH. Directed evolution of enzyme catalysts. Trends Biotechnol. 1997;15:523–530. doi: 10.1016/S0167-7799(97)01138-4. - DOI - PubMed
1. Adadi R, Volkmer B, Milo R, Heinemann M, Shlomi T. Prediction of Microbial Growth Rate versus Biomass Yield by a Metabolic Network with Kinetic Parameters. PLoS Comput. Biol. 2012;8:e1002575. doi: 10.1371/journal.pcbi.1002575. - DOI - PMC - PubMed
1. Currin A, Swainston N, Day PJ, Kell DB. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently. Chem. Soc. Rev. 2015;44:1172–1239. doi: 10.1039/C4CS00351A. - DOI - PMC - PubMed
1. Briggs GE, Haldane JBS. A note on the kinetics of enzyme action. Biochem. J. 1925;19:338. doi: 10.1042/bj0190338. - DOI - PMC - PubMed
1. Nilsson A, Nielsen J, Palsson BO. Metabolic Models of Protein Allocation Call for the Kinetome. Cell Systems. 2017;5:538–541. doi: 10.1016/j.cels.2017.11.013. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

32071421/National Natural Science Foundation of China (National Science Foundation of China)

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

UniKP: a unified framework for the prediction of enzyme kinetic parameters

Affiliations

UniKP: a unified framework for the prediction of enzyme kinetic parameters

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous