Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Feb 5;10(2):226-241.
doi: 10.1021/acscentsci.3c01275. eCollection 2024 Feb 28.

Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering

Affiliations
Review

Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering

Jason Yang et al. ACS Cent Sci. .

Abstract

Enzymes can be engineered at the level of their amino acid sequences to optimize key properties such as expression, stability, substrate range, and catalytic efficiency-or even to unlock new catalytic activities not found in nature. Because the search space of possible proteins is vast, enzyme engineering usually involves discovering an enzyme starting point that has some level of the desired activity followed by directed evolution to improve its "fitness" for a desired application. Recently, machine learning (ML) has emerged as a powerful tool to complement this empirical process. ML models can contribute to (1) starting point discovery by functional annotation of known protein sequences or generating novel protein sequences with desired functions and (2) navigating protein fitness landscapes for fitness optimization by learning mappings between protein sequences and their associated fitness values. In this Outlook, we explain how ML complements enzyme engineering and discuss its future potential to unlock improved engineering outcomes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
The enzyme engineering workflow. Enzyme engineering begins with a discovery phase to identify an enzyme with initial activity (desired function). If fitness is not sufficient, the enzyme is then optimized using DE. (A) Enzyme discovery involves screening for desired activities, which could include native activity or promiscuous activities. (B) Enzyme starting points can be found in known proteins or by (C) diversification of enzymes using various computational methods to generate starting sequences that are more stable and evolvable. (D, E) In its simplest form, optimization using DE involves generating a pool of protein variants, identifying one with improved fitness, and using this variant as the starting point for the next generation of mutation and screening. DE can be thought of as a greedy hill climb on a protein fitness landscape. The natural ordering of sequences in the DE fitness landscape is that all sequences are surrounded by their single mutant neighbors.
Figure 2
Figure 2
Opportunities for the discovery of functional enzymes using machine learning. Identifying functional enzymes as starting points for optimization of their properties is a key challenge in enzyme engineering. Many useful enzymes could be discovered amidst already known, but unannotated, protein sequences. (A) ML models can classify sequences based on their EC numbers. (B) Generalized LLMs could annotate proteins in databases and scientific literature, and (C) AI could act as a structural biologist and organic chemist to discern if certain reactions might work based on catalytic/structural motifs. Alternatively, emerging deep learning methods can look beyond the sequences explored by natural evolution and design novel functional enzymes. This problem can be treated as (D) pure sequence generation or (E) generation toward a target structure. Future work should focus on identifying promiscuous and evolvable enzymes.
Figure 3
Figure 3
Opportunities for machine learning models to help navigate protein fitness landscapes. (A) ML models can allow for bigger jumps in sequence space by proposing combinations of mutations that would not be achieved by traditional DE. The role of nonadditivity between mutation effects, or epistasis, should be explored further to understand when ML offers an advantage. (B) The role of ZS scores to predict protein fitness without any labeled assay data needs to be better understood for different protein families and functions. Finally, ML-assisted protein fitness optimization could benefit from (C) multimodal representations that capture physically relevant descriptors of proteins to predict multiple relevant properties and (D) active learning with deep learning models tailored toward proteins and uncertainty quantification.
Figure 4
Figure 4
A fully self-driven protein engineering system as an active learning “design-build-test-learn” cycle assisted by machine learning. Emerging ML-assisted methods will provide an increased diversity of protein starting points that possess desired function and are highly evolvable. Automated robotic systems will synthesize protein variants and test them for various properties using experimental assays. Supervised ML models will then be trained to learn a mapping between protein features and their properties. Finally, design algorithms will propose new variants to test in the next iteration and update robotic scripts on the fly. This protein engineering system will perform automated end-to-end discovery and optimization of proteins for desired functions.

Similar articles

Cited by

References

    1. Arnold F. H. Directed Evolution: Bringing New Chemistry to Life. Angew. Chem., Int. Ed. 2018, 57 (16), 4143–4148. 10.1002/anie.201708408. - DOI - PMC - PubMed
    1. Bell E. L.; Finnigan W.; France S. P.; Green A. P.; Hayes M. A.; Hepworth L. J.; Lovelock S. L.; Niikura H.; Osuna S.; Romero E.; Ryan K. S.; Turner N. J.; Flitsch S. L. Biocatalysis. Nat. Rev. Methods Primer 2021, 1 (1), 46.10.1038/s43586-021-00044-z. - DOI
    1. Buller R.; Lutz S.; Kazlauskas R. J.; Snajdrova R.; Moore J. C.; Bornscheuer U. T. From Nature to Industry: Harnessing Enzymes for Biocatalysis. Science 2023, 382 (6673), eadh861510.1126/science.adh8615. - DOI - PubMed
    1. Romero P. A.; Arnold F. H. Exploring Protein Fitness Landscapes by Directed Evolution. Nat. Rev. Mol. Cell Bio 2009, 10, 866–876. 10.1038/nrm2805. - DOI - PMC - PubMed
    1. Pierce N. A.; Winfree E. Protein Design Is NP-Hard. Protein Eng. Des. Sel. 2002, 15 (10), 779–782. 10.1093/protein/15.10.779. - DOI - PubMed

LinkOut - more resources