Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Feb 15:34:gzab019.
doi: 10.1093/protein/gzab019.

Machine learning for enzyme engineering, selection and design

Affiliations
Review

Machine learning for enzyme engineering, selection and design

Ryan Feehan et al. Protein Eng Des Sel. .

Abstract

Machine learning is a useful computational tool for large and complex tasks such as those in the field of enzyme engineering, selection and design. In this review, we examine enzyme-related applications of machine learning. We start by comparing tools that can identify the function of an enzyme and the site responsible for that function. Then we detail methods for optimizing important experimental properties, such as the enzyme environment and enzyme reactants. We describe recent advances in enzyme systems design and enzyme design itself. Throughout we compare and contrast the data and algorithms used for these tasks to illustrate how the algorithms and data can be best used by future designers.

Keywords: deep learning; enzyme design; enzyme engineering; machine learning.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Fig. 1
Fig. 1
The feature tree. ML models reviewed in this paper according to the type of features used. The maroon dotted line across the center of the tree divides features that can be broadly classified as sequence-based features (left of line) and structure-based features (right of line). The downward arrow categorizes features hierarchically, from atomic (top) to organismal (bottom). Citations for the models (numbered column on the right) are color-coded by the relevant task they are used for (enzyme classification (red), enzyme site prediction (orange), condition optimization (green), substrate identification (teal), turnover rate (blue) and design (purple).

References

    1. Agarwal, P.K. (2019) Biochemistry, 58, 438–449. - PMC - PubMed
    1. Ajjolli Nagaraja, A., Charton, P., Cadet, X.F.et al. (2020) Catalysts, 10, 291.
    1. Alley, E.C., Khimulya, G., Biswas, S., AlQuraishi, M. and Church, G.M. (2019) Nat. Methods, 16, 1315–1322. - PMC - PubMed
    1. Amidi, S., Amidi, A., Vlachakis, D., Paragios, N. and Zacharaki, E.I. (2017) Peer J., 5, e3095–e3095. - PMC - PubMed
    1. Bagley, S.C. and Altman, R.B. (1995) Protein Sci., 4, 622–635. - PMC - PubMed

Publication types