Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 30;116(18):8852-8858.
doi: 10.1073/pnas.1901979116. Epub 2019 Apr 12.

Machine learning-assisted directed protein evolution with combinatorial libraries

Affiliations

Machine learning-assisted directed protein evolution with combinatorial libraries

Zachary Wu et al. Proc Natl Acad Sci U S A. .

Erratum in

Abstract

To reduce experimental effort associated with directed protein evolution and to explore the sequence space encoded by mutating multiple positions simultaneously, we incorporate machine learning into the directed evolution workflow. Combinatorial sequence space can be quite expensive to sample experimentally, but machine-learning models trained on tested variants provide a fast method for testing sequence space computationally. We validated this approach on a large published empirical fitness landscape for human GB1 binding protein, demonstrating that machine learning-guided directed evolution finds variants with higher fitness than those found by other directed evolution approaches. We then provide an example application in evolving an enzyme to produce each of the two possible product enantiomers (i.e., stereodivergence) of a new-to-nature carbene Si-H insertion reaction. The approach predicted libraries enriched in functional enzymes and fixed seven mutations in two rounds of evolution to identify variants for selective catalysis with 93% and 79% ee (enantiomeric excess). By greatly increasing throughput with in silico modeling, machine learning enhances the quality and diversity of sequence solutions for a protein engineering problem.

Keywords: catalysis; directed evolution; enzyme; machine learning; protein engineering.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
(A) Directed evolution with single mutations. If limited to single mutations, the identification of optimal amino acids for N positions requires N rounds of evolution. (B) Directed evolution by recombining mutations found in best variants from a random combinatorial search. (C) Machine learning-assisted directed evolution. As a result of increased throughput provided by screening in silico, four positions can be explored simultaneously in a single round, enabling a broader search of sequence–function relationships and deeper exploration of epistatic interactions.
Fig. 2.
Fig. 2.
(A) Highest fitness values found by directed evolution and directed evolution assisted by machine learning. The distribution of fitness peaks found by iterative site-saturation mutagenesis from all labeled variants (149,361 of 204 possible covering four residues) is shown in red. The distribution of fitness peaks found by 10,000 recombination runs with an average of 570 variants tested is shown in blue. The distribution of the highest fitnesses found from 600 runs of the machine learning-assisted approach is shown in green. A total of 570 variants are tested in all approaches. For reference, the distribution of all measured fitness values in the landscape is shown in gray. (B) The same evolutionary distributions are shown as empirical cumulative distribution functions, where the ordinate at any specified fitness value is the fraction of evolutionary runs that reach a fitness less than or equal to that specified value. Machine learning-assisted evolution walks are more likely to reach higher fitness levels compared with conventional directed evolution.
Fig. 3.
Fig. 3.
Carbon–silicon bond formation catalyzed by heme-containing Rma NOD to form individual product enantiomers with high selectivity.
Fig. 4.
Fig. 4.
(A) Structural homology model of Rma NOD and positions of mutated residues made by SWISS-MODEL (47). Set I positions 32, 46, 56, and 97 are shown in red, and set II positions 49, 51, and 53 are shown in blue. (B) Evolutionary lineage of the two rounds of evolution. (C) Summary statistics for each round, including the number of sequences obtained to train each model, the fraction of the total library represented in the input variants, each model’s leave-one-out cross-validation (CV) Pearson correlation, and the number of predicted sequences tested.
Fig. 5.
Fig. 5.
A library’s fitness values can be visualized as a 1D distribution, in this case as kernel density estimates over corresponding rug plots. This figure shows subplots for each library illustrating the changes between input (lighter) and predicted (darker) libraries for the (S)-enantiomers (cyan) and (R)-enantiomers (red). The initial input library for set I is shown in gray. The predicted (darker) libraries for each round are shifted toward the right and left of the distributions for the (S)- and (R)-enantiomers, respectively. For reference, dotted lines are shown for no enantiopreference (i.e., 0% ee).

References

    1. Petrović D, Kamerlin SCL. Molecular modeling of conformational dynamics and its role in enzyme evolution. Curr Opin Struct Biol. 2018;52:50–57. - PubMed
    1. Romero PA, Arnold FH. Exploring protein fitness landscapes by directed evolution. Nat Rev Mol Cell Biol. 2009;10:866–876. - PMC - PubMed
    1. Goldsmith M, Tawfik DS. Enzyme engineering: Reaching the maximal catalytic efficiency peak. Curr Opin Struct Biol. 2017;47:140–150. - PubMed
    1. Zeymer C, Hilvert D. Directed evolution of protein catalysts. Annu Rev Biochem. 2018;87:131–157. - PubMed
    1. Garcia-Borrás M, Houk KN, Jiménez-Oses G. Computational design of protein function. In: Martín-Santamaría S, editor. Computational Tools for Chemical Biology. Royal Society of Chemistry; London: 2018. pp. 87–107.

Publication types

Supplementary concepts

LinkOut - more resources