Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2009 Dec;10(12):866-76.
doi: 10.1038/nrm2805.

Exploring protein fitness landscapes by directed evolution

Affiliations
Review

Exploring protein fitness landscapes by directed evolution

Philip A Romero et al. Nat Rev Mol Cell Biol. 2009 Dec.

Abstract

Directed evolution circumvents our profound ignorance of how a protein's sequence encodes its function by using iterative rounds of random mutation and artificial selection to discover new and useful proteins. Proteins can be tuned to adapt to new functions or environments by simple adaptive walks involving small numbers of mutations. Directed evolution studies have shown how rapidly some proteins can evolve under strong selection pressures and, because the entire 'fossil record' of evolutionary intermediates is available for detailed study, they have provided new insight into the relationship between sequence and function. Directed evolution has also shown how mutations that are functionally neutral can set the stage for further adaptation.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Protein fitness landscapes
Directed protein evolution traverses a ‘fitness landscape’ in sequence space. This fitness is the measure of how well a given protein performs a target function. a | The plot of fitness against sequence creates the landscape for evolution. The transition from black, to red, to orange, to yellow represents increasing fitness. Although the details of this landscape are unknown, it is believed that most sequences do not function (black) and that the rare functional sequences encoding natural proteins are clustered near other functional sequences. This popular three-dimensional representation, however, does a poor job of illustrating the very large number of paths available to evolution and the large number of sequences within functional regions that do not encode functional proteins . b | Similar to most natural protein evolution, directed evolution moves along networks of functional proteins that differ by a single amino acid, because selection requires a continuous uphill walk and does not permit the fixation of nonfunctional sequences. Epistasis occurs when the effect of one mutation depends on the presence of another, which can create landscape ruggedness and local optima. Landscapes could range from the rugged badlands landscape (the ‘Badlands’ landscape) which is nearly impossible to climb by mutational steps, to the Fujiyama landscape, where any beneficial mutation brings the search closer to the optimum . c | The presence of local optima might restrict some of the mutational paths uphill (red line). However, the large number of alternate routes leaves plenty of adaptive paths to a fitness optimum (green line).
Figure 2
Figure 2. Overview of directed evolution
The objective of directed evolution is to create a specific protein function through successive rounds of mutation and selection, starting from a parent protein exhibiting a related function. There are numerous options for implementing each step in the process, the choice of which can greatly affect the efficiency and success of the protein sequence optimization. A parent sequence (or sequences) is chosen based on its perceived proximity to the desired function and its evolvability. This parent sequence is then mutated to form a library of new sequences. (Error-prone PCR or other methods can be used to incorporate mutations randomly, recombination can be used to introduce mutations from other functional sequences, or mutation sites can be chosen based on functional and/or structural information.) These mutated sequences are evaluated for their ability to perform the desired function using a high-throughput screen or artificial selection. The most ‘fit’ sequence (or sequences) is used as the parent for the next round of directed evolution, and this process is repeated until the engineering objective is met (usually 5-10 generations).
Figure 3
Figure 3. Recombination of homologous sequences
a | Recombination generates highly mutated sequence libraries. Multiple homologous parent sequences are divided into fragments, which can be chosen to minimize structural disruption , and these fragments are recombined to form a combinatorial library of chimeric proteins. b | The mutations from homologous recombination are much more conservative than random mutations. In β-lactamase, chimeras with high levels of amino acid mutations (around 75) are 1016 times more likely to fold than sequences with 75 random mutations. Modified, with permission, from REF. 56 (2005) National Academy of Sciences. c | Chimeric proteins contain new combinations of beneficial mutations. The histogram shows the distribution of thermostabilities (T50 – temperature where 50% of the proteins are inactivated in 10 minutes) of 184 randomly-selected chimeric cytochrome P450 enzymes made by structure-guided recombination. The stabilities of the three parents are marked by the red lines . A significant fraction of chimeras are more stable than any parent from which they are derived. Modified, with permission, from Nature Biotech. REF. 89 (2006) Macmillan Publishers Ltd. All rights reserved.
Figure 4
Figure 4. Directed evolution of a cytochrome P450 propane monooxygenase
Cytochrome P450 BM3 from Bacillus megaterium catalyzes the hydroxylation of long-chain fatty acids and has no measurable activity on propane. This enzyme was converted into a highly efficient and specific propane monooxygenase over 13 rounds of directed evolution, , . The large change in substrate specificity was achieved using an incremental approach that involved screening first on an intermediate substrate. Since the native substrate contains a long alkane chain, and the target function was activity on a short alkane, an intermediate-length alkane towards which the parent enzyme had low but measurable activity (octane) was chosen as the initial directed evolution target. Once high octane activity was achieved, the selective pressure was switched toward activity on propane. a | Selected kinetic and biophysical properties of evolutionary intermediates from later generations. Total catalytic turnovers (moles propanol produced per mole P450), Km, and kcat are reported for propane hydroxylation. Thermostability is reported as T50 (temperature where half of the enzyme inactivates after 10 min incubation). Variants were selected for total propane activity in all generations, except for generation 9, which was selected for stability (T50). The mutations acquired during each generation are listed (and mapped to the structure below). Even small numbers of mutations can be responsible for large functional changes. Modified, with permission, from REF. 72 b | The crystal structure of the fifth generation P450 heme domain (139-3, PDB ID: 3CBD) with the locations of the mutations from subsequent generations colored as: generation 6 – red, generation 8 – green, generation 9 – blue, generation 10 – yellow, generation 11 – magenta, generation 12 – cyan, and generation 13 – orange. Beneficial mutations are distributed over the heme domain, and many are tens of Å from the catalytic iron.
Figure 5
Figure 5. Stability threshold and epistasis
Laboratory evolution studies have found many examples of mutational epistasis that are related to protein stability. The relationship between protein stability and epistasis is best explained in terms of a protein stability threshold, where stability is under selection only insofar as it allows a protein to fold and function, , . a | Epistasis can arise as the result of the protein stability threshold. The G238S active-site mutation in this β-lactamase increases enzyme activity on cephalosporin antibiotics. However, this mutation cannot be accepted into the wild-type sequence (MG) because the resulting protein (MS) is not sufficiently stable. Sequences with the beneficial G238S mutation can instead be reached by first finding the functionally neutral, but stabilizing M182T mutation and then incorporating the G238S mutation. b | Because most mutations are destabilizing, many of the single mutants of a protein close to the stability threshold (top panel) will be unstable and therefore inactive (red). This leaves few active mutants having beneficial mutations (green). A more stable protein (bottom panel) will be more tolerant to mutation, making available more beneficial mutations (those that might also be destabilizing).

Similar articles

Cited by

References

    1. Chen K, Arnold FH. Tuning the activity of an enzyme for unusual environments: sequential random mutagenesis of subtilisin E for catalysis in dimethylformamide. Proc Natl Acad Sci USA. 1993;90:5618–5622. - PMC - PubMed
    2. The first demonstration of directed evolution by successive rounds of mutagenesis and screening, a strategy now widely used to engineer enzymes.

    1. Reetz MT. Combinatorial and evolution-based methods in the creation of enantioselective catalysts. Angew Chem Int Ed. 2001;40:284–310. - PubMed
    1. Boder ET, Midelfort KS, Wittrup KD. Directed evolution of antibody fragments with monovalent femtomolar antigen-binding affinity. Proc Natl Acad Sci USA. 2000;97:10701–10705. - PMC - PubMed
    1. Campbell RE, et al. A monomeric red fluorescent protein. Proc Natl Acad Sci USA. 2002;99:7877–7882. - PMC - PubMed
    1. Jiang L, et al. De novo computational design of retro-aldol enzymes. Science. 2008;319:1387–1391. - PMC - PubMed

Publication types