Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Oct;298(10):102435.
doi: 10.1016/j.jbc.2022.102435. Epub 2022 Aug 27.

Engineering functional thermostable proteins using ancestral sequence reconstruction

Affiliations
Review

Engineering functional thermostable proteins using ancestral sequence reconstruction

Raine E S Thomson et al. J Biol Chem. 2022 Oct.

Abstract

Natural proteins are often only slightly more stable in the native state than the denatured state, and an increase in environmental temperature can easily shift the balance toward unfolding. Therefore, the engineering of proteins to improve protein stability is an area of intensive research. Thermostable proteins are required to withstand industrial process conditions, for increased shelf-life of protein therapeutics, for developing robust 'biobricks' for synthetic biology applications, and for research purposes (e.g., structure determination). In addition, thermostability buffers the often destabilizing effects of mutations introduced to improve other properties. Rational design approaches to engineering thermostability require structural information, but even with advanced computational methods, it is challenging to predict or parameterize all the relevant structural factors with sufficient precision to anticipate the results of a given mutation. Directed evolution is an alternative when structures are unavailable but requires extensive screening of mutant libraries. Recently, however, bioinspired approaches based on phylogenetic analyses have shown great promise. Leveraging the rapid expansion in sequence data and bioinformatic tools, ancestral sequence reconstruction can generate highly stable folds for novel applications in industrial chemistry, medicine, and synthetic biology. This review provides an overview of the factors important for successful inference of thermostable proteins by ancestral sequence reconstruction and what it can reveal about the determinants of stability in proteins.

Keywords: ancestral sequence reconstruction; biocatalysis; cytochrome P450; directed evolution; molecular evolution; precambrian; protein engineering; synthetic biology; thermostability.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest The authors are engaged in directed evolution efforts to produce thermostable cytochrome P450 enzymes for biocatalysis and synthetic biology applications, some of which have been licensed for application in pharmaceutical and fine chemical production under the tradename “CYPerior.” The authors declare that they have no conflicts of interest with the contents of this article.

Figures

Figure 1
Figure 1
Comparison of approaches to engineering thermostability, in terms of typical screening effort (library size) and information required, and the extent of sequence space that can be sampled. Approaches are grouped broadly into directed evolution, rational (including computer-aided) design, and phylogenetic methods (i.e., evolutionary methods that rely on data mining of sequences in natural evolutionary trees as opposed to directed evolution experiments). Note that there is some overlap between approaches (e.g., site saturation mutagenesis can be used for rational design as well as directed evolution strategies; computational methods can be used to augment directed evolution and phylogenetic approaches), and different methods are often combined.
Figure 2
Figure 2
Changes in experimentally determined thermostability (T50or Tm) versus estimated evolutionary age observed in resurrected ancestors compared to their related extant forms. An overall trend is seen toward greater thermostability in older ancestors but the magnitude of the effect differs markedly between different proteins and with the overall stability of the extant form. The data used in this analysis are from the studies listed in Table 1; only those that proposed an estimated age for respective ancestors are shown here. Different colors represent individual studies and for each phylogeny, directly related lineages are connected by solid lines. Sources in the order shown in the figure are: (41, 42, 43, 45, 47, 48, 49, 63, 64, 97, 109, 115, 116, 117, 118, 119, 120, 121, 122).
Figure 3
Figure 3
Outline of the ASR process. Extant protein sequences collected from sequence databases are iteratively aligned and curated to remove poor quality or potentially erroneous data then used to generate a phylogenetic tree. The tree, alignment, and an amino acid substitution model are used as inputs for ancestral inference using probabilistic methods. Ancestors from points of interest in the evolutionary tree are then reverse translated and the corresponding ORFs synthesized and expressed in a heterologous host, for example, E. coli. The resurrected ancestors can then be characterized for various biochemical properties or used as templates for further protein engineering. ASR, ancestral sequence reconstruction.
Figure 4
Figure 4
Examples of sequence curation required for ASR.A, representative changes in the overall alignment during sequence curation. The red rectangle at the top left of each image shows an equivalent area of the alignment. The overall number of sequences decreases during curation as sequences with likely artefacts (insertions, deletions, and frameshifts) resulting from miscalling of start, stop, and splice sites are removed. Removal of such sequences, especially those containing insertion artefacts, improves the ability to align the remaining sequences such that the overall alignment length decreases markedly. BE, arrows indicate sequences with likely artefacts. B, very short sequence fragments are typically removed since they may not encode a functional protein, whereas sequences that lack a small proportion of the overall coding sequence at the N or C termini can be retained without disrupting the ASR. C, incorrectly called start and stop sites lead to massively extended sequences, which appear as clear outliers in sequence alignments. If these sequences are retained in the alignment used for the ASR, the inferred ancestors will have similar artefactual extensions, so extensions are typically pruned to the consensus start and stop sites. D, artefactual insertions, deletions, and frameshifts appear as sequences with marked differences to phylogenetic near-neighbors over an extended area of the alignment. Such artefacts are readily visible in highly conserved regions but may not be apparent in regions of higher variability or in alignments with highly diverse sequences. Biochemical expertise can also be used to interpret the likelihood of these sequences being correct, that is, from what is known about the structure, a prediction be made as to whether the fold would tolerate such a disruption to the typical sequence. E, likely pseudogenes are evident from a pattern of numerous, possibly minor deviations from the sequence of phylogenetic near-neighbors distributed across the ORF. ASR, ancestral sequence reconstruction.

References

    1. Pace C.N. Conformational stability of globular proteins. Trends Biochem. Sci. 1990;15:14–17. - PubMed
    1. Baker D. What has de novo protein design taught us about protein folding and biophysics? Protein Sci. 2019;28:678–683. - PMC - PubMed
    1. Bommarius A.S., Paye M.F. Stabilizing biocatalysts. Chem. Soc. Rev. 2013;42:6534–6565. - PubMed
    1. Burton S.G., Cowan D.A., Woodley J.M. The search for the ideal biocatalyst. Nat. Biotech. 2002;20:37–45. - PubMed
    1. Tokuriki N., Tawfik D.S. Stability effects of mutations and protein evolvability. Curr. Opin. Struct. Biol. 2009;19:596–604. - PubMed

Publication types

LinkOut - more resources