Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2017 May 22:46:85-103.
doi: 10.1146/annurev-biophys-070816-033819. Epub 2017 Mar 15.

Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence

Affiliations
Review

Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence

Julian Echave et al. Annu Rev Biophys. .

Abstract

For decades, rates of protein evolution have been interpreted in terms of the vague concept of functional importance. Slowly evolving proteins or sites within proteins were assumed to be more functionally important and thus subject to stronger selection pressure. More recently, biophysical models of protein evolution, which combine evolutionary theory with protein biophysics, have completely revolutionized our view of the forces that shape sequence divergence. Slowly evolving proteins have been found to evolve slowly because of selection against toxic misfolding and misinteractions, linking their rate of evolution primarily to their abundance. Similarly, most slowly evolving sites in proteins are not directly involved in function, but mutating these sites has a large impact on protein structure and stability. In this article, we review the studies in the emerging field of biophysical protein evolution that have shaped our current understanding of sequence divergence patterns. We also propose future research directions to develop this nascent field.

Keywords: evolutionary rate; fitness landscape; protein folding; protein misfolding; protein–protein interaction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Common stability-based fitness models. (a) The soft-threshold model assumes that sufficiently stable proteins (low ΔG) all have the same high fitness, whereas unstable proteins (high ΔG) have fitness zero. Inbetween these two extremes, there is a smooth, sigmoidal transition region. (b) The threshold model can be considered as a limiting case of the soft-threshold model. It assumes that all proteins with sufficient stability (ΔG < some threshold) have the same fitness, while all other proteins are completely inviable. (c) The stability-optimum model assumes that there is an ideal stability for maximum fitness, and both more and less stable proteins will be less fit. (d) The maximum-stability model assumes that fitness is the higher the more stable the protein.
Figure 2
Figure 2
The relationship between the effective number of amino acids Ω and the relative solvent accessibility (RSA) of sites depends on whether amino-acid distributions are pooled across similar sites. (a) Without pooling, an Ω value is calculated at each site, and then Ω is averaged over all sites within the same RSA percentile. This procedure yields relatively low Ω values, between 2 and 10, and a nearly linear increase of Ω with increasing RSA. (b) Alternatively, amino-acid distributions are pooled among all sites within the same RSA percentile, and then a single Ω value is calculated for each pooled distribution. This procedure yields much higher Ω values (> 10), and it shows a maximum at intermediate RSA values. Protein sequences and structural data for this analysis were taken from (34). We selected 266 distinct enzymes for which each available alignment consisted of at least 400 sequences.
Figure 3
Figure 3
Conceptual explanation for the results shown in Figure 2. (a) At a per-site level, buried sites are most conserved, intermediate sites are moderately conserved, and exposed sites are the most variable. (b) However, for both buried and exposed sites, the types of amino acids that are seen are limited to mostly hydrophobic or mostly polar residues, respectively, whereas for intermediate sites both hydrophobic and polar residues are seen across many sites (58). As a consequence, even though the site variability increases approximately linearly with solvent exposure (c), the amino-acid variability across sites has a maximum at intermediate solvent exposure (d).

References

    1. Arenas M, Sanchez-Cobos A, Bastolla U. Maximum-likelihood phylogenetic inference with selection on protein folding stability. Mol. Biol. Evol. 2015;32:2195–2207. - PMC - PubMed
    1. Ashenberg O, Gong LI, Bloom JD. Mutational effects on stability are largely conserved during protein evolution. Proc. Natl. Acad. Sci. USA. 2013;110:21071–21076. - PMC - PubMed
    1. Bastolla U, Porto M, Roman HE, Vendruscolo M. Principal eigenvector of contact matrices and hydrophobicity profiles in proteins. Proteins. 2004;58:22–30. - PubMed
    1. Bastolla U, Porto M, Roman HE, Vendruscolo M. A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank. BMC Evol. Biol. 2006;6:43. - PMC - PubMed
    1. Bloom JD, Drummond DA, Arnold FH, Wilke CO. Structural determinants of the rate of protein evolution in yeast. Mol. Biol. Evol. 2006;23:1751–1761. - PubMed

LinkOut - more resources