Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 1998 Apr 28;95(9):4976-81.
doi: 10.1073/pnas.95.9.4976.

How evolution makes proteins fold quickly

Affiliations

How evolution makes proteins fold quickly

L A Mirny et al. Proc Natl Acad Sci U S A. .

Abstract

Sequences of fast-folding model proteins (48 residues long on a cubic lattice) were generated by an evolution-like selection toward fast folding. We find that fast-folding proteins exhibit a specific folding mechanism in which all transition state conformations share a smaller subset of common contacts (folding nucleus). Acceleration of folding was accompanied by dramatic strengthening of interactions in the folding nucleus whereas average energy of nonnucleus interactions remained largely unchanged. Furthermore, the residues involved in the nucleus are the most conserved ones within families of evolved sequences. Our results imply that for each protein structure there is a small number of conserved positions that are key determinants of fast folding into that structure. This conjecture was tested on two protein superfamilies: the first having the classical monophosphate binding fold (CMBF; 98 families) and the second having type-III repeat fold (47 families). For each superfamily, we discovered a few positions that exhibit very strong and statistically significant "conservatism of conservatism"-amino acids in those positions are conserved within every family whereas the actual types of amino acids varied from family to family. Those amino acids are in spatial contact with each other. The experimental data of Serrano and coworkers [Lopez-Hernandez, E. & Serrano, L. (1996) Fold. Des. (London) 1, 43-55]. for one of the proteins of the CMBF superfamily (CheY) show that residues identified this way indeed belong to the folding nucleus. Further analysis revealed deep connections between nucleation in CMBF proteins and their function.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The native conformation of the studied 48-mer. Broken lines show the contacts in the folding nucleus defined and determined as explained in refs. and . “Cold” positions in which no mutations were observed over the whole steady–state part of the evolution (last 500 sequences) are shown in white.
Figure 2
Figure 2
The progress of the evolutionary algorithm showing acceleration of folding (MFPT in MC steps) for the 48-mer model. (Inset) The first 50 accepted mutations.
Figure 3
Figure 3
Evolution of the average energy (per contact) of the nonnucleus native contacts and of the average energy of nucleus contacts. Nucleus contacts are shown by dashed lines in Fig. 1. (Inset) The first 50 accepted mutations.
Figure 4
Figure 4
Analysis for the CMBF superfamily. Ninety-eight families were used (the list is available from authors on request). All listed proteins are structurally homologous to CheY with Z > 3 and RMSD < 4A, according to the families of structurally similar proteins (FSSP) database (19). We used a coarse-grained six-letter amino-acid alphabet whereby amino acids were grouped according to their physical properties into following six classes: “aliphatic + Cys”: A, L, I, V, M, C; “aromatic”: F, Y, W, H; small nonpolar: G, P; polar: T, S, Q, N; basic: R, K; and acidic: E, D. The analysis using all 20 types of amino acids gives results that are qualitatively similar. Horizontal axes denote position in the CheY, which was taken as reference. (a, circles) CoC analysis: intrafamily sequence entropy averaged over all 98 families (excluding gaps), calculated as SCoC(l) = ∑F=1M SintraF(l)/M. Here, the sum is taken over all of the 98 families used in the analysis, excluding gaps. Intrafamily sequence entropy for every position, for a given family, F, is calculated as follows: SintraF(l) = −∑i=16 piF(l)log piF(l), where piF(l) represents the normalized frequency of observing residue of class i (i = 1–6) at position l in all homologous sequences belonging to the family F. The sum is taken over all possible residue classes. (a, squares) sequence entropy calculated across all families. To obtain this quantity, we evaluated frequencies of occurrence of amino acids of each class i at each position l for all families [piacross(l)] and then calculated sequence entropy for a position l as Sacross(l) = −∑i=16 piacross(l)log piacross(l). (b) The probability that equal or lower SCoC will be observed under zero hypothesis that conservatism of a residue in the structure is related primarily to its degree of buriedness.
Figure 5
Figure 5
Ribbon diagram of the CheY structure where the four residues showing most statistically significant CoC (D12, M17, D57, A88) are shown as solid models.

References

    1. Shakhnovich E I. Phys Rev Lett. 1994;72:3907–3910. - PubMed
    1. Shakhnovich E I. Curr Opin Struct Biol. 1997;7:29–40. - PubMed
    1. Shakhnovich E, Gutin A. Protein Eng. 1993;6:793–800. - PubMed
    1. Li H, Winfreen N, Tang C. Science. 1996;273:666–669. - PubMed
    1. Finkelstein A V, Gutin A, Badretdinov A. Proteins Struct Funct Genet. 1995;23:142–149. - PubMed

Publication types