Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2013 Jul;38(7):337-44.
doi: 10.1016/j.tibs.2013.05.001. Epub 2013 Jun 11.

Folding the proteome

Affiliations
Review

Folding the proteome

Esther Braselmann et al. Trends Biochem Sci. 2013 Jul.

Abstract

Protein folding is an essential prerequisite for protein function and hence cell function. Kinetic and thermodynamic studies of small proteins that refold reversibly were essential for developing our current understanding of the fundamentals of protein folding mechanisms. However, we still lack sufficient understanding to accurately predict protein structures from sequences, or the effects of disease-causing mutations. To date, model proteins selected for folding studies represent only a small fraction of the complexity of the proteome and are unlikely to exhibit the breadth of folding mechanisms used in vivo. We are in urgent need of new methods - both theoretical and experimental - that can quantify the folding behavior of a truly broad set of proteins under in vivo conditions. Such a shift in focus will provide a more comprehensive framework from which to understand the connections between protein folding, the molecular basis of disease, and cell function and evolution.

PubMed Disclaimer

Figures

Figure 1
Figure 1. While typical protein folding models exhibit properties not representative of the E. coli proteome, emerging techniques can capture a broader set of proteins
(A) The 4133 proteins from E. coli str. K-12 substr. MG1655 (NC_000913.2) were used to construct a proportional Venn diagram, with each unit area in the yellow rectangle corresponding to one E. coli protein coding sequence. These sequences were divided by length (< or ≥200 aa) and analyzed for the presence of an N-terminal signal sequence (http://www.cbs.dtu.dk/services/SignalP) (blue shading), one or more transmembrane α-helices (http://www.cbs.dtu.dk/services/TMHMM) (pink shading), both a signal sequence and a transmembrane α-helix (purple shading) and/or a PDB entry with >95% sequence identity to at least some portion of the protein sequence (hatched area). Note that this map underestimates the complexity of the proteome, as each protein coding sequence from E. coli genome is treated as a separate monomeric protein. A set of 165 non-redundant model proteins used to study protein folding (<95% sequence identity) [3-9] was also analyzed. Each protein is indicated by a green point proportional to the size of one E. coli coding sequence. Seventeen of the model proteins have >95% sequence identity to an E. coli protein (dark green points); the remaining 148 model proteins are from other organisms (light green points). In some cases these models represent individual domains or fragments taken from larger proteins, but as it is known that removal from a larger protein context can change folding behavior [33, 44] (see text), the size of the studied domain is used here. (B) Subsets of proteins identified by proteome-wide screens designed to select other, non-traditional folding behavior were categorized as described for the 165 folding models and compared to the properties of the E. coli proteome as in panel (A). Kinetically stable proteins (red points) were identified by protease resistance [43] or resistance to moderate concentrations of sodium dodecyl sulfate (SDS) [42], yielding 81 non-redundant E. coli proteins. E. coli chaperone client proteins (blue points) represent both DnaK substrates (category “enriched” in [61]) and GroEL substrates (“class IV” in [60]), resulting in a set of 227 proteins. Proteins present in both sets (kinetically stable and chaperone client) are indicated as purple points. Note that there is only one protein in common between the folding models (panel (A)) and kinetically stable and/or chaperone client proteins: maltose binding protein, a kinetically stable protein [43]. (C) Size distribution for each protein group shown in panels (A) and (B), sorted by sequence length.
Figure 2
Figure 2. Protein folding models are biased towards monomeric proteins
The multimerization state of each group of proteins shown in Figure 1 (E. coli proteome, protein folding models, kinetically stable proteins, chaperone client proteins) was determined. For the E. coli proteome, subunit assignments in the Uniprot database were used (30% of proteins in the E. coli proteome have assignments; 1236 proteins). Multimerization state for the 165 non-redundant protein folding models was assigned based on reported multimerization state in the protein folding literature. The multimerization state is indicated for 71 of the 81 kinetically stable proteins identified in ref. [42, 43]. The multimerization state of the chaperone client proteins was assigned using the Uniprot database. 103 of the 227 non-redundant chaperone client proteins have a subunit assignment in the Uniprot database.
Figure 3
Figure 3. Examples of diversity amongst protein folding mechanisms
(A): Most proteins currently used as folding models are marginally stable (black), meaning that their folded lifetime (t1/2) is short. Lifetime can be increased in two ways. The native structure can be stabilized thermodynamically, increasing the energetic difference between the denatured ensemble and the native structure (increasing ΔGofolding, blue). Alternatively, the energetic barrier separating the denatured ensemble and the native conformation (ΔG) can be increased (red); this will preserve the (low) thermodynamic stability but increase the folded state lifetime. Increasing the energy barrier yields kinetically stable proteins, which can be identified by proteome-wide folding screens [42, 43] (see also Figure 1B). (B): Proteins fold from an ensemble of unfolded states, represented by the wide top of a protein folding funnel. In simple model systems (yellow), the funnel has one energy minimum, the native conformation. However, some proteins have a more complex energy landscape and can adopt alternative folded structures (green). These two folded structures may interconvert, or features of the cellular environment may stabilize a subset of early folding intermediates, resulting in a biased accumulation of one structure versus the other(s).
Figure 4
Figure 4. Proteins identified by in vivo assays and folding “outliers” are structurally more complex than typical folding models
PDB ID codes are indicated in parentheses. Subunits of multimeric proteins are shown in different colors, and cofactors (in myoglobin and fumarate reductase) are shown in red. Most models used to study protein folding (A) are smaller and less complex than proteins representing diverse properties from the E. coli proteome (B). Of the E. coli proteins shown here, purine nucleoside phosphorylase (a hexamer) and phosphoglycerate kinase (a monomer) were identified in the screen for kinetically stable proteins [43], aspartate-β-semialdehyde dehydrogenase (a dimer) was identified in the screen for chaperone client proteins [60], and the outer membrane protein TolC (a trimer) was identified in screens for both kinetic stability and chaperone clients [42, 61]. Lactose permease (a monomer) is an α-helical transmembrane protein. Two subunits of the tetrameric fumarate reductase contain transmembrane α-helices (shown in green and pink). The soluble subunits of fumarate reductase (shown in blue and yellow) were identified in the screens for chaperone clients [60, 61].

References

    1. Anfinsen CB. Principles that govern the folding of protein chains. Science. 1973;181:223–230. - PubMed
    1. Pace CN. Determination and analysis of urea and guanidine hydrochloride denaturation curves. Methods Enzymol. 1986;131:266–280. - PubMed
    1. Plaxco KW, et al. Contact order, transition state placement and the refolding rates of single domain proteins. J. Mol. Biol. 1998;277:985–994. - PubMed
    1. Jackson SE. How do small single-domain proteins fold? Fold. Des. 1998;3:R81–R91. - PubMed
    1. Galzitskaya OV, et al. Chain length is the main determinant of the folding rate for proteins with three-state folding kinetics. Proteins. 2003;51:162–166. - PubMed

Publication types

MeSH terms