Folding the proteome

Esther Braselmann¹, Julie L Chaney, Patricia L Clark

Affiliations

PMID: 23764454
PMCID: PMC3691291
DOI: 10.1016/j.tibs.2013.05.001

Review

Folding the proteome

Esther Braselmann et al. Trends Biochem Sci. 2013 Jul.

. 2013 Jul;38(7):337-44.

doi: 10.1016/j.tibs.2013.05.001. Epub 2013 Jun 11.

Authors

Esther Braselmann¹, Julie L Chaney, Patricia L Clark

Affiliation

¹ Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, IN 46556 USA.

PMID: 23764454
PMCID: PMC3691291
DOI: 10.1016/j.tibs.2013.05.001

Abstract

Protein folding is an essential prerequisite for protein function and hence cell function. Kinetic and thermodynamic studies of small proteins that refold reversibly were essential for developing our current understanding of the fundamentals of protein folding mechanisms. However, we still lack sufficient understanding to accurately predict protein structures from sequences, or the effects of disease-causing mutations. To date, model proteins selected for folding studies represent only a small fraction of the complexity of the proteome and are unlikely to exhibit the breadth of folding mechanisms used in vivo. We are in urgent need of new methods - both theoretical and experimental - that can quantify the folding behavior of a truly broad set of proteins under in vivo conditions. Such a shift in focus will provide a more comprehensive framework from which to understand the connections between protein folding, the molecular basis of disease, and cell function and evolution.

PubMed Disclaimer

Figures

**Figure 1. While typical protein folding models exhibit properties not representative of the *E. coli* proteome, emerging techniques can capture a broader set of proteins**
**(A)** The 4133 proteins from *E. coli* str. K-12 substr. MG1655 (NC_000913.2) were used to construct a proportional Venn diagram, with each unit area in the yellow rectangle corresponding to one *E. coli* protein coding sequence. These sequences were divided by length (< or ≥200 aa) and analyzed for the presence of an N-terminal signal sequence (http://www.cbs.dtu.dk/services/SignalP) (*blue shading*), one or more transmembrane α-helices (http://www.cbs.dtu.dk/services/TMHMM) (*pink shading*), both a signal sequence and a transmembrane α-helix (*purple shading*) and/or a PDB entry with >95% sequence identity to at least some portion of the protein sequence (*hatched area*). Note that this map underestimates the complexity of the proteome, as each protein coding sequence from *E. coli* genome is treated as a separate monomeric protein. A set of 165 non-redundant model proteins used to study protein folding (<95% sequence identity) [3-9] was also analyzed. Each protein is indicated by a green point proportional to the size of one E. coli coding sequence. Seventeen of the model proteins have >95% sequence identity to an *E. coli* protein (*dark green points*); the remaining 148 model proteins are from other organisms (*light green points*). In some cases these models represent individual domains or fragments taken from larger proteins, but as it is known that removal from a larger protein context can change folding behavior [33, 44] (see text), the size of the studied domain is used here. **(B)** Subsets of proteins identified by proteome-wide screens designed to select other, non-traditional folding behavior were categorized as described for the 165 folding models and compared to the properties of the *E. coli* proteome as in panel (A). Kinetically stable proteins (*red points*) were identified by protease resistance [43] or resistance to moderate concentrations of sodium dodecyl sulfate (SDS) [42], yielding 81 non-redundant *E. coli* proteins. *E. coli* chaperone client proteins (*blue points*) represent both DnaK substrates (category “enriched” in [61]) and GroEL substrates (“class IV” in [60]), resulting in a set of 227 proteins. Proteins present in both sets (kinetically stable and chaperone client) are indicated as *purple points*. Note that there is only one protein in common between the folding models (panel (A)) and kinetically stable and/or chaperone client proteins: maltose binding protein, a kinetically stable protein [43]. **(C)** Size distribution for each protein group shown in panels (A) and (B), sorted by sequence length.

**Figure 2. Protein folding models are biased towards monomeric proteins**
The multimerization state of each group of proteins shown in Figure 1 (*E. coli* proteome, protein folding models, kinetically stable proteins, chaperone client proteins) was determined. For the *E. coli* proteome, subunit assignments in the Uniprot database were used (30% of proteins in the *E. coli* proteome have assignments; 1236 proteins). Multimerization state for the 165 non-redundant protein folding models was assigned based on reported multimerization state in the protein folding literature. The multimerization state is indicated for 71 of the 81 kinetically stable proteins identified in ref. [42, 43]. The multimerization state of the chaperone client proteins was assigned using the Uniprot database. 103 of the 227 non-redundant chaperone client proteins have a subunit assignment in the Uniprot database.

**Figure 3. Examples of diversity amongst protein folding mechanisms**
**(A)**: Most proteins currently used as folding models are marginally stable (*black*), meaning that their folded lifetime (t_1/2) is short. Lifetime can be increased in two ways. The native structure can be stabilized thermodynamically, increasing the energetic difference between the denatured ensemble and the native structure (increasing ΔG^o_folding, *blue*). Alternatively, the energetic barrier separating the denatured ensemble and the native conformation (ΔG^‡) can be increased (*red*); this will preserve the (low) thermodynamic stability but increase the folded state lifetime. Increasing the energy barrier yields kinetically stable proteins, which can be identified by proteome-wide folding screens [42, 43] (see also Figure 1B). **(B):** Proteins fold from an ensemble of unfolded states, represented by the wide top of a protein folding funnel. In simple model systems (*yellow*), the funnel has one energy minimum, the native conformation. However, some proteins have a more complex energy landscape and can adopt alternative folded structures (*green*). These two folded structures may interconvert, or features of the cellular environment may stabilize a subset of early folding intermediates, resulting in a biased accumulation of one structure versus the other(s).

**Figure 4. Proteins identified by *in vivo* assays and folding “outliers” are structurally more complex than typical folding models**
PDB ID codes are indicated in parentheses. Subunits of multimeric proteins are shown in different colors, and cofactors (in myoglobin and fumarate reductase) are shown in red. Most models used to study protein folding **(A)** are smaller and less complex than proteins representing diverse properties from the *E. coli* proteome **(B)**. Of the *E. coli* proteins shown here, purine nucleoside phosphorylase (a hexamer) and phosphoglycerate kinase (a monomer) were identified in the screen for kinetically stable proteins [43], aspartate-β-semialdehyde dehydrogenase (a dimer) was identified in the screen for chaperone client proteins [60], and the outer membrane protein TolC (a trimer) was identified in screens for both kinetic stability and chaperone clients [42, 61]. Lactose permease (a monomer) is an α-helical transmembrane protein. Two subunits of the tetrameric fumarate reductase contain transmembrane α-helices (shown in green and pink). The soluble subunits of fumarate reductase (shown in blue and yellow) were identified in the screens for chaperone clients [60, 61].

See this image and copyright information in PMC

References

1. Anfinsen CB. Principles that govern the folding of protein chains. Science. 1973;181:223–230. - PubMed
1. Pace CN. Determination and analysis of urea and guanidine hydrochloride denaturation curves. Methods Enzymol. 1986;131:266–280. - PubMed
1. Plaxco KW, et al. Contact order, transition state placement and the refolding rates of single domain proteins. J. Mol. Biol. 1998;277:985–994. - PubMed
1. Jackson SE. How do small single-domain proteins fold? Fold. Des. 1998;3:R81–R91. - PubMed
1. Galzitskaya OV, et al. Chain length is the main determinant of the folding rate for proteins with three-state folding kinetics. Proteins. 2003;51:162–166. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

GM074807/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Folding the proteome

Affiliation

Folding the proteome

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources