Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 Aug;11(8):1945-57.
doi: 10.1110/ps.0203202.

Thermodynamic environments in proteins: fundamental determinants of fold specificity

Affiliations

Thermodynamic environments in proteins: fundamental determinants of fold specificity

James O Wrabl et al. Protein Sci. 2002 Aug.

Abstract

To investigate the relationship between an amino acid sequence and its corresponding protein fold, a database of thermodynamic stability information was assembled as a function of residue type from 81 nonhomologous proteins. This information was obtained using the COREX algorithm, which computes an ensemble-based description of the native state of proteins. Dissection of the COREX stability constant into its fundamental energetic components resulted in 12 thermodynamic environments describing the tertiary architecture of protein folds. Because of the observation that residue types partitioned unequally between these environments, it was hypothesized that thermodynamic environments contained energetic information that connected sequence to fold. To test the significance of this hypothesis, the thermodynamic stability information was incorporated into a three-dimensional-to-one-dimensional scoring matrix, and simple fold recognition experiments were performed in a manner such that information about the fold target was never included in the scoring. For 60 out of 81 fold targets, the correct sequence for the target scored in the top 5% of 3858 decoy sequences, with Z-scores ranging from 1.76 to 12.23. Furthermore, a scoring matrix assembled from the residues of 40 nonhomologous all-alpha proteins was used to thread sequences against 12 nonhomologous all-beta protein targets. In 10 of 12 cases, sequences known to adopt the native all-beta structure scored in the top 5% of 3858 decoy sequences, with Z-scores ranging from 1.99 to 7.94. These results indicate that energetic information encoded by thermodynamic environments represents a fundamental property of proteins that underlies classifications based on secondary structure.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Results of a COREX calculation for the bacterial cold-shock protein cspA (Protein Data Bank 1mjc). (a) Plot of calculated thermodynamic stability, ln κf,j (Equation 3), as a function of residue number for cspA. The simulated temperature was 25°C. Regions of relatively high, medium, and low stability, as defined in Equations 19 through 21, are shown in blue, green, and red, respectively. Secondary structure elements, as defined by the program DSSP (Kabsch and Sander 1983) are labeled. (b) The relative calculated stabilities of each residue in the 1mjc crystal structure. Note that a given secondary structural element is predicted to have regions of varying stability, and that the most stable regions of the molecule are often, but not necessarily, within the hydrophobic core.
Fig. 2.
Fig. 2.
Description of protein structure in terms of thermodynamic environments. (a) Thermodynamic environment classification scheme used in this work. Three quantities derived from the output of the COREX algorithm—stability (κf,j), enthalpy ratio (Hratio,j), and entropy ratio (Sratio,j)—describe the thermodynamic environment of each residue. Calculation of these quantities is described in Materials and Methods. (b) The 12 thermodynamic environments defined by this classification scheme are shown in a schematic describing protein energetic phase space. Each colored cube represents a region dominated by certain stability, enthalpy, and entropy characteristics. Every residue position in the protein structures used in this work lies somewhere within this phase space. (c) Examples of the distribution of thermodynamic environments of b in three proteins with varying types and amounts of secondary structure. Note that single secondary structure elements do not show unique thermodynamic environments.
Fig. 3.
Fig. 3.
Three-dimensional–to–one-dimensional scores relating amino acid types to 12 protein structural thermodynamic environments. The scores were calculated from normalized probabilities (log-odds ratios) of observing amino acid types in thermodynamic environments calculated from protein structures using the COREX algorithm, as described in the text. The 12 thermodynamic environments were classified empirically, as described in Materials and Methods. The three-letter abbreviation in each panel represents the stability, enthalpic, and entropic descriptor of the thermodynamic environment. For example, MLH represents a protein thermodynamic environment of medium stability, low polar/apolar enthalpy ratio, and high conformational entropy/Gibbs’ solvation energy ratio.
Fig. 4.
Fig. 4.
Fold-recognition results for 81 protein targets using a scoring matrix composed of thermodynamic information from protein structures. The horizontal axis represents the percentile ranking of the score against the target structure for the sequence corresponding to the target structure. Low percentiles (high scores) indicate relatively more success in matching a sequence to its target structure. For example, the sequence corresponding to the target cold-shock protein (Protein Data Bank 1mjc) received the 157th highest score of 3858 sequences against the cold-shock protein thermodynamic profile. This result placed the sequence for the cold-shock protein in the fifth percentile bin in Fig. 4 ▶. When aligned with their respective thermodynamic profiles, the majority (44/81) of sequences scored better than 99% of the 3858 sequences in the database.
Fig. 5.
Fig. 5.
Fold-recognition results for 12 all-β protein targets using a scoring matrix composed of thermodynamic information from 31 all-α protein structures. The horizontal axis represents the percentile ranking of the score against the target structure for the sequence corresponding to the target structure. Low percentiles (high scores) indicate relatively more success in matching a sequence to its target structure. For example, the sequence corresponding to the all-β target tendamistat (Protein Data Bank 1hoe) received the 26th highest score of 3858 sequences against the tendamistat thermodynamic profile. This result placed the tendamistat sequence in the fifth percentile bin in Fig. 5 ▶. All 12 sequences corresponding to β-targets scored better against their respective targets than 90% of the 3858 sequences in the database.

Similar articles

Cited by

References

    1. Anfinsen, C.B. 1973. Principles that govern the folding of protein chains. Science 181 223–230. - PubMed
    1. Baldwin, R.L. 1986. Temperature dependence of the hydrophobic interaction in protein folding. Proc. Natl. Acad. Sci. 83 8069–8072. - PMC - PubMed
    1. Baldwin, R.L. and Rose, G.D. 1999. Is protein folding hierarchic? I: Local structure and peptide folding. Trends Biochem. Sci. 24 26–33. - PubMed
    1. Berman, H.M., Westbrook, J., Feng, Z., Gilliand, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E. 2000. The Protein Data Bank. Nucleic Acids Res. 28 235–242. - PMC - PubMed
    1. Bonneau, R., Tsai, J., Ruczinski, I., and Baker, D. 2001. Functional Inferences from blind ab initio protein structure predictions. J. Struct. Biol. 134 186–190. - PubMed

Publication types

LinkOut - more resources