Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Jul;13(7):1787-801.
doi: 10.1110/ps.04706204.

Analysis of the "thermodynamic information content" of a Homo sapiens structural database reveals hierarchical thermodynamic organization

Affiliations

Analysis of the "thermodynamic information content" of a Homo sapiens structural database reveals hierarchical thermodynamic organization

Scott A Larson et al. Protein Sci. 2004 Jul.

Abstract

Classification of the amounts and types of lower order structural elements in proteins is a prerequisite to effective comparisons between protein folds. In an effort to provide an additional vehicle for fold comparison, we present an alternative classification scheme whereby protein folds are represented in statistical thermodynamic terms in such a way as to illuminate the energetic building blocks within protein structures. The thermodynamic relationship is examined between amino acid sequences and the conformational ensembles for a database of 159 Homo sapiens protein structures ranging from 50 to 250 amino acids. Using hierarchical clustering, it is shown through fold-recognition experiments that (1) eight thermodynamic environmental descriptors sufficiently accounts for the energetic variation within the native state ensembles of the H. sapiens structural database, (2) an amino acid library of only six residue types is sufficient to encode >90% of the thermodynamic information required for fold specificity in the entire database, and (3) structural resolution of the statistically derived environments reveals sequential cooperative segments throughout the protein, which are independent of secondary structure. As the first level of thermodynamic organization in proteins, these segments represent the thermodynamic counterpart to secondary structure.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Position-specific thermodynamic environments in proteins. The COREX algorithm converts the high-resolution structure into an ensemble of states (top; see Materials and Methods). To calculate the position-specific thermodynamic descriptors, the ensemble of states is first divided into folded and nonfolded subensembles (middle left and middle right) with respect to a particular position j in the protein. Position j is colored blue in the folded subensemble and yellow in the nonfolded subensemble. The position specific observables ([ΔG]j) have been defined as the difference in energy between the subensembles in which position j is folded (〈ΔGf,j〉) and the subensemble of states in which position j is not folded (〈ΔGnf,j〉). Highlighting the statistical nature of the position-specific quantities, we note that each of the states in the different subensembles may have different position-specific energetics, indicating that the average value within a subensemble does not necessarily correspond to the energetics of a particular conformational state.
Figure 2.
Figure 2.
Residue-specific accessible surface area vs. position-specific thermodynamic descriptors. Each point of the scatter plot is a residue position in the human lysozyme protein (PDB: 1JSF). The ordinate is the accessible surface area (ASA) of the apolar atoms for each residue of the protein taken from the X-ray crystal structure. The static ASA values represent the residue-specific energetic contribution to the thermodynamics of the protein. The abscissa is the thermodynamic descriptor, ([ΔH]ap,j), for each residue position of the protein calculated by the COREX algorithm. These values are ensemble averaged reporters of the apolar enthalpy at each position in the protein. The correlation coefficient (R2) for the static residue-specific ASA vs. the ensemble-averaged position-specific thermodynamic descriptors is 0.0932, indicating no correlation. Correlation statistics for the entire database of proteins is summarized in Table 1.
Figure 3.
Figure 3.
Fold recognition success as a function of thermodynamic environment number. Fold recognition experiments (solid squares) using scoring matrices composed of the log-odds probability of the 20 amino acids for a series of thermodynamic environments. A successful fold recognition experiment is one in which the native amino acid sequence of the target protein scores higher than 99% of the sequences in the sequence library (i.e., one of the top four out of 431 scoring sequences). The dotted line indicates where fold recognition success saturates. The X-axis indicates the number of thermodynamic environments used to generate the scoring matrix for the associated fold recognition experiment. The large open square denotes the minimum number of thermodynamic environments necessary to capture 95% of the structure encoding information for the proteins used in this study (see text for details).
Figure 4.
Figure 4.
Normalized mean energetic properties of the eight requisite thermodynamic environments. Each thermodynamic environment has been statistically derived based on its component thermodynamic descriptors (see Materials and Methods). Plotted are the eight thermodynamic environments (clusters) listed in order of increasing stability. The two thermodynamic descriptors are the stability constant (closed circles) and enthalpy ratio (open circles). The Y-axis is the normalized mean value of the corresponding thermodynamic descriptors. Due to the relationship between enthalpy and surface area, lower enthalpy ratios denote higher apolar content environments.
Figure 5.
Figure 5.
Thermodynamic environment characterization for the GTP binding protein (PDB: 1KAO). (A) The primary sequence has been colored according to cooperative segments, where each color represents a different thermodynamic environment. The mean energetic properties of the thermodynamic environments comprising the segments are listed in Table 2. Above the sequence is a cartoon representation of the secondary structural units of the protein (gray). It is important to note that the sequential cooperative segments can bridge multiple structural elements, and structural elements can span multiple sequential cooperative segments. In essence, the segments identified here are the thermodynamic counterpart to secondary structure, as they represent the first level of thermodynamic organization in proteins. (B) The ensemble-based energetics have been mapped onto the high-resolution structure, providing a quantitative “single-molecule view” of a fluctuating ensemble.
Figure 6.
Figure 6.
Double hierarchical cluster analysis of amino acid log-odds probabilities for eight thermodynamic environments. The 20 amino acids make up the rows and the eight thermodynamic environments comprise the columns of the heat map. The heat map is a qualitative representation of the amino acid log-odds probabilities for the thermodynamic environments. Negative log-odds probabilities are green, log-odds probabilities near zero are black, and positive log-odds probabilities are red. The color intensity reflects the magnitude of the log-odds probabilities. The row dendrogram shows groupings of amino acids with similar log-odds probabilities for the thermodynamic environments. The gray scale above the amino acid dendrogram is the cluster scale; the values below the scale indicate the calculated dissimilarity measures, and the values above the scale correspond to the number of amino acid clusters at different positions in the dendrogram. The red dotted line is positioned at the level of six amino acid clusters. Each of the six amino acid cluster nodes is indicated by a red dot. The column dendrogram reveals similarities in the thermodynamic environments.
Figure 7.
Figure 7.
Fold recognition success as a function of amino acid cluster number. The solid squares represent fold recognition experiments using scoring matrices composed of the log-odds probability of a series of amino acid clusters for the eight thermodynamic environments. The open squares represent fold recognition experiments using scoring matrices composed of the log-odds probability of a series of amino acid clusters for two thermodynamic environments. A successful fold recognition experiment is one in which the actual amino acid sequence of the target protein scores higher than 99% of the sequences in the decoy library (i.e., one of the top four out of 431 scoring sequences). The dotted line indicates where fold recognition success saturates. The X-axis indicates the number of amino acid clusters used to generate the scoring matrix used in the associated fold recognition experiment. The large open square denotes the minimum number of amino acid groups necessary to encode the eight thermodynamic environments of the proteins in the H. sapiens database.
Figure 8.
Figure 8.
Position-specific thermodynamics of heat shock protein 90 (PDB: 1BYQ). Six phenylalanine residues are represented in space-fill and colored according to their thermodynamic environment. The accompanying table summarizes the ensemble-averaged thermodynamics at each position as well as the static properties of these six residues. The thermodynamic environments do not report on static structural properties of the system (see text for details).

Similar articles

Cited by

References

    1. Anfinsen, C.B. 1973. Principles that govern the folding of protein chains. Science 181 223–230. - PubMed
    1. Babu, C.R., Hilser, V.J., and Wand, A.J. 2004. Direct access to the cooperative substructure of proteins and the protein ensemble via cold denaturation. Nat. Struct. Mol. Biol. 11 352–357. - PubMed
    1. Bai, Y. and Englander, S.W. 1996. Future directions in folding: The multi-state nature of protein structure. Proteins 24 145–151. - PubMed
    1. Baldwin, R.L. 1986. Temperature dependence of the hydrophobic interaction in protein folding. Proc. Natl. Acad. Sci. 83 8069–8072. - PMC - PubMed
    1. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucleic Acids Res. 28 235–242. - PMC - PubMed

Publication types