Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Feb 20:11:97.
doi: 10.1186/1471-2105-11-97.

Structural alphabets derived from attractors in conformational space

Affiliations

Structural alphabets derived from attractors in conformational space

Alessandro Pandini et al. BMC Bioinformatics. .

Abstract

Background: The hierarchical and partially redundant nature of protein structures justifies the definition of frequently occurring conformations of short fragments as 'states'. Collections of selected representatives for these states define Structural Alphabets, describing the most typical local conformations within protein structures. These alphabets form a bridge between the string-oriented methods of sequence analysis and the coordinate-oriented methods of protein structure analysis.

Results: A Structural Alphabet has been derived by clustering all four-residue fragments of a high-resolution subset of the protein data bank and extracting the high-density states as representative conformational states. Each fragment is uniquely defined by a set of three independent angles corresponding to its degrees of freedom, capturing in simple and intuitive terms the properties of the conformational space. The fragments of the Structural Alphabet are equivalent to the conformational attractors and therefore yield a most informative encoding of proteins. Proteins can be reconstructed within the experimental uncertainty in structure determination and ensembles of structures can be encoded with accuracy and robustness.

Conclusions: The density-based Structural Alphabet provides a novel tool to describe local conformations and it is specifically suitable for application in studies of protein dynamics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Fragment definition. Cα atoms are represented as spheres. The conformation is entirely described by two pseudo bond angles (ϕ1, ϕ2) and one pseudo torsion angle (θ).
Figure 2
Figure 2
Projection of the data into the conformational space of the internal angles (ϕ1, ϕ2, θ). Each dot (orange) corresponds to a fragment. The plot is split at the periodic boundary -180/180° of the θ angle, while the angle range of the θ1 and θ2 dimensions has been cropped to the populated region. Fragments of the M32K25 alphabet (see text) are shown as labelled circles and four selected fragments (Y blue, U red, P cyan, A yellow) are rendered as ball-and-stick models on the right panel. The left models illustrate the [ϕ1, ϕ2] angles and the right models the θ angle. The relation between the two views is a 90° rotation around a vertical axis in the paper plane and an adjustment to align the two central atoms to a Newman projection. Atom '1' is positioned left (left models) and front (right models). The plot was produced with the R package scatterplot3d [63] and the side panel with PyMol [64].
Figure 3
Figure 3
Reachability Plot and clustering scheme for the alphabet M32K25. The bottom panel shows the Reachability Distance (neighbour distance) of all fragments in the order of the nearest-neighbour walk. Short distances correspond in general to high cluster density. The Reachability Distance scale is cropped to 0-14° to preserve details. The order of cluster extraction is illustrated in the top scheme, where each circle represents a cluster, its size inversely proportional to its Core Distance. The labels in the middle panel are those of the resulting Structural Alphabet; lines indicate the location of the cluster representative in the Reachability Plot. For each cluster (dent region in the plot) the corresponding representative was selected by lowest Core Distance (top scheme).
Figure 4
Figure 4
Comparison of fragment location for Structural Alphabets of four Cα atoms. The fragment representatives for M32K25, MSM2000 [8] and CGT2004 [11] are plotted in conformational space [ϕ1, ϕ2, θ].
Figure 5
Figure 5
An example of a typical local/global fit reconstruction. Alignment of the template SCOP domain 1fm0d_(black) and the reconstructed structure (orange) for both, local (left) and global fit (right). Fit cRMSD values are 0.19 A (local) and 0.70 Å (global). The image was generated with PyMol [64].
Figure 6
Figure 6
Median global fit cRMSD against alphabet size (k). MK denotes the Structural Alphabets derived in this study. The test set comprises 798 high resolution protein structures. Symbols denote the alphabet type: (filled circle) the series of MxKy alphabets, (filled triangle) M32K25 alphabet, (empty circle) CGT2004 alphabet, (empty diamond) MSM2000 alphabet and (filled square) the alphabet resulting from the GA optimisation of all fragments contained in the MxKy series.
Figure 7
Figure 7
Akaike's Information Content (AIC) against alphabet size (k). Alphabets and test set are identical to those in Figure 6. Symbols denote the alphabet type: (filled circle) the series of MxKy alphabets, (filled triangle) M32K25 alphabet, (empty circle) CGT2004 alphabet, (empty diamond) MSM2000 alphabet and (filled square) the alphabet resulting from the GA optimisation of all fragments contained in the MxKy series.
Figure 8
Figure 8
Barplot of overall secondary structure contribution per letter of the Structural Alphabet M32K25. The secondary structure attribution is based on the annotation from STRIDE on the second Cα atom of each fragment.
Figure 9
Figure 9
Barplot of the Pearson correlation coefficients between RMSF and local-fit Shannon Entropy profiles. The 24 proteins are ordered according to the SCOP class and, within a given class, to decreasing fraction of structured DSSP [55] elements. M32K25 values are reported in black, CGT2004 in light blue and MSM2000 as empty bars.
Figure 10
Figure 10
Barplot of the Pearson correlation coefficients between RMSF and global-fit Shannon Entropy profiles. See Figure 9 caption.
Figure 11
Figure 11
Scatterplot of the average protein RMSF against the average Shannon entropy. Average RMSF values (in Å) calculated over all the residues of each of the 24 proteins are reported against the average Shannon Entropy (in bits) for the M32K25 (upper panel), the CGT2004 (middle panel) and the MSM2000 (lower panel) alphabets. Left and right scatterplots contain Shannon Entropies from the local and the global fit reconstructions, respectively. Empty squares are used for α-class proteins, empty circles for β, filled diamonds for α/β and filled triangles for α+β.

Similar articles

Cited by

References

    1. Corey RB, Pauling L. Fundamental dimensions of polypeptide chains. Proceedings Royal Society London, B, Biological Sciences. 1953;141(902):10–20. doi: 10.1098/rspb.1953.0011. - DOI - PubMed
    1. Jones TA, Thirup S. Using known substructures in protein model building and crystallography. EMBO Journal. 1986;5(4):819–22. - PMC - PubMed
    1. Ramachandran GN, Ramakrishnan C, Sasisekharan V. Stereochemistry of polypeptide chain configurations. Journal of Molecular Biology. 1963;7:95–9. doi: 10.1016/S0022-2836(63)80023-6. - DOI - PubMed
    1. Walther D, Cohen FE. Conformational attractors on the Ramachandran map. Acta Crystallographica D Biological Crystallography. 1999;55(Pt 2):506–17. doi: 10.1107/S0907444998013353. - DOI - PubMed
    1. Rooman MJ, Rodriguez J, Wodak SJ. Automatic definition of recurrent local structure motifs in proteins. Journal of Molecular Biology. 1990;213(2):327–36. doi: 10.1016/S0022-2836(05)80194-9. - DOI - PubMed

Publication types

LinkOut - more resources