Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Sep 22;2(9):e131.
doi: 10.1371/journal.pcbi.0020131. Epub 2006 Aug 21.

Sampling realistic protein conformations using local structural bias

Affiliations

Sampling realistic protein conformations using local structural bias

Thomas Hamelryck et al. PLoS Comput Biol. .

Abstract

The prediction of protein structure from sequence remains a major unsolved problem in biology. The most successful protein structure prediction methods make use of a divide-and-conquer strategy to attack the problem: a conformational sampling method generates plausible candidate structures, which are subsequently accepted or rejected using an energy function. Conceptually, this often corresponds to separating local structural bias from the long-range interactions that stabilize the compact, native state. However, sampling protein conformations that are compatible with the local structural bias encoded in a given protein sequence is a long-standing open problem, especially in continuous space. We describe an elegant and mathematically rigorous method to do this, and show that it readily generates native-like protein conformations simply by enforcing compactness. Our results have far-reaching implications for protein structure prediction, determination, simulation, and design.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Schematic Representation of a Protein's Cα Backbone
The Cα positions are numbered, and the pseudo bond angles θ and pseudo dihedral angles τ are indicated. The segment has length 5, and is thus fully described by two pseudo dihedral and three pseudo bond angles. The numbering scheme of the angles is chosen so that the angle pair (θi,τi), associated with position i, specifies the position of the Cα atom at position i + 1.
Figure 2
Figure 2. Conditional Dependency Graph of FB5–HMM
Squares represent discrete nodes, circles represent the FB5 node with unit vector output. The arrows indicate conditional dependencies. Three slices are shown, corresponding to three consecutive amino acid positions. A possible set of node values is shown in color (v1, v2, and v3 are unit vectors). The hidden node sequence (34,34,3) corresponds to two C-terminal positions of an α-helix, followed by a coil residue. A, amino acid node; F, FB5 node; H, hidden node; S, secondary structure node.
Figure 3
Figure 3. Transitions Occurring between (θ,τ) Angle Pairs in Proteins according to FB5–HMM
The graph shows some of the most important possible hidden node transitions in FB5–HMM. Each hidden node value is represented as a box, showing the associated mean direction as a pair of (θ,τ) angles. For clarity, only a subset of all transitions is shown: for each hidden node value, the incoming and the outgoing transition with the highest probability is shown as an arrow. If one of them is a self-transition, the second best incoming or outgoing transition is also shown. Hidden node values mainly associated with α-helices are shown in light red, with β-strands in light blue, and with coils in white.
Figure 4
Figure 4. Three Point Sets Sampled from the FB5 Distribution on the Sphere
The three sets consist of 1,000 unit vectors sampled from the FB5 distributions associated with hidden node values 3 (blue), 34 (red), and 44 (green), respectively These three node values are typical representatives of coil, α-helix, and β-strand geometry. The samples were plotted on the unit sphere, and the mean directions of the three FB5 distributions are indicated with arrows.
Figure 5
Figure 5. Scatter Plot of the (θ,τ) Angles in a Sampled Dataset
The dataset consisted of 500 sequences of length 100 generated using FB5–HMM. The ideal (θ,τ) values of some conformations are indicated: α: α-helix, β: β-strand, π: π-helix, L: left-handed α-helix, 3: 310-helix, 1 & 2: Poly-Proline helices types I and II. The open circles indicate the mean directions of the 75 FB5 distributions. Angle pairs generated by hidden node values 3, 34, and 44 are plotted in blue, red, and green, respectively. These three hidden node values are typical representatives of hidden node values that correspond to coil, α-helix, and β-strand geometry, respectively.
Figure 6
Figure 6. Histograms of the (θ,τ) Angle Pairs
Histograms are shown for the training set (upper) and the decoy set (lower). The bin size is 1° × 1°. The color scale refers to the number of counts per bin. Bins with a count below 4 are white.
Figure 7
Figure 7. Histograms of Secondary Structure Element Length
Histograms of the lengths of the secondary structure elements in the training set (white bars) and the decoy set (black bars).
Figure 8
Figure 8. Best Compact Decoys Generated Using FB5–HMM
The best compact decoys generated using sequence information (Table 1, S) are shown for 1ENH (top) and 2CRO (bottom). From left to right: crystal structure, FB5–HMM, S0 baseline, M0 baseline, MS0 baseline. The N-terminus is shown in blue. The figure was made with PyMol (DeLano Scientific, http://www.delanoscientific.com).
Figure 9
Figure 9. Secondary Structure of the Target Proteins
(First line) Secondary structure assignment derived from the crystal structure. (Second line) Predicted secondary structure assignment.
Figure 10
Figure 10. Training FB5–HMM
(Left) ICL plotted versus hidden node size. For each hidden node size, four models were trained. The ICL reaches a maximum for one of the models with a hidden node size of 75 (indicated with a solid dot). (Right) Evolution of the LogLik of the completed data during training. The LogLik is plotted against the number of EM iterations.

Similar articles

Cited by

References

    1. Anfinsen CB. Principles that govern the folding of protein chains. Science. 1973;181:223–230. - PubMed
    1. Levinthal C. Mössbauer spectroscopy in biological systems. Springfield (Illinois): University of Illinois Press; 1969. pp. 22–24. “How to Fold Graciously” chapter. pp.
    1. Srinivasan R, Rose GD. A physical basis for protein secondary structure. Proc Natl Acad Sci U S A. 1999;96:14258–14263. - PMC - PubMed
    1. Street AG, Mayo SL. Intrinsic β-sheet propensities result from van der Waals interactions between side chains and the local backbone. Proc Natl Acad Sci U S A. 1999;96:9074–9076. - PMC - PubMed
    1. Honig B. Protein folding: From the levinthal paradox to structure prediction. J Mol Biol. 1999;293:283–293. - PubMed

Publication types