. 2006 Sep 22;2(9):e131.

doi: 10.1371/journal.pcbi.0020131. Epub 2006 Aug 21.

Sampling realistic protein conformations using local structural bias

Thomas Hamelryck¹, John T Kent, Anders Krogh

Affiliations

PMID: 17002495
PMCID: PMC1570370
DOI: 10.1371/journal.pcbi.0020131

Sampling realistic protein conformations using local structural bias

Thomas Hamelryck et al. PLoS Comput Biol. 2006.

. 2006 Sep 22;2(9):e131.

doi: 10.1371/journal.pcbi.0020131. Epub 2006 Aug 21.

Authors

Thomas Hamelryck¹, John T Kent, Anders Krogh

Affiliation

¹ Bioinformatics Center, Institute of Molecular Biology and Physiology, University of Copenhagen, Copenhagen, Denmark. thamelry@binf.ku.dk

PMID: 17002495
PMCID: PMC1570370
DOI: 10.1371/journal.pcbi.0020131

Abstract

The prediction of protein structure from sequence remains a major unsolved problem in biology. The most successful protein structure prediction methods make use of a divide-and-conquer strategy to attack the problem: a conformational sampling method generates plausible candidate structures, which are subsequently accepted or rejected using an energy function. Conceptually, this often corresponds to separating local structural bias from the long-range interactions that stabilize the compact, native state. However, sampling protein conformations that are compatible with the local structural bias encoded in a given protein sequence is a long-standing open problem, especially in continuous space. We describe an elegant and mathematically rigorous method to do this, and show that it readily generates native-like protein conformations simply by enforcing compactness. Our results have far-reaching implications for protein structure prediction, determination, simulation, and design.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

**Figure 1. Schematic Representation of a Protein's Cα Backbone**
The Cα positions are numbered, and the pseudo bond angles θ and pseudo dihedral angles τ are indicated. The segment has length 5, and is thus fully described by two pseudo dihedral and three pseudo bond angles. The numbering scheme of the angles is chosen so that the angle pair (*θ_i*,*τ_i*), associated with position i, specifies the position of the Cα atom at position i + 1.

**Figure 2. Conditional Dependency Graph of FB5–HMM**
Squares represent discrete nodes, circles represent the FB5 node with unit vector output. The arrows indicate conditional dependencies. Three slices are shown, corresponding to three consecutive amino acid positions. A possible set of node values is shown in color (v1, v2, and v3 are unit vectors). The hidden node sequence (34,34,3) corresponds to two C-terminal positions of an α-helix, followed by a coil residue. A, amino acid node; F, FB5 node; H, hidden node; S, secondary structure node.

**Figure 3. Transitions Occurring between (θ,τ) Angle Pairs in Proteins according to FB5–HMM**
The graph shows some of the most important possible hidden node transitions in FB5–HMM. Each hidden node value is represented as a box, showing the associated mean direction as a pair of (θ,τ) angles. For clarity, only a subset of all transitions is shown: for each hidden node value, the incoming and the outgoing transition with the highest probability is shown as an arrow. If one of them is a self-transition, the second best incoming or outgoing transition is also shown. Hidden node values mainly associated with α-helices are shown in light red, with β-strands in light blue, and with coils in white.

**Figure 4. Three Point Sets Sampled from the FB5 Distribution on the Sphere**
The three sets consist of 1,000 unit vectors sampled from the FB5 distributions associated with hidden node values 3 (blue), 34 (red), and 44 (green), respectively These three node values are typical representatives of coil, α-helix, and β-strand geometry. The samples were plotted on the unit sphere, and the mean directions of the three FB5 distributions are indicated with arrows.

**Figure 5. Scatter Plot of the (θ,τ) Angles in a Sampled Dataset**
The dataset consisted of 500 sequences of length 100 generated using FB5–HMM. The ideal (θ,τ) values of some conformations are indicated: α: α-helix, β: β-strand, π: π-helix, L: left-handed α-helix, 3: 3₁₀-helix, 1 & 2: Poly-Proline helices types I and II. The open circles indicate the mean directions of the 75 FB5 distributions. Angle pairs generated by hidden node values 3, 34, and 44 are plotted in blue, red, and green, respectively. These three hidden node values are typical representatives of hidden node values that correspond to coil, α-helix, and β-strand geometry, respectively.

**Figure 6. Histograms of the (θ,τ) Angle Pairs**
Histograms are shown for the training set (upper) and the decoy set (lower). The bin size is 1° × 1°. The color scale refers to the number of counts per bin. Bins with a count below 4 are white.

**Figure 7. Histograms of Secondary Structure Element Length**
Histograms of the lengths of the secondary structure elements in the training set (white bars) and the decoy set (black bars).

**Figure 8. Best Compact Decoys Generated Using FB5–HMM**
The best compact decoys generated using sequence information (Table 1, S) are shown for 1ENH (top) and 2CRO (bottom). From left to right: crystal structure, FB5–HMM, S0 baseline, M0 baseline, MS0 baseline. The N-terminus is shown in blue. The figure was made with PyMol (DeLano Scientific, http://www.delanoscientific.com).

**Figure 9. Secondary Structure of the Target Proteins**
(First line) Secondary structure assignment derived from the crystal structure. (Second line) Predicted secondary structure assignment.

**Figure 10. Training FB5–HMM**
(Left) ICL plotted versus hidden node size. For each hidden node size, four models were trained. The ICL reaches a maximum for one of the models with a hidden node size of 75 (indicated with a solid dot). (Right) Evolution of the LogLik of the completed data during training. The LogLik is plotted against the number of EM iterations.

See this image and copyright information in PMC

Cited by

Molecular dynamics analysis of biomolecular systems including nucleic acids.
Kameda T, Awazu A, Togashi Y. Kameda T, et al. Biophys Physicobiol. 2022 Aug 23;19:e190027. doi: 10.2142/biophysico.bppb-v19.0027. eCollection 2022. Biophys Physicobiol. 2022. PMID: 36349319 Free PMC article.
Data-driven probabilistic definition of the low energy conformational states of protein residues.
Gavalda-Garcia J, Bickel D, Roca-Martinez J, Raimondi D, Orlando G, Vranken W. Gavalda-Garcia J, et al. NAR Genom Bioinform. 2024 Jul 9;6(3):lqae082. doi: 10.1093/nargab/lqae082. eCollection 2024 Sep. NAR Genom Bioinform. 2024. PMID: 38984065 Free PMC article.
A probabilistic fragment-based protein structure prediction algorithm.
Simoncini D, Berenger F, Shrestha R, Zhang KY. Simoncini D, et al. PLoS One. 2012;7(7):e38799. doi: 10.1371/journal.pone.0038799. Epub 2012 Jul 19. PLoS One. 2012. PMID: 22829868 Free PMC article.
Calculation of accurate small angle X-ray scattering curves from coarse-grained protein models.
Stovgaard K, Andreetta C, Ferkinghoff-Borg J, Hamelryck T. Stovgaard K, et al. BMC Bioinformatics. 2010 Aug 18;11:429. doi: 10.1186/1471-2105-11-429. BMC Bioinformatics. 2010. PMID: 20718956 Free PMC article.
Plant microRNAs: Biogenesis, Homeostasis, and Degradation.
Wang J, Mei J, Ren G. Wang J, et al. Front Plant Sci. 2019 Mar 27;10:360. doi: 10.3389/fpls.2019.00360. eCollection 2019. Front Plant Sci. 2019. PMID: 30972093 Free PMC article. Review.

See all "Cited by" articles

References

1. Anfinsen CB. Principles that govern the folding of protein chains. Science. 1973;181:223–230. - PubMed
1. Levinthal C. Mössbauer spectroscopy in biological systems. Springfield (Illinois): University of Illinois Press; 1969. pp. 22–24. “How to Fold Graciously” chapter. pp.
1. Srinivasan R, Rose GD. A physical basis for protein secondary structure. Proc Natl Acad Sci U S A. 1999;96:14258–14263. - PMC - PubMed
1. Street AG, Mayo SL. Intrinsic β-sheet propensities result from van der Waals interactions between side chains and the local backbone. Proc Natl Acad Sci U S A. 1999;96:9074–9076. - PMC - PubMed
1. Honig B. Protein folding: From the levinthal paradox to structure prediction. J Mol Biol. 1999;293:283–293. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Sampling realistic protein conformations using local structural bias

Affiliation

Sampling realistic protein conformations using local structural bias

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources