Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Sep 1;30(17):i482-8.
doi: 10.1093/bioinformatics/btu458.

PconsFold: improved contact predictions improve protein models

Affiliations

PconsFold: improved contact predictions improve protein models

Mirco Michel et al. Bioinformatics. .

Abstract

Motivation: Recently it has been shown that the quality of protein contact prediction from evolutionary information can be improved significantly if direct and indirect information is separated. Given sufficiently large protein families, the contact predictions contain sufficient information to predict the structure of many protein families. However, since the first studies contact prediction methods have improved. Here, we ask how much the final models are improved if improved contact predictions are used.

Results: In a small benchmark of 15 proteins, we show that the TM-scores of top-ranked models are improved by on average 33% using PconsFold compared with the original version of EVfold. In a larger benchmark, we find that the quality is improved with 15-30% when using PconsC in comparison with earlier contact prediction methods. Further, using Rosetta instead of CNS does not significantly improve global model accuracy, but the chemistry of models generated with Rosetta is improved.

Availability: PconsFold is a fully automated pipeline for ab initio protein structure prediction based on evolutionary information. PconsFold is based on PconsC contact prediction and uses the Rosetta folding protocol. Due to its modularity, the contact prediction tool can be easily exchanged. The source code of PconsFold is available on GitHub at https://www.github.com/ElofssonLab/pcons-fold under the MIT license. PconsC is available from http://c.pcons.net/.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
PconsFold pipeline. Based on a given protein sequence, amino acid contacts are predicted with PconsC. These contacts then facilitate protein folding with Rosetta. In the end, PconsFold outputs a structural model for the given sequence
Fig. 2.
Fig. 2.
Model quality in TM-score for adjustments of two different Rosetta parameters. (a) Performance distributions for two different sample sizes of 20 000 (left) and 2000 (right) decoy structures. The black boxes indicate upper and lower quartile with white dots at the median of the distributions. For each protein in the full PSICOV dataset the top-ranked model was selected from the decoys by its Rosetta score and compared with the native structure. (b) Effects of adjustments to the well-depth parameter of the FADE function. A low absolute well-depth (left side) puts low weight on predicted constraints. Constraints are stronger weighted by higher absolute values of well-depth (right side). A subset of 14 proteins of the PSICOV dataset was used here
Fig. 3.
Fig. 3.
Folding performance on the full PSICOV dataset. (a) The number of contacts used in structure prediction is plotted against average TM-score for three different methods: PconsFold (green circles), Rosetta/plmDCA (blue triangles) and Rosetta/PSICOV (black squares). For each protein, the number of top-ranked contacts was selected relative to its sequence length. A value of 1.0 on the x-axis represents one contact per residue on average. Error bars indicate standard errors. (b) TM-scores are compared with the PPV of underlying contact maps for PconsFold (using PconsC). The colours represent all four CATH fold classes. Lines are fitted to the data to illustrate performance differences between the fold classes
Fig. 4.
Fig. 4.
Analysis of contact maps in native structures and top-ranked models. PPVs were calculated for the sets of contacts that were used during folding (1.0 · l top-ranked contacts) with a Cβ distance cutoff of 8 Å in the structures. (a) PPV values for PconsC contacts on native structures (x-axis) against PPVs on the top-ranked models from PconsFold (y-axis). The colours represent TM-scores of models against native structures. (b) Native structure of 1JWQ. Lines represent all predicted contacts. The colour scheme indicates spatial distances of residue pairs in the structure. The PPV is 0.83. (c) Predicted contacts in the top-ranked model for 1JWQ with the same color scheme. This model has a TM-score of 0.62 and a PPV of 0.46
Fig. 5.
Fig. 5.
TM-score comparison for top-ranked models of the proteins in the PSICOV dataset. The decoys for each method were re-ranked using Pcons to assess the performance of the structure prediction process independent of the model ranking scheme. The colours represent all four CATH fold classes. (a) PconsFold compared with EVfold-PLM. (b) Rosetta/plmDCA compared with EVfold-PLM

References

    1. Berman HM, et al. The protein data bank. Nucleac Acids Res. 2000;28:235–242. - PMC - PubMed
    1. Bradley P, Baker D. Improved beta-protein structure prediction by multilevel optimization of nonlocal strand pairings and local backbone conformation. Proteins. 2006;65:922–929. - PubMed
    1. Brunger A. Version 1.2 of the crystallography and NMR system. Nat. Protoc. 2007;2:2728–2733. - PubMed
    1. Burger L, van Nimwegen E. Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comput. Biol. 2010;6:e1000633. - PMC - PubMed
    1. Chen VB, et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D. Biol. Crystallogr. 2010;66(Pt. 1):12–21. - PMC - PubMed

Publication types