Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Aug;83(8):1436-49.
doi: 10.1002/prot.24829. Epub 2015 Jun 6.

CONFOLD: Residue-residue contact-guided ab initio protein folding

Affiliations

CONFOLD: Residue-residue contact-guided ab initio protein folding

Badri Adhikari et al. Proteins. 2015 Aug.

Abstract

Predicted protein residue-residue contacts can be used to build three-dimensional models and consequently to predict protein folds from scratch. A considerable amount of effort is currently being spent to improve contact prediction accuracy, whereas few methods are available to construct protein tertiary structures from predicted contacts. Here, we present an ab initio protein folding method to build three-dimensional models using predicted contacts and secondary structures. Our method first translates contacts and secondary structures into distance, dihedral angle, and hydrogen bond restraints according to a set of new conversion rules, and then provides these restraints as input for a distance geometry algorithm to build tertiary structure models. The initially reconstructed models are used to regenerate a set of physically realistic contact restraints and detect secondary structure patterns, which are then used to reconstruct final structural models. This unique two-stage modeling approach of integrating contacts and secondary structures improves the quality and accuracy of structural models and in particular generates better β-sheets than other algorithms. We validate our method on two standard benchmark datasets using true contacts and secondary structures. Our method improves TM-score of reconstructed protein models by 45% and 42% over the existing method on the two datasets, respectively. On the dataset for benchmarking reconstructions methods with predicted contacts and secondary structures, the average TM-score of best models reconstructed by our method is 0.59, 5.5% higher than the existing method. The CONFOLD web server is available at http://protein.rnet.missouri.edu/confold/.

Keywords: ab initio protein folding; contact assisted protein structure prediction; optimization; protein residue-residue contacts; protein structure modeling.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The CONFOLD method for building models with contacts and secondary structures in two stages. When true contacts are the input, all contacts are used to reconstruct models. For predicted contacts, top-xL contacts are used, where x ranges from 0.4 to 2.2 at a step of 0.2.
Figure 2
Figure 2
Ten alternate hydrogen-bonding patterns for antiparallel (left) and parallel (right) pairing for a pair of strands, each six residues long. First strand is from residues 3 to 8, and second strand is from residues 12 to 17 for antiparallel pairs and 23 to 28 for parallel pairs. The ideal hydrogen bonding pattern (A), alternate hydrogen bonding pattern (B), top strand right shifted by one residue (C), alternate pattern for C (D), top strand right shifted by 2 residues (E), alternate pattern for E (F), top strand left shifted by 1 residue (G), alternate pattern for G (H), top strand left shifted by 2 residues (I), and alternate pattern for I (J). In case of parallel pairing (right), although DSSP uses one more hydrogen bond to consider the strands to be in pair, we take a less strict approach and ignore the hydrogen bonding because we observed that this approach worked better when building models using predicted contacts. Black residue connecting lines show hydrogen bonding and double arrowed lines represent double hydrogen bonding.
Figure 3
Figure 3
Top models reconstructed for the proteins 2QOM and 1YPI using true secondary structure information along with beta-pairing information but without using any residue contact information. Secondary structure restraints are computed using λ = 0.5. Superposition of crystal structure (green) and reconstructed top model (orange) of the beta-alpha-beta barrel protein 1YPI (A) and antiparallel beta barrel protein 2QOM (B).
Figure 4
Figure 4
Best models reconstructed for the protein 5p21 using Modeller (A), Reconstruct (B), customized CNS DGSA protocol (C), and CONFOLD (D). All models are superimposed with native structure (green). The TM-scores of Models A, B, C, and D are 0.53, 0.86, 0.88, and 0.94, respectively. Model D reconstructed by CONFOLD has higher TM-score and also much better secondary structure quality than the other models.
Figure 5
Figure 5
Distribution of TM-scores of the best models reconstructed by the four methods for 150 FRAGFOLD proteins.
Figure 6
Figure 6
Best predicted models for the proteins RNH_ECOLI (A) and SPTB2_HUMAN (B) using EVFOLD (purple) and CONFOLD (orange) superimposed with native structures (green). The TM-scores of these models are reported in Table 4. CONFOLD models have higher TM-score and better secondary structure quality than EVAFOLD.
Figure 7
Figure 7
Distribution of model quality of the EVFOLD models and the models built by CONFOLD. Distribution of models built in first stage of CONFOLD (stage1), second stage with contact filtering only (rr filter), and second stage with β-sheet detection only (sheet detect) are also presented. Each curve represents the distribution of 400 times 15 models. Since some models in the EVFOLD model pool have RMSD greater than 20 Å, all models with RMSD greater than 20 Å from all four model pools were filtered out.
Figure 8
Figure 8
Improvement in the accuracy of best models (left) and all 400 models (right) in the second stage of CONFOLD over the first stage for 150 proteins in FRAGFOLD dataset.
Figure 9
Figure 9
Contact filtering from stage 1 to stage 2 for the protein 1NRV. (A) Superimposition of the best model in stage 1 reconstructed with top-0.6L contacts by CONFOLD (orange) with the native structure (green). The model has TM-score of 0.50. Among the top-0.6L (60) contacts, 5 out of 8 erroneous contacts that were removed in stage 2 are visualized in the native structure along with the distance between their Cβ-Cβ atoms. The filtered, predicted contacts (20-59, 53-73, 30-36, 49-56, and 88-93) have Cβ-Cβ distances of 23, 23, 20, 12, and 9 Å respectively, in the native structure. Each pair of residues predicted to be in contact is denoted by the same color. (B) Superimposition of the best model in stage 2 reconstructed with reduced/filtered top-0.6L contacts by CONFOLD (orange) with the native structure (green). TM-score of the model is 0.61.
Figure 10
Figure 10
Number of best models and the number of contacts used to build the best models for 150 proteins in FRAGFOLD dataset.

Similar articles

Cited by

References

    1. Monastyrskyy B, Fidelis K, Tramontano A, Kryshtafovych A. Evaluation of residue–residue contact predictions in CASP9. Proteins: Structure, Function, and Bioinformatics. 2011;79(S10):119–125. - PMC - PubMed
    1. Monastyrskyy B, D'Andrea D, Fidelis K, Tramontano A, Kryshtafovych A. Evaluation of residue–residue contact prediction in CASP10. Proteins: Structure, Function, and Bioinformatics. 2014;82(S2):138–153. - PMC - PubMed
    1. Cheng J, Baldi P. Improved residue contact prediction using support vector machines and a large feature set. BMC bioinformatics. 2007;8(1):113. - PMC - PubMed
    1. Eickholt J, Cheng J. Predicting protein residue–residue contacts using deep networks and boosting. Bioinformatics. 2012;28(23):3066–3072. - PMC - PubMed
    1. Fariselli P, Olmea O, Valencia A, Casadio R. Prediction of contact maps with neural networks and correlated mutations. Protein engineering. 2001;14(11):835–843. - PubMed

Publication types