Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks

doi:10.1186/1472-6807-9-5

. 2009 Jan 30:9:5.

doi: 10.1186/1472-6807-9-5.

Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks

Ian Walsh¹, Davide Baù, Alberto J M Martin, Catherine Mooney, Alessandro Vullo, Gianluca Pollastri

Affiliations

PMID: 19183478
PMCID: PMC2654788
DOI: 10.1186/1472-6807-9-5

Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks

Ian Walsh et al. BMC Struct Biol. 2009.

. 2009 Jan 30:9:5.

doi: 10.1186/1472-6807-9-5.

Authors

Ian Walsh¹, Davide Baù, Alberto J M Martin, Catherine Mooney, Alessandro Vullo, Gianluca Pollastri

Affiliation

¹ School of Computer Science and Informatics, University College Dublin, Dublin, Ireland. ian.walsh@ucd.ie

PMID: 19183478
PMCID: PMC2654788
DOI: 10.1186/1472-6807-9-5

Abstract

Background: Prediction of protein structures from their sequences is still one of the open grand challenges of computational biology. Some approaches to protein structure prediction, especially ab initio ones, rely to some extent on the prediction of residue contact maps. Residue contact map predictions have been assessed at the CASP competition for several years now. Although it has been shown that exact contact maps generally yield correct three-dimensional structures, this is true only at a relatively low resolution (3-4 A from the native structure). Another known weakness of contact maps is that they are generally predicted ab initio, that is not exploiting information about potential homologues of known structure.

Results: We introduce a new class of distance restraints for protein structures: multi-class distance maps. We show that C alpha trace reconstructions based on 4-class native maps are significantly better than those from residue contact maps. We then build two predictors of 4-class maps based on recursive neural networks: one ab initio, or relying on the sequence and on evolutionary information; one template-based, or in which homology information to known structures is provided as a further input. We show that virtually any level of sequence similarity to structural templates (down to less than 10%) yields more accurate 4-class maps than the ab initio predictor. We show that template-based predictions by recursive neural networks are consistently better than the best template and than a number of combinations of the best available templates. We also extract binary residue contact maps at an 8 A threshold (as per CASP assessment) from the 4-class predictors and show that the template-based version is also more accurate than the best template and consistently better than the ab initio one, down to very low levels of sequence identity to structural templates. Furthermore, we test both ab-initio and template-based 8 A predictions on the CASP7 targets using a pre-CASP7 PDB, and find that both predictors are state-of-the-art, with the template-based one far outperforming the best CASP7 systems if templates with sequence identity to the query of 10% or better are available. Although this is not the main focus of this paper we also report on reconstructions of C alpha traces based on both ab initio and template-based 4-class map predictions, showing that the latter are generally more accurate even when homology is dubious.

Conclusion: Accurate predictions of multi-class maps may provide valuable constraints for improved ab initio and template-based prediction of protein structures, naturally incorporate multiple templates, and yield state-of-the-art binary maps. Predictions of protein structures and 8 A contact maps based on the multi-class distance map predictors described in this paper are freely available to academic users at the url http://distill.ucd.ie/.

PubMed Disclaimer

Figures

**Figure 1**
**8 Å prediction for sequence separation between 6 and 11**. On the x axis the sequence identity between the query and the best template. The bins' height is proportional to the average F1 for the contact class. Red bins represent ab initio predictions, while blue ones are template-based. Results for sequence separations between 6 and 11 residues, inclusive.

**Figure 2**
**8 Å prediction for sequence separation between 12 and 23**. On the x axis the sequence identity between the query and the best template. The bins' height is proportional to the average F1 for the contact class. Red bins represent ab initio predictions, while blue ones are template-based. Results for sequence separations between 12 and 23 residues, inclusive.

**Figure 3**
**8 Å prediction for sequence separation of 24 and greater**. On the x axis the sequence identity between the query and the best template. The bins' height is proportional to the average F1 for the contact class. Red bins represent ab initio predictions, while blue ones are template-based. Results for sequence separations of 24 and greater.

**Figure 4**
**4-class predictions vs. TM-score of the best template**. On the x axis is the TM-score between the query and the best template found by PSI-BLAST. The bins' height is proportional to the average Q₄of the map. Red bins represent ab initio predictions, while blue ones are template-based. Results for all sequence separations. The errors of individual bins are in the 0.7–2% range, with all differences greater than the sum of standard deviations of the template and ab initio bins except for the [0,0.1) interval.

**Figure 5**
**An example of ab initio and template-based 4-class map prediction**. Protein 1B9LA Multi class contact maps for ab initio (left) and template-based (right) predictions. The best template sequence identity is 22.7%. The top right of each map is the true map and the bottom left is predicted. In the predicted half red, blue, green and yellow correspond to class 0, 1, 2 and 3 respectively. The greyscale in the predicted half corresponds to falsely predicted classes. The three black lines correspond to |i - j| = 6, 12, 24.

**Figure 6**
**Reconstructions from 4-class contact maps**. Average RMSD vs. sequence length is shown for models derived from true 4-class maps (yellow bins), from 4-class maps predicted using information derived from homologues (M_TE) (green bins) and from 4-class maps predicted *ab initio* (red bins), together with the baseline (blue bins). Note that, since no templates are allowed that show a sequence identity greater than 95% to the query, the M_TEresults are based on a mixture of good, bad and no templates (see Figure 6 for a sample distribution of template quality). Standard deviations are approximately 1.3 Å for the 40–60 class, 1.1 Å for the 60–80 one and less than 1 Å for the other classes.

**Figure 7**
**Best and Average template distribution**. Distribution of best-hit (blue) and average (red) sequence similarity in the PSI-BLAST templates for the S3129 set. Hits above 95% sequence similarity excluded.

**Figure 8**
**Quality of 3D models from 4-class maps vs. TM-score of the best template**. On the x axis is the fraction of residues in the query which are within 5 Å of the template. The bins' height is proportional to the average fraction of residues in either the 3D model (red bins) or the best template (blue bins) that are within 5 Å of the native structure.

See this image and copyright information in PMC

Cited by

OMPcontact: An Outer Membrane Protein Inter-Barrel Residue Contact Prediction Method.
Zhang L, Wang H, Yan L, Su L, Xu D. Zhang L, et al. J Comput Biol. 2017 Mar;24(3):217-228. doi: 10.1089/cmb.2015.0236. Epub 2016 Aug 11. J Comput Biol. 2017. PMID: 27513917 Free PMC article.
Recent Applications of Deep Learning Methods on Evolution- and Contact-Based Protein Structure Prediction.
Suh D, Lee JW, Choi S, Lee Y. Suh D, et al. Int J Mol Sci. 2021 Jun 2;22(11):6032. doi: 10.3390/ijms22116032. Int J Mol Sci. 2021. PMID: 34199677 Free PMC article. Review.
Accurate Ab Initio and Template-Based Prediction of Short Intrinsically-Disordered Regions by Bidirectional Recurrent Neural Networks Trained on Large-Scale Datasets.
Volpato V, Alshomrani B, Pollastri G. Volpato V, et al. Int J Mol Sci. 2015 Aug 21;16(8):19868-85. doi: 10.3390/ijms160819868. Int J Mol Sci. 2015. PMID: 26307973 Free PMC article.
SCLpredT: Ab initio and homology-based prediction of subcellular localization by N-to-1 neural networks.
Adelfio A, Volpato V, Pollastri G. Adelfio A, et al. Springerplus. 2013 Oct 3;2:502. doi: 10.1186/2193-1801-2-502. eCollection 2013. Springerplus. 2013. PMID: 24133649 Free PMC article.
Toward an accurate prediction of inter-residue distances in proteins using 2D recursive neural networks.
Kukic P, Mirabello C, Tradigo G, Walsh I, Veltri P, Pollastri G. Kukic P, et al. BMC Bioinformatics. 2014 Jan 10;15:6. doi: 10.1186/1471-2105-15-6. BMC Bioinformatics. 2014. PMID: 24410833 Free PMC article.

See all "Cited by" articles

References

1. Chandonia J, Brenner S. The Impact of Structural Genomics: Expectations and Outcomes. Science. 2006;311:347. doi: 10.1126/science.1121018. - DOI - PubMed
1. Adams M, Joachimiak A, Kim GT, Montelione R, Norvell J. Meeting review: 2003 NIH protein structure initiative workshop in protein production and crystallization for structural and functional genomics. J Struct Funct Genomics. 2004;5:1–2. doi: 10.1023/B:JSFG.0000029244.65028.71. - DOI - PubMed
1. Moult J, Fidelis K, Rost B, Hubbard T, Tramontano A. Critical Assessment of Methods of Protein Structure Prediction (CASP) – Round 6. Proteins. 2005;7:3–7. doi: 10.1002/prot.20716. - DOI - PubMed
1. Bates P, Kelley L, MacCallum R, Sternberg M. Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Proteins. 2001;45:39–46. doi: 10.1002/prot.1168. - DOI - PubMed
1. Zhou H, Pandit S, Borreguero J, Chen H, Wroblewska L, Skolnick J. Analysis of TASSER-based CASP7 protein structure prediction results. Proteins. 2007;69:90–97. doi: 10.1002/prot.21649. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

[1] Chandonia J, Brenner S. The Impact of Structural Genomics: Expectations and Outcomes. Science. 2006;311:347. doi: 10.1126/science.1121018. - DOI - PubMed

[2] Chandonia J, Brenner S. The Impact of Structural Genomics: Expectations and Outcomes. Science. 2006;311:347. doi: 10.1126/science.1121018. - DOI - PubMed

[3] Adams M, Joachimiak A, Kim GT, Montelione R, Norvell J. Meeting review: 2003 NIH protein structure initiative workshop in protein production and crystallization for structural and functional genomics. J Struct Funct Genomics. 2004;5:1–2. doi: 10.1023/B:JSFG.0000029244.65028.71. - DOI - PubMed

[4] Adams M, Joachimiak A, Kim GT, Montelione R, Norvell J. Meeting review: 2003 NIH protein structure initiative workshop in protein production and crystallization for structural and functional genomics. J Struct Funct Genomics. 2004;5:1–2. doi: 10.1023/B:JSFG.0000029244.65028.71. - DOI - PubMed

[5] Moult J, Fidelis K, Rost B, Hubbard T, Tramontano A. Critical Assessment of Methods of Protein Structure Prediction (CASP) – Round 6. Proteins. 2005;7:3–7. doi: 10.1002/prot.20716. - DOI - PubMed

[6] Moult J, Fidelis K, Rost B, Hubbard T, Tramontano A. Critical Assessment of Methods of Protein Structure Prediction (CASP) – Round 6. Proteins. 2005;7:3–7. doi: 10.1002/prot.20716. - DOI - PubMed

[7] Bates P, Kelley L, MacCallum R, Sternberg M. Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Proteins. 2001;45:39–46. doi: 10.1002/prot.1168. - DOI - PubMed

[8] Bates P, Kelley L, MacCallum R, Sternberg M. Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Proteins. 2001;45:39–46. doi: 10.1002/prot.1168. - DOI - PubMed

[9] Zhou H, Pandit S, Borreguero J, Chen H, Wroblewska L, Skolnick J. Analysis of TASSER-based CASP7 protein structure prediction results. Proteins. 2007;69:90–97. doi: 10.1002/prot.21649. - DOI - PubMed

[10] Zhou H, Pandit S, Borreguero J, Chen H, Wroblewska L, Skolnick J. Analysis of TASSER-based CASP7 protein structure prediction results. Proteins. 2007;69:90–97. doi: 10.1002/prot.21649. - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks

Affiliation

Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources