Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information
- PMID: 17570843
- PMCID: PMC1913928
- DOI: 10.1186/1471-2105-8-201
Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information
Abstract
Background: Structural properties of proteins such as secondary structure and solvent accessibility contribute to three-dimensional structure prediction, not only in the ab initio case but also when homology information to known structures is available. Structural properties are also routinely used in protein analysis even when homology is available, largely because homology modelling is lower throughput than, say, secondary structure prediction. Nonetheless, predictors of secondary structure and solvent accessibility are virtually always ab initio.
Results: Here we develop high-throughput machine learning systems for the prediction of protein secondary structure and solvent accessibility that exploit homology to proteins of known structure, where available, in the form of simple structural frequency profiles extracted from sets of PDB templates. We compare these systems to their state-of-the-art ab initio counterparts, and with a number of baselines in which secondary structures and solvent accessibilities are extracted directly from the templates. We show that structural information from templates greatly improves secondary structure and solvent accessibility prediction quality, and that, on average, the systems significantly enrich the information contained in the templates. For sequence similarity exceeding 30%, secondary structure prediction quality is approximately 90%, close to its theoretical maximum, and 2-class solvent accessibility roughly 85%. Gains are robust with respect to template selection noise, and significant for marginal sequence similarity and for short alignments, supporting the claim that these improved predictions may prove beneficial beyond the case in which clear homology is available.
Conclusion: The predictive system are publicly available at the address http://distill.ucd.ie.
Figures







Similar articles
-
Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks.BMC Struct Biol. 2009 Jan 30;9:5. doi: 10.1186/1472-6807-9-5. BMC Struct Biol. 2009. PMID: 19183478 Free PMC article.
-
Beyond the Twilight Zone: automated prediction of structural properties of proteins by recursive neural networks and remote homology information.Proteins. 2009 Oct;77(1):181-90. doi: 10.1002/prot.22429. Proteins. 2009. PMID: 19422056
-
Combining sequence and structural profiles for protein solvent accessibility prediction.Comput Syst Bioinformatics Conf. 2008;7:195-202. Comput Syst Bioinformatics Conf. 2008. PMID: 19642280 Free PMC article.
-
Correlated substitution analysis and the prediction of amino acid structural contacts.Brief Bioinform. 2008 Jan;9(1):46-56. doi: 10.1093/bib/bbm052. Epub 2007 Nov 13. Brief Bioinform. 2008. PMID: 18000015 Review.
-
Bridging the protein sequence-structure gap by structure predictions.Annu Rev Biophys Biomol Struct. 1996;25:113-36. doi: 10.1146/annurev.bb.25.060196.000553. Annu Rev Biophys Biomol Struct. 1996. PMID: 8800466 Review.
Cited by
-
Template-based protein modeling: recent methodological advances.Curr Top Med Chem. 2010;10(1):84-94. doi: 10.2174/156802610790232314. Curr Top Med Chem. 2010. PMID: 19929829 Free PMC article. Review.
-
A generic method for assignment of reliability scores applied to solvent accessibility predictions.BMC Struct Biol. 2009 Jul 31;9:51. doi: 10.1186/1472-6807-9-51. BMC Struct Biol. 2009. PMID: 19646261 Free PMC article.
-
Cell cycle kinases predicted from conserved biophysical properties.Proteins. 2009 Feb 15;74(3):655-68. doi: 10.1002/prot.22181. Proteins. 2009. PMID: 18704950 Free PMC article.
-
SCLpredT: Ab initio and homology-based prediction of subcellular localization by N-to-1 neural networks.Springerplus. 2013 Oct 3;2:502. doi: 10.1186/2193-1801-2-502. eCollection 2013. Springerplus. 2013. PMID: 24133649 Free PMC article.
-
Sixty-five years of the long march in protein secondary structure prediction: the final stretch?Brief Bioinform. 2018 May 1;19(3):482-494. doi: 10.1093/bib/bbw129. Brief Bioinform. 2018. PMID: 28040746 Free PMC article.
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources