Accurate prediction of solvent accessibility using neural networks-based regression
- PMID: 15281128
- DOI: 10.1002/prot.20176
Accurate prediction of solvent accessibility using neural networks-based regression
Abstract
Accurate prediction of relative solvent accessibilities (RSAs) of amino acid residues in proteins may be used to facilitate protein structure prediction and functional annotation. Toward that goal we developed a novel method for improved prediction of RSAs. Contrary to other machine learning-based methods from the literature, we do not impose a classification problem with arbitrary boundaries between the classes. Instead, we seek a continuous approximation of the real-value RSA using nonlinear regression, with several feed forward and recurrent neural networks, which are then combined into a consensus predictor. A set of 860 protein structures derived from the PFAM database was used for training, whereas validation of the results was carefully performed on several nonredundant control sets comprising a total of 603 structures derived from new Protein Data Bank structures and had no homology to proteins included in the training. Two classes of alternative predictors were developed for comparison with the regression-based approach: one based on the standard classification approach and the other based on a semicontinuous approximation with the so-called thermometer encoding. Furthermore, a weighted approximation, with errors being scaled by the observed levels of variability in RSA for equivalent residues in families of homologous structures, was applied in order to improve the results. The effects of including evolutionary profiles and the growth of sequence databases were assessed. In accord with the observed levels of variability in RSA for different ranges of RSA values, the regression accuracy is higher for buried than for exposed residues, with overall 15.3-15.8% mean absolute errors and correlation coefficients between the predicted and experimental values of 0.64-0.67 on different control sets. The new method outperforms classification-based algorithms when the real value predictions are projected onto two-class classification problems with several commonly used thresholds to separate exposed and buried residues. For example, classification accuracy of about 77% is consistently achieved on all control sets with a threshold of 25% RSA. A web server that enables RSA prediction using the new method and provides customizable graphical representation of the results is available at http://sable.cchmc.org.
Copyright 2004 Wiley-Liss, Inc.
Similar articles
-
Combining prediction of secondary structure and solvent accessibility in proteins.Proteins. 2005 May 15;59(3):467-75. doi: 10.1002/prot.20441. Proteins. 2005. PMID: 15768403
-
Prediction-based fingerprints of protein-protein interactions.Proteins. 2007 Feb 15;66(3):630-45. doi: 10.1002/prot.21248. Proteins. 2007. PMID: 17152079
-
Prediction of protein relative solvent accessibility with a two-stage SVM approach.Proteins. 2005 Apr 1;59(1):30-7. doi: 10.1002/prot.20404. Proteins. 2005. PMID: 15696542
-
Supervised learning with decision tree-based methods in computational and systems biology.Mol Biosyst. 2009 Dec;5(12):1593-605. doi: 10.1039/b907946g. Epub 2009 Oct 5. Mol Biosyst. 2009. PMID: 20023720 Review.
-
A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues.Brief Bioinform. 2016 Jan;17(1):88-105. doi: 10.1093/bib/bbv023. Epub 2015 May 1. Brief Bioinform. 2016. PMID: 25935161 Review.
Cited by
-
PolyDoms: a whole genome database for the identification of non-synonymous coding SNPs with the potential to impact disease.Nucleic Acids Res. 2007 Jan;35(Database issue):D700-6. doi: 10.1093/nar/gkl826. Epub 2006 Nov 16. Nucleic Acids Res. 2007. PMID: 17142238 Free PMC article.
-
Structural model for the multisubunit Type IC restriction-modification DNA methyltransferase M.EcoR124I in complex with DNA.Nucleic Acids Res. 2006 Apr 13;34(7):1992-2005. doi: 10.1093/nar/gkl132. Print 2006. Nucleic Acids Res. 2006. PMID: 16614449 Free PMC article.
-
Challenges in describing the conformation and dynamics of proteins with ambiguous behavior.Front Mol Biosci. 2022 Aug 3;9:959956. doi: 10.3389/fmolb.2022.959956. eCollection 2022. Front Mol Biosci. 2022. PMID: 35992270 Free PMC article.
-
The Rare IL22RA2 Signal Peptide Coding Variant rs28385692 Decreases Secretion of IL-22BP Isoform-1, -2 and -3 and Is Associated with Risk for Multiple Sclerosis.Cells. 2020 Jan 10;9(1):175. doi: 10.3390/cells9010175. Cells. 2020. PMID: 31936765 Free PMC article.
-
The Nature and Arrangement of Pentatricopeptide Domains and the Linker Sequences Between Them.Bioinform Biol Insights. 2020 Mar 4;14:1177932220906434. doi: 10.1177/1177932220906434. eCollection 2020. Bioinform Biol Insights. 2020. PMID: 32180683 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources