Identification of RNA-binding sites in proteins by integrating various sequence information
- PMID: 20549269
- DOI: 10.1007/s00726-010-0639-7
Identification of RNA-binding sites in proteins by integrating various sequence information
Abstract
RNA-protein interactions play a pivotal role in various biological processes, such as mRNA processing, protein synthesis, assembly, and function of ribosome. In this work, we have introduced a computational method for predicting RNA-binding sites in proteins based on support vector machines by using a variety of features from amino acid sequence information including position-specific scoring matrix (PSSM) profiles, physicochemical properties and predicted solvent accessibility. Considering the influence of the surrounding residues of an amino acid and the dependency effect from the neighboring amino acids, a sliding window and a smoothing window are used to encode the PSSM profiles. The outer fivefold cross-validation method is evaluated on the data set of 77 RNA-binding proteins (RBP77). It achieves an overall accuracy of 88.66% with the Matthew's correlation coefficient (MCC) of 0.69. Furthermore, an independent data set of 39 RNA-binding proteins (RBP39) is employed to further evaluate the performance and achieves an overall accuracy of 82.36% with the MCC of 0.44. The result shows that our method has good generalization abilities in predicting RNA-binding sites for novel proteins. Compared with other previous methods, our method performs well on the same data set. The prediction results suggest that the used features are effective in predicting RNA-binding sites in proteins. The code and all data sets used in this article are freely available at http://cic.scu.edu.cn/bioinformatics/Predict_RBP.rar .
Similar articles
-
Prediction of protein-RNA binding sites by a random forest method with combined features.Bioinformatics. 2010 Jul 1;26(13):1616-22. doi: 10.1093/bioinformatics/btq253. Epub 2010 May 18. Bioinformatics. 2010. PMID: 20483814
-
Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature.Proteins. 2011 Apr;79(4):1230-9. doi: 10.1002/prot.22958. Epub 2011 Jan 25. Proteins. 2011. PMID: 21268114
-
PRINTR: prediction of RNA binding sites in proteins using SVM and profiles.Amino Acids. 2008 Aug;35(2):295-302. doi: 10.1007/s00726-007-0634-9. Epub 2008 Jan 31. Amino Acids. 2008. PMID: 18235992
-
Computational methods for prediction of protein-RNA interactions.J Struct Biol. 2012 Sep;179(3):261-8. doi: 10.1016/j.jsb.2011.10.001. Epub 2011 Oct 12. J Struct Biol. 2012. PMID: 22019768 Review.
-
Computational modeling of protein-RNA complex structures.Methods. 2014 Feb;65(3):310-9. doi: 10.1016/j.ymeth.2013.09.014. Epub 2013 Sep 29. Methods. 2014. PMID: 24083976 Review.
Cited by
-
Prediction of redox-sensitive cysteines using sequential distance and other sequence-based features.BMC Bioinformatics. 2016 Aug 24;17(1):316. doi: 10.1186/s12859-016-1185-4. BMC Bioinformatics. 2016. PMID: 27553667 Free PMC article.
-
RPI-Bind: a structure-based method for accurate identification of RNA-protein binding sites.Sci Rep. 2017 Apr 4;7(1):614. doi: 10.1038/s41598-017-00795-4. Sci Rep. 2017. PMID: 28377624 Free PMC article.
-
Comprehensive Survey and Comparative Assessment of RNA-Binding Residue Predictions with Analysis by RNA Type.Int J Mol Sci. 2020 Sep 19;21(18):6879. doi: 10.3390/ijms21186879. Int J Mol Sci. 2020. PMID: 32961749 Free PMC article. Review.
-
Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences.Brief Bioinform. 2024 Nov 22;26(1):bbaf016. doi: 10.1093/bib/bbaf016. Brief Bioinform. 2024. PMID: 39833102 Free PMC article. Review.
-
A Large-Scale Assessment of Nucleic Acids Binding Site Prediction Programs.PLoS Comput Biol. 2015 Dec 17;11(12):e1004639. doi: 10.1371/journal.pcbi.1004639. eCollection 2015 Dec. PLoS Comput Biol. 2015. PMID: 26681179 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous