Prediction of nuclear proteins using SVM and HMM models
- PMID: 19152693
- PMCID: PMC2632991
- DOI: 10.1186/1471-2105-10-22
Prediction of nuclear proteins using SVM and HMM models
Abstract
Background: The nucleus, a highly organized organelle, plays important role in cellular homeostasis. The nuclear proteins are crucial for chromosomal maintenance/segregation, gene expression, RNA processing/export, and many other processes. Several methods have been developed for predicting the nuclear proteins in the past. The aim of the present study is to develop a new method for predicting nuclear proteins with higher accuracy.
Results: All modules were trained and tested on a non-redundant dataset and evaluated using five-fold cross-validation technique. Firstly, Support Vector Machines (SVM) based modules have been developed using amino acid and dipeptide compositions and achieved a Mathews correlation coefficient (MCC) of 0.59 and 0.61 respectively. Secondly, we have developed SVM modules using split amino acid compositions (SAAC) and achieved the maximum MCC of 0.66. Thirdly, a hidden Markov model (HMM) based module/profile was developed for searching exclusively nuclear and non-nuclear domains in a protein. Finally, a hybrid module was developed by combining SVM module and HMM profile and achieved a MCC of 0.87 with an accuracy of 94.61%. This method performs better than the existing methods when evaluated on blind/independent datasets. Our method estimated 31.51%, 21.89%, 26.31%, 25.72% and 24.95% of the proteins as nuclear proteins in Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, mouse and human proteomes respectively. Based on the above modules, we have developed a web server NpPred for predicting nuclear proteins http://www.imtech.res.in/raghava/nppred/.
Conclusion: This study describes a highly accurate method for predicting nuclear proteins. SVM module has been developed for the first time using SAAC for predicting nuclear proteins, where amino acid composition of N-terminus and the remaining protein were computed separately. In addition, our study is a first documentation where exclusively nuclear and non-nuclear domains have been identified and used for predicting nuclear proteins. The performance of the method improved further by combining both approaches together.
Figures



Similar articles
-
Predicting sub-cellular localization of tRNA synthetases from their primary structures.Amino Acids. 2012 May;42(5):1703-13. doi: 10.1007/s00726-011-0872-8. Epub 2011 Mar 13. Amino Acids. 2012. PMID: 21400228
-
A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.In Silico Biol. 2008;8(2):129-40. In Silico Biol. 2008. PMID: 18928201
-
Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile.Amino Acids. 2010 Jun;39(1):101-10. doi: 10.1007/s00726-009-0381-1. Epub 2009 Nov 12. Amino Acids. 2010. PMID: 19908123
-
BTXpred: prediction of bacterial toxins.In Silico Biol. 2007;7(4-5):405-12. In Silico Biol. 2007. PMID: 18391233
-
Oxypred: prediction and classification of oxygen-binding proteins.Genomics Proteomics Bioinformatics. 2007 Dec;5(3-4):250-2. doi: 10.1016/S1672-0229(08)60012-1. Genomics Proteomics Bioinformatics. 2007. PMID: 18267306 Free PMC article.
Cited by
-
The Nucleocapsid Protein of Potato Yellow dwarf Virus: Protein Interactions and Nuclear Import Mediated by a Non-Canonical Nuclear Localization Signal.Front Plant Sci. 2012 Feb 2;3:14. doi: 10.3389/fpls.2012.00014. eCollection 2012. Front Plant Sci. 2012. PMID: 22645569 Free PMC article.
-
The pancreatic beta cell surface proteome.Diabetologia. 2012 Jul;55(7):1877-89. doi: 10.1007/s00125-012-2531-3. Epub 2012 Mar 31. Diabetologia. 2012. PMID: 22460761 Free PMC article. Review.
-
NRfamPred: a proteome-scale two level method for prediction of nuclear receptor proteins and their sub-families.Sci Rep. 2014 Oct 29;4:6810. doi: 10.1038/srep06810. Sci Rep. 2014. PMID: 25351274 Free PMC article.
-
Bird Eye View of Protein Subcellular Localization Prediction.Life (Basel). 2020 Dec 14;10(12):347. doi: 10.3390/life10120347. Life (Basel). 2020. PMID: 33327400 Free PMC article. Review.
-
Generation and Analysis of Large-Scale Data-Driven Mycobacterium tuberculosis Functional Networks for Drug Target Identification.Adv Bioinformatics. 2011;2011:801478. doi: 10.1155/2011/801478. Epub 2011 Nov 29. Adv Bioinformatics. 2011. PMID: 22190924 Free PMC article.
References
-
- Heddad A, Brameier M, MacCallum RM. Evolving regular expression-based sequence classifiers for protein nuclear localization. 2nd European Workshop on Evolutionary Computation and Bioinformatics (EvoBIO 2004): 2004; Coimbra, Portugal. 2004. pp. 31–40.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Molecular Biology Databases
Miscellaneous