Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jan 19:10:22.
doi: 10.1186/1471-2105-10-22.

Prediction of nuclear proteins using SVM and HMM models

Affiliations

Prediction of nuclear proteins using SVM and HMM models

Manish Kumar et al. BMC Bioinformatics. .

Abstract

Background: The nucleus, a highly organized organelle, plays important role in cellular homeostasis. The nuclear proteins are crucial for chromosomal maintenance/segregation, gene expression, RNA processing/export, and many other processes. Several methods have been developed for predicting the nuclear proteins in the past. The aim of the present study is to develop a new method for predicting nuclear proteins with higher accuracy.

Results: All modules were trained and tested on a non-redundant dataset and evaluated using five-fold cross-validation technique. Firstly, Support Vector Machines (SVM) based modules have been developed using amino acid and dipeptide compositions and achieved a Mathews correlation coefficient (MCC) of 0.59 and 0.61 respectively. Secondly, we have developed SVM modules using split amino acid compositions (SAAC) and achieved the maximum MCC of 0.66. Thirdly, a hidden Markov model (HMM) based module/profile was developed for searching exclusively nuclear and non-nuclear domains in a protein. Finally, a hybrid module was developed by combining SVM module and HMM profile and achieved a MCC of 0.87 with an accuracy of 94.61%. This method performs better than the existing methods when evaluated on blind/independent datasets. Our method estimated 31.51%, 21.89%, 26.31%, 25.72% and 24.95% of the proteins as nuclear proteins in Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, mouse and human proteomes respectively. Based on the above modules, we have developed a web server NpPred for predicting nuclear proteins http://www.imtech.res.in/raghava/nppred/.

Conclusion: This study describes a highly accurate method for predicting nuclear proteins. SVM module has been developed for the first time using SAAC for predicting nuclear proteins, where amino acid composition of N-terminus and the remaining protein were computed separately. In addition, our study is a first documentation where exclusively nuclear and non-nuclear domains have been identified and used for predicting nuclear proteins. The performance of the method improved further by combining both approaches together.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Average amino acid composition of nuclear and non-nuclear protein sequences of main dataset.
Figure 2
Figure 2
Variation in the preference of amino acids at N-terminal 25 residues, C-terminal 25 residues and full length nuclear and non-nuclear proteins.
Figure 3
Figure 3
Prediction of nuclear proteins using NpPred in proteome of Yeast (S. cerevisiae), Worm (C. elegans), Fly (D. melanogaster), Mouse (M. musculus) and Human (H. sapiens).

Similar articles

Cited by

References

    1. Guda C, Fahy E, Subramaniam S. MITOPRED: a genome-scale method for prediction of nucleus-encoded mitochondrial proteins. Bioinformatics. 2004;20:1785–1794. doi: 10.1093/bioinformatics/bth171. - DOI - PubMed
    1. Kumar M, Verma R, Raghava GPS. Prediction of mitochondrial proteins using support vector machine and hidden Markov model. J Biol Chem. 2006;281:5357–5363. doi: 10.1074/jbc.M511061200. - DOI - PubMed
    1. Emanuelsson O, Nielsen H, Brunak S, von Heijne G. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol. 2000;300:1005–1016. doi: 10.1006/jmbi.2000.3903. - DOI - PubMed
    1. Cokol M, Nair R, Rost B. Finding nuclear localization signals. EMBO Rep. 2000;1:411–415. doi: 10.1093/embo-reports/kvd092. - DOI - PMC - PubMed
    1. Heddad A, Brameier M, MacCallum RM. Evolving regular expression-based sequence classifiers for protein nuclear localization. 2nd European Workshop on Evolutionary Computation and Bioinformatics (EvoBIO 2004): 2004; Coimbra, Portugal. 2004. pp. 31–40.

Publication types