Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition
- PMID: 16213466
- DOI: 10.1016/j.bbrc.2005.09.117
Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition
Abstract
The nucleus is the brain of eukaryotic cells that guides the life processes of the cell by issuing key instructions. For in-depth understanding of the biochemical process of the nucleus, the knowledge of localization of nuclear proteins is very important. With the avalanche of protein sequences generated in the post-genomic era, it is highly desired to develop an automated method for fast annotating the subnuclear locations for numerous newly found nuclear protein sequences so as to be able to timely utilize them for basic research and drug discovery. In view of this, a novel approach is developed for predicting the protein subnuclear location. It is featured by introducing a powerful classifier, the optimized evidence-theoretic K-nearest classifier, and using the pseudo amino acid composition [K.C. Chou, PROTEINS: Structure, Function, and Genetics, 43 (2001) 246], which can incorporate a considerable amount of sequence-order effects, to represent protein samples. As a demonstration, identifications were performed for 370 nuclear proteins among the following 9 subnuclear locations: (1) Cajal body, (2) chromatin, (3) heterochromatin, (4) nuclear diffuse, (5) nuclear pore, (6) nuclear speckle, (7) nucleolus, (8) PcG body, and (9) PML body. The overall success rates thus obtained by both the re-substitution test and jackknife cross-validation test are significantly higher than those by existing classifiers on the same working dataset. It is anticipated that the powerful approach may also become a useful high throughput vehicle to bridge the huge gap occurring in the post-genomic era between the number of gene sequences in databases and the number of gene products that have been functionally characterized. The OET-KNN classifier will be available at www.pami.sjtu.edu.cn/people/hbshen.
Similar articles
-
Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types.Biochem Biophys Res Commun. 2005 Aug 19;334(1):288-92. doi: 10.1016/j.bbrc.2005.06.087. Biochem Biophys Res Commun. 2005. PMID: 16002049
-
Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers.J Proteome Res. 2006 Aug;5(8):1888-97. doi: 10.1021/pr060167c. J Proteome Res. 2006. PMID: 16889410
-
Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization.Biochem Biophys Res Commun. 2006 Aug 18;347(1):150-7. doi: 10.1016/j.bbrc.2006.06.059. Epub 2006 Jun 21. Biochem Biophys Res Commun. 2006. PMID: 16808903
-
Predicting protein subcellular location by fusing multiple classifiers.J Cell Biochem. 2006 Oct 1;99(2):517-27. doi: 10.1002/jcb.20879. J Cell Biochem. 2006. PMID: 16639720
-
Addressing protein localization within the nucleus.EMBO J. 2002 Mar 15;21(6):1248-54. doi: 10.1093/emboj/21.6.1248. EMBO J. 2002. PMID: 11889031 Free PMC article. Review.
Cited by
-
Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence.BMC Bioinformatics. 2006 Nov 30;7:518. doi: 10.1186/1471-2105-7-518. BMC Bioinformatics. 2006. PMID: 17134515 Free PMC article.
-
A multi-label classifier for prediction membrane protein functional types in animal.J Membr Biol. 2014 Nov;247(11):1141-8. doi: 10.1007/s00232-014-9708-2. Epub 2014 Aug 9. J Membr Biol. 2014. PMID: 25107302
-
CIPPN: computational identification of protein pupylation sites by using neural network.Oncotarget. 2017 Nov 6;8(65):108867-108879. doi: 10.18632/oncotarget.22335. eCollection 2017 Dec 12. Oncotarget. 2017. PMID: 29312575 Free PMC article.
-
Some remarks on protein attribute prediction and pseudo amino acid composition.J Theor Biol. 2011 Mar 21;273(1):236-47. doi: 10.1016/j.jtbi.2010.12.024. Epub 2010 Dec 17. J Theor Biol. 2011. PMID: 21168420 Free PMC article.
-
Protein sub-nuclear localization prediction using SVM and Pfam domain information.PLoS One. 2014 Jun 4;9(6):e98345. doi: 10.1371/journal.pone.0098345. eCollection 2014. PLoS One. 2014. PMID: 24897370 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous