Predicting protein subcellular location by fusing multiple classifiers
- PMID: 16639720
- DOI: 10.1002/jcb.20879
Predicting protein subcellular location by fusing multiple classifiers
Abstract
One of the fundamental goals in cell biology and proteomics is to identify the functions of proteins in the context of compartments that organize them in the cellular environment. Knowledge of subcellular locations of proteins can provide key hints for revealing their functions and understanding how they interact with each other in cellular networking. Unfortunately, it is both time-consuming and expensive to determine the localization of an uncharacterized protein in a living cell purely based on experiments. With the avalanche of newly found protein sequences emerging in the post genomic era, we are facing a critical challenge, that is, how to develop an automated method to fast and reliably identify their subcellular locations so as to be able to timely use them for basic research and drug discovery. In view of this, an ensemble classifier was developed by the approach of fusing many basic individual classifiers through a voting system. Each of these basic classifiers was trained in a different dimension of the amphiphilic pseudo amino acid composition (Chou [2005] Bioinformatics 21: 10-19). As a demonstration, predictions were performed with the fusion classifier for proteins among the following 14 localizations: (1) cell wall, (2) centriole, (3) chloroplast, (4) cytoplasm, (5) cytoskeleton, (6) endoplasmic reticulum, (7) extracellular, (8) Golgi apparatus, (9) lysosome, (10) mitochondria, (11) nucleus, (12) peroxisome, (13) plasma membrane, and (14) vacuole. The overall success rates thus obtained via the resubstitution test, jackknife test, and independent dataset test were all significantly higher than those by the existing classifiers. It is anticipated that the novel ensemble classifier may also become a very useful vehicle in classifying other attributes of proteins according to their sequences, such as membrane protein type, enzyme family/sub-family, G-protein coupled receptor (GPCR) type, and structural class, among many others. The fusion ensemble classifier will be available at www.pami.sjtu.edu.cn/people/hbshen.
Copyright 2006 Wiley-Liss, Inc.
Similar articles
-
Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers.J Proteome Res. 2006 Aug;5(8):1888-97. doi: 10.1021/pr060167c. J Proteome Res. 2006. PMID: 16889410
-
Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction.Amino Acids. 2007 Jul;33(1):57-67. doi: 10.1007/s00726-006-0478-8. Epub 2007 Jan 19. Amino Acids. 2007. PMID: 17235453
-
Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization.Biochem Biophys Res Commun. 2006 Aug 18;347(1):150-7. doi: 10.1016/j.bbrc.2006.06.059. Epub 2006 Jun 21. Biochem Biophys Res Commun. 2006. PMID: 16808903
-
Methodology development for predicting subcellular localization and other attributes of proteins.Expert Rev Proteomics. 2007 Aug;4(4):453-63. doi: 10.1586/14789450.4.4.453. Expert Rev Proteomics. 2007. PMID: 17705704 Review.
-
[Development of antituberculous drugs: current status and future prospects].Kekkaku. 2006 Dec;81(12):753-74. Kekkaku. 2006. PMID: 17240921 Review. Japanese.
Cited by
-
iATC-mHyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals.Oncotarget. 2017 Apr 11;8(35):58494-58503. doi: 10.18632/oncotarget.17028. eCollection 2017 Aug 29. Oncotarget. 2017. PMID: 28938573 Free PMC article.
-
Imbalanced classification for protein subcellular localization with multilabel oversampling.Bioinformatics. 2023 Jan 1;39(1):btac841. doi: 10.1093/bioinformatics/btac841. Bioinformatics. 2023. PMID: 36579866 Free PMC article.
-
Some remarks on protein attribute prediction and pseudo amino acid composition.J Theor Biol. 2011 Mar 21;273(1):236-47. doi: 10.1016/j.jtbi.2010.12.024. Epub 2010 Dec 17. J Theor Biol. 2011. PMID: 21168420 Free PMC article.
-
Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence.BMC Bioinformatics. 2006 Nov 30;7:518. doi: 10.1186/1471-2105-7-518. BMC Bioinformatics. 2006. PMID: 17134515 Free PMC article.
-
A multilabel model based on Chou's pseudo-amino acid composition for identifying membrane proteins with both single and multiple functional types.J Membr Biol. 2013 Apr;246(4):327-34. doi: 10.1007/s00232-013-9536-9. Epub 2013 Apr 2. J Membr Biol. 2013. PMID: 23546013
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials