EpiLoc: a (working) text-based system for predicting protein subcellular location
- PMID: 18229719
EpiLoc: a (working) text-based system for predicting protein subcellular location
Abstract
Motivation: Predicting the subcellular location of proteins is an active research area, as a protein's location within the cell provides meaningful cues about its function. Several previous experiments in utilizing text for protein subcellular location prediction varied in methods, applicability and performance level. In an earlier work we have used a preliminary text classification system and focused on the integration of text features into a sequence-based classifier to improve location prediction performance.
Results: Here the focus shifts to the text-based component itself. We introduce EpiLoc, a comprehensive text-based localization system. We provide an in-depth study of text-feature selection, and study several new ways to associate text with proteins, so that text-based location prediction can be performed for practically any protein. We show that EpiLoc's performance is comparable to (and may even exceed) that of state-of-the-art sequence-based systems. EpiLoc is available at: http://epiloc.cs.queensu.ca.
Similar articles
-
SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data.Bioinformatics. 2007 Jun 1;23(11):1410-7. doi: 10.1093/bioinformatics/btm115. Epub 2007 Mar 28. Bioinformatics. 2007. PMID: 17392328
-
Improving subcellular localization prediction using text classification and the gene ontology.Bioinformatics. 2008 Nov 1;24(21):2512-7. doi: 10.1093/bioinformatics/btn463. Epub 2008 Aug 26. Bioinformatics. 2008. PMID: 18728042
-
Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization.Biochem Biophys Res Commun. 2006 Aug 18;347(1):150-7. doi: 10.1016/j.bbrc.2006.06.059. Epub 2006 Jun 21. Biochem Biophys Res Commun. 2006. PMID: 16808903
-
An overview on predicting the subcellular location of a protein.In Silico Biol. 2002;2(3):291-303. In Silico Biol. 2002. PMID: 12542414 Review.
-
Predicting multisite protein subcellular locations: progress and challenges.Expert Rev Proteomics. 2013 Jun;10(3):227-37. doi: 10.1586/epr.13.16. Expert Rev Proteomics. 2013. PMID: 23777214 Review.
Cited by
-
The Text-mining based PubChem Bioassay neighboring analysis.BMC Bioinformatics. 2010 Nov 8;11:549. doi: 10.1186/1471-2105-11-549. BMC Bioinformatics. 2010. PMID: 21059237 Free PMC article.
-
An effective biomedical document classification scheme in support of biocuration: addressing class imbalance.Database (Oxford). 2019 Jan 1;2019:baz045. doi: 10.1093/database/baz045. Database (Oxford). 2019. PMID: 31032839 Free PMC article.
-
Genome-Scale Characterization of Predicted Plastid-Targeted Proteomes in Higher Plants.Sci Rep. 2020 May 19;10(1):8281. doi: 10.1038/s41598-020-64670-5. Sci Rep. 2020. PMID: 32427841 Free PMC article.
-
Computational prediction of protein function based on weighted mapping of domains and GO terms.Biomed Res Int. 2014;2014:641469. doi: 10.1155/2014/641469. Epub 2014 Apr 23. Biomed Res Int. 2014. PMID: 24868539 Free PMC article.
-
Protein function prediction using text-based features extracted from the biomedical literature: the CAFA challenge.BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S14. doi: 10.1186/1471-2105-14-S3-S14. Epub 2013 Feb 28. BMC Bioinformatics. 2013. PMID: 23514326 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources