Supervised learning method for the prediction of subcellular localization of proteins using amino acid and amino acid pair composition
- PMID: 18366605
- PMCID: PMC2386058
- DOI: 10.1186/1471-2164-9-S1-S16
Supervised learning method for the prediction of subcellular localization of proteins using amino acid and amino acid pair composition
Abstract
Background: Occurrence of protein in the cell is an important step in understanding its function. It is highly desirable to predict a protein's subcellular locations automatically from its sequence. Most studied methods for prediction of subcellular localization of proteins are signal peptides, the location by sequence homology, and the correlation between the total amino acid compositions of proteins. Taking amino-acid composition and amino acid pair composition into consideration helps improving the prediction accuracy.
Results: We constructed a dataset of protein sequences from SWISS-PROT database and segmented them into 12 classes based on their subcellular locations. SVM modules were trained to predict the subcellular location based on amino acid composition and amino acid pair composition. Results were calculated after 10-fold cross validation. Radial Basis Function (RBF) outperformed polynomial and linear kernel functions. Total prediction accuracy reached to 71.8% for amino acid composition and 77.0% for amino acid pair composition. In order to observe the impact of number of subcellular locations we constructed two more datasets of nine and five subcellular locations. Total accuracy was further improved to 79.9% and 85.66%.
Conclusions: A new SVM based approach is presented based on amino acid and amino acid pair composition. Result shows that data simulation and taking more protein features into consideration improves the accuracy to a great extent. It was also noticed that the data set needs to be crafted to take account of the distribution of data in all the classes.
Figures






Similar articles
-
Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs.Bioinformatics. 2003 Sep 1;19(13):1656-63. doi: 10.1093/bioinformatics/btg222. Bioinformatics. 2003. PMID: 12967962
-
pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties.BMC Bioinformatics. 2005 Jun 17;6:152. doi: 10.1186/1471-2105-6-152. BMC Bioinformatics. 2005. PMID: 15963230 Free PMC article.
-
A complexity-based method for predicting protein subcellular location.Amino Acids. 2009 Jul;37(2):427-33. doi: 10.1007/s00726-008-0172-0. Epub 2008 Aug 22. Amino Acids. 2009. PMID: 18719852
-
Supervised ensembles of prediction methods for subcellular localization.J Bioinform Comput Biol. 2009 Apr;7(2):269-85. doi: 10.1142/s0219720009004072. J Bioinform Comput Biol. 2009. PMID: 19340915 Review.
-
Computational protein function prediction: are we making progress?Cell Mol Life Sci. 2007 Oct;64(19-20):2505-11. doi: 10.1007/s00018-007-7211-y. Cell Mol Life Sci. 2007. PMID: 17611711 Free PMC article. Review.
Cited by
-
Genomics, molecular imaging, bioinformatics, and bio-nano-info integration are synergistic components of translational medicine and personalized healthcare research.BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):I1. doi: 10.1186/1471-2164-9-S2-I1. BMC Genomics. 2008. PMID: 18831773 Free PMC article.
-
Promoting synergistic research and education in genomics and bioinformatics.BMC Genomics. 2008;9 Suppl 1(Suppl 1):I1. doi: 10.1186/1471-2164-9-S1-I1. BMC Genomics. 2008. PMID: 18366597 Free PMC article. Review.
-
ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins.BMC Bioinformatics. 2008 Nov 28;9:503. doi: 10.1186/1471-2105-9-503. BMC Bioinformatics. 2008. PMID: 19038062 Free PMC article.
-
PlantLoc: an accurate web server for predicting plant protein subcellular localization by substantiality motif.Nucleic Acids Res. 2013 Jul;41(Web Server issue):W441-7. doi: 10.1093/nar/gkt428. Epub 2013 May 31. Nucleic Acids Res. 2013. PMID: 23729470 Free PMC article.
-
Subcellular location prediction of apoptosis proteins using two novel feature extraction methods based on evolutionary information and LDA.BMC Bioinformatics. 2020 May 24;21(1):212. doi: 10.1186/s12859-020-3539-1. BMC Bioinformatics. 2020. PMID: 32448129 Free PMC article.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources