Genetic programming for creating Chou's pseudo amino acid based features for submitochondria localization
- PMID: 18175047
- DOI: 10.1007/s00726-007-0018-1
Genetic programming for creating Chou's pseudo amino acid based features for submitochondria localization
Abstract
Given a protein that is localized in the mitochondria it is very important to know the submitochondria localization of that protein to understand its function. In this work, we propose a submitochondria localizer whose feature extraction method is based on the Chou's pseudo-amino acid composition. The pseudo-amino acid based features are obtained by combining pseudo-amino acid compositions with hundreds of amino-acid indices and amino-acid substitution matrices, then from this huge set of features a small set of 15 "artificial" features is created. The feature creation is performed by genetic programming combining one or more "original" features by means of some mathematical operators. Finally, the set of combined features are used to train a radial basis function support vector machine. This method is named GP-Loc. Moreover, we also propose a very few parameterized method, named ALL-Loc, where all the "original" features are used to train a linear support vector machine. The overall prediction accuracy obtained by GP-Loc is 89% when the jackknife cross-validation is used, this result outperforms the performance obtained in the literature (85.2%) using the same dataset. While the overall prediction accuracy obtained by ALL-Loc is 83.9%.
Similar articles
-
Prediction of protein structural classes by Chou's pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis.Amino Acids. 2009 Jul;37(2):415-25. doi: 10.1007/s00726-008-0170-2. Epub 2008 Aug 23. Amino Acids. 2009. PMID: 18726140
-
Combing ontologies and dipeptide composition for predicting DNA-binding proteins.Amino Acids. 2008 May;34(4):635-41. doi: 10.1007/s00726-007-0016-3. Epub 2008 Jan 4. Amino Acids. 2008. PMID: 18175049
-
An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins.Amino Acids. 2009 Feb;36(2):167-75. doi: 10.1007/s00726-008-0044-7. Epub 2008 Feb 21. Amino Acids. 2009. PMID: 18288459
-
A Brief Review on Software Tools in Generating Chou's Pseudo-factor Representations for All Types of Biological Sequences.Protein Pept Lett. 2018;25(9):822-829. doi: 10.2174/0929866525666180905111124. Protein Pept Lett. 2018. PMID: 30182829 Review.
-
Differences in amino acids composition and coupling patterns between mesophilic and thermophilic proteins.Amino Acids. 2008 Jan;34(1):25-33. doi: 10.1007/s00726-007-0589-x. Epub 2007 Aug 21. Amino Acids. 2008. PMID: 17710363 Review.
Cited by
-
iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition.Nucleic Acids Res. 2013 Apr 1;41(6):e68. doi: 10.1093/nar/gks1450. Epub 2013 Jan 8. Nucleic Acids Res. 2013. PMID: 23303794 Free PMC article.
-
Multi label learning for prediction of human protein subcellular localizations.Protein J. 2009 Dec;28(9-10):384-90. doi: 10.1007/s10930-009-9205-0. Protein J. 2009. PMID: 19806439
-
Some illuminating remarks on molecular genetics and genomics as well as drug development.Mol Genet Genomics. 2020 Mar;295(2):261-274. doi: 10.1007/s00438-019-01634-z. Epub 2020 Jan 1. Mol Genet Genomics. 2020. PMID: 31894399 Review.
-
Prediction of Protein Submitochondrial Locations by Incorporating Dipeptide Composition into Chou's General Pseudo Amino Acid Composition.J Membr Biol. 2016 Jun;249(3):293-304. doi: 10.1007/s00232-015-9868-8. Epub 2016 Jan 8. J Membr Biol. 2016. PMID: 26746980
-
Predicting secretory proteins of malaria parasite by incorporating sequence evolution information into pseudo amino acid composition via grey system model.PLoS One. 2012;7(11):e49040. doi: 10.1371/journal.pone.0049040. Epub 2012 Nov 26. PLoS One. 2012. PMID: 23189138 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources