Identifying anticancer peptides by using a generalized chaos game representation
- PMID: 30291366
- DOI: 10.1007/s00285-018-1279-x
Identifying anticancer peptides by using a generalized chaos game representation
Abstract
We generalize chaos game representation (CGR) to higher dimensional spaces while maintaining its bijection, keeping such method sufficiently representative and mathematically rigorous compare to previous attempts. We first state and prove the asymptotic property of CGR and our generalized chaos game representation (GCGR) method. The prediction follows that the dissimilarity of sequences which possess identical subsequences but distinct positions would be lowered exponentially by the length of the identical subsequence; this effect was taking place unbeknownst to researchers. By shining a spotlight on it now, we show the effect fundamentally supports (G)CGR as a similarity measure or feature extraction technique. We develop two feature extraction techniques: GCGR-Centroid and GCGR-Variance. We use the GCGR-Centroid to analyze the similarity between protein sequences by using the datasets 9 ND5, 24 TF and 50 beta-globin proteins. We obtain consistent results compared with previous studies which proves the significance thereof. Finally, by utilizing support vector machines, we train the anticancer peptide prediction model by using both GCGR-Centroid and GCGR-Variance, and achieve a significantly higher prediction performance by employing the 3 well-studied anticancer peptide datasets.
Keywords: Anticancer peptides; Chaos game representation; Similarity analysis; Support vector machine.
Similar articles
-
Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features.Molecules. 2019 Mar 6;24(5):919. doi: 10.3390/molecules24050919. Molecules. 2019. PMID: 30845684 Free PMC article.
-
Analysis of genomic sequences by Chaos Game Representation.Bioinformatics. 2001 May;17(5):429-37. doi: 10.1093/bioinformatics/17.5.429. Bioinformatics. 2001. PMID: 11331237
-
A Statistical Similarity/Dissimilarity Analysis of Protein Sequences Based on a Novel Group Representative Vector.Biomed Res Int. 2019 May 8;2019:8702968. doi: 10.1155/2019/8702968. eCollection 2019. Biomed Res Int. 2019. PMID: 31205946 Free PMC article.
-
Chaos game representation and its applications in bioinformatics.Comput Struct Biotechnol J. 2021 Nov 10;19:6263-6271. doi: 10.1016/j.csbj.2021.11.008. eCollection 2021. Comput Struct Biotechnol J. 2021. PMID: 34900136 Free PMC article. Review.
-
Sequence analysis by iterated maps, a review.Brief Bioinform. 2014 May;15(3):369-75. doi: 10.1093/bib/bbt072. Epub 2013 Oct 25. Brief Bioinform. 2014. PMID: 24162172 Free PMC article. Review.
Cited by
-
Microbial characterization based on multifractal analysis of metagenomes.Front Cell Infect Microbiol. 2023 Jan 26;13:1117421. doi: 10.3389/fcimb.2023.1117421. eCollection 2023. Front Cell Infect Microbiol. 2023. PMID: 36779183 Free PMC article.
-
To Assist Oncologists: An Efficient Machine Learning-Based Approach for Anti-Cancer Peptides Classification.Sensors (Basel). 2022 May 25;22(11):4005. doi: 10.3390/s22114005. Sensors (Basel). 2022. PMID: 35684624 Free PMC article.
-
ACP-ADA: A Boosting Method with Data Augmentation for Improved Prediction of Anticancer Peptides.Int J Mol Sci. 2022 Oct 13;23(20):12194. doi: 10.3390/ijms232012194. Int J Mol Sci. 2022. PMID: 36293050 Free PMC article.
-
ACP-DA: Improving the Prediction of Anticancer Peptides Using Data Augmentation.Front Genet. 2021 Jun 30;12:698477. doi: 10.3389/fgene.2021.698477. eCollection 2021. Front Genet. 2021. PMID: 34276801 Free PMC article.
-
Encodings and models for antimicrobial peptide classification for multi-resistant pathogens.BioData Min. 2019 Mar 4;12:7. doi: 10.1186/s13040-019-0196-x. eCollection 2019. BioData Min. 2019. PMID: 30867681 Free PMC article. Review.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous