A novel numerical representation for proteins: Three-dimensional Chaos Game Representation and its Extended Natural Vector
- PMID: 32774785
- PMCID: PMC7390779
- DOI: 10.1016/j.csbj.2020.07.004
A novel numerical representation for proteins: Three-dimensional Chaos Game Representation and its Extended Natural Vector
Abstract
Chaos Game Representation (CGR) was first proposed to be an image representation method of DNA and have been extended to the case of other biological macromolecules. Compared with the CGR images of DNA, where DNA sequences are converted into a series of points in the unit square, the existing CGR images of protein are not so elegant in geometry and the implications of the distribution of points in the CGR image are not so obvious. In this study, by naturally distributing the twenty amino acids on the vertices of a regular dodecahedron, we introduce a novel three-dimensional image representation of protein sequences with CGR method. We also associate each CGR image with a vector in high dimensional Euclidean space, called the extended natural vector (ENV), in order to analyze the information contained in the CGR images. Based on the results of protein classification and phylogenetic analysis, our method could serve as a precise method to discover biological relationships between proteins.
Keywords: Chaos Game Representation; Extended Natural Vector; Protein classification; Three-dimensional CGR.
© 2020 The Author(s).
Figures









Similar articles
-
Chaos game representation and its applications in bioinformatics.Comput Struct Biotechnol J. 2021 Nov 10;19:6263-6271. doi: 10.1016/j.csbj.2021.11.008. eCollection 2021. Comput Struct Biotechnol J. 2021. PMID: 34900136 Free PMC article. Review.
-
Fast and accurate genome comparison using genome images: The Extended Natural Vector Method.Mol Phylogenet Evol. 2019 Dec;141:106633. doi: 10.1016/j.ympev.2019.106633. Epub 2019 Sep 26. Mol Phylogenet Evol. 2019. PMID: 31563612
-
Predicting thermophilic proteins with pseudo amino acid composition:approached from chaos game representation and principal component analysis.Protein Pept Lett. 2011 Dec;18(12):1244-50. doi: 10.2174/092986611797642661. Protein Pept Lett. 2011. PMID: 21787282
-
Splice sites detection using chaos game representation and neural network.Genomics. 2020 Mar;112(2):1847-1852. doi: 10.1016/j.ygeno.2019.10.018. Epub 2019 Nov 5. Genomics. 2020. PMID: 31704313
-
Genome analysis through image processing with deep learning models.J Hum Genet. 2024 Oct;69(10):519-525. doi: 10.1038/s10038-024-01275-0. Epub 2024 Jul 31. J Hum Genet. 2024. PMID: 39085457 Free PMC article. Review.
Cited by
-
Chaos game representation and its applications in bioinformatics.Comput Struct Biotechnol J. 2021 Nov 10;19:6263-6271. doi: 10.1016/j.csbj.2021.11.008. eCollection 2021. Comput Struct Biotechnol J. 2021. PMID: 34900136 Free PMC article. Review.
-
Clustering and classification of virus sequence through music communication protocol and wavelet transform.Genomics. 2021 Jan;113(1 Pt 2):778-784. doi: 10.1016/j.ygeno.2020.10.009. Epub 2020 Oct 16. Genomics. 2021. PMID: 33069829 Free PMC article.
-
An accurate alignment-free protein sequence comparator based on physicochemical properties of amino acids.Sci Rep. 2022 Jul 1;12(1):11158. doi: 10.1038/s41598-022-15266-8. Sci Rep. 2022. PMID: 35778592 Free PMC article.
-
On leveraging self-supervised learning for accurate HCV genotyping.Sci Rep. 2024 Jul 5;14(1):15463. doi: 10.1038/s41598-024-64209-y. Sci Rep. 2024. PMID: 38965254 Free PMC article.
-
Detection of intra-family coronavirus genome sequences through graphical representation and artificial neural network.Expert Syst Appl. 2022 May 15;194:116559. doi: 10.1016/j.eswa.2022.116559. Epub 2022 Jan 21. Expert Syst Appl. 2022. PMID: 35095217 Free PMC article.
References
-
- Rigden D.J. Springer; Netherlands: 2017. From protein structure to function with bioinformatics.
LinkOut - more resources
Full Text Sources