Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 15:18:1904-1913.
doi: 10.1016/j.csbj.2020.07.004. eCollection 2020.

A novel numerical representation for proteins: Three-dimensional Chaos Game Representation and its Extended Natural Vector

Affiliations

A novel numerical representation for proteins: Three-dimensional Chaos Game Representation and its Extended Natural Vector

Zeju Sun et al. Comput Struct Biotechnol J. .

Abstract

Chaos Game Representation (CGR) was first proposed to be an image representation method of DNA and have been extended to the case of other biological macromolecules. Compared with the CGR images of DNA, where DNA sequences are converted into a series of points in the unit square, the existing CGR images of protein are not so elegant in geometry and the implications of the distribution of points in the CGR image are not so obvious. In this study, by naturally distributing the twenty amino acids on the vertices of a regular dodecahedron, we introduce a novel three-dimensional image representation of protein sequences with CGR method. We also associate each CGR image with a vector in high dimensional Euclidean space, called the extended natural vector (ENV), in order to analyze the information contained in the CGR images. Based on the results of protein classification and phylogenetic analysis, our method could serve as a precise method to discover biological relationships between proteins.

Keywords: Chaos Game Representation; Extended Natural Vector; Protein classification; Three-dimensional CGR.

PubMed Disclaimer

Figures

None
Graphical abstract
Fig. 1
Fig. 1
Vertices with identical letters will coincide when folding into a regular dodecahedron. (a) Distribution of amino acids on the vertices of a regular dodecahedron (3-dimensional view). (b) Distribution of amino acids on the vertices of a regular dodecahedron (extended image).
Fig. 2
Fig. 2
(a) Three dimensional Chaos Game Representation (CGR) of the first five amino acids of KNG1_HUMAN (P01042), “MKLIT”. (b) Three dimensional Chaos Game Representation (CGR) of KNG1_HUMAN (P01042).
Fig. 3
Fig. 3
(a) The ball controlled by 20 amino acids. (b) The ball BL and the balls controlled by dipeptides ‘ωL’.
Fig. 4
Fig. 4
Three-dimensional CGR image of KNG1_HUMAN. (a) shows the location of points in the ball B, (b) is an enlargement of BL, and (c) is an enlargement of BGL.
Fig. 5
Fig. 5
Three-dimensional CGR image of (a) KNG1_HUMAN (P01042), (b) KNG2_BOVIN (P01045) and (c) UROM_HUMAN (P07911).
Fig. 6
Fig. 6
Balls controlled by dipeptides in group 7 of KNG1_HUMAN (P01042).
Fig. 7
Fig. 7
Phylogenetic tree constructed by our method, NV method and traditional alignment method. (a) is the tree of our method, (b) is the tree of NV method and (c) is the tree of Clustal W method.
Fig. 8
Fig. 8
Linear regression of RMSD of a pair of protein structures and ENV distance of the corresponding protein sequences; the RMSD as the x-axis and the ENV-distance as the y-axis (a) dataset 1: 8 serine hydroxymethyltransferase proteins (b) dataset 2: 8 response regulator proteins.

Similar articles

Cited by

References

    1. Rigden D.J. Springer; Netherlands: 2017. From protein structure to function with bioinformatics.
    1. Jurtz V.I., Johansen A.R., Nielsen M., Armenteros J.J.A., Nielsen H., Sonderby C.K., Winther O., Sonderby S.K. An introduction to deep learning on biological sequence data: examples and solutions. Bioinformatics. 2017;33:3685–3690. - PMC - PubMed
    1. Li J., Koehl P. 3D representations of amino acids applications to protein sequence comparison and classification. Comput Struct Biotechnol J. 2014;11:47–58. - PMC - PubMed
    1. Li B., Cai L., Liao B., Fu X., Bing P., Yang J. Prediction of protein subcellular localization based on fusion of multi-view features. Molecules. 2019;24:919. - PMC - PubMed
    1. Deng M., Yu C., Liang Q., He R.L., Yau S.S.T. A novel method of characterizing genetic sequences: genome space with biological distance and applications. PloS One. 2011;6 - PMC - PubMed

LinkOut - more resources