Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 20:15:1339156.
doi: 10.3389/fmicb.2024.1339156. eCollection 2024.

CGRWDL: alignment-free phylogeny reconstruction method for viruses based on chaos game representation weighted by dynamical language model

Affiliations

CGRWDL: alignment-free phylogeny reconstruction method for viruses based on chaos game representation weighted by dynamical language model

Ting Wang et al. Front Microbiol. .

Abstract

Traditional alignment-based methods meet serious challenges in genome sequence comparison and phylogeny reconstruction due to their high computational complexity. Here, we propose a new alignment-free method to analyze the phylogenetic relationships (classification) among species. In our method, the dynamical language (DL) model and the chaos game representation (CGR) method are used to characterize the frequency information and the context information of k-mers in a sequence, respectively. Then for each DNA sequence or protein sequence in a dataset, our method converts the sequence into a feature vector that represents the sequence information based on CGR weighted by the DL model to infer phylogenetic relationships. We name our method CGRWDL. Its performance was tested on both DNA and protein sequences of 8 datasets of viruses to construct the phylogenetic trees. We compared the Robinson-Foulds (RF) distance between the phylogenetic tree constructed by CGRWDL and the reference tree by other advanced methods for each dataset. The results show that the phylogenetic trees constructed by CGRWDL can accurately classify the viruses, and the RF scores between the trees and the reference trees are smaller than that with other methods.

Keywords: alignment-free method; chaos game representation; dynamical language model; k-mers; virus phylogeny reconstruction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
(A) CGR generation process (e.g., ATGC), (B) CGR of complete DNA sequence of HRV: DQ473505.1.
Figure 2
Figure 2
The CAFV values of the eight datasets change with k-mers length.
Figure 3
Figure 3
When k-mers takes different values, the Normalized RF distance between the phylogenetic tree constructed and the reference tree of the eight datasets.
Figure 4
Figure 4
Phylogenetic tree of HIV-1 complete protein-coding DNA sequences constructed by CGRWDL (k = 8).
Figure 5
Figure 5
Phylogenetic tree of HRV complete protein sequences constructed by CGRWDL (k = 4).
Figure 6
Figure 6
Phylogenetic tree of Coronaviruses complete DNA sequences constructed by CGRWDL (k = 10).

Similar articles

Cited by

References

    1. Akgül B., Cooke J. C., Storey A. (2006). HPV-associated skin disease. J. Pathol. Pathol. Soc. Great Britain Ireland 208, 165–175. doi: 10.1002/path.1893 - DOI - PubMed
    1. Almeida J. S. (2014). Sequence analysis by iterated maps, a review. Brief. Bioinform. 15, 369–375. doi: 10.1093/bib/bbt072, PMID: - DOI - PMC - PubMed
    1. Basu S., Pan A., Dutta C., Das J. (1997). Chaos game representation of proteins. J. Mol. Graph. Model. 15, 279–289. doi: 10.1016/S1093-3263(97)00106-X - DOI - PubMed
    1. Bernard G., Chan C. X., Chan Y. B., Chua X. Y., Cong Y., Hogan J. M., et al. . (2019). Alignment-free inference of hierarchical and reticulate phylogenomic relationships. Brief. Bioinform. 20, 426–435. doi: 10.1093/bib/bbx067, PMID: - DOI - PMC - PubMed
    1. Bochkov Y. A., Palmenberg A. C., Lee W. M., Rathe J. A., Amineva S. P., Sun X., et al. . (2011). Molecular modeling, organ culture and reverse genetics for a newly identified human rhinovirus C. Nat. Med. 17, 627–632. doi: 10.1038/nm.2358, PMID: - DOI - PMC - PubMed

LinkOut - more resources