Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Aug 1;546(1):25-34.
doi: 10.1016/j.gene.2014.05.043. Epub 2014 May 22.

K-mer natural vector and its application to the phylogenetic analysis of genetic sequences

Affiliations

K-mer natural vector and its application to the phylogenetic analysis of genetic sequences

Jia Wen et al. Gene. .

Abstract

Based on the well-known k-mer model, we propose a k-mer natural vector model for representing a genetic sequence based on the numbers and distributions of k-mers in the sequence. We show that there exists a one-to-one correspondence between a genetic sequence and its associated k-mer natural vector. The k-mer natural vector method can be easily and quickly used to perform phylogenetic analysis of genetic sequences without requiring evolutionary models or human intervention. Whole or partial genomes can be handled more effective with our proposed method. It is applied to the phylogenetic analysis of genetic sequences, and the obtaining results fully demonstrate that the k-mer natural vector method is a very powerful tool for analysing and annotating genetic sequences and determining evolutionary relationships both in terms of accuracy and efficiency.

Keywords: K-mer model; Natural vector; Phylogenetic analysis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Phylogenetic tree of 31 mitochondrial genome sequences based on 9-mer natural vector. All 31 genomes are correctly clustered into eight known clusters: Carnivora (red), Perissodactyla (blue), Artiodactyla (yellow), Cetacea (light green), Lagomorpha (light blue), Rodentia (purple), Primates (green) and Erinaceomorpha (light green), which agrees with results from standard biological taxonomy and evolutionary relationships of species.
Figure 2
Figure 2
Phylogenetic tree of 31 mitochondrial genome sequences obtained by multiple sequence alignment (clustalW).
Figure 3
Figure 3
Phylogenetic tree of 53 human mitochondrial genome sequences based on 8-mer natural vector. The 53 mtDNAs are mainly divided into two parts: non-Africans (red and green) and Africans (blue, yellow, brown and purple), and humans in each group correctly cluster, which is consistent with known evidences of human evolution and human migration.
Figure 4
Figure 4
Phylogenetic tree of 53 human mitochondrial genome sequences obtained by multiple sequence alignment (clustalW).
Figure 5
Figure 5
Phylogenetic tree of 40 18S rRNA sequences based on 6-mer natural vector. The phylogenetic tree of 18S rRNAs contains four clades: Birds (green), Crocodilians (blue), Mammals (red) and Amphibians (purple), and the species in each clade correctly group together that conform to results from traditional classification.
Figure 6
Figure 6
Phylogenetic tree of 40 18S rRNA sequences obtained by multiple sequence alignment (clustalW).

Similar articles

Cited by

References

    1. Atchley WR, Fitch WM, Bronner FM. Molecular evolution of the MyoD family of transcription factors. Proc. Natl. Acad. Sci. USA. 1994;91:11522–11526. - PMC - PubMed
    1. Ausio J, Soley JT, Burger W, Lewis JD, Barreda D, Cheng KM. The histidine-rich protamine from ostrich and tinamou sperm: A link between reptile and bird protamines. Biochemistry. 1999;38:180–184. - PubMed
    1. Berry MW, Drmac Z, Jessup ER. Matrices, vector spaces, and information retrieval. SIAM Rewiew. 1999;41:335–362.
    1. Blaisdell BE. A measure of the similarity of sets of sequences not requiring sequence alignment. Proc. Natl. Acad. Sci. USA. 1986;83:5155–5159. - PMC - PubMed
    1. Brown WM, George MJ, Wilson AC. Rapid evolution of animal mitochondrial DNA. Proc. Natl. Acad. Sci. USA. 1979;76:1967–1971. - PMC - PubMed

Publication types

Substances