Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep 18;10(9):e0136577.
doi: 10.1371/journal.pone.0136577. eCollection 2015.

Two Dimensional Yau-Hausdorff Distance with Applications on Comparison of DNA and Protein Sequences

Affiliations

Two Dimensional Yau-Hausdorff Distance with Applications on Comparison of DNA and Protein Sequences

Kun Tian et al. PLoS One. .

Abstract

Comparing DNA or protein sequences plays an important role in the functional analysis of genomes. Despite many methods available for sequences comparison, few methods retain the information content of sequences. We propose a new approach, the Yau-Hausdorff method, which considers all translations and rotations when seeking the best match of graphical curves of DNA or protein sequences. The complexity of this method is lower than that of any other two dimensional minimum Hausdorff algorithm. The Yau-Hausdorff method can be used for measuring the similarity of DNA sequences based on two important tools: the Yau-Hausdorff distance and graphical representation of DNA sequences. The graphical representations of DNA sequences conserve all sequence information and the Yau-Hausdorff distance is mathematically proved as a true metric. Therefore, the proposed distance can preciously measure the similarity of DNA sequences. The phylogenetic analyses of DNA sequences by the Yau-Hausdorff distance show the accuracy and stability of our approach in similarity comparison of DNA or protein sequences. This study demonstrates that Yau-Hausdorff distance is a natural metric for DNA and protein sequences with high level of stability. The approach can be also applied to similarity analysis of protein sequences by graphic representations, as well as general two dimensional shape matching.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Graphical representation of human mitochondrial DNA (1–500 bp, GenBank:X93334).
Fig 2
Fig 2. Hierarchical tree of COI sequences (Yau-Hausdorff method and natural vector method).
Fig 3
Fig 3. Hierarchical tree of barcoding DNA sequences (Yau-Hausdorff method).
Fig 4
Fig 4. Hierarchical tree of barcoding DNA sequences (Feature vector method).
Fig 5
Fig 5. Hierarchical tree of H1N1 virus sequences (Yau-Hausdorff method).
Fig 6
Fig 6. Hierarchical tree of the influenza virus NA genes(Yau-Hausdorff method).
Fig 7
Fig 7. Natural graph the PKC family(Yau-Hausdorff method).
Fig 8
Fig 8. Hierarchical tree of 50 β-globin sequences (Yau-Hausdorff method).
Fig 9
Fig 9. Hierarchical tree of 50 β-globin sequences (Moment vector method).
Fig 10
Fig 10. Natural graph of 50 β-globin sequences(Yau-Hausdorff method).
Fig 11
Fig 11. The relationship between Yau-Hausdorff distance and deletion length of sequence.

References

    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of molecular biology. 1990;215(3):403–410. - PubMed
    1. Yau SST, Yu C, He R. A protein map and its application. DNA and cell biology. 2008;27(5):241–250. 10.1089/dna.2007.0676 - DOI - PubMed
    1. Huang G, Zhou H, Li Y, Xu L. Alignment-free comparison of genome sequences by a new numerical characterization. Journal of theoretical biology. 2011;281(1):107–112. 10.1016/j.jtbi.2011.04.003 - DOI - PubMed
    1. Liu B, Liu F, Fang L, Wang X, Chou KC. repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects. Bioinformatics. 2015;31(8):1307–1309. 10.1093/bioinformatics/btu820 - DOI - PubMed
    1. Zou Q, Hu Q, Guo M, Wang G. HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy. Bioinformatics. 2015;p. btv177. - PubMed

Publication types

LinkOut - more resources