Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 15;35(6):914-922.
doi: 10.1093/bioinformatics/bty747.

Dot2dot: accurate whole-genome tandem repeats discovery

Affiliations

Dot2dot: accurate whole-genome tandem repeats discovery

Loredana M Genovese et al. Bioinformatics. .

Abstract

Motivation: Large-scale sequencing projects have confirmed the hypothesis that eukaryotic DNA is rich in repetitions whose functional role needs to be elucidated. In particular, tandem repeats (TRs) (i.e. short, almost identical sequences that lie adjacent to each other) have been associated to many cellular processes and, indeed, are also involved in several genetic disorders. The need of comprehensive lists of TRs for association studies and the absence of a computational model able to capture their variability have revived research on discovery algorithms.

Results: Building upon the idea that sequence similarities can be easily displayed using graphical methods, we formalized the structure that TRs induce in dot-plot matrices where a sequence is compared with itself. Leveraging on the observation that a compact representation of these matrices can be built and searched in linear time, we developed Dot2dot: an accurate algorithm fast enough to be suitable for whole-genome discovery of TRs. Experiments on five manually curated collections of TRs have shown that Dot2dot is more accurate than other established methods, and completes the analysis of the biggest known reference genome in about one day on a standard PC.

Availability and implementation: Source code and datasets are freely available upon paper acceptance at the URL: https://github.com/Gege7177/Dot2dot.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Sample data structure for the matrix associated to the sequence TTACGACGTACGATGACGACGT

Similar articles

Cited by

References

    1. Abajian C. (1994) Sputnik: DNA microsatellite repeat search utility.
    1. Aknin-Seifer I. et al. (2005) Is the cag repeat of mitochondrial dna polymerase gamma (polg) associated with male infertility? A multi-centre french study. Hum. Reprod., 20, 736–740. - PubMed
    1. Azrak S. et al. (2012) Cag repeat variants in the polg1 gene encoding mtdna polymerase-gamma and risk of breast cancer in African-American women. PLoS One, 7, e29548.. - PMC - PubMed
    1. Bacolla A. et al. (2008) Abundance and length of simple repeats in vertebrate genomes are determined by their structural properties. Genome Res., 18, 1545–1553. - PMC - PubMed
    1. Benson G. (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res., 27, 573.. - PMC - PubMed

Publication types