Dot2dot: accurate whole-genome tandem repeats discovery
- PMID: 30165507
- PMCID: PMC6419916
- DOI: 10.1093/bioinformatics/bty747
Dot2dot: accurate whole-genome tandem repeats discovery
Abstract
Motivation: Large-scale sequencing projects have confirmed the hypothesis that eukaryotic DNA is rich in repetitions whose functional role needs to be elucidated. In particular, tandem repeats (TRs) (i.e. short, almost identical sequences that lie adjacent to each other) have been associated to many cellular processes and, indeed, are also involved in several genetic disorders. The need of comprehensive lists of TRs for association studies and the absence of a computational model able to capture their variability have revived research on discovery algorithms.
Results: Building upon the idea that sequence similarities can be easily displayed using graphical methods, we formalized the structure that TRs induce in dot-plot matrices where a sequence is compared with itself. Leveraging on the observation that a compact representation of these matrices can be built and searched in linear time, we developed Dot2dot: an accurate algorithm fast enough to be suitable for whole-genome discovery of TRs. Experiments on five manually curated collections of TRs have shown that Dot2dot is more accurate than other established methods, and completes the analysis of the biggest known reference genome in about one day on a standard PC.
Availability and implementation: Source code and datasets are freely available upon paper acceptance at the URL: https://github.com/Gege7177/Dot2dot.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2018. Published by Oxford University Press.
Figures
Similar articles
-
TRStalker: an efficient heuristic for finding fuzzy tandem repeats.Bioinformatics. 2010 Jun 15;26(12):i358-66. doi: 10.1093/bioinformatics/btq209. Bioinformatics. 2010. PMID: 20529928 Free PMC article.
-
Decomposing mosaic tandem repeats accurately from long reads.Bioinformatics. 2023 Apr 3;39(4):btad185. doi: 10.1093/bioinformatics/btad185. Bioinformatics. 2023. PMID: 37039842 Free PMC article.
-
Tally-2.0: upgraded validator of tandem repeat detection in protein sequences.Bioinformatics. 2020 May 1;36(10):3260-3262. doi: 10.1093/bioinformatics/btaa121. Bioinformatics. 2020. PMID: 32096820 Free PMC article.
-
Genome (in)stability at tandem repeats.Semin Cell Dev Biol. 2021 May;113:97-112. doi: 10.1016/j.semcdb.2020.10.003. Epub 2020 Oct 24. Semin Cell Dev Biol. 2021. PMID: 33109442 Review.
-
Statistical approaches to detecting and analyzing tandem repeats in genomic sequences.Front Bioeng Biotechnol. 2015 Mar 17;3:31. doi: 10.3389/fbioe.2015.00031. eCollection 2015. Front Bioeng Biotechnol. 2015. PMID: 25853125 Free PMC article. Review.
Cited by
-
An Overview of Duplicated Gene Detection Methods: Why the Duplication Mechanism Has to Be Accounted for in Their Choice.Genes (Basel). 2020 Sep 4;11(9):1046. doi: 10.3390/genes11091046. Genes (Basel). 2020. PMID: 32899740 Free PMC article. Review.
-
USAT: a bioinformatic toolkit to facilitate interpretation and comparative visualization of tandem repeat sequences.BMC Bioinformatics. 2022 Nov 19;23(1):497. doi: 10.1186/s12859-022-05021-1. BMC Bioinformatics. 2022. PMID: 36402991 Free PMC article.
-
Structure and evolution of the Forsythieae genome elucidated by chromosome-level genome comparison of Abeliophyllum distichum and Forsythia ovata (Oleaceae).Commun Biol. 2025 Feb 18;8(1):254. doi: 10.1038/s42003-025-07683-y. Commun Biol. 2025. PMID: 39966682 Free PMC article.
-
BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data.Front Big Data. 2022 Jan 18;4:727216. doi: 10.3389/fdata.2021.727216. eCollection 2021. Front Big Data. 2022. PMID: 35118375 Free PMC article.
-
Revisiting tandem repeats in psychiatric disorders from perspectives of genetics, physiology, and brain evolution.Mol Psychiatry. 2022 Jan;27(1):466-475. doi: 10.1038/s41380-021-01329-1. Epub 2021 Oct 14. Mol Psychiatry. 2022. PMID: 34650204 Review.
References
-
- Abajian C. (1994) Sputnik: DNA microsatellite repeat search utility.
-
- Aknin-Seifer I. et al. (2005) Is the cag repeat of mitochondrial dna polymerase gamma (polg) associated with male infertility? A multi-centre french study. Hum. Reprod., 20, 736–740. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources