Tandem repeats over the edit distance
- PMID: 17237101
- DOI: 10.1093/bioinformatics/btl309
Tandem repeats over the edit distance
Abstract
Motivation: A tandem repeat in DNA is a sequence of two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats occur in the genomes of both eukaryotic and prokaryotic organisms. They are important in numerous fields including disease diagnosis, mapping studies, human identity testing (DNA fingerprinting), sequence homology and population studies. Although tandem repeats have been used by biologists for many years, there are few tools available for performing an exhaustive search for all tandem repeats in a given sequence.
Results: In this paper we describe an efficient algorithm for finding all tandem repeats within a sequence, under the edit distance measure. The contributions of this paper are two-fold: theoretical and practical. We present a precise definition for tandem repeats over the edit distance and an efficient, deterministic algorithm for finding these repeats.
Availability: The algorithm has been implemented in C++, and the software is available upon request and can be used at http://www.sci.brooklyn.cuny.edu/~sokol/trepeats. The use of this tool will assist biologists in discovering new ways that tandem repeats affect both the structure and function of DNA and protein molecules.
Similar articles
-
HomologMiner: looking for homologous genomic groups in whole genomes.Bioinformatics. 2007 Apr 15;23(8):917-25. doi: 10.1093/bioinformatics/btm048. Epub 2007 Feb 18. Bioinformatics. 2007. PMID: 17308341
-
Pattern locator: a new tool for finding local sequence patterns in genomic DNA sequences.Bioinformatics. 2006 Dec 15;22(24):3099-100. doi: 10.1093/bioinformatics/btl551. Epub 2006 Nov 8. Bioinformatics. 2006. PMID: 17095514
-
STAR: an algorithm to Search for Tandem Approximate Repeats.Bioinformatics. 2004 Nov 1;20(16):2812-20. doi: 10.1093/bioinformatics/bth335. Epub 2004 Jun 4. Bioinformatics. 2004. PMID: 15180940
-
Mechanisms of tandem repeat instability in bacteria.Mutat Res. 2006 Jun 25;598(1-2):144-63. doi: 10.1016/j.mrfmmm.2006.01.020. Epub 2006 Mar 7. Mutat Res. 2006. PMID: 16519906 Review.
-
Discovering and detecting transposable elements in genome sequences.Brief Bioinform. 2007 Nov;8(6):382-92. doi: 10.1093/bib/bbm048. Epub 2007 Oct 10. Brief Bioinform. 2007. PMID: 17932080 Review.
Cited by
-
Tandem repeats discovery service (TReaDS) applied to finding novel cis-acting factors in repeat expansion diseases.BMC Bioinformatics. 2012 Mar 28;13 Suppl 4(Suppl 4):S3. doi: 10.1186/1471-2105-13-S4-S3. BMC Bioinformatics. 2012. PMID: 22536970 Free PMC article.
-
Ab initio detection of fuzzy amino acid tandem repeats in protein sequences.BMC Bioinformatics. 2012 Mar 21;13 Suppl 3(Suppl 3):S8. doi: 10.1186/1471-2105-13-S3-S8. BMC Bioinformatics. 2012. PMID: 22536906 Free PMC article.
-
TRStalker: an efficient heuristic for finding fuzzy tandem repeats.Bioinformatics. 2010 Jun 15;26(12):i358-66. doi: 10.1093/bioinformatics/btq209. Bioinformatics. 2010. PMID: 20529928 Free PMC article.
-
Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm.Nucleic Acids Res. 2013 Jan 7;41(1):e17. doi: 10.1093/nar/gks721. Epub 2012 Sep 12. Nucleic Acids Res. 2013. PMID: 22977183 Free PMC article.
-
Dot2dot: accurate whole-genome tandem repeats discovery.Bioinformatics. 2019 Mar 15;35(6):914-922. doi: 10.1093/bioinformatics/bty747. Bioinformatics. 2019. PMID: 30165507 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources