. 2001;8(1):1-18.

doi: 10.1089/106652701300099038.

An algorithm for approximate tandem repeats

G M Landau¹, J P Schmidt, D Sokol

Affiliations

PMID: 11339903
DOI: 10.1089/106652701300099038

An algorithm for approximate tandem repeats

G M Landau et al. J Comput Biol. 2001.

. 2001;8(1):1-18.

doi: 10.1089/106652701300099038.

Authors

G M Landau¹, J P Schmidt, D Sokol

Affiliation

¹ Department of Computer Science, Haifa University, Haifa 31905, Israel. landau@poly.edu

PMID: 11339903
DOI: 10.1089/106652701300099038

Abstract

A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g., abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g., abcdaacd. In this paper we consider two criterions of similarity: the Hamming distance (k mismatches) and the edit distance (k differences). For a string S of length n and an integer k our algorithm reports all locally optimal approximate repeats, r = umacro û, for which the Hamming distance of umacro and û is at most k, in O(nk log (n/k)) time, or all those for which the edit distance of umacro and û is at most k, in O(nk log k log (n/k)) time. This paper concentrates on a more general type of repeat called multiple tandem repeats. A multiple tandem repeat in a sequence S is a (periodic) substring r of S of the form r = u(a)u', where u is a prefix of r and u' is a prefix of u. An approximate multiple tandem repeat is a multiple repeat with errors; the repeated subsequences are similar but not identical. We precisely define approximate multiple repeats, and present an algorithm that finds all repeats that concur with our definition. The time complexity of the algorithm, when searching for repeats with up to k errors in a string S of length n, is O(nka log (n/k)) where a is the maximum number of periods in any reported repeat. We present some experimental results concerning the performance and sensitivity of our algorithm. The problem of finding repeats within a string is a computational problem with important applications in the field of molecular biology. Both exact and inexact repeats occur frequently in the genome, and certain repeats occurring in the genome are known to be related to diseases in the human.

PubMed Disclaimer

Cited by

Streamlining of Simple Sequence Repeat Data Mining Methodologies and Pipelines for Crop Scanning.
Geethanjali S, Kadirvel P, Anumalla M, Hemanth Sadhana N, Annamalai A, Ali J. Geethanjali S, et al. Plants (Basel). 2024 Sep 19;13(18):2619. doi: 10.3390/plants13182619. Plants (Basel). 2024. PMID: 39339594 Free PMC article. Review.
XSTREAM: a practical algorithm for identification and architecture modeling of tandem repeats in protein sequences.
Newman AM, Cooper JB. Newman AM, et al. BMC Bioinformatics. 2007 Oct 11;8:382. doi: 10.1186/1471-2105-8-382. BMC Bioinformatics. 2007. PMID: 17931424 Free PMC article.
NTRFinder: a software tool to find nested tandem repeats.
Matroud AA, Hendy MD, Tuffley CP. Matroud AA, et al. Nucleic Acids Res. 2012 Feb;40(3):e17. doi: 10.1093/nar/gkr1070. Epub 2011 Nov 25. Nucleic Acids Res. 2012. PMID: 22121222 Free PMC article.
mreps: Efficient and flexible detection of tandem repeats in DNA.
Kolpakov R, Bana G, Kucherov G. Kolpakov R, et al. Nucleic Acids Res. 2003 Jul 1;31(13):3672-8. doi: 10.1093/nar/gkg617. Nucleic Acids Res. 2003. PMID: 12824391 Free PMC article.
Consensus higher order repeats and frequency of string distributions in human genome.
Paar V, Basar I, Rosandić M, Gluncić M. Paar V, et al. Curr Genomics. 2007 Apr;8(2):93-111. doi: 10.2174/138920207780368169. Curr Genomics. 2007. PMID: 18660848 Free PMC article.

See all "Cited by" articles

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- Atypon
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An algorithm for approximate tandem repeats

Affiliation

An algorithm for approximate tandem repeats

Authors

Affiliation

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources