Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 Mar 17:3:31.
doi: 10.3389/fbioe.2015.00031. eCollection 2015.

Statistical approaches to detecting and analyzing tandem repeats in genomic sequences

Affiliations
Review

Statistical approaches to detecting and analyzing tandem repeats in genomic sequences

Maria Anisimova et al. Front Bioeng Biotechnol. .

Abstract

Tandem repeats (TRs) are frequently observed in genomes across all domains of life. Evidence suggests that some TRs are crucial for proteins with fundamental biological functions and can be associated with virulence, resistance, and infectious/neurodegenerative diseases. Genome-scale systematic studies of TRs have the potential to unveil core mechanisms governing TR evolution and TR roles in shaping genomes. However, TR-related studies are often non-trivial due to heterogeneous and sometimes fast evolving TR regions. In this review, we discuss these intricacies and their consequences. We present our recent contributions to computational and statistical approaches for TR significance testing, sequence profile-based TR annotation, TR-aware sequence alignment, phylogenetic analyses of TR unit number and order, and TR benchmarks. Importantly, all these methods explicitly rely on the evolutionary definition of a tandem repeat as a sequence of adjacent repeat units stemming from a common ancestor. The discussed work has a focus on protein TRs, yet is generally applicable to nucleic acid TRs, sharing similar features.

Keywords: molecular evolution; protein domain; sequence profile model; tandem repeat annotation; tandem repeats.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Tandem repeats in genomic sequences. (A) An example TR with three units and the corresponding MSA of its units. (B) Different parts of a TR motif (R = right and L = left) have different histories after a single duplication with shifted TR units. Shown are these duplication histories as phylogenies of the right and left parts of the TR motif. (C) Five scenarios of overlapping and non-overlapping TR annotations.
Figure 2
Figure 2
Overview of a generic TR annotation workflow.

Similar articles

Cited by

References

    1. Benson G., Dong L. (1999). Reconstructing the duplication history of a tandem repeat. Proc. Int. Conf. Intell. Syst. Mol. Biol. 44–53. - PubMed
    1. Bucher P., Karplus K., Moeri N., Hofmann K. (1996). A flexible motif search technique based on generalized profiles. Comput. Chem. 20, 3–23.10.1016/S0097-8485(96)80003-9 - DOI - PubMed
    1. Dalquen D. A., Anisimova M., Gonnet G. H., Dessimoz C. (2012). ALF – a simulation framework for genome evolution. Mol. Biol. Evol. 29, 1115–1123.10.1093/molbev/msr268 - DOI - PMC - PubMed
    1. Di Domenico T., Potenza E., Walsh I., Gonzalo Parra R., Giollo M., Minervini G., et al. (2014). RepeatsDB: a database of tandem repeat protein structures. Nucleic Acids Res. 42, D352–D357.10.1093/nar/gkt1175 - DOI - PMC - PubMed
    1. Eddy S. R. (2011). Accelerated profile HMM searches. PLoS Comput. Biol. 7:e1002195.10.1371/journal.pcbi.1002195 - DOI - PMC - PubMed

LinkOut - more resources