Statistical approaches to detecting and analyzing tandem repeats in genomic sequences
- PMID: 25853125
- PMCID: PMC4362331
- DOI: 10.3389/fbioe.2015.00031
Statistical approaches to detecting and analyzing tandem repeats in genomic sequences
Abstract
Tandem repeats (TRs) are frequently observed in genomes across all domains of life. Evidence suggests that some TRs are crucial for proteins with fundamental biological functions and can be associated with virulence, resistance, and infectious/neurodegenerative diseases. Genome-scale systematic studies of TRs have the potential to unveil core mechanisms governing TR evolution and TR roles in shaping genomes. However, TR-related studies are often non-trivial due to heterogeneous and sometimes fast evolving TR regions. In this review, we discuss these intricacies and their consequences. We present our recent contributions to computational and statistical approaches for TR significance testing, sequence profile-based TR annotation, TR-aware sequence alignment, phylogenetic analyses of TR unit number and order, and TR benchmarks. Importantly, all these methods explicitly rely on the evolutionary definition of a tandem repeat as a sequence of adjacent repeat units stemming from a common ancestor. The discussed work has a focus on protein TRs, yet is generally applicable to nucleic acid TRs, sharing similar features.
Keywords: molecular evolution; protein domain; sequence profile model; tandem repeat annotation; tandem repeats.
Figures


Similar articles
-
Deep conservation of human protein tandem repeats within the eukaryotes.Mol Biol Evol. 2014 May;31(5):1132-48. doi: 10.1093/molbev/msu062. Epub 2014 Feb 3. Mol Biol Evol. 2014. PMID: 24497029 Free PMC article.
-
The evolution and function of protein tandem repeats in plants.New Phytol. 2015 Apr;206(1):397-410. doi: 10.1111/nph.13184. Epub 2014 Nov 24. New Phytol. 2015. PMID: 25420631
-
Genome-wide analysis of tandem repeats in Daphnia pulex--a comparative approach.BMC Genomics. 2010 Apr 30;11:277. doi: 10.1186/1471-2164-11-277. BMC Genomics. 2010. PMID: 20433735 Free PMC article.
-
Advancing genomic technologies and clinical awareness accelerates discovery of disease-associated tandem repeat sequences.Genome Res. 2022 Jan;32(1):1-27. doi: 10.1101/gr.269530.120. Epub 2021 Dec 29. Genome Res. 2022. PMID: 34965938 Free PMC article. Review.
-
In search of the boundary between repetitive and non-repetitive protein sequences.Biochem Soc Trans. 2015 Oct;43(5):807-11. doi: 10.1042/BST20150073. Biochem Soc Trans. 2015. PMID: 26517886 Review.
Cited by
-
Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases.Nucleic Acids Res. 2019 Dec 2;47(21):10994-11006. doi: 10.1093/nar/gkz841. Nucleic Acids Res. 2019. PMID: 31584084 Free PMC article. Review.
-
Mutation and selection processes regulating short tandem repeats give rise to genetic and phenotypic diversity across species.J Evol Biol. 2023 Feb;36(2):321-336. doi: 10.1111/jeb.14106. Epub 2022 Oct 26. J Evol Biol. 2023. PMID: 36289560 Free PMC article. Review.
-
A New Census of Protein Tandem Repeats and Their Relationship with Intrinsic Disorder.Genes (Basel). 2020 Apr 9;11(4):407. doi: 10.3390/genes11040407. Genes (Basel). 2020. PMID: 32283633 Free PMC article.
-
Tandem Repeats in Proteins: Prediction Algorithms and Biological Role.Front Bioeng Biotechnol. 2015 Sep 24;3:143. doi: 10.3389/fbioe.2015.00143. eCollection 2015. Front Bioeng Biotechnol. 2015. PMID: 26442257 Free PMC article. Review.
-
Accuracy of short tandem repeats genotyping tools in whole exome sequencing data.F1000Res. 2020 Mar 23;9:200. doi: 10.12688/f1000research.22639.1. eCollection 2020. F1000Res. 2020. PMID: 32665844 Free PMC article.
References
-
- Benson G., Dong L. (1999). Reconstructing the duplication history of a tandem repeat. Proc. Int. Conf. Intell. Syst. Mol. Biol. 44–53. - PubMed
Publication types
LinkOut - more resources
Full Text Sources
Other Literature Sources