Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression
- PMID: 16403795
- DOI: 10.1093/bioinformatics/btk032
Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression
Abstract
Motivation: Genomic sequences are highly redundant and contain many types of repetitive DNA. Fuzzy tandem repeats (FTRs) are of particular interest. They are found in regulatory regions of eukaryotic genes and are reported to interact with transcription factors. However, accurate assessment of FTR occurrences in different genome segments requires specific algorithm for efficient FTR identification and classification.
Results: We have obtained formulas for P-values of FTR occurrence and developed an FTR identification algorithm implemented in TandemSWAN software. Using TandemSWAN we compared the structure and the occurrence of FTRs with short period length (up to 24 bp) in coding and non-coding regions including UTRs, heterochromatic, intergenic and enhancer sequences of Drosophila melanogaster and Drosophila pseudoobscura. Tandems with period three and its multiples were found in coding segments, whereas FTRs with periods multiple of six are overrepresented in all non-coding segment. Periods equal to 5-7 and 11-14 were characteristic of the enhancer regions and other non-coding regions close to genes.
Availability: TandemSWAN web page, stand-alone version and documentation can be found at http://bioinform.genetika.ru/projects/swan/www/
Supplementary information: Supplementary data are available at Bioinformatics online.
Similar articles
-
Tandem repeats over the edit distance.Bioinformatics. 2007 Jan 15;23(2):e30-5. doi: 10.1093/bioinformatics/btl309. Bioinformatics. 2007. PMID: 17237101
-
Pattern locator: a new tool for finding local sequence patterns in genomic DNA sequences.Bioinformatics. 2006 Dec 15;22(24):3099-100. doi: 10.1093/bioinformatics/btl551. Epub 2006 Nov 8. Bioinformatics. 2006. PMID: 17095514
-
A genetic algorithm for the detection of new cis-regulatory modules in sets of coregulated genes.Bioinformatics. 2004 Aug 12;20(12):1974-6. doi: 10.1093/bioinformatics/bth179. Epub 2004 Mar 25. Bioinformatics. 2004. PMID: 15044242
-
HomologMiner: looking for homologous genomic groups in whole genomes.Bioinformatics. 2007 Apr 15;23(8):917-25. doi: 10.1093/bioinformatics/btm048. Epub 2007 Feb 18. Bioinformatics. 2007. PMID: 17308341
-
Finding approximate tandem repeats in genomic sequences.J Comput Biol. 2005 Sep;12(7):928-42. doi: 10.1089/cmb.2005.12.928. J Comput Biol. 2005. PMID: 16201913 Review.
Cited by
-
Drug-food Interactions in the Era of Molecular Big Data, Machine Intelligence, and Personalized Health.Recent Adv Food Nutr Agric. 2022 Nov 14;13(1):27-50. doi: 10.2174/2212798412666220620104809. Recent Adv Food Nutr Agric. 2022. PMID: 36173075 Free PMC article. Review.
-
Dot2dot: accurate whole-genome tandem repeats discovery.Bioinformatics. 2019 Mar 15;35(6):914-922. doi: 10.1093/bioinformatics/bty747. Bioinformatics. 2019. PMID: 30165507 Free PMC article.
-
Searching microsatellites in DNA sequences: approaches used and tools developed.Physiol Mol Biol Plants. 2012 Jan;18(1):11-9. doi: 10.1007/s12298-011-0098-y. Epub 2011 Dec 23. Physiol Mol Biol Plants. 2012. PMID: 23573036 Free PMC article.
-
Streamlining of Simple Sequence Repeat Data Mining Methodologies and Pipelines for Crop Scanning.Plants (Basel). 2024 Sep 19;13(18):2619. doi: 10.3390/plants13182619. Plants (Basel). 2024. PMID: 39339594 Free PMC article. Review.
-
Comparative analysis of regulatory motif discovery tools for transcription factor binding sites.Genomics Proteomics Bioinformatics. 2007 May;5(2):131-42. doi: 10.1016/S1672-0229(07)60023-0. Genomics Proteomics Bioinformatics. 2007. PMID: 17893078 Free PMC article.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Molecular Biology Databases