Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Jun 10:6:145.
doi: 10.1186/1471-2105-6-145.

Satellog: a database for the identification and prioritization of satellite repeats in disease association studies

Affiliations

Satellog: a database for the identification and prioritization of satellite repeats in disease association studies

Perseus I Missirlis et al. BMC Bioinformatics. .

Abstract

Background: To date, 35 human diseases, some of which also exhibit anticipation, have been associated with unstable repeats. Anticipation has been reported in a number of diseases in which repeat expansion may have a role in etiology. Despite the growing importance of unstable repeats in disease, currently no resource exists for the prioritization of repeats. Here we present Satellog, a database that catalogs all pure 1-16 repeat unit satellite repeats in the human genome along with supplementary data. Satellog analyzes each pure repeat in UniGene clusters for evidence of repeat polymorphism.

Results: A total of 5,546 such repeats were identified, providing the first indication of many novel polymorphic sites in the genome. Overall, polymorphic repeats were over-represented within 3'-UTR sequence relative to 5'-UTR and coding sequence. Interestingly, we observed that repeat polymorphism within coding sequence is restricted to trinucleotide repeats whereas UTR sequence tolerated a wider range of repeat period polymorphisms. For each pure repeat we also calculate its repeat length percentile rank, its location either within or adjacent to EnsEMBL genes, and its expression profile in normal tissues according to the GeneNote database.

Conclusion: Satellog provides the ability to dynamically prioritize repeats based on any of their characteristics (i.e. repeat unit, class, period, length, repeat length percentile rank, genomic co-ordinates), polymorphism profile within UniGene, proximity to or presence within gene regions (i.e. cds, UTR, 15 kb upstream etc.), metadata of the genes they are detected within and gene expression profiles within normal human tissues. Unstable repeats associated with 31 diseases were analyzed in Satellog to evaluate their common repeat properties. The utility of Satellog was highlighted by prioritizing repeats for Huntington's disease and schizophrenia. Satellog is available online at http://satellog.bcgsc.ca.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Genome-wide repeat lengths of disease-associated repeat classes. Genomic distribution of repeat lengths of all repeat classes associated with disease.
Figure 2
Figure 2
Boxplot comparison of polymorphic repeats from coding, 5'-UTR and 3'-UTR sequence. Median standard deviations (line through box) of all polymorphic repeats detected in coding, 5'-UTR, and 3'-UTR sequence. After controlling for sampling bias, coding and 5'-UTR standard deviations did not significantly differ from each other, but did significantly differ from 3'-UTR repeats implying that the 3'-UTR tolerates larger, more expanded repeats (P < 0.001).
Figure 3
Figure 3
Counts of unstable non-coding repeats at increasing instability cut-offs. Repeat period distribution of polymorphic non-coding repeats at increasing standard deviation (sd) cut-offs.
Figure 4
Figure 4
Counts of unstable coding repeats at increasing instability cut-offs. Repeat period distribution of polymorphic coding repeats at increasing standard deviation (sd) cut-offs.
Figure 5
Figure 5
Candidate repeats within Huntington's disease linkage region 4p16.3. Sample output from Satellog summarizing candidate repeats within the 4p16.3 Huntington's disease linkage region. Coding CAG-type repeats from chr4:1-4,600,000 were selected along with their peptide sequence, HUGO names and ensembl gene IDs. The repeat encoding 19 glutamines has been associated with Huntington disease progression.
Figure 6
Figure 6
repeatalyzer.pl flowchart. Flowchart outlining how repeatalyzer.pl populates the Satellog database.

Similar articles

Cited by

References

    1. Harper PS, Harley HG, Reardon W, Shaw DJ. Anticipation in myotonic dystrophy: new light on an old problem. Am J Hum Genet. 1992;51:10–16. - PMC - PubMed
    1. Verkerk AJ, Pieretti M, Sutcliffe JS, Fu YH, Kuhl DP, Pizzuti A, Reiner O, Richards S, Victoria MF, Zhang FP, et al. Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell. 1991;65:905–914. doi: 10.1016/0092-8674(91)90397-H. - DOI - PubMed
    1. Kremer EJ, Pritchard M, Lynch M, Yu S, Holman K, Baker E, Warren ST, Schlessinger D, Sutherland GR, Richards RI. Mapping of DNA instability at the fragile X to a trinucleotide repeat sequence p(CCG)n. Science. 1991;252:1711–1714. - PubMed
    1. La Spada AR, Wilson EM, Lubahn DB, Harding AE, Fischbeck KH. Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy. Nature. 1991;352:77–79. doi: 10.1038/352077a0. - DOI - PubMed
    1. Cleary JD, Pearson CE. The contribution of cis-elements to disease-associated repeat instability: clinical and experimental evidence. Cytogenet Genome Res. 2003;100:25–55. doi: 10.1159/000072837. - DOI - PubMed

MeSH terms

Substances