Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2007 Dec;17(12):1787-96.
doi: 10.1101/gr.6554007. Epub 2007 Oct 31.

Sequence-based estimation of minisatellite and microsatellite repeat variability

Affiliations
Comparative Study

Sequence-based estimation of minisatellite and microsatellite repeat variability

Matthieu Legendre et al. Genome Res. 2007 Dec.

Abstract

Variable tandem repeats are frequently used for genetic mapping, genotyping, and forensics studies. Moreover, variation in some repeats underlies rapidly evolving traits or certain diseases. However, mutation rates vary greatly from repeat to repeat, and as a consequence, not all tandem repeats are suitable genetic markers or interesting unstable genetic modules. We developed a model, "SERV," that predicts the variability of a broad range of tandem repeats in a wide range of organisms. The nonlinear model uses three basic characteristics of the repeat (number of repeated units, unit length, and purity) to produce a numeric "VARscore" that correlates with repeat variability. SERV was experimentally validated using a large set of different artificial repeats located in the Saccharomyces cerevisiae URA3 gene. Further in silico analysis shows that SERV outperforms existing models and accurately predicts repeat variability in bacteria and eukaryotes, including plants and humans. Using SERV, we demonstrate significant enrichment of variable repeats within human genes involved in transcriptional regulation, chromatin remodeling, morphogenesis, and neurogenesis. Moreover, SERV allows identification of known and candidate genes involved in repeat-based diseases. In addition, we demonstrate the use of SERV for the selection and comparison of suitable variable repeats for genotyping and forensic purposes. Our analysis indicates that tandem repeats used for genotyping should have a VARscore between 1 and 3. SERV is publicly available from http://hulsweb1.cgr.harvard.edu/SERV/.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
VARscore correlates with repeat mutation rates. (A) To evaluate the correlation between the VARscore and experimentally determined mutation rates, a series of 30 different artificial repeats was inserted right behind the START codon of the genomic URA3 gene of a haploid S. cerevisiae S288C yeast strain. Three classes of strains were constructed: (1) a series of “CA” dinucleotide repeats with varying number of units; (2) a series of CA repeats with a constant number of units, but varying repeat purity; (3) a series of strains with a 10-mer and 20-mer unit length and varying number of units. (B) Since the number of nucleotides in each repeat is not a multitude of three, changes in the number of repeats lead to shifts in the URA3 reading frame, so that some strains will be Ura+, and others Ura, depending on the number of repeat units they contain. Moreover, because of the instability of tandem repeats, the number of repeats will change in a fraction of each mitotic division, resulting in frequent shifts between Ura+ and URA phenotypes. This can be demonstrated by growing cells in either SC–Ura or 5-FOA medium, which selects for Ura+ and Urastrains, respectively (see Methods for details). (C) Plotting the mutation rates in the various repeat classes shows an exponential increase in mutation events with increasing unit number and purity. (D) Plotting VARscores for each repeat against their experimental mutation rates shows the correlation between VARscore and mutation rates, indicating that VARscores can be used as a rough estimation of mutation rate.
Figure 2.
Figure 2.
VARscore as a benchmarking tool for genotypic markers. All tandem repeats in the P. vivax genome were plotted according to their VARscore. The circles represent the VARscores of the markers used in two independent genotyping studies. The top row are the score for the markers used by Leclerc et al. (2004) who found very little variability in these markers, except for one marker, the one with the highest VARscore (far right point). The markers used by Imwong et al. (2006) (bottom row) have significantly higher VARscores, which agrees with the observed variability for these markers.

References

    1. Al-Shahrour F., Minguez P., Vaquerizas J.M., Conde L., Dopazo J., Minguez P., Vaquerizas J.M., Conde L., Dopazo J., Vaquerizas J.M., Conde L., Dopazo J., Conde L., Dopazo J., Dopazo J. BABELOMICS: A suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments. Nucleic Acids Res. 2005;33:W460–W464. doi: 10.1093/nar/gki456. - DOI - PMC - PubMed
    1. Baldus S.E., Engelmann K., Hanisch F.G., Engelmann K., Hanisch F.G., Hanisch F.G. MUC1 and the MUCs: A family of human mucins with impact in cancer biology. Crit. Rev. Clin. Lab. Sci. 2004;41:189–231. - PubMed
    1. Becker K.G., Barnes K.C., Bright T.J., Wang S.A., Barnes K.C., Bright T.J., Wang S.A., Bright T.J., Wang S.A., Wang S.A. The genetic association database. Nat. Genet. 2004;36:431–432. - PubMed
    1. Benjamini Y., Hochberg Y., Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B. 1995;57:963–971.
    1. Benson G. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–580. - PMC - PubMed

Publication types

LinkOut - more resources