Large-scale identification of polymorphic microsatellites using an in silico approach

Jifeng Tang¹, Samantha J Baldwin, Jeanne Me Jacobs, C Gerard van der Linden, Roeland E Voorrips, Jack Am Leunissen, Herman van Eck, Ben Vosman

Affiliations

PMID: 18793407
PMCID: PMC2562394
DOI: 10.1186/1471-2105-9-374

Large-scale identification of polymorphic microsatellites using an in silico approach

Jifeng Tang et al. BMC Bioinformatics. 2008.

. 2008 Sep 15:9:374.

doi: 10.1186/1471-2105-9-374.

Authors

Jifeng Tang¹, Samantha J Baldwin, Jeanne Me Jacobs, C Gerard van der Linden, Roeland E Voorrips, Jack Am Leunissen, Herman van Eck, Ben Vosman

Affiliation

¹ Laboratory of Bioinformatics, Wageningen University, PO Box 8128, 6700 ET Wageningen, the Netherlands. jifeng.tang@gmail.com

PMID: 18793407
PMCID: PMC2562394
DOI: 10.1186/1471-2105-9-374

Abstract

Background: Simple Sequence Repeat (SSR) or microsatellite markers are valuable for genetic research. Experimental methods to develop SSR markers are laborious, time consuming and expensive. In silico approaches have become a practicable and relatively inexpensive alternative during the last decade, although testing putative SSR markers still is time consuming and expensive. In many species only a relatively small percentage of SSR markers turn out to be polymorphic. This is particularly true for markers derived from expressed sequence tags (ESTs). In EST databases a large redundancy of sequences is present, which may contain information on length-polymorphisms in the SSR they contain, and whether they have been derived from heterozygotes or from different genotypes. Up to now, although a number of programs have been developed to identify SSRs in EST sequences, no software can detect putatively polymorphic SSRs.

Results: We have developed PolySSR, a new pipeline to identify polymorphic SSRs rather than just SSRs. Sequence information is obtained from public EST databases derived from heterozygous individuals and/or at least two different genotypes. The pipeline includes PCR-primer design for the putatively polymorphic SSR markers, taking into account Single Nucleotide Polymorphisms (SNPs) in the flanking regions, thereby improving the success rate of the potential markers. A large number of polymorphic SSRs were identified using publicly available EST sequences of potato, tomato, rice, Arabidopsis, Brassica and chicken.The SSRs obtained were divided into long and short based on the number of times the motif was repeated. Surprisingly, the frequency of polymorphic SSRs was much higher in the short SSRs.

Conclusion: PolySSR is a very effective tool to identify polymorphic SSRs. Using PolySSR, several hundred putative markers were developed and stored in a searchable database. Validation experiments showed that almost all markers that were indicated as putatively polymorphic by polySSR were indeed polymorphic. This greatly improves the efficiency of marker development, especially in species where there are low levels of polymorphism, like tomato. When combined with the new sequencing technologies PolySSR will have a big impact on the development of polymorphic SSRs in any species.PolySSR and the polymorphic SSR marker database are available from http://www.bioinformatics.nl/tools/polyssr/.

PubMed Disclaimer

Figures

**Figure 1**
**An example of unreliable polymorphic SSRs**. Since the repeat chain in EST 3 and 4 does not extend to the end it is not clear whether these two ESTs represent a different (shorter) allele of the SSR or not. For that reason a minimum length for the flanking sequence used must be specified to reliably detect polymorphic SSRs.

**Figure 2**
Flowchart of the PolySSR pipeline.

**Figure 3**
**Flowchart of the PolySSR core program**. Two parameters used in step 2 are the degree of matching in a repeat motif and the degree of matching in a repeat chain; four parameters used in step 3 include two parameters from step 2, and plus the length of flanking sequences of repeats and the minimum repeat times for different length of repeat motifs; three parameters used in step 4 consist of two parameters used in step 2 and the minimum number of sequences per allele. * actions in steps 2, 3 and 4 all use the algorithm described in Figure 4 and in the Materials and Methods section.

**Figure 4**
**The flowchart used to identify perfect and imperfect repeat chains**. The parameter used in step 2 is the degree of matching in a repeat motif; the parameter used in step 3 is the degree of matching in a repeat.

See this image and copyright information in PMC

References

1. Powell W, Machray GC, Provan J. Polymorphism revealed by simple sequence repeats. Trends Plant Sc. 1996;1:215–222.
1. Jones CJ, Edwards KJ, Castaglione S, Winfield MO, Sala F, Wiel C van de, Bredemeijer G, Vosman B, Matthes M, Daly A, Brettschneider R, Bettini P, Buiatti M, Maestri E, Malcevschi A, Marmiroli N, Aert R, Volckaert G, Rueda J, Linacero R, Vazquez A, Karp A. Reproducibility testing of RAPD, AFLP and SSR markers in plants by a network of European laboratories. Mol Breed. 1997;3:381–390. doi: 10.1023/A:1009612517139. - DOI
1. Varshney RK, Graner A, Sorrells ME. Genic microsatellite markers in plants: features and applications. Trends in Biotechnology. 2005;23:1. doi: 10.1016/j.tibtech.2004.11.005. - DOI - PubMed
1. Tang JF, Gao LF, Cao YS, Jia JZ. Homologous analysis of SSR-ESTs and transferability of wheat SSR-EST markers across barley, rice and maize. Euphytica. 2006;151:87–93. doi: 10.1007/s10681-006-9131-6. - DOI
1. Slate J, Hale MC, Birkhead TR. Simple sequence repeats in zebra finch (Taeniopygia guttata) expressed sequence tags: a new resource for evolutionary genetic studies of passerines. BMC Genomics. 2007;8:52–64. doi: 10.1186/1471-2164-8-52. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Large-scale identification of polymorphic microsatellites using an in silico approach

Affiliation

Large-scale identification of polymorphic microsatellites using an in silico approach

Authors

Affiliation

Abstract

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Research Materials