Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Oct 15:11:569.
doi: 10.1186/1471-2164-11-569.

Genome-wide characterization of simple sequence repeats in cucumber (Cucumis sativus L.)

Affiliations

Genome-wide characterization of simple sequence repeats in cucumber (Cucumis sativus L.)

Pablo F Cavagnaro et al. BMC Genomics. .

Abstract

Background: Cucumber, Cucumis sativus L. is an important vegetable crop worldwide. Until very recently, cucumber genetic and genomic resources, especially molecular markers, have been very limited, impeding progress of cucumber breeding efforts. Microsatellites are short tandemly repeated DNA sequences, which are frequently favored as genetic markers due to their high level of polymorphism and codominant inheritance. Data from previously characterized genomes has shown that these repeats vary in frequency, motif sequence, and genomic location across taxa. During the last year, the genomes of two cucumber genotypes were sequenced including the Chinese fresh market type inbred line '9930' and the North American pickling type inbred line 'Gy14'. These sequences provide a powerful tool for developing markers in a large scale. In this study, we surveyed and characterized the distribution and frequency of perfect microsatellites in 203 Mbp assembled Gy14 DNA sequences, representing 55% of its nuclear genome, and in cucumber EST sequences. Similar analyses were performed in genomic and EST data from seven other plant species, and the results were compared with those of cucumber.

Results: A total of 112,073 perfect repeats were detected in the Gy14 cucumber genome sequence, accounting for 0.9% of the assembled Gy14 genome, with an overall density of 551.9 SSRs/Mbp. While tetranucleotides were the most frequent microsatellites in genomic DNA sequence, dinucleotide repeats, which had more repeat units than any other SSR type, had the highest cumulative sequence length. Coding regions (ESTs) of the cucumber genome had fewer microsatellites compared to its genomic sequence, with trinucleotides predominating in EST sequences. AAG was the most frequent repeat in cucumber ESTs. Overall, AT-rich motifs prevailed in both genomic and EST data. Compared to the other species examined, cucumber genomic sequence had the highest density of SSRs (although comparable to the density of poplar, grapevine and rice), and was richest in AT dinucleotides. Using an electronic PCR strategy, we investigated the polymorphism between 9930 and Gy14 at 1,006 SSR loci, and found unexpectedly high degree of polymorphism (48.3%) between the two genotypes. The level of polymorphism seems to be positively associated with the number of repeat units in the microsatellite. The in silico PCR results were validated empirically in 660 of the 1,006 SSR loci. In addition, primer sequences for more than 83,000 newly-discovered cucumber microsatellites, and their exact positions in the Gy14 genome assembly were made publicly available.

Conclusions: The cucumber genome is rich in microsatellites; AT and AAG are the most abundant repeat motifs in genomic and EST sequences of cucumber, respectively. Considering all the species investigated, some commonalities were noted, especially within the monocot and dicot groups, although the distribution of motifs and the frequency of certain repeats were characteristic of the species examined. The large number of SSR markers developed from this study should be a significant contribution to the cucurbit research community.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Relative frequency (%) of SSR types, by number of repeats, in the cucumber genome. The graph was based on a total of N = 112,073 SSRs detected in 203 Mbp non-redundant genomic DNA sequence of the Gy14 genome.
Figure 2
Figure 2
Distribution of different di- and trinucleotide repeats in genomic and EST sequences of cucumber and selected seven other plant species. Frequency values were expressed as number of repeats per million base pairs of sequence. Species were arranged in order to their phylogenetic relationship with cucumber. All species considered, repeats of AT (T = 4.53, P = 0.0021), AC (T = 1.34, P = 0.018), and AAT (T = 2.68, P = 0.028) were significantly more frequent in genomic - than in EST data, whereas ACT (T = 4.25, P < 0.001) and ACC (T = 7.09, P < 0.001) were more abundant in EST sequence. Detailed information on frequencies of individual di- and trinucleotide repeat motifs was provided in Additional file 1 (supplement Tables S2a and S3a).
Figure 3
Figure 3
Distribution of tetranucleotide repeats in genomic and EST sequences of cucumber and seven other plant species. Frequency values are expressed as number of repeats per million base pairs of sequence. Species are arranged in order to their phylogenetic relationship with cucumber. All species considered, repeats of AAAT (T = 3.46, P = 0.009), AATT (T = 3.90, P = 0.005), AAAC (T = 2.71, P = 0.017), ACAT (T = 3.07, P = 0.008), AACT (T = 4.04, P = 0.001), AATC (T = 3.67, P = 0.003), and AATG (T = 2.25, P = 0.041) were significantly more frequent in genomic- than in EST sequences, whereas the AACC (T = 2.39, P = 0.031) tetranucleotides were more abundant in EST data. Refer to Additional file 1 (supplement Tables S2a and S3a) for details on frequencies of individual tetranucleotide motifs.
Figure 4
Figure 4
Frequency (A) and magnitude (B) of SSR expansions/contractions in cucumber genotypes Gy14 versus 9930 in relation to the number of repeat units of the microsatellite. The allele from cucumber inbred line 9930 was considered as the reference (or initial state) of the repeat. For each class of 'number of repeat units', the percentage (bars) and actual counts (italic numbers) of SSRs showing expansions or contractions in Gy14 are presented in panel A. The magnitude of the change in allele sizes (∆al) from 9930 and Gy14 is presented in panel B (error bars denote standard deviations). For compound SSRs, repeat units in each uninterrupted repeat were summed if the allele had at least 10 repeat units (e.g., (AT)8(GT)12 = 20 repeat units).
Figure 5
Figure 5
Relationship between amplicon length differences (∆al) between Gy14 and 9930, and the number of repeat units in dinucleotides repeats. AT-poor SSRs include AT:GC balanced repeats and GC-rich motifs. Loci with compound SSRs containing repeats other than dinucleotides were not considered. Loci with compound SSRs were excluded from the analysis unless all simple repeats within the compound SSR were dinucleotides, in which case the number of repeat units was the sum of the repeat units in each simple dinucleotide, for example, (AT)8(GT)12 = 20 repeat units.

Similar articles

Cited by

References

    1. Tautz D. Hypervariability of simple sequences as a general source for polymorphic DNA markers. Nucleic Acids Res. 1989;17:6463–6470. doi: 10.1093/nar/17.16.6463. - DOI - PMC - PubMed
    1. Gur-Arie R, Cohen CJ, Eitan Y, Shelef L, Hallerman EM, Kashi Y. Simple sequence repeats in Escherichia coli: abundance, distribution, composition, and polymorphism. Genome Res. 2000;10:62–71. - PMC - PubMed
    1. Toth G, Gaspari Z, Jurka J. Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res. 2000;10:967–981. doi: 10.1101/gr.10.7.967. - DOI - PMC - PubMed
    1. Morgante M, Olivieri AM. PCR-amplified microsatellites as markers in plant genetics. Plant J. 1993;3:175–182. doi: 10.1111/j.1365-313X.1993.tb00020.x. - DOI - PubMed
    1. Powell W, Morgante M, Andre C, Henfey M, Vogel J, Tingy S, Rafalsky A. The comparison of RFLP, RAPD, AFLP and SSR (microsatellite) markers for germplasm analysis. Mol Breed. 1996;2:225–238. doi: 10.1007/BF00564200. - DOI