. 2011;6(9):e24811.

doi: 10.1371/journal.pone.0024811. Epub 2011 Sep 20.

Rapid genotyping of soybean cultivars using high throughput sequencing

Kranthi Varala¹, Kankshita Swaminathan, Ying Li, Matthew E Hudson

Affiliations

PMID: 21949759
PMCID: PMC3176760
DOI: 10.1371/journal.pone.0024811

Rapid genotyping of soybean cultivars using high throughput sequencing

Kranthi Varala et al. PLoS One. 2011.

. 2011;6(9):e24811.

doi: 10.1371/journal.pone.0024811. Epub 2011 Sep 20.

Authors

Kranthi Varala¹, Kankshita Swaminathan, Ying Li, Matthew E Hudson

Affiliation

¹ Department of Crop Sciences, University of Illinois, Urbana-Champaign, Illinois, United States of America.

PMID: 21949759
PMCID: PMC3176760
DOI: 10.1371/journal.pone.0024811

Abstract

Soybean (Glycine max) breeding involves improving commercially grown varieties by introgressing important agronomic traits from poor yielding accessions and/or wild relatives of soybean while minimizing the associated yield drag. Molecular markers associated with these traits are instrumental in increasing the efficiency of producing such crosses and Single Nucleotide Polymorphisms (SNPs) are particularly well suited for this task, owing to high density in the non-genic regions and thus increased likelihood of finding a tightly linked marker to a given trait. A rapid method to develop SNP markers that can differentiate specific loci between any two parents in soybean is thus highly desirable. In this study we investigate such a protocol for developing SNP markers between multiple soybean accessions and the reference Williams 82 genome. To restrict sampling frequency reduced representation libraries (RRLs) of genomic DNA were generated by restriction digestion followed by library construction. We chose to sequence four accessions Dowling (PI 548663), Dwight (PI 597386), Komata (PI200492) and PI 594538A for their agronomic importance as well as Williams 82 as a control.MseI was chosen to digest genomic DNA based on predictions that it will cut sparingly in the mathematically defined high-copy-number regions of the genome. All RRLs were sequenced on the Illumina genome analyzer. Reads were aligned to the Glyma1 reference assembly and SNP calls made from the alignments. We identified from 4294 to 14550 SNPs between the four accessions and the Williams 82 reference. In addition a small number of SNPs (1142) were found by aligning Williams 82 reads to the reference assembly (Glyma1) suggesting limited genetic variation within the Williams 82 line. The SNP data allowed us to estimate genetic diversity between the four lines and Williams 82. Restriction digestion of soybean genomic DNA with MseI followed by high throughput sequencing provides a rapid and reproducible method for generating SNP markers.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: MH is an academic editor of PLoS ONE.

Figures

**Figure 1. Sequence coverage at tagged sites across varieties subjected to genotyping by sequencing.**
Average depth of coverage across all bases covered by resequencing reads is shown by blue columns with the value indicated by the vertical axis on the right. The variation in depth observed between libraries is a combination of variation in the amount of reads obtained in a sequencing run and the number of loci tagged by a read in that library. The total number of 35mers from the genome that were tagged by at least one read is shown by the green columns. The vertical axis on the left depicts the number of sites tagged (in tens of thousands).

**Figure 2. Pedigree of Dwight.**
The Dwight variety of soybean was produced by crossing Jack and an experimental line. Following the parentage back 3 generations reveals that the Williams line served as an ancestor on both sides of the cross and was used as recurrent parent to varying degrees. Numbers in parenthesis indicate the number of times a line was used as a recurrent parent. Based on the parentage, the proportion of the genome that is expected to be from Williams is indicated in Red. Lineage is depicted with the progenitors to the right.

**Figure 3. SNP loci genotyped across accessions.**
A. All high-confidence SNP loci identified from two accessions were pooled and filtered to retain only those loci which were genotyped in both. Loci were then classified based on the the presence of the SNP into 3 categories: 1. SNP was observed in both accessions when compared to the reference sequence, 2. SNP was observed in the first accession but not the second and 3. SNP was observed in the second accession but not the first. The sum of SNP loci in categories 2 and 3 is the number of loci detected as polymorphic between these two accessions. B. The number of loci genotyped with = 3 reads in 1–5 lines are shown.

formula image — **Figure 3. SNP loci genotyped across accessions.**
A. All high-confidence SNP loci identified from two accessions were pooled and filtered to retain only those loci which were genotyped in both. Loci were then classified based on the the presence of the SNP into 3 categories: 1. SNP was observed in both accessions when compared to the reference sequence, 2. SNP was observed in the first accession but not the second and 3. SNP was observed in the second accession but not the first. The sum of SNP loci in categories 2 and 3 is the number of loci detected as polymorphic between these two accessions. B. The number of loci genotyped with = 3 reads in 1–5 lines are shown.

**Figure 4. SNP loci shared between accessions.**
The number of high-confidence SNP loci shared between accessions is shown. A large proportion of SNP loci from any given accession seem unique to that accession. This unique portion is most likely an overestimate since the corresponding loci were simply not genotyped in the other accessions.

**Figure 5. SNPs polymorphic between each variety sequenced and the Glyma1 assembly of chromosome 3.**
High confidence SNPs called from 3 or more reads aligned against the Glyma1 reference assembly of Gm03 (Linkage group N) are shown. At the right edge of the image, the centromere is indicated by a red bar and % repetitive content of 100 Kb blocks (ranging from 0–100%) is plotted as a bezier curve. Note that repetitive sequences (predominant in the centromere) prevent unique mapping of sequence reads and thus show substantially reduced SNP density. Scale is in megabases (Mb) of physical distance. SNPs occurring in a one million basepair (MB) bin are dithered, left-to-right, along the X axis based on the position of SNP within that bin. The presence of a large number of SNPs and their non-random distribution on Gm03 for the Williams 82 data suggests that the Williams 82 line carries significant portions of the non-recurrent parent Kingwa. The distribution of SNPs in other lines shows a density proportional to sampling frequency and shared parentage, while the Williams 82 line shows higher diversity around the 5MB mark of Gm03 (arrow).

**Figure 6. SNPs between resequenced accessions and the Glyma1 assembly.**
High confidence SNPs called by aligning reads against the reference assembly are shown in concentric rings. The outer most ring depicts the known repeat elements as stacked blocks. The subsequent rings each depict the position of high-confidence SNPs from the lines Dowling, Dwight, Komata, PI 594538A and Williams 82 consecutively. The inner-most ring depicts SNPs identified from exome capture sequencing of two Williams 82 individuals by Haun et al. (Data kindly provided by Robert Stupar). Genomic regions rich in repeats, as shown by higher stacks of blocks on the outer-most ring, render themselves poorly to unambigous read alignment and hence SNP calling. Outside these regions SNPs are distributed evenly across the genome in all the accessions sequenced except for Williams 82. Data from our study and Haun et al concur on the regions of high heterogeneity restricted mostly to Gm03 but also on Gm07, Gm14, Gm15. Haun et al. data additionally identifies a heterogenous region at the start of Gm20 that was not observed in our data.

**Figure 7. Transposon family divergence.**
All reads were aligned to the soybean transposable element database (soyTEdb) and grouped based on the transposon family they match. The number of reads assigned to each family was normalized to the total number of reads from that library to allow comparison across lines. Abbreviations for soybean genotypes: Do = Dowling, Dw = Dwight, Ko = Komata, PI = PI 594538A, W = Williams 82. A) CACTA and Copia families show a significant expansion in the Dowling accession. B) Elements of the Gypsy family have substantially increased numbers in the Dwight genome and are increased to a lesser extent in the Williams 82 genome.

See this image and copyright information in PMC

References

1. Hyten DL, Song Q, Zhu Y, Choi I, Nelson RL, et al. Impacts of genetic bottlenecks on soybean genome diversity. Proceedings of the National Academy of Sciences. 2006;103:16666–16671. - PMC - PubMed
1. Bernard R, Cremeens C. Registration of ‘Williams 82’ soybean. Crop Science. 1988;28:1027–1028.
1. Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, et al. Genome sequence of the palaeopolyploid soybean. Nature. 2010;463:178–183. - PubMed
1. Keim P, Shoemaker RC, Palmer RG. Restriction fragment length polymor- phism diversity in soybean. TAG Theoretical and Applied Genetics. 1989;77:786–792. - PubMed
1. Williams JG, Kubelik AR, Livak KJ, Rafalski J, Tingey SV. DNA polymor- phisms amplified by arbitrary primers are useful as genetic markers. Nucleic Acids Research. 1990;18:6531–6535. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Rapid genotyping of soybean cultivars using high throughput sequencing

Affiliation

Rapid genotyping of soybean cultivars using high throughput sequencing

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous