Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2010 Dec 21;107(51):22032-7.
doi: 10.1073/pnas.1009526107. Epub 2010 Dec 3.

Whole-genome sequencing and intensive analysis of the undomesticated soybean (Glycine soja Sieb. and Zucc.) genome

Affiliations
Review

Whole-genome sequencing and intensive analysis of the undomesticated soybean (Glycine soja Sieb. and Zucc.) genome

Moon Young Kim et al. Proc Natl Acad Sci U S A. .

Abstract

The genome of soybean (Glycine max), a commercially important crop, has recently been sequenced and is one of six crop species to have been sequenced. Here we report the genome sequence of G. soja, the undomesticated ancestor of G. max (in particular, G. soja var. IT182932). The 48.8-Gb Illumina Genome Analyzer (Illumina-GA) short DNA reads were aligned to the G. max reference genome and a consensus was determined for G. soja. This consensus sequence spanned 915.4 Mb, representing a coverage of 97.65% of the G. max published genome sequence and an average mapping depth of 43-fold. The nucleotide sequence of the G. soja genome, which contains 2.5 Mb of substituted bases and 406 kb of small insertions/deletions relative to G. max, is ∼0.31% different from that of G. max. In addition to the mapped 915.4-Mb consensus sequence, 32.4 Mb of large deletions and 8.3 Mb of novel sequence contigs in the G. soja genome were also detected. Nucleotide variants of G. soja versus G. max confirmed by Roche Genome Sequencer FLX sequencing showed a 99.99% concordance in single-nucleotide polymorphism and a 98.82% agreement in insertion/deletion calls on Illumina-GA reads. Data presented in this study suggest that the G. soja/G. max complex may be at least 0.27 million y old, appearing before the relatively recent event of domestication (6,000∼9,000 y ago). This suggests that soybean domestication is complicated and that more in-depth study of population genetics is needed. In any case, genome comparison of domesticated and undomesticated forms of soybean can facilitate its improvement.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Prediction of duplicated regions of G. soja using read coverage and validation of large deletions by GS-FLX reads. (A) Identification of duplicated regions on chromosome 1 by higher-than-expected coverage compared with expected coverage. Black and red lines refer to expected and observed coverages, respectively, across the given genome region. Regions with copy numbers 1, 2, 3, and 4 displayed by shaded boxes have coverage one-, two-, three-, and fourfold higher than the expected coverage, respectively. Regions with more than one copy are predicted to be duplicated in the soybean genome. (B) Proportion of duplicated regions in the whole G. soja genome by copy number. Numbers at each data point represent actual nucleotide lengths of the G. soja genome.
Fig. 2.
Fig. 2.
Validation of large deletions by GS-FLX reads. An example of a 634-bp deletion region in G. soja validated by GS-FLX reads (Gm02, 10126582∼10127216). The first and second tracks indicate reads mapped to sequences flanking the deleted region from Illumina-GA and GS-FLX reads, respectively. The third and fourth tracks represent regions of the G. soja and G. max genomes, respectively.
Fig. 3.
Fig. 3.
Distribution of sequence variation on Chr 1 of G. soja. (A) Black and red lines indicate total SNPs and number of indels, respectively. Gene numbers and numbers of nonsynonymous SNPs on Chr 1 are shown as corresponding colored bars. The gray area represents mapping depth regions greater than 100 and the black line indicates the number of repeats. To fit the lines and bars in one graph using a binning unit of 1 Mb on the x axis, the SNP number was scaled to 1/10 and the repeat size was scaled to 1/50. (B) Effect of mapping depth (to reference genome) on genome coverage and SNP detection. Coverage indicates the number of short reads matched to the reference. Numbers of detected SNPs according to mapping depth are indicated by red lines.
Fig. 4.
Fig. 4.
Soybean domestication history. G. max is generally believed to have been domesticated from its wild relative, G. soja, 6,000∼9,000 y ago. The G. soja/G. max complex diverged from a common ancestor at 0.27 Mya. Divergence between G. soja and G. max thus predated domestication, indicating that cultivated soybean was domesticated from a preexisting G. soja/G. max complex.

Comment in

Similar articles

Cited by

References

    1. Carter TE, Jr., Nelson R, Sneller CH, Cui Z. Genetic diversity in soybean. In: Boerma HR, Specht JE, editors. Soybeans: Improvement, Production and Uses. Madison, WI: Am Soc Agron; 2004. pp. 303–416.
    1. Liu B, et al. QTL mapping of domestication-related traits in soybean (Glycine max) Ann Bot (Lond) 2007;100:1027–1038. - PMC - PubMed
    1. Zhang WK, et al. QTL mapping of ten agronomic traits on the soybean (Glycine max L. Merr.) genetic map and their association with EST markers. Theor Appl Genet. 2004;108:1131–1139. - PubMed
    1. Kang S-T, et al. Population-specific QTLs and their different epistatic interactions for pod dehiscence in soybean (Glycine max (L.) Merr.) Euphytica. 2009;166:15–24.
    1. Sasaki T, Antonio BA. Plant genomics: Sorghum in sequence. Nature. 2009;457:547–548. - PubMed

Publication types

MeSH terms

LinkOut - more resources