Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb;19(2):324-334.
doi: 10.1111/pbi.13466. Epub 2020 Sep 14.

Soybean (Glycine max) Haplotype Map (GmHapMap): a universal resource for soybean translational and functional genomics

Affiliations

Soybean (Glycine max) Haplotype Map (GmHapMap): a universal resource for soybean translational and functional genomics

Davoud Torkamaneh et al. Plant Biotechnol J. 2021 Feb.

Abstract

Here, we describe a worldwide haplotype map for soybean (GmHapMap) constructed using whole-genome sequence data for 1007 Glycine max accessions and yielding 14.9 million variants as well as 4.3 M tag single-nucleotide polymorphisms (SNPs). When sampling random subsets of these accessions, the number of variants and tag SNPs plateaued beyond approximately 800 and 600 accessions, respectively. This suggests extensive coverage of diversity within the cultivated soybean. GmHapMap variants were imputed onto 21 618 previously genotyped accessions with up to 96% success for common alleles. A local association analysis was performed with the imputed data using markers located in a 1-Mb region known to contribute to seed oil content and enabled us to identify a candidate causal SNP residing in the NPC1 gene. We determined gene-centric haplotypes (407 867 GCHs) for the 55 589 genes and showed that such haplotypes can help to identify alleles that differ in the resulting phenotype. Finally, we predicted 18 031 putative loss-of-function (LOF) mutations in 10 662 genes and illustrated how such a resource can be used to explore gene function. The GmHapMap provides a unique worldwide resource for applied soybean genomics and breeding.

Keywords: genetic variants; haplotype; haplotype map; imputation; loss-of-function mutation; soybean; whole-genome sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1
Description of GmHapMap. (a) Geographical distribution of GmHapMap accessions. (b) Venn diagram representing the degree of overlap among variants called using the two collections of sequenced soybean accessions. (c) Population structure analysis using all SNPs representing six different subpopulations (K = 6) in the GmHapMap collection. (d) Distribution of genetic diversity among subpopulations of GmHapMap. [Colour figure can be viewed at wileyonlinelibrary.com]
Figure 2
Figure 2
(a) Average number of variants (pink) and tag SNPs (blue) detected in random subsets of N accessions (where n = 100, 200 etc.). This average was derived from subsampling 20 times. (b) Imputation accuracy as a function of allele frequency for 6 different scenarios; three different experimentally derived genotype datasets (SoySNP50K, GBS and GBS/SoySNP50K) and two reference panels (REF‐I and REF‐II). [Colour figure can be viewed at wileyonlinelibrary.com]
Figure 3
Figure 3
Description of GCHs characterized in the GmHapMap dataset. (a) Distribution of the number of genes that have a given number of predicted GCHs. (b) Distribution of the number of SNPs residing in a 10‐kb window in and around genes in soybean according to the number of gene‐centric haplotypes (GCHs) defined using HaplotypeMiner. (c) Distribution of the mean length of genes and gene‐centric haplotypes (GCHs) according to the number of GCHs defined by HaplotypeMiner. Haplotype length is defined as the distance between the two retained SNP markers that reside to one side and the other (relative to the middle of the gene) and are the furthest apart from one another. (d) Schematic representation of predicted GCHs for GmGIa. [Colour figure can be viewed at wileyonlinelibrary.com]
Figure 4
Figure 4
Phenotypic variation observed between accessions with (blue) and without (red) a predicted LOF mutation in four different genes. (a) FAD3A, a key gene for linolenic acid synthesis; (b) GmJ, a key gene of Long Juvenile trait; (c) GmGIa, a key gene controlling maturity; (d), KASIIa, a key gene in the oil biosynthesis pathway. In each case, the number of accessions sharing the same allele (and for which phenotypic data were at hand) is indicated. [Colour figure can be viewed at wileyonlinelibrary.com]

References

    1. Browning, B.L. and Browning, S.R. (2016) Genotype Imputation with Millions of Reference Samples. Am. J. Hum. Genet. 98, 116–126. - PMC - PubMed
    1. Bukowski, R. , Guo, X. , Lu, Y. , Zou, C. , He, B. , Rong, Z. , Wang, B. et al. (2018) Construction of the third‐generation Zea mays haplotype map. GigaScience, 7, gix134. - PMC - PubMed
    1. Chia, J.M. , Song, C. , Bradbury, P.J. , Costich, D. , de Leon, N., Doebley, J. , Elshire, R.J. et al. (2012) Maize HapMap2 identifies extant variation from a genome in flux. Nat. Genet. 44, 803–807. - PubMed
    1. Chung, W.H. , Jeong, N. , Kim, J. , Lee, W.K. , Lee, Y.G. , Lee, S.H. , Yoon, W. et al. (2014) Population structure and domestication revealed by high‐depth resequencing of Korean cultivated and wild soybean genomes. DNA Res. 21, 153–167. - PMC - PubMed
    1. Cingolani, P. , Platts, A. , Wang, L.L. , Coon, M. , Nguyen, T. , Wang, L. , Land, S.J. et al. (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso‐2; iso‐3. Fly, 6, 80–92. - PMC - PubMed

Publication types

LinkOut - more resources