Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 29;36(6):2160-2175.
doi: 10.1093/plcell/koae062.

Integrative omics analysis elucidates the genetic basis underlying seed weight and oil content in soybean

Affiliations

Integrative omics analysis elucidates the genetic basis underlying seed weight and oil content in soybean

Xiaobo Yuan et al. Plant Cell. .

Abstract

Synergistic optimization of key agronomic traits by traditional breeding has dramatically enhanced crop productivity in the past decades. However, the genetic basis underlying coordinated regulation of yield- and quality-related traits remains poorly understood. Here, we dissected the genetic architectures of seed weight and oil content by combining genome-wide association studies (GWAS) and transcriptome-wide association studies (TWAS) using 421 soybean (Glycine max) accessions. We identified 26 and 33 genetic loci significantly associated with seed weight and oil content by GWAS, respectively, and detected 5,276 expression quantitative trait loci (eQTLs) regulating expression of 3,347 genes based on population transcriptomes. Interestingly, a gene module (IC79), regulated by two eQTL hotspots, exhibited significant correlation with both seed weigh and oil content. Twenty-two candidate causal genes for seed traits were further prioritized by TWAS, including Regulator of Weight and Oil of Seed 1 (GmRWOS1), which encodes a sodium pump protein. GmRWOS1 was verified to pleiotropically regulate seed weight and oil content by gene knockout and overexpression. Notably, allelic variations of GmRWOS1 were strongly selected during domestication of soybean. This study uncovers the genetic basis and network underlying regulation of seed weight and oil content in soybean and provides a valuable resource for improving soybean yield and quality by molecular breeding.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement. None declared.

Figures

Figure 1.
Figure 1.
GWAS of 100-seed weight and oil content in soybean. A) The geographic distribution of 421 soybean accessions. B) PCA plot of the first two principal components of soybean accessions. C) Manhattan plots of GWAS for 100-seed weight (bottom panel) and oil content (top panel). The horizontal solid lines indicate the significance threshold of GWAS (−log10(P) = 5.20). The yellow dots represent the lead SNPs after LD-based result clumping. The number in brackets indicates the distance between a gene and its closest lead SNP. The known QTLs related with 100-seed weight and oil content that overlap GWAS peaks are labeled using names assigned by SoyBase in black font. D) Heatmap describing the genetic correlation between 100-seed weight and oil content. The value in upper right square was calculated by bivariate GREML analysis, and the value in lower left square represents the correlation of variant effect (signed t-value). E) The relationship between MAF and effect size of significant variant (P < 6.24 × 10−6) for 100-seed weight (orange) and oil content (blue). MAF, minor allele frequency.
Figure 2.
Figure 2.
Genome-wide characterization of eQTLs. A) Dot plot displaying the associations of eQTLs and their regulated eGenes in 20 chromosomes. The color scale of each dot represents the significance (−log10(P)) of each eQTL–eGene pair. B) The fraction of local and distal eQTLs (top), and the percentage of eGenes regulated by different types of eQTLs (bottom). C) Split violin plot showing explained variance (r2) of local and distal eQTLs for expression of their regulated eGenes. The dashed lines indicate the distribution quantiles (0.25, 0.75). Three asterisks indicate a statistical significance level of P < 2.2e−16 (two-sided Wilcoxon rank sum test). D) More eQTLs in OCRs compared with random genomic regions which were represented by 1,000 times random permutation of OCR positions across the genome. The dashed line indicates 95% confidence interval values. E) Counts of eGenes regulated by different numbers of eQTLs. F) Number of eGenes regulated by distal eQTLs using 1-Mb windows across the genome. The horizontal dashed line indicates the threshold (15) of eGene numbers regulated by eQTL hotspot regions. The known genes and QTLs related to seed quality and yield are labeled in red and black font, respectively. G) Genetic network between eQTL hotspots and eGenes. H) Identification of selective hotspots during domestication and improvement. The vertical and horizontal dashed lines indicate the significant genome-wide threshold of selection signals for domestication (≥6.15) and improvement (≥2.68), respectively. Selective hotspots are marked in purple. WS, LS, and CS indicate wild, landrace, or improved soybeans, respectively.
Figure 3.
Figure 3.
Genetic network of module IC79 involved in regulation of seed weight and oil content. A) Identification of coexpression modules significantly associated with 100-seed weight and oil content. The first two rows of the heatmap represent correlations between modules and seed traits, and the last row indicates the fold enrichment of known trait-related genes. The thickness of red solid line indicates the enrichment of genes distally regulated by the eQTL hotspot in each module. The number of genes regulated by each eQTL hotspot is marked on each line. B) Correlation network of genes in module IC79. The genes related to seed storage proteins, seed weight, and lipid synthesis are labeled in the network. C) Overrepresented GO (black) and KEGG (green) terms for genes in three submodules. D) Correlation of genes in different submodules with 100-seed weight (right) and oil content (left). In each box plot, borders represent the first and third quartiles, center line denotes median, and whiskers extend to 1.5 times the interquartile range beyond the quartiles.
Figure 4.
Figure 4.
Identification of genes significantly associated with both seed weight and oil content. A) Scatter plot showing correlation of genes in submodule1 and submodule2 with 100-seed weight (y axis) and oil content (x axis). The genes significantly associated with both 100-seed weight and oil content are marked in blue (P < 0.01, Pearson correlation). B) Representative seeds of wild-type plants and Gmuspl1 mutants (CR1 and CR2). Scale bar, 1 cm. C) Seed oil content (upper) and 100-seed weight (bottom) in wild type and Gmuspl1 mutants. The data were shown as mean ± SD (n = 6). Different letters denote significant differences (P < 0.05) from two-tailed t-test. The value of each replicate is represented by a dot. D) Manhattan plot of GWAS using the expression pattern of IC79 as a phenotype. The horizontal solid line indicates the significance threshold of GWAS (−log10(P) = 6.11). The yellow dots represent the lead SNPs after LD-based result clumping. E) Enrichment of BZR, MYC2, and bZIP44 motif in the promoter sequences of genes in IC79.
Figure 5.
Figure 5.
Regulation of seed weight and oil content by GmRWOS1. A and B) The aggregation of superior alleles in the top and bottom 50 accessions with highest and lowest oil content (A) or 100-seed weight (B) in the soybean population. In each box plot (n = 50), borders represent the first and third quartiles, center line denotes median, and whiskers extend to 1.5 times the interquartile range beyond the quartiles. *** indicates P < 0.001 (Wilcoxon rank sum test). C) Local Manhattan plot (top) and linkage disequilibrium plot (bottom) for SNPs surrounding the peak on chromosome 12. Asterisks indicate positions of the two lead SNPs. D) Correlation of GmRWOS1 expression with 100-seed weight (right) and oil content (left). Each dot indicates one soybean accession. E) The representative seeds of wild type, Gmrwos1 mutant, and overexpression lines OX1/2. Scale bars, 1 cm. F and G) Seed oil content (F) and 100-seed weight (G) in wild type, Gmrwos1 mutant and overexpression lines OX1/2. The data were shown as mean ± SD (n = 8 or 10). The value of each replicate is represented by a dot. Different letters denote significant differences (P < 0.05) from two-tailed t-test.
Figure 6.
Figure 6.
Distribution and diversity analysis of GmRWOS1 alleles in soybean. A) The spectrum of two haplotypes (Hap1 and Hap2) for GmRWOS1 in 421 soybean accessions. The digits in brackets display the accession number with each haplotype. Vertical lines represent variants and the most significant four variants are marked in red. B) Differences of oil content, 100-seed weight and GmRWOS1 expression between individuals with different haplotypes. In each box plot, borders represent the first and third quartiles, center line denotes median, and whiskers extend to 1.5 times the interquartile range beyond the quartiles. The statistical significance was calculated by Wilcoxon rank sum test. C) Distribution of four haplotypes in 2,898 previously resequenced accessions and their proportions in wild soybeans, landraces, and cultivars. Inner to outer circles indicate wild soybeans, landraces and cultivars. D) The π ratio and FST values of flanking region at GmRWOS1 in wild soybeans, landraces, and cultivars. The horizontal dashed lines indicate the genome-wide thresholds of wild soybeans vs landraces and landraces vs cultivars (top 10%). GmRWOS1 is labeled with a red dot.

Similar articles

Cited by

References

    1. Ahmad MZ, Zhang Y, Zeng X, Li P, Wang X, Benedito VA, Zhao J. Isoflavone malonyl-CoA acyltransferase GmMaT2 is involved in nodulation of soybean by modifying synthesis and secretion of isoflavones. J Exp Bot. 2021:72(4):1349–1369. 10.1093/jxb/eraa511 - DOI - PubMed
    1. Allen GC, Flores-Vergara MA, Krasynanski S, Kumar S, Thompson WF. A modified protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide. Nat Protoc. 2006:1(5):2320–2325. 10.1038/nprot.2006.384 - DOI - PubMed
    1. Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009:37(Web Server):W202–W208. 10.1093/nar/gkp335 - DOI - PMC - PubMed
    1. Bastian M, Heymann S, Jacomy M. Gephi: an open source software for exploring and manipulating networks. Proc Int AAAI Conf Web Soc Media. 2009:3(1):361–362. 10.1609/icwsm.v3i1.13937 - DOI
    1. Baud S, Lepiniec L. Physiological and developmental regulation of seed oil production. Prog Lipid Res. 2010:49(3):235–249. 10.1016/j.plipres.2010.01.001 - DOI - PubMed