Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 16:13:882732.
doi: 10.3389/fpls.2022.882732. eCollection 2022.

Identification of Candidate Genes and Genomic Selection for Seed Protein in Soybean Breeding Pipeline

Affiliations

Identification of Candidate Genes and Genomic Selection for Seed Protein in Soybean Breeding Pipeline

Jun Qin et al. Front Plant Sci. .

Abstract

Soybean is a primary meal protein for human consumption, poultry, and livestock feed. In this study, quantitative trait locus (QTL) controlling protein content was explored via genome-wide association studies (GWAS) and linkage mapping approaches based on 284 soybean accessions and 180 recombinant inbred lines (RILs), respectively, which were evaluated for protein content for 4 years. A total of 22 single nucleotide polymorphisms (SNPs) associated with protein content were detected using mixed linear model (MLM) and general linear model (GLM) methods in Tassel and 5 QTLs using Bayesian interval mapping (IM), single-trait multiple interval mapping (SMIM), single-trait composite interval mapping maximum likelihood estimation (SMLE), and single marker regression (SMR) models in Q-Gene and IciMapping. Major QTLs were detected on chromosomes 6 and 20 in both populations. The new QTL genomic region on chromosome 6 (Chr6_18844283-19315351) included 7 candidate genes and the Hap.X AA at the Chr6_19172961 position was associated with high protein content. Genomic selection (GS) of protein content was performed using Bayesian Lasso (BL) and ridge regression best linear unbiased prediction (rrBULP) based on all the SNPs and the SNPs significantly associated with protein content resulted from GWAS. The results showed that BL and rrBLUP performed similarly; GS accuracy was dependent on the SNP set and training population size. GS efficiency was higher for the SNPs derived from GWAS than random SNPs and reached a plateau when the number of markers was >2,000. The SNP markers identified in this study and other information were essential in establishing an efficient marker-assisted selection (MAS) and GS pipelines for improving soybean protein content.

Keywords: Glycine max; genome-wide association study; genomic selection; genotyping by sequencing; protein content; single nucleotide polymorphism.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
(A) QTL mapping of seed protein content in soybean chromosome 6 based on single-trait multiple IM (SMIM) in Qgene, (B) The QTL, qtl-chr6_prot was mapped on the combined map between physical distance and genetic position of the chromosome 6, where the x-axis shows physical distance (Mbp) and the y-axis shows the genetic position (cM).
FIGURE 2
FIGURE 2
Structure analysis: (A) delta K-values for different numbers of populations (K) from the STRUCTURE analysis, the x-axis shows different numbers of populations (K), the y-axis shows delta K-values for different numbers of subpopulations (K). (B) Classification of 284 accessions into four subpopulations using STRUCTURE version 2.3.4, where the x-axis shows accessions and the y-axis shows the probability (from 0 to 1) of each accession belonging to subpopulation (Q = K) membership. The membership of each accession belonging to subpopulations is indicated by different colors (Q1, red; Q2, green; Q3, blue; and Q4, yellow). (C) Principal component analysis (PCA) of the population structure. Distribution of the accessions in the association panel under PC1 and PC2.
FIGURE 3
FIGURE 3
(A) The extent of linkage disequilibrium (LD) in the regions based on pairwise r2 values. The r2 values are indicated using the color intensity index. Heatmap showing LD between each pair of markers that passed the Bonferroni threshold in genome-wide association study (GWAS). (B) Candidate genes for each single nucleotide polymorphism (SNP) locus. The bottom panel depicts the extent of linkage disequilibrium in the regions based on pairwise r2 values. The r2 values are indicated using the color intensity index shown. (C) Boxplot of seed protein based on different genotypes in soybean accessions. (D) Boxplot of seed protein based on Hap.XGG and Hap.XAA phenotypic differences between genotype combinations of the two SNPs.
FIGURE 4
FIGURE 4
Boxplots show the effect of different SNP density sets on genomic selection in the Bayesian Lasso Regression (BLR) model and ridge regression best linear unbiased prediction (rrBLUP) models.
FIGURE 5
FIGURE 5
Boxplots show the effect of training population size on genomic selection accuracy by conducting cross-validation at different folds with 100 replications for each cross-validation fold using rrBLUP.

Similar articles

Cited by

References

    1. Bandillo N., Jarquin D., Song Q., Nelson R. L., Cregan P., Specht J., et al. (2015). A population structure and genome-wide association analysis on the USDA soybean germplasm collection. Plant Genome 8 1–13. 10.3835/plantgenome2015.04.0024 - DOI - PubMed
    1. Bao Y., Vuong T., Meinhardt C., Tiffin P., Denny R., Chen S., et al. (2014). Potential of association mapping and genomic selection to explore PI 88788 derived soybean cyst nematode resistance. Plant Genome 7 2840–2854.
    1. Bradbury P. J., Zhang Z., Kroon D. E., Casstevens T. M., Ramdoss Y., Buckler E. S. (2007). TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23 2633–2635. 10.1093/bioinformatics/btm308 - DOI - PubMed
    1. Brummer E., Graef G., Orf J., Wilcox J., Shoemaker R. (1997). Mapping QTL for seed protein and oil content in eight soybean populations. Crop Sci. 37 370–378.
    1. Chapman A., Pantalone V., Ustun A., Allen F., Landau-Ellis D., Trigiano R., et al. (2003). Quantitative trait loci for agronomic and seed quality traits in an F 2 and F 4: 6 soybean population. Euphytica 129 387–393.

LinkOut - more resources