Identification of Candidate Genes and Genomic Selection for Seed Protein in Soybean Breeding Pipeline

Affiliations

¹ National Soybean Improvement Center Shijiazhuang Sub-Center, North China Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture, Hebei Laboratory of Crop Genetics and Breeding, Cereal & Oil Crop Institute, Hebei Academy of Agricultural and Forestry Sciences, Shijiazhuang, China.
² Department of Horticulture, University of Arkansas, Fayetteville, AR, United States.
³ Soybean Genomics and Improvement Lab, United States Department of Agriculture - Agricultural Research Service (USDA-ARS), Beltsville, MD, United States.
⁴ Department of Soil and Crop Sciences, Texas A&M University, College Station, TX, United States.

PMID: 35783963
PMCID: PMC9244705
DOI: 10.3389/fpls.2022.882732

Identification of Candidate Genes and Genomic Selection for Seed Protein in Soybean Breeding Pipeline

Jun Qin et al. Front Plant Sci. 2022.

. 2022 Jun 16:13:882732.

doi: 10.3389/fpls.2022.882732. eCollection 2022.

Authors

Affiliations

¹ National Soybean Improvement Center Shijiazhuang Sub-Center, North China Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture, Hebei Laboratory of Crop Genetics and Breeding, Cereal & Oil Crop Institute, Hebei Academy of Agricultural and Forestry Sciences, Shijiazhuang, China.
² Department of Horticulture, University of Arkansas, Fayetteville, AR, United States.
³ Soybean Genomics and Improvement Lab, United States Department of Agriculture - Agricultural Research Service (USDA-ARS), Beltsville, MD, United States.
⁴ Department of Soil and Crop Sciences, Texas A&M University, College Station, TX, United States.

PMID: 35783963
PMCID: PMC9244705
DOI: 10.3389/fpls.2022.882732

Abstract

Soybean is a primary meal protein for human consumption, poultry, and livestock feed. In this study, quantitative trait locus (QTL) controlling protein content was explored via genome-wide association studies (GWAS) and linkage mapping approaches based on 284 soybean accessions and 180 recombinant inbred lines (RILs), respectively, which were evaluated for protein content for 4 years. A total of 22 single nucleotide polymorphisms (SNPs) associated with protein content were detected using mixed linear model (MLM) and general linear model (GLM) methods in Tassel and 5 QTLs using Bayesian interval mapping (IM), single-trait multiple interval mapping (SMIM), single-trait composite interval mapping maximum likelihood estimation (SMLE), and single marker regression (SMR) models in Q-Gene and IciMapping. Major QTLs were detected on chromosomes 6 and 20 in both populations. The new QTL genomic region on chromosome 6 (Chr6_18844283-19315351) included 7 candidate genes and the Hap.X ^AA at the Chr6_19172961 position was associated with high protein content. Genomic selection (GS) of protein content was performed using Bayesian Lasso (BL) and ridge regression best linear unbiased prediction (rrBULP) based on all the SNPs and the SNPs significantly associated with protein content resulted from GWAS. The results showed that BL and rrBLUP performed similarly; GS accuracy was dependent on the SNP set and training population size. GS efficiency was higher for the SNPs derived from GWAS than random SNPs and reached a plateau when the number of markers was >2,000. The SNP markers identified in this study and other information were essential in establishing an efficient marker-assisted selection (MAS) and GS pipelines for improving soybean protein content.

Keywords: Glycine max; genome-wide association study; genomic selection; genotyping by sequencing; protein content; single nucleotide polymorphism.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
**(A)** QTL mapping of seed protein content in soybean chromosome 6 based on single-trait multiple IM (SMIM) in Qgene, **(B)** The QTL, qtl-chr6_prot was mapped on the combined map between physical distance and genetic position of the chromosome 6, where the x-axis shows physical distance (Mbp) and the y-axis shows the genetic position (cM).

**FIGURE 2**
Structure analysis: **(A)** delta K-values for different numbers of populations (K) from the STRUCTURE analysis, the x-axis shows different numbers of populations (K), the y-axis shows delta K-values for different numbers of subpopulations (K). **(B)** Classification of 284 accessions into four subpopulations using STRUCTURE version 2.3.4, where the x-axis shows accessions and the y-axis shows the probability (from 0 to 1) of each accession belonging to subpopulation (Q = K) membership. The membership of each accession belonging to subpopulations is indicated by different colors (Q1, red; Q2, green; Q3, blue; and Q4, yellow). **(C)** Principal component analysis (PCA) of the population structure. Distribution of the accessions in the association panel under PC1 and PC2.

**FIGURE 3**
**(A)** The extent of linkage disequilibrium (LD) in the regions based on pairwise r² values. The r² values are indicated using the color intensity index. Heatmap showing LD between each pair of markers that passed the Bonferroni threshold in genome-wide association study (GWAS). **(B)** Candidate genes for each single nucleotide polymorphism (SNP) locus. The bottom panel depicts the extent of linkage disequilibrium in the regions based on pairwise r² values. The r² values are indicated using the color intensity index shown. **(C)** Boxplot of seed protein based on different genotypes in soybean accessions. **(D)** Boxplot of seed protein based on Hap.X^GG and Hap.X^AA phenotypic differences between genotype combinations of the two SNPs.

**FIGURE 4**
Boxplots show the effect of different SNP density sets on genomic selection in the Bayesian Lasso Regression (BLR) model and ridge regression best linear unbiased prediction (rrBLUP) models.

**FIGURE 5**
Boxplots show the effect of training population size on genomic selection accuracy by conducting cross-validation at different folds with 100 replications for each cross-validation fold using rrBLUP.

See this image and copyright information in PMC

Cited by

Genetic mapping and functional genomics of soybean seed protein.
Liu S, Liu Z, Hou X, Li X. Liu S, et al. Mol Breed. 2023 Apr 12;43(4):29. doi: 10.1007/s11032-023-01373-5. eCollection 2023 Apr. Mol Breed. 2023. PMID: 37313523 Free PMC article.
Discovery of genomic regions associated with grain yield and agronomic traits in Bi-parental populations of maize (Zea mays. L) Under optimum and low nitrogen conditions.
Kimutai C, Ndlovu N, Chaikam V, Ertiro BT, Das B, Beyene Y, Kiplagat O, Spillane C, Prasanna BM, Gowda M. Kimutai C, et al. Front Genet. 2023 Oct 26;14:1266402. doi: 10.3389/fgene.2023.1266402. eCollection 2023. Front Genet. 2023. PMID: 37964777 Free PMC article.
Soybean genetic resources contributing to sustainable protein production.
Guo B, Sun L, Jiang S, Ren H, Sun R, Wei Z, Hong H, Luan X, Wang J, Wang X, Xu D, Li W, Guo C, Qiu LJ. Guo B, et al. Theor Appl Genet. 2022 Nov;135(11):4095-4121. doi: 10.1007/s00122-022-04222-9. Epub 2022 Oct 14. Theor Appl Genet. 2022. PMID: 36239765 Free PMC article. Review.
Multiple-statistical genome-wide association analysis and genomic prediction of fruit aroma and agronomic traits in peaches.
Li X, Wang J, Su M, Zhang M, Hu Y, Du J, Zhou H, Yang X, Zhang X, Jia H, Gao Z, Ye Z. Li X, et al. Hortic Res. 2023 May 31;10(7):uhad117. doi: 10.1093/hr/uhad117. eCollection 2023 Jul. Hortic Res. 2023. PMID: 37577398 Free PMC article.
Genomic and phenomic prediction for soybean seed yield, protein, and oil.
Van der Laan L, Parmley K, Saadati M, Pacin HT, Panthulugiri S, Sarkar S, Ganapathysubramanian B, Lorenz A, Singh AK. Van der Laan L, et al. Plant Genome. 2025 Mar;18(1):e70002. doi: 10.1002/tpg2.70002. Plant Genome. 2025. PMID: 39972529 Free PMC article.

See all "Cited by" articles

References

1. Bandillo N., Jarquin D., Song Q., Nelson R. L., Cregan P., Specht J., et al. (2015). A population structure and genome-wide association analysis on the USDA soybean germplasm collection. Plant Genome 8 1–13. 10.3835/plantgenome2015.04.0024 - DOI - PubMed
1. Bao Y., Vuong T., Meinhardt C., Tiffin P., Denny R., Chen S., et al. (2014). Potential of association mapping and genomic selection to explore PI 88788 derived soybean cyst nematode resistance. Plant Genome 7 2840–2854.
1. Bradbury P. J., Zhang Z., Kroon D. E., Casstevens T. M., Ramdoss Y., Buckler E. S. (2007). TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23 2633–2635. 10.1093/bioinformatics/btm308 - DOI - PubMed
1. Brummer E., Graef G., Orf J., Wilcox J., Shoemaker R. (1997). Mapping QTL for seed protein and oil content in eight soybean populations. Crop Sci. 37 370–378.
1. Chapman A., Pantalone V., Ustun A., Allen F., Landau-Ellis D., Trigiano R., et al. (2003). Quantitative trait loci for agronomic and seed quality traits in an F 2 and F 4: 6 soybean population. Euphytica 129 387–393.

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identification of Candidate Genes and Genomic Selection for Seed Protein in Soybean Breeding Pipeline

Affiliations

Identification of Candidate Genes and Genomic Selection for Seed Protein in Soybean Breeding Pipeline

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources