. 2016 Aug 9;6(8):2365-74.

doi: 10.1534/g3.116.029090.

The Use of Targeted Marker Subsets to Account for Population Structure and Relatedness in Genome-Wide Association Studies of Maize (Zea mays L.)

Angela H Chen¹, Alexander E Lipka²

Affiliations

¹ Department of Statistics, University of Illinois at Urbana-Champaign, Illinois 61801.
² Department of Crop Sciences, University of Illinois at Urbana-Champaign, Illinois 61801 alipka@illinois.edu.

PMID: 27233668
PMCID: PMC4978891
DOI: 10.1534/g3.116.029090

The Use of Targeted Marker Subsets to Account for Population Structure and Relatedness in Genome-Wide Association Studies of Maize (Zea mays L.)

Angela H Chen et al. G3 (Bethesda). 2016.

. 2016 Aug 9;6(8):2365-74.

doi: 10.1534/g3.116.029090.

Authors

Angela H Chen¹, Alexander E Lipka²

Affiliations

¹ Department of Statistics, University of Illinois at Urbana-Champaign, Illinois 61801.
² Department of Crop Sciences, University of Illinois at Urbana-Champaign, Illinois 61801 alipka@illinois.edu.

PMID: 27233668
PMCID: PMC4978891
DOI: 10.1534/g3.116.029090

Abstract

A typical plant genome-wide association study (GWAS) uses a mixed linear model (MLM) that includes a trait as the response variable, a marker as an explanatory variable, and fixed and random effect covariates accounting for population structure and relatedness. Although effective in controlling for false positive signals, this model typically fails to detect signals that are correlated with population structure or are located in high linkage disequilibrium (LD) genomic regions. This result likely arises from each tested marker being used to estimate population structure and relatedness. Previous work has demonstrated that it is possible to increase the power of the MLM by estimating relatedness (i.e., kinship) with markers that are not located on the chromosome where the tested marker resides. To quantify the amount of additional significant signals one can expect using this so-called K_chr model, we reanalyzed Mendelian, polygenic, and complex traits in two maize (Zea mays L.) diversity panels that have been previously assessed using the traditional MLM. We demonstrated that the K_chr model could find more significant associations, especially in high LD regions. This finding is underscored by our identification of novel genomic signals proximal to the tocochromanol biosynthetic pathway gene ZmVTE1 that are associated with a ratio of tocotrienols. We conclude that the K_chr model can detect more intricate sources of allelic variation underlying agronomically important traits, and should therefore become more widely used for GWAS. To facilitate the implementation of the K_chr model, we provide code written in the R programming language.

Keywords: GWAS; linkage disequilibrium; maize; marker subsets; mixed model.

PubMed Disclaimer

Figures

**Figure 1**
Manhattan plots depicting all SNPs significantly associated with carotenoid (A) and tocochromanol (B) traits at 10% FDR using the K_chr model located in novel genomic regions. Such a SNP is in a novel genomic region if there are no SNPs within ± 250 kb significantly associated with that same trait at 10% FDR when using the traditional unified mixed linear model. (A) The X-axis depicts the B73 RefGen_v2 position along the maize genome and the Y-axis shows the −log(10) P-values for each significant SNP at 10% FDR located in a novel genomic region. The blue dots represent novel genomic signals for β-xanthophylls/α-xanthophylls, the light orange dot represents such a signal for α-carotene/zeinoxanthin, and the dark orange dots represent such genomic signals for zeinoxanthin/lutein. The minor allele frequencies of the SNPs depicted in the figure range from 0.09–0.45. (B) The X- and Y-axes are as described in (A). The blue dot represents novel genomic signals for γ-tocopherol/(γ-tocopherol + α-tocopherol), the light orange dots represent such signals for δ-tocotrienol/(γ-tocotrienol + α-tocotrienol), the dark orange dots represent such signals for δ-tocotrienol/γ-tocotrienol, and the purple dots represent such signals for α-tocopherol/γ-tocopherol. The minor allele frequencies of the SNPs depicted in the figure range from 0.08–0.48. The approximate B73 RefGen_v2 positions of relevant biosynthetic pathway genes are depicted by dotted gray arrows. FDR, false discovery rate; SNP, single nucleotide polymorphism.

**Figure 2**
Manhattan plot depicting all SNPs significantly associated with the traits evaluated in the North Central Regional Plant Introduction Station panel at 5% FDR using the K_chr model located in novel genomic regions. Such a SNP is in a novel genomic region if there are no SNPs within ± 250 kb significantly associated with that same trait at 5% FDR when using the traditional unified mixed linear model. The X-axis depicts the B73 RefGen_v2 position along the maize genome and the Y-axis shows the −log(10) P-values for each significant SNP at 5% FDR located in a novel genomic region. The blue dots represent novel genomic signals for sweet *vs.* starchy corn, the light orange dots represent such signals for days to silking, the red dots represent such signals for days to anthesis, the black dots represent such signals for plant height, and the purple dots represent such signals for ear height. The minor allele frequencies of the SNPs depicted in the figure range from 0.05–0.50. The approximate B73 RefGen_v2 positions of relevant candidate genes and regulatory elements are depicted by dotted gray arrows. FDR, false discovery rate; SNP, single nucleotide polymorphism.

**Figure 3**
Distribution of P-values obtained from the K_chr model and traditional unified mixed linear model (MLM) at six specific genomic regions, each of which contain at least one candidate gene. Each graph compares the distribution of P-values from the K_chr model (red box plot, left) to those from the traditional unified MLM (blue box plot, right). The −log(10) P-values are presented on the Y-axis. (A) Distribution of P-values from the K_chr model and MLM when markers within the chromosome 5 region surrounding *ZmVTE1* were tested for association with δ-tocotrienol/γ-tocotrienol. (B) Distribution of P-values from the K_chr model and MLM when markers in the chromosome 1 region surrounding *lut1* were tested for association with zeinoxanthin. (C) Distribution of P-values from the K_chr model and MLM when markers in the chromosome 4 region surrounding *Su1* were tested for associations with sweet *vs.* starchy corn. (D) Distribution of P-values from the K_chr model and MLM when markers in the chromosome 5 region surrounding *ZmVTE4* were tested for associations with α-tocopherol. (E) Distribution of P-values from the K_chr model and MLM when markers in the chromosome 2 region surrounding *zep1* were tested for associations with β-xanthophylls/α-xanthophylls. (F) Distribution of P-values from the K_chr model and MLM when markers in the chromosome 8 region surrounding *ZCN8* and *ZmRap2.7* were tested for associations with days to silking. For the regions with high local linkage disequilibrium (LD; *i.e.*, those presented in A, B, and C), the distribution of P-values from the K_chr model are noticeably lower than the distribution presented by the traditional unified MLM. The same trend is observed for the two regions analyzed using data from the powerful North Central Regional Plant Introduction Station panel (presented in C and F). Finally, the distribution of P-values from the two different models are more similar in regions of lower LD (presented in D and E) analyzed using data from the smaller Goodman diversity panel.

See this image and copyright information in PMC

Cited by

An assessment of true and false positive detection rates of stepwise epistatic model selection as a function of sample size and number of markers.
Chen AH, Ge W, Metcalf W, Jakobsson E, Mainzer LS, Lipka AE. Chen AH, et al. Heredity (Edinb). 2019 May;122(5):660-671. doi: 10.1038/s41437-018-0162-2. Epub 2018 Nov 15. Heredity (Edinb). 2019. PMID: 30443009 Free PMC article.
The utility of metabolomics as a tool to inform maize biology.
Medeiros DB, Brotman Y, Fernie AR. Medeiros DB, et al. Plant Commun. 2021 Apr 21;2(4):100187. doi: 10.1016/j.xplc.2021.100187. eCollection 2021 Jul 12. Plant Commun. 2021. PMID: 34327322 Free PMC article. Review.
Linking anthocyanin diversity, hue, and genetics in purple corn.
Chatham LA, Juvik JA. Chatham LA, et al. G3 (Bethesda). 2021 Feb 9;11(2):jkaa062. doi: 10.1093/g3journal/jkaa062. G3 (Bethesda). 2021. PMID: 33585872 Free PMC article.
An assessment of the performance of the logistic mixed model for analyzing binary traits in maize and sorghum diversity panels.
Shenstone E, Cooper J, Rice B, Bohn M, Jamann TM, Lipka AE. Shenstone E, et al. PLoS One. 2018 Nov 21;13(11):e0207752. doi: 10.1371/journal.pone.0207752. eCollection 2018. PLoS One. 2018. PMID: 30462727 Free PMC article.
COMPILE: a GWAS computational pipeline for gene discovery in complex genomes.
Hill MJ, Penning BW, McCann MC, Carpita NC. Hill MJ, et al. BMC Plant Biol. 2022 Jul 2;22(1):315. doi: 10.1186/s12870-022-03668-9. BMC Plant Biol. 2022. PMID: 35778686 Free PMC article.

See all "Cited by" articles

References

1. Atwell S., Huang Y. S., Vilhjalmsson B. J., Willems G., Horton M., et al. , 2010. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465(7298): 627–631. - PMC - PubMed
1. Benjamini Y., Hochberg Y., 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. B Met. 57(1): 289–300.
1. Bernardo, R., 2013 Genomewide markers for controlling background variation in association mapping. Plant Genome 6(1). Available at: www.dl.sciencesocieties.org/publications/tpg/abstracts/6/1/plantgenome20....
1. Buckler E. S., Holland J. B., Bradbury P. J., Acharya C. B., Brown P. J., et al. , 2009. The genetic architecture of maize flowering time. Science 325(5941): 714–718. - PubMed
1. Buckler Lab at Cornell University 2016 Tassel 5. www.maizegenetics.net. Accessed: June 10, 2016.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Use of Targeted Marker Subsets to Account for Population Structure and Relatedness in Genome-Wide Association Studies of Maize (Zea mays L.)

Affiliations

The Use of Targeted Marker Subsets to Account for Population Structure and Relatedness in Genome-Wide Association Studies of Maize (Zea mays L.)

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials