Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Aug 9;6(8):2365-74.
doi: 10.1534/g3.116.029090.

The Use of Targeted Marker Subsets to Account for Population Structure and Relatedness in Genome-Wide Association Studies of Maize (Zea mays L.)

Affiliations

The Use of Targeted Marker Subsets to Account for Population Structure and Relatedness in Genome-Wide Association Studies of Maize (Zea mays L.)

Angela H Chen et al. G3 (Bethesda). .

Abstract

A typical plant genome-wide association study (GWAS) uses a mixed linear model (MLM) that includes a trait as the response variable, a marker as an explanatory variable, and fixed and random effect covariates accounting for population structure and relatedness. Although effective in controlling for false positive signals, this model typically fails to detect signals that are correlated with population structure or are located in high linkage disequilibrium (LD) genomic regions. This result likely arises from each tested marker being used to estimate population structure and relatedness. Previous work has demonstrated that it is possible to increase the power of the MLM by estimating relatedness (i.e., kinship) with markers that are not located on the chromosome where the tested marker resides. To quantify the amount of additional significant signals one can expect using this so-called K_chr model, we reanalyzed Mendelian, polygenic, and complex traits in two maize (Zea mays L.) diversity panels that have been previously assessed using the traditional MLM. We demonstrated that the K_chr model could find more significant associations, especially in high LD regions. This finding is underscored by our identification of novel genomic signals proximal to the tocochromanol biosynthetic pathway gene ZmVTE1 that are associated with a ratio of tocotrienols. We conclude that the K_chr model can detect more intricate sources of allelic variation underlying agronomically important traits, and should therefore become more widely used for GWAS. To facilitate the implementation of the K_chr model, we provide code written in the R programming language.

Keywords: GWAS; linkage disequilibrium; maize; marker subsets; mixed model.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Manhattan plots depicting all SNPs significantly associated with carotenoid (A) and tocochromanol (B) traits at 10% FDR using the K_chr model located in novel genomic regions. Such a SNP is in a novel genomic region if there are no SNPs within ± 250 kb significantly associated with that same trait at 10% FDR when using the traditional unified mixed linear model. (A) The X-axis depicts the B73 RefGen_v2 position along the maize genome and the Y-axis shows the −log(10) P-values for each significant SNP at 10% FDR located in a novel genomic region. The blue dots represent novel genomic signals for β-xanthophylls/α-xanthophylls, the light orange dot represents such a signal for α-carotene/zeinoxanthin, and the dark orange dots represent such genomic signals for zeinoxanthin/lutein. The minor allele frequencies of the SNPs depicted in the figure range from 0.09–0.45. (B) The X- and Y-axes are as described in (A). The blue dot represents novel genomic signals for γ-tocopherol/(γ-tocopherol + α-tocopherol), the light orange dots represent such signals for δ-tocotrienol/(γ-tocotrienol + α-tocotrienol), the dark orange dots represent such signals for δ-tocotrienol/γ-tocotrienol, and the purple dots represent such signals for α-tocopherol/γ-tocopherol. The minor allele frequencies of the SNPs depicted in the figure range from 0.08–0.48. The approximate B73 RefGen_v2 positions of relevant biosynthetic pathway genes are depicted by dotted gray arrows. FDR, false discovery rate; SNP, single nucleotide polymorphism.
Figure 2
Figure 2
Manhattan plot depicting all SNPs significantly associated with the traits evaluated in the North Central Regional Plant Introduction Station panel at 5% FDR using the K_chr model located in novel genomic regions. Such a SNP is in a novel genomic region if there are no SNPs within ± 250 kb significantly associated with that same trait at 5% FDR when using the traditional unified mixed linear model. The X-axis depicts the B73 RefGen_v2 position along the maize genome and the Y-axis shows the −log(10) P-values for each significant SNP at 5% FDR located in a novel genomic region. The blue dots represent novel genomic signals for sweet vs. starchy corn, the light orange dots represent such signals for days to silking, the red dots represent such signals for days to anthesis, the black dots represent such signals for plant height, and the purple dots represent such signals for ear height. The minor allele frequencies of the SNPs depicted in the figure range from 0.05–0.50. The approximate B73 RefGen_v2 positions of relevant candidate genes and regulatory elements are depicted by dotted gray arrows. FDR, false discovery rate; SNP, single nucleotide polymorphism.
Figure 3
Figure 3
Distribution of P-values obtained from the K_chr model and traditional unified mixed linear model (MLM) at six specific genomic regions, each of which contain at least one candidate gene. Each graph compares the distribution of P-values from the K_chr model (red box plot, left) to those from the traditional unified MLM (blue box plot, right). The −log(10) P-values are presented on the Y-axis. (A) Distribution of P-values from the K_chr model and MLM when markers within the chromosome 5 region surrounding ZmVTE1 were tested for association with δ-tocotrienol/γ-tocotrienol. (B) Distribution of P-values from the K_chr model and MLM when markers in the chromosome 1 region surrounding lut1 were tested for association with zeinoxanthin. (C) Distribution of P-values from the K_chr model and MLM when markers in the chromosome 4 region surrounding Su1 were tested for associations with sweet vs. starchy corn. (D) Distribution of P-values from the K_chr model and MLM when markers in the chromosome 5 region surrounding ZmVTE4 were tested for associations with α-tocopherol. (E) Distribution of P-values from the K_chr model and MLM when markers in the chromosome 2 region surrounding zep1 were tested for associations with β-xanthophylls/α-xanthophylls. (F) Distribution of P-values from the K_chr model and MLM when markers in the chromosome 8 region surrounding ZCN8 and ZmRap2.7 were tested for associations with days to silking. For the regions with high local linkage disequilibrium (LD; i.e., those presented in A, B, and C), the distribution of P-values from the K_chr model are noticeably lower than the distribution presented by the traditional unified MLM. The same trend is observed for the two regions analyzed using data from the powerful North Central Regional Plant Introduction Station panel (presented in C and F). Finally, the distribution of P-values from the two different models are more similar in regions of lower LD (presented in D and E) analyzed using data from the smaller Goodman diversity panel.

Similar articles

Cited by

References

    1. Atwell S., Huang Y. S., Vilhjalmsson B. J., Willems G., Horton M., et al. , 2010. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465(7298): 627–631. - PMC - PubMed
    1. Benjamini Y., Hochberg Y., 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. B Met. 57(1): 289–300.
    1. Bernardo, R., 2013 Genomewide markers for controlling background variation in association mapping. Plant Genome 6(1). Available at: www.dl.sciencesocieties.org/publications/tpg/abstracts/6/1/plantgenome20....
    1. Buckler E. S., Holland J. B., Bradbury P. J., Acharya C. B., Brown P. J., et al. , 2009. The genetic architecture of maize flowering time. Science 325(5941): 714–718. - PubMed
    1. Buckler Lab at Cornell University 2016 Tassel 5. www.maizegenetics.net. Accessed: June 10, 2016.

Publication types

LinkOut - more resources