Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Apr 7;98(4):653-66.
doi: 10.1016/j.ajhg.2016.02.012. Epub 2016 Mar 24.

Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models

Affiliations

Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models

Han Chen et al. Am J Hum Genet. .

Abstract

Linear mixed models (LMMs) are widely used in genome-wide association studies (GWASs) to account for population structure and relatedness, for both continuous and binary traits. Motivated by the failure of LMMs to control type I errors in a GWAS of asthma, a binary trait, we show that LMMs are generally inappropriate for analyzing binary traits when population stratification leads to violation of the LMM's constant-residual variance assumption. To overcome this problem, we develop a computationally efficient logistic mixed model approach for genome-wide analysis of binary traits, the generalized linear mixed model association test (GMMAT). This approach fits a logistic mixed model once per GWAS and performs score tests under the null hypothesis of no association between a binary trait and individual genetic variants. We show in simulation studies and real data analysis that GMMAT effectively controls for population structure and relatedness when analyzing binary traits in a wide variety of study designs.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Quantile-Quantile Plot of Association Test p Values from the Asthma GWAS Analysis in HCHS/SOL (A) All SNPs. (B) Category 1: SNPs with the ratio of expected variances in Puerto Ricans over non-Puerto Ricans less than 0.8. (C) Category 2: SNPs with the ratio of expected variances in Puerto Ricans over non-Puerto Ricans between 0.8 and 1.25. (D) Category 3: SNPs with the ratio of expected variances in Puerto Ricans over non-Puerto Ricans greater than 1.25. Abbreviations are as follows: LMM, a joint analysis using LMM on the combined samples; LMM meta, an inverse-variance weighted fixed effects meta-analysis approach to combine LMM results from analyzing Puerto Ricans and non-Puerto Ricans separately.
Figure 2
Figure 2
True Mean-Variance Relationship for a Binary Trait and the Constant Mean-Variance Relationship Assumed by Linear Models, Illustrated by the Example from the Asthma Data in HCHS/SOL For a binary trait with the mean π, its variance is π(1 − π), which varies with the mean. This heteroscedasticity is properly accounted for by logistic regression. Linear models inappropriately assume that the variance of the binary trait does not change with the mean and is a constant (homoscedasticity). For example, the variance of the binary trait (asthma status) in Puerto Ricans is considerably larger than the variances in the other five populations, because Puerto Ricans have a much higher asthma disease proportion than the other populations. This heteroscedasticity caused by population stratification results in the p values calculated from LMMs being likely to be incorrect, but is properly taken into account by logistic mixed models using GMMAT.
Figure 3
Figure 3
A Simulated Cohort Study with 10,000 Related Individuals Quantile-quantile plots of association test p values from 3,200 simulation replicates under the null hypothesis of no genetic association, each with 625,583 common SNPs, were combined to get more than 2 billion null p values. (A) All SNPs. (B) Category 1: SNPs with the ratio of expected variances in population 1 (high risk) over population 2 (low risk) less than 0.8. (C) Category 2: SNPs with the ratio of expected variances in population 1 (high risk) over population 2 (low risk) between 0.8 and 1.25. (D) Category 3: SNPs with the ratio of expected variances in population 1 (high risk) over population 2 (low risk) greater than 1.25.

References

    1. Lander E.S., Schork N.J. Genetic dissection of complex traits. Science. 1994;265:2037–2048. - PubMed
    1. Aulchenko Y.S., de Koning D.J., Haley C. Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics. 2007;177:577–585. - PMC - PubMed
    1. Kang H.M., Zaitlen N.A., Wade C.M., Kirby A., Heckerman D., Daly M.J., Eskin E. Efficient control of population structure in model organism association mapping. Genetics. 2008;178:1709–1723. - PMC - PubMed
    1. Kang H.M., Sul J.H., Service S.K., Zaitlen N.A., Kong S.Y., Freimer N.B., Sabatti C., Eskin E. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 2010;42:348–354. - PMC - PubMed
    1. Zhang Z., Ersoz E., Lai C.Q., Todhunter R.J., Tiwari H.K., Gore M.A., Bradbury P.J., Yu J., Arnett D.K., Ordovas J.M., Buckler E.S. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 2010;42:355–360. - PMC - PubMed

Publication types

LinkOut - more resources