Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models

Affiliations

¹ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
² Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore.
³ Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
⁴ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Department of Mathematics, Tsinghua University, Beijing 100084, P. R. China.
⁵ Division of Pediatric Pulmonary Medicine, Allergy and Immunology, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA 15224, USA.
⁶ Division of Sleep and Circadian Disorders, Departments of Medicine and Neurology, Brigham and Women's Hospital, Boston, MA 02115, USA.
⁷ Prevention and Population Sciences Program, Division of Cardiovascular Sciences, National Heart, Lung, and Blood Institute, NIH, Bethesda, MD 20892, USA.
⁸ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA. Electronic address: xlin@hsph.harvard.edu.

PMID: 27018471
PMCID: PMC4833218
DOI: 10.1016/j.ajhg.2016.02.012

Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models

Han Chen et al. Am J Hum Genet. 2016.

. 2016 Apr 7;98(4):653-66.

doi: 10.1016/j.ajhg.2016.02.012. Epub 2016 Mar 24.

Authors

Affiliations

¹ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
² Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore.
³ Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
⁴ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Department of Mathematics, Tsinghua University, Beijing 100084, P. R. China.
⁵ Division of Pediatric Pulmonary Medicine, Allergy and Immunology, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA 15224, USA.
⁶ Division of Sleep and Circadian Disorders, Departments of Medicine and Neurology, Brigham and Women's Hospital, Boston, MA 02115, USA.
⁷ Prevention and Population Sciences Program, Division of Cardiovascular Sciences, National Heart, Lung, and Blood Institute, NIH, Bethesda, MD 20892, USA.
⁸ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA. Electronic address: xlin@hsph.harvard.edu.

PMID: 27018471
PMCID: PMC4833218
DOI: 10.1016/j.ajhg.2016.02.012

Abstract

Linear mixed models (LMMs) are widely used in genome-wide association studies (GWASs) to account for population structure and relatedness, for both continuous and binary traits. Motivated by the failure of LMMs to control type I errors in a GWAS of asthma, a binary trait, we show that LMMs are generally inappropriate for analyzing binary traits when population stratification leads to violation of the LMM's constant-residual variance assumption. To overcome this problem, we develop a computationally efficient logistic mixed model approach for genome-wide analysis of binary traits, the generalized linear mixed model association test (GMMAT). This approach fits a logistic mixed model once per GWAS and performs score tests under the null hypothesis of no association between a binary trait and individual genetic variants. We show in simulation studies and real data analysis that GMMAT effectively controls for population structure and relatedness when analyzing binary traits in a wide variety of study designs.

PubMed Disclaimer

Figures

**Figure 1**
Quantile-Quantile Plot of Association Test p Values from the Asthma GWAS Analysis in HCHS/SOL (A) All SNPs. (B) Category 1: SNPs with the ratio of expected variances in Puerto Ricans over non-Puerto Ricans less than 0.8. (C) Category 2: SNPs with the ratio of expected variances in Puerto Ricans over non-Puerto Ricans between 0.8 and 1.25. (D) Category 3: SNPs with the ratio of expected variances in Puerto Ricans over non-Puerto Ricans greater than 1.25. Abbreviations are as follows: LMM, a joint analysis using LMM on the combined samples; LMM meta, an inverse-variance weighted fixed effects meta-analysis approach to combine LMM results from analyzing Puerto Ricans and non-Puerto Ricans separately.

**Figure 2**
True Mean-Variance Relationship for a Binary Trait and the Constant Mean-Variance Relationship Assumed by Linear Models, Illustrated by the Example from the Asthma Data in HCHS/SOL For a binary trait with the mean π, its variance is π(1 − π), which varies with the mean. This heteroscedasticity is properly accounted for by logistic regression. Linear models inappropriately assume that the variance of the binary trait does not change with the mean and is a constant (homoscedasticity). For example, the variance of the binary trait (asthma status) in Puerto Ricans is considerably larger than the variances in the other five populations, because Puerto Ricans have a much higher asthma disease proportion than the other populations. This heteroscedasticity caused by population stratification results in the p values calculated from LMMs being likely to be incorrect, but is properly taken into account by logistic mixed models using GMMAT.

**Figure 3**
A Simulated Cohort Study with 10,000 Related Individuals Quantile-quantile plots of association test p values from 3,200 simulation replicates under the null hypothesis of no genetic association, each with 625,583 common SNPs, were combined to get more than 2 billion null p values. (A) All SNPs. (B) Category 1: SNPs with the ratio of expected variances in population 1 (high risk) over population 2 (low risk) less than 0.8. (C) Category 2: SNPs with the ratio of expected variances in population 1 (high risk) over population 2 (low risk) between 0.8 and 1.25. (D) Category 3: SNPs with the ratio of expected variances in population 1 (high risk) over population 2 (low risk) greater than 1.25.

See this image and copyright information in PMC

References

1. Lander E.S., Schork N.J. Genetic dissection of complex traits. Science. 1994;265:2037–2048. - PubMed
1. Aulchenko Y.S., de Koning D.J., Haley C. Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics. 2007;177:577–585. - PMC - PubMed
1. Kang H.M., Zaitlen N.A., Wade C.M., Kirby A., Heckerman D., Daly M.J., Eskin E. Efficient control of population structure in model organism association mapping. Genetics. 2008;178:1709–1723. - PMC - PubMed
1. Kang H.M., Sul J.H., Service S.K., Zaitlen N.A., Kong S.Y., Freimer N.B., Sabatti C., Eskin E. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 2010;42:348–354. - PMC - PubMed
1. Zhang Z., Ersoz E., Lai C.Q., Todhunter R.J., Tiwari H.K., Gore M.A., Bradbury P.J., Yu J., Arnett D.K., Ordovas J.M., Buckler E.S. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 2010;42:355–360. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- ClinicalTrials.gov

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models

Affiliations

Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical