Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb;25(2):240-245.
doi: 10.1038/ejhg.2016.150. Epub 2016 Nov 16.

Guidance for the utility of linear models in meta-analysis of genetic association studies of binary phenotypes

Affiliations

Guidance for the utility of linear models in meta-analysis of genetic association studies of binary phenotypes

James P Cook et al. Eur J Hum Genet. 2017 Feb.

Abstract

Linear mixed models are increasingly used for the analysis of genome-wide association studies (GWAS) of binary phenotypes because they can efficiently and robustly account for population stratification and relatedness through inclusion of random effects for a genetic relationship matrix. However, the utility of linear (mixed) models in the context of meta-analysis of GWAS of binary phenotypes has not been previously explored. In this investigation, we present simulations to compare the performance of linear and logistic regression models under alternative weighting schemes in a fixed-effects meta-analysis framework, considering designs that incorporate variable case-control imbalance, confounding factors and population stratification. Our results demonstrate that linear models can be used for meta-analysis of GWAS of binary phenotypes, without loss of power, even in the presence of extreme case-control imbalance, provided that one of the following schemes is used: (i) effective sample size weighting of Z-scores or (ii) inverse-variance weighting of allelic effect sizes after conversion onto the log-odds scale. Our conclusions thus provide essential recommendations for the development of robust protocols for meta-analysis of binary phenotypes with linear models.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest. The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Power to detect association (at genome-wide significance, P<5 × 10−8) of a binary phenotype with a causal SNP, in the absence of population stratification or confounders, using alternative meta-analysis strategies for summary statistics obtained from linear and logistic regression models without random effects for the GRM (Table 1). Results are presented as a function of the allelic OR, for a causal SNP with RAF in the range of 1–50% and for variable extent of case–control imbalance (defined in Table 2).
Figure 2
Figure 2
Power to detect association (at genome-wide significance, P<5 × 10−8) of a binary phenotype with a causal SNP, in the presence of population stratification (cases and controls ascertained from sub-populations (A and B), using alternative meta-analysis strategies for summary statistics obtained from linear regression models with random effects for the GRM (Table 1). Results are presented as a function of the probability that a case is ascertained from sub-population A, for a causal SNP with allelic OR of 1.15 for the binary phenotype and for variable extent of case–control imbalance (defined in Table 2).
Figure 3
Figure 3
Power to detect association (at genome-wide significance, P<5 × 10−8) of a binary phenotype with a causal SNP, in the absence of population stratification or confounders, using alternative meta-analysis strategies for summary statistics obtained from linear and logistic regression models without random effects for the GRM (Table 1). Association summary statistics were aggregated from a population biobank of 100 000 participants with extreme case–control imbalance and a balanced case–control study of 2000 participants. Results are presented for a causal SNP with RAF 50% and an allelic OR of 1.25, as a function of the number of cases in the population biobank.

References

    1. Kang HM, Sul JH, Service SK et al: Variance component model to account for sample structure in genome-wide association studies. Nat Genet 2010; 42: 348–354. - PMC - PubMed
    1. Zhang Z, Ersoz E, Lai CQ et al: Mixed linear model approach adapted for genome-wide association studies. Nat Genet 2010; 42: 355–360. - PMC - PubMed
    1. Price AL, Zaitlen NA, Reich D, Patterson N: New approaches to population stratification in genome-wide association studies. Nat Rev Genet 2010; 11: 459–463. - PMC - PubMed
    1. Lippert C, Listgarten J, Liu Y, Kadie CM, Davidson RI, Heckerman D: FaST linear mixed models for genome-wide association studies. Nat Methods 2011; 8: 833–835. - PubMed
    1. Listgarten J, Lippert C, Kadie CM, Davidson RI, Eskin E, Heckerman D: Improved linear mixed models for genome-wide association studies. Nat Methods 2012; 9: 525–526. - PMC - PubMed

Publication types

LinkOut - more resources