Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov;50(6):423-439.
doi: 10.1007/s10519-020-10010-2. Epub 2020 Aug 17.

Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data

Affiliations

Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data

Souvik Seal et al. Behav Genet. 2020 Nov.

Abstract

Genome-wide association studies (GWASs) are a popular tool for detecting association between genetic variants or single nucleotide polymorphisms (SNPs) and complex traits. Family data introduce complexity due to the non-independence of the family members. Methods for non-independent data are well established, but when the GWAS contains distinct family types, explicit modeling of between-family-type differences in the dependence structure comes at the cost of significantly increased computational burden. The situation is exacerbated with binary traits. In this paper, we perform several simulation studies to compare multiple candidate methods to perform single SNP association analysis with binary traits. We consider generalized estimating equations (GEE), generalized linear mixed models (GLMMs), or generalized least square (GLS) approaches. We study the influence of different working correlation structures for GEE on the GWAS findings and also the performance of different analysis method(s) to conduct a GWAS with binary trait data in families. We discuss the merits of each approach with attention to their applicability in a GWAS. We also compare the performances of the methods on the alcoholism data from the Minnesota Center for Twin and Family Research (MCTFR) study.

Keywords: Family data; Generalized estimating equation; Generalized least squares; Generalized linear mixed effect model; Genome-wide scan; Population-based association analysis.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. Simulation 1 Type I error at α =0.05.
IND = independence, EX = exchangeable, UN = unstructured, TU = totally unstructured. Each row is for the stated data type, while each column is for the stated sample size.
Figure 2:
Figure 2:. Simulation 1 Empirical Power.
IND = independence, EX = exchangeable, UN = unstructured, TU = totally unstructured. Each row is for the stated data type, while each column is for the stated sample size.
Figure 3:
Figure 3:. Simulation 2, Marginal Data Type I Error at α = 0.05 and Empirical Power.
Each row corresponds to a MAF-Prevalence combination, while the columns correspond to the 2 covariance conditions. MAF = minor allele frequency, P = trait prevalence; RI = random intercept, GenEff = generation-effects, GEE(UN) = GEE with unstructured working correlation matrix and sandwich covariance estimator, GEE(FIJ) = GEE with unstructured working correlation matrix and fully-iterated jackknife estimator, GEE(IND) = GEE with independence working correlation matrix.
Figure 4:
Figure 4:. Simulation 2, Mixed Data Type I Error at α =0.05.
Each row corresponds to a MAF-Prevalence combination, while the columns correspond to the 2 covariance conditions. MAF = minor allele frequency, P = trait prevalence; RI = random intercept, GenEff = generation-effects, GEE(UN) = GEE with unstructured working correlation matrix and sandwich covariance estimator, GEE(FIJ) = GEE with unstructured working correlation matrix and fully-iterated jackknife estimator, GEE(IND) = GEE with independence working correlation matrix.
Figure 5:
Figure 5:. Simulation 2, Mixed Data Empirical Power.
Each row corresponds to a MAF-Prevalence combination, while the columns correspond to the 2 covariance conditions. MAF = minor allele frequency, P = trait prevalence; RI = random intercept, GenEff = generation-effects, GEE(UN) = GEE with unstructured working correlation matrix and sandwich covariance estimator, GEE(FIJ) = GEE with unstructured working correlation matrix and fully-iterated jackknife estimator, GEE(IND) = GEE with independence working correlation matrix.
Figure 6:
Figure 6:
ROC curves of the different methods under Case 1 and different sub-cases from Section (2.4).
Figure 7:
Figure 7:
ROC curves of the different methods under Case 2 and different sub-cases from Section (2.4).
Figure 8:
Figure 8:
Normal Quantile-Quantile Plots for the general GLMM, GEE(UN) and RFGLS Alcoholism GWAS.

Similar articles

Cited by

References

    1. Agresti A and Kateri M (2011). Categorical data analysis. Springer.
    1. Allen NE, Sudlow C, Peakman T, Collins R, et al. (2014). Uk biobank data: come and get it. - PubMed
    1. Allen-Brady K, Cannon-Albright L, Farnham JM, Teerlink C, Vierhout ME, van Kempen LC, Kluivers KB, and Norton PA (2011). Identification of six loci associated with pelvic organ prolapse using genome-wide association analysis. Obstetrics and gynecology 118, 1345. - PMC - PubMed
    1. Bates DM (2010). lme4: Mixed-effects modeling with r.
    1. Benyamin B, Visscher PM, and McRae AF (2009). Family-based genome-wide association studies. - PubMed

Publication types

LinkOut - more resources