Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data

doi:10.1007/s10519-020-10010-2

. 2020 Nov;50(6):423-439.

doi: 10.1007/s10519-020-10010-2. Epub 2020 Aug 17.

Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data

Souvik Seal¹, Jeffrey A Boatman², Matt McGue³, Saonli Basu²

Affiliations

¹ Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA. sealx017@umn.edu.
² Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA.
³ Department of Psychology, University of Minnesota, Minneapolis, MN, USA.

PMID: 32804302
PMCID: PMC7581561
DOI: 10.1007/s10519-020-10010-2

Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data

Souvik Seal et al. Behav Genet. 2020 Nov.

. 2020 Nov;50(6):423-439.

doi: 10.1007/s10519-020-10010-2. Epub 2020 Aug 17.

Authors

Souvik Seal¹, Jeffrey A Boatman², Matt McGue³, Saonli Basu²

Affiliations

¹ Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA. sealx017@umn.edu.
² Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA.
³ Department of Psychology, University of Minnesota, Minneapolis, MN, USA.

PMID: 32804302
PMCID: PMC7581561
DOI: 10.1007/s10519-020-10010-2

Abstract

Genome-wide association studies (GWASs) are a popular tool for detecting association between genetic variants or single nucleotide polymorphisms (SNPs) and complex traits. Family data introduce complexity due to the non-independence of the family members. Methods for non-independent data are well established, but when the GWAS contains distinct family types, explicit modeling of between-family-type differences in the dependence structure comes at the cost of significantly increased computational burden. The situation is exacerbated with binary traits. In this paper, we perform several simulation studies to compare multiple candidate methods to perform single SNP association analysis with binary traits. We consider generalized estimating equations (GEE), generalized linear mixed models (GLMMs), or generalized least square (GLS) approaches. We study the influence of different working correlation structures for GEE on the GWAS findings and also the performance of different analysis method(s) to conduct a GWAS with binary trait data in families. We discuss the merits of each approach with attention to their applicability in a GWAS. We also compare the performances of the methods on the alcoholism data from the Minnesota Center for Twin and Family Research (MCTFR) study.

Keywords: Family data; Generalized estimating equation; Generalized least squares; Generalized linear mixed effect model; Genome-wide scan; Population-based association analysis.

PubMed Disclaimer

Figures

**Figure 1:. Simulation 1 Type I error at α =0.05.**
IND = independence, EX = exchangeable, UN = unstructured, TU = totally unstructured. Each row is for the stated data type, while each column is for the stated sample size.

**Figure 2:. Simulation 1 Empirical Power.**
IND = independence, EX = exchangeable, UN = unstructured, TU = totally unstructured. Each row is for the stated data type, while each column is for the stated sample size.

**Figure 3:. Simulation 2, Marginal Data Type I Error at α = 0.05 and Empirical Power.**
Each row corresponds to a MAF-Prevalence combination, while the columns correspond to the 2 covariance conditions. MAF = minor allele frequency, P = trait prevalence; RI = random intercept, GenEff = generation-effects, GEE(UN) = GEE with unstructured working correlation matrix and sandwich covariance estimator, GEE(FIJ) = GEE with unstructured working correlation matrix and fully-iterated jackknife estimator, GEE(IND) = GEE with independence working correlation matrix.

**Figure 4:. Simulation 2, Mixed Data Type I Error at α =0.05.**
Each row corresponds to a MAF-Prevalence combination, while the columns correspond to the 2 covariance conditions. MAF = minor allele frequency, P = trait prevalence; RI = random intercept, GenEff = generation-effects, GEE(UN) = GEE with unstructured working correlation matrix and sandwich covariance estimator, GEE(FIJ) = GEE with unstructured working correlation matrix and fully-iterated jackknife estimator, GEE(IND) = GEE with independence working correlation matrix.

**Figure 5:. Simulation 2, Mixed Data Empirical Power.**
Each row corresponds to a MAF-Prevalence combination, while the columns correspond to the 2 covariance conditions. MAF = minor allele frequency, P = trait prevalence; RI = random intercept, GenEff = generation-effects, GEE(UN) = GEE with unstructured working correlation matrix and sandwich covariance estimator, GEE(FIJ) = GEE with unstructured working correlation matrix and fully-iterated jackknife estimator, GEE(IND) = GEE with independence working correlation matrix.

**Figure 6:**
ROC curves of the different methods under Case 1 and different sub-cases from Section (2.4).

**Figure 7:**
ROC curves of the different methods under Case 2 and different sub-cases from Section (2.4).

**Figure 8:**
Normal Quantile-Quantile Plots for the general GLMM, GEE(UN) and RFGLS Alcoholism GWAS.

See this image and copyright information in PMC

Cited by

Efficient estimation of SNP heritability using Gaussian predictive process in large scale cohort studies.
Seal S, Datta A, Basu S. Seal S, et al. PLoS Genet. 2022 Apr 20;18(4):e1010151. doi: 10.1371/journal.pgen.1010151. eCollection 2022 Apr. PLoS Genet. 2022. PMID: 35442943 Free PMC article.

References

1. Agresti A and Kateri M (2011). Categorical data analysis. Springer.
1. Allen NE, Sudlow C, Peakman T, Collins R, et al. (2014). Uk biobank data: come and get it. - PubMed
1. Allen-Brady K, Cannon-Albright L, Farnham JM, Teerlink C, Vierhout ME, van Kempen LC, Kluivers KB, and Norton PA (2011). Identification of six loci associated with pelvic organ prolapse using genome-wide association analysis. Obstetrics and gynecology 118, 1345. - PMC - PubMed
1. Bates DM (2010). lme4: Mixed-effects modeling with r.
1. Benyamin B, Visscher PM, and McRae AF (2009). Family-based genome-wide association studies. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

[1] Agresti A and Kateri M (2011). Categorical data analysis. Springer.

[2] Agresti A and Kateri M (2011). Categorical data analysis. Springer.

[3] Allen NE, Sudlow C, Peakman T, Collins R, et al. (2014). Uk biobank data: come and get it. - PubMed

[4] Allen NE, Sudlow C, Peakman T, Collins R, et al. (2014). Uk biobank data: come and get it. - PubMed

[5] Allen-Brady K, Cannon-Albright L, Farnham JM, Teerlink C, Vierhout ME, van Kempen LC, Kluivers KB, and Norton PA (2011). Identification of six loci associated with pelvic organ prolapse using genome-wide association analysis. Obstetrics and gynecology 118, 1345. - PMC - PubMed

[6] Allen-Brady K, Cannon-Albright L, Farnham JM, Teerlink C, Vierhout ME, van Kempen LC, Kluivers KB, and Norton PA (2011). Identification of six loci associated with pelvic organ prolapse using genome-wide association analysis. Obstetrics and gynecology 118, 1345. - PMC - PubMed

[7] Bates DM (2010). lme4: Mixed-effects modeling with r.

[8] Bates DM (2010). lme4: Mixed-effects modeling with r.

[9] Benyamin B, Visscher PM, and McRae AF (2009). Family-based genome-wide association studies. - PubMed

[10] Benyamin B, Visscher PM, and McRae AF (2009). Family-based genome-wide association studies. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data

Affiliations

Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous