Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies

Han Chen¹, Jennifer E Huffman², Jennifer A Brody³, Chaolong Wang⁴, Seunggeun Lee⁵, Zilin Li⁶, Stephanie M Gogarten⁷, Tamar Sofer⁸, Lawrence F Bielak⁹, Joshua C Bis³, John Blangero¹⁰, Russell P Bowler¹¹, Brian E Cade⁸, Michael H Cho¹², Adolfo Correa¹³, Joanne E Curran¹⁰, Paul S de Vries¹⁴, David C Glahn¹⁵, Xiuqing Guo¹⁶, Andrew D Johnson¹⁷, Sharon Kardia⁹, Charles Kooperberg¹⁸, Joshua P Lewis¹⁹, Xiaoming Liu²⁰, Rasika A Mathias²¹, Braxton D Mitchell²², Jeffrey R O'Connell¹⁹, Patricia A Peyser⁹, Wendy S Post²³, Alex P Reiner¹⁸, Stephen S Rich²⁴, Jerome I Rotter¹⁶, Edwin K Silverman¹², Jennifer A Smith⁹, Ramachandran S Vasan²⁵, James G Wilson²⁶, Lisa R Yanek²¹; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium; TOPMed Hematology and Hemostasis Working Group; Susan Redline²⁷, Nicholas L Smith²⁸, Eric Boerwinkle²⁹, Ingrid B Borecki⁷, L Adrienne Cupples³⁰, Cathy C Laurie⁷, Alanna C Morrison¹⁴, Kenneth M Rice⁷, Xihong Lin³¹

Affiliations

¹ Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Center for Precision Health, School of Public Health and School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
² Center for Population Genomics, VA Boston Healthcare System, Jamaica Plain, MA 02130, USA.
³ Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA 98101, USA.
⁴ Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China.
⁵ Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.
⁶ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
⁷ Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
⁸ Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA 02115, USA; Division of Sleep Medicine, Harvard Medical School, Boston, MA 02115, USA.
⁹ Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.
¹⁰ Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX 78520, USA.
¹¹ Division of Pulmonary Medicine, Department of Medicine, National Jewish Health, Denver, CO 80206, USA.
¹² Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA; Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA.
¹³ Jackson Heart Study, Department of Medicine, University of Mississippi Medical Center, Jackson, MS 39216, USA.
¹⁴ Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
¹⁵ Department of Psychiatry, Yale University School of Medicine, New Haven, CT 06510, USA; Olin Neuropsychiatric Research Center, Institute of Living, Hartford Hospital, Hartford, CT 06106, USA.
¹⁶ The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA 90502, USA.
¹⁷ Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, MA 01702, USA.
¹⁸ Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
¹⁹ Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA.
²⁰ USF Genomics, College of Public Health, University of South Florida, Tampa, FL 33612, USA.
²¹ Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA.
²² Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA; Geriatrics Research and Education Clinical Center, Baltimore VA Medical Center, Baltimore, MD 21201, USA.
²³ Division of Cardiology, Johns Hopkins University, Baltimore, MD 21287, USA.
²⁴ Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA.
²⁵ Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, MA 01702, USA; Sections of Preventive Medicine and Epidemiology, and of Cardiology, Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA; Department of Epidemiology, Boston University School of Public Health, Boston, MA 02118, USA.
²⁶ Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS 39216, USA.
²⁷ Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA 02115, USA; Division of Sleep Medicine, Harvard Medical School, Boston, MA 02115, USA; Division of Pulmonary, Critical Care, and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA 02115, USA.
²⁸ Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA 98101, USA; Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, USA; Seattle Epidemiologic Research and Information Center, Department of Veterans Affairs Office of Research and Development, Seattle, WA 98108, USA; Department of Epidemiology, University of Washington, Seattle, WA 98195, USA.
²⁹ Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.
³⁰ Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, MA 01702, USA; Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA.
³¹ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Department of Statistics, Harvard University, Cambridge, MA 02138, USA. Electronic address: xlin@hsph.harvard.edu.

PMID: 30639324
PMCID: PMC6372261
DOI: 10.1016/j.ajhg.2018.12.012

Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies

Han Chen et al. Am J Hum Genet. 2019.

. 2019 Feb 7;104(2):260-274.

doi: 10.1016/j.ajhg.2018.12.012. Epub 2019 Jan 10.

Authors

Affiliations

¹ Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Center for Precision Health, School of Public Health and School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
² Center for Population Genomics, VA Boston Healthcare System, Jamaica Plain, MA 02130, USA.
³ Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA 98101, USA.
⁴ Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China.
⁵ Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.
⁶ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
⁷ Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
⁸ Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA 02115, USA; Division of Sleep Medicine, Harvard Medical School, Boston, MA 02115, USA.
⁹ Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.
¹⁰ Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX 78520, USA.
¹¹ Division of Pulmonary Medicine, Department of Medicine, National Jewish Health, Denver, CO 80206, USA.
¹² Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA; Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA.
¹³ Jackson Heart Study, Department of Medicine, University of Mississippi Medical Center, Jackson, MS 39216, USA.
¹⁴ Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
¹⁵ Department of Psychiatry, Yale University School of Medicine, New Haven, CT 06510, USA; Olin Neuropsychiatric Research Center, Institute of Living, Hartford Hospital, Hartford, CT 06106, USA.
¹⁶ The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA 90502, USA.
¹⁷ Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, MA 01702, USA.
¹⁸ Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
¹⁹ Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA.
²⁰ USF Genomics, College of Public Health, University of South Florida, Tampa, FL 33612, USA.
²¹ Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA.
²² Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA; Geriatrics Research and Education Clinical Center, Baltimore VA Medical Center, Baltimore, MD 21201, USA.
²³ Division of Cardiology, Johns Hopkins University, Baltimore, MD 21287, USA.
²⁴ Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA.
²⁵ Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, MA 01702, USA; Sections of Preventive Medicine and Epidemiology, and of Cardiology, Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA; Department of Epidemiology, Boston University School of Public Health, Boston, MA 02118, USA.
²⁶ Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS 39216, USA.
²⁷ Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA 02115, USA; Division of Sleep Medicine, Harvard Medical School, Boston, MA 02115, USA; Division of Pulmonary, Critical Care, and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA 02115, USA.
²⁸ Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA 98101, USA; Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, USA; Seattle Epidemiologic Research and Information Center, Department of Veterans Affairs Office of Research and Development, Seattle, WA 98108, USA; Department of Epidemiology, University of Washington, Seattle, WA 98195, USA.
²⁹ Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.
³⁰ Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, MA 01702, USA; Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA.
³¹ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Department of Statistics, Harvard University, Cambridge, MA 02138, USA. Electronic address: xlin@hsph.harvard.edu.

PMID: 30639324
PMCID: PMC6372261
DOI: 10.1016/j.ajhg.2018.12.012

Abstract

With advances in whole-genome sequencing (WGS) technology, more advanced statistical methods for testing genetic association with rare variants are being developed. Methods in which variants are grouped for analysis are also known as variant-set, gene-based, and aggregate unit tests. The burden test and sequence kernel association test (SKAT) are two widely used variant-set tests, which were originally developed for samples of unrelated individuals and later have been extended to family data with known pedigree structures. However, computationally efficient and powerful variant-set tests are needed to make analyses tractable in large-scale WGS studies with complex study samples. In this paper, we propose the variant-set mixed model association tests (SMMAT) for continuous and binary traits using the generalized linear mixed model framework. These tests can be applied to large-scale WGS studies involving samples with population structure and relatedness, such as in the National Heart, Lung, and Blood Institute's Trans-Omics for Precision Medicine (TOPMed) program. SMMATs share the same null model for different variant sets, and a virtue of this null model, which includes covariates only, is that it needs to be fit only once for all tests in each genome-wide analysis. Simulation studies show that all the proposed SMMATs correctly control type I error rates for both continuous and binary traits in the presence of population structure and relatedness. We also illustrate our tests in a real data example of analysis of plasma fibrinogen levels in the TOPMed program (n = 23,763), using the Analysis Commons, a cloud-based computing platform.

Keywords: TOPMed; generalized linear mixed model; population structure; rare variants; relatedness; variant set association test; whole-genome sequencing.

PubMed Disclaimer

Figures

**Figure 1**
Map of Spatially Continuous Populations from Which Genotypes Were Simulated Based on the Coalescent Model (A) Map for a single-cohort simulation study: the top left 10 × 10 grid formed population 1, and the rest formed population 2. (B) Map for a meta-analysis simulation study: scenario A studies were unrelated individuals sampled from population 1 only; scenario B studies were related individuals sampled from specific regions in population 1 and population 2; scenario C studies were unrelated individuals sampled from specific regions in population 1 and population 2; and scenario D studies were related individuals sampled from specific regions in population 2 only.

**Figure 2**
Quantile-Quantile Plots of SMMAT-B, SMMAT-S, SMMAT-O, and SMMAT-E in the Analysis of 10,000 Samples in Single-Cohort Studies with Both Population Structure and Cryptic Relatedness, under the Null Hypothesis of No Genetic Association (A) Continuous traits in linear mixed models. (B) Binary traits in logistic mixed models.

**Figure 3**
Quantile-Quantile Plots of SMMAT-B, SMMAT-S, SMMAT-O, and SMMAT-E in the Meta-analysis of 12 Studies with a Total Sample Size of 12,000, under the Null Hypothesis of No Genetic Association (A) Continuous traits in linear mixed models, all studies in the same group. (B) Binary traits in logistic mixed models, all studies in the same group. (C) Continuous traits in linear mixed models, scenario A, B, C, and D studies in four separate groups. (D) Binary traits in logistic mixed models, scenario A, B, C, and D studies in four separate groups.

**Figure 4**
Empirical Power of Linear Mixed Model-Based SMMAT-B, SMMAT-S, SMMAT-O, SMMAT-E, and GLMM-MiST in Continuous Trait Analysis of 2,000, 5,000, and 10,000 Samples (A–C) 10% causal variants with 100% (A), 80% (B), or 50% (C) negative effects. (D–F) 20% causal variants with 100% (D), 80% (E), or 50% (F) negative effects. (G–I) 50% causal variants with 100% (G), 80% (H), or 50% (I) negative effects. Effect sizes were simulated using the same parameter in each row, but different across rows.

**Figure 5**
Empirical Power of Logistic Mixed Model-Based SMMAT-B, SMMAT-S, SMMAT-O, SMMAT-E, and GLMM-MiST in Binary Trait Analysis of 2,000, 5,000, and 10,000 Samples (A–C) 10% causal variants with 100% (A), 80% (B), or 50% (C) negative effects. (D–F) 20% causal variants with 100% (D), 80% (E), or 50% (F) negative effects. (G–I) 50% causal variants with 100% (G), 80% (H), or 50% (I) negative effects. Effect sizes were simulated using the same parameter in each row, but different across rows.

**Figure 6**
TOPMed Fibrinogen Level SMMAT Analysis Results via a Heteroscedastic Linear Mixed Model on Rare Variants with MAF < 5% in Non-overlapping 4 kb Sliding Windows on Chromosome 4 (n = 23,763) (A) Quantile-quantile plot. (B) p values on the log scale versus physical positions of the windows on chromosome 4 (build hg38).

See this image and copyright information in PMC

References

1. Kang H.M., Zaitlen N.A., Wade C.M., Kirby A., Heckerman D., Daly M.J., Eskin E. Efficient control of population structure in model organism association mapping. Genetics. 2008;178:1709–1723. - PMC - PubMed
1. Kang H.M., Sul J.H., Service S.K., Zaitlen N.A., Kong S.Y., Freimer N.B., Sabatti C., Eskin E. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 2010;42:348–354. - PMC - PubMed
1. Lippert C., Listgarten J., Liu Y., Kadie C.M., Davidson R.I., Heckerman D. FaST linear mixed models for genome-wide association studies. Nat. Methods. 2011;8:833–835. - PubMed
1. Zhou X., Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 2012;44:821–824. - PMC - PubMed
1. Pirinen M., Donnelly P., Spencer C.C.A. Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies. Ann. Appl. Stat. 2013;7:369–390.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies

Affiliations

Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources