Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan;51(1):180-186.
doi: 10.1038/s41588-018-0271-0. Epub 2018 Nov 26.

A linear mixed-model approach to study multivariate gene-environment interactions

Collaborators, Affiliations

A linear mixed-model approach to study multivariate gene-environment interactions

Rachel Moore et al. Nat Genet. 2019 Jan.

Abstract

Different exposures, including diet, physical activity, or external conditions can contribute to genotype-environment interactions (G×E). Although high-dimensional environmental data are increasingly available and multiple exposures have been implicated with G×E at the same loci, multi-environment tests for G×E are not established. Here, we propose the structured linear mixed model (StructLMM), a computationally efficient method to identify and characterize loci that interact with one or more environments. After validating our model using simulations, we applied StructLMM to body mass index in the UK Biobank, where our model yields previously known and novel G×E signals. Finally, in an application to a large blood eQTL dataset, we demonstrate that StructLMM can be used to study interactions with hundreds of environmental variables.

PubMed Disclaimer

Conflict of interest statement

Competing interests

F.P.C. was employed at Microsoft while performing the research.

Figures

Figure 1
Figure 1. Overview of the StructLMM model.
(a) Basic genotype-environment interaction, with a genetic effect that is specific to one of two groups (blue and orange lines correspond to the average phenotypes observed within two environmental groups for two alleles). (b) Interaction with multiple environmental groups or bins of continuous environmental states (average phenotypes for groups exerting increasing GxE effects from blue to orange for two alleles). (c) StructLMM accounts for possible heterogeneity in effect sizes due to GxE using a multivariate normal prior, where alternative choices of the environmental covariance Σ can capture discrete (two groups, group hierarchy; see a,b) or continuous substructure of environmental exposures in the population (multiple envs). (d,e) Different illustrative example analyses using StructLMM. (d) Estimation of per-individual allelic effects in the population at individual loci. The violin plot displays the density of estimated allelic effect sizes for individuals in the population. Median and the top and bottom 5% quantiles of the effect size distribution are indicated by the red and green bars, respectively. (e) Bayes factors between the full model and models with environmental variables removed, thereby identifying environments that are most relevant for GxE.
Figure 2
Figure 2. Assessment of statistical calibration and power using simulated data.
(a) QQ plots of negative log P values from the StructLMM interaction test (green, StructLMM-int) using phenotypes simulated from the null (no genetic effect) for 103,527 variants on chromosome 21. (b) Comparison of power for detecting GxE interactions for increasing fractions of the genetic variance explained by GxE (ρ). Compared are the StructLMM interaction test (StructLMM-int) and a single-environment interaction test (SingleEnv-Renv-int). (c) Analogous power analysis, when simulating GxE using increasing numbers of active environments with non-zero GxE effects (out of 60 environments total, considered in all sts; ρ=0.7). All 60 environments contribute to the simulated additive environment effect. Models were assessed in terms of power (at Family Wise Error Rate - FWER<1%) for detecting variants with true GxE effects (Methods). Stars denote default values of genetic parameters, which were retained when varying other parameters (Supp. Table 1). A synthetic CEU population of 5,000 individuals based on the 1000 Genomes Project was used for all experiments.
Figure 3
Figure 3. Applications to model GxE on body mass index (BMI) in UK Biobank.
(a) Scatter plot of negative log P values from GxE interaction tests at 97 GIANT variants, considering a single-environment fixed effect GxE tests (SingleEnv-Renv-int, x-axis, P values Bonferroni adjusted for the number of tested environments) versus the StructLMM interaction test (StructLMM-int, y-axis). Dashed lines correspond to α<0.05, Bonferroni adjusted for the number of tests. (b) Local Manhattan plots of an interaction identified by StructLMM-int at MC4R. From top to bottom: LMM association test (LMM-Renv), StructLMM interaction test (StructLMM-int), single-environment LMM interaction test (SingleEnv-Renv-int) for the environment with the strongest GxE effect at the GIANT SNP, age-adjusted vigorous physical activity (vigorous physical activity x age). The red vertical line and diamond symbol indicates the GIANT SNP as in a. (c) Scatter plot of genome-wide negative log P values from LMM association test (LMM-Renv, x-axis) versus the StructLMM association test (y-axis). Dashed lines indicate genome-wide significance at P<5x10-8 and colour denotes the estimated extent of heterogeneity (fitted parameter ⍴), where yellow/red corresponds to variants with low/high GxE components. The inset displays a zoom-in view of variants close to genome-wide significance. n = 252,188 unrelated individuals of European ancestry for all experiments.
Figure 4
Figure 4. Downstream analysis to explore identified GxE loci.
(a) Violin plots showing distributions of the in-sample estimated allelic effect size (effect of heterozygous versus homozygous reference carriers for environmental states realised in the population; n = 252,188 unrelated individuals of European ancestry for all experiments; Methods) on BMI for the four GIANT variants with GxE (α<0.05, Fig. 3a). Estimated persistent genetic effects are shown by the red bar and the green bars indicate top and bottom 5% quantiles of variation in effect sizes due to GxE. (b) Cumulative evidence of environmental variables that explain GxE at MC4R, showing Bayes factors between the full model and models with increasing numbers of environmental variables removed using backward elimination. For comparison, shown is the evidence for all 64 environmental variables. ‘Alcohol frequency female’, is selected as the first environmental factor, followed by ‘Alcohol frequency x age’ and so on.
Figure 5
Figure 5. Gene-context interactions in a blood gene expression cohort.
(a) Cumulative fraction (top) and density (bottom) of eQTL with interactions (3,483 interaction eQTL; FDR<5%) as a function of the estimated extent of heterogeneity (fitted parameter ρ). (b-d) Example of an interaction eQTL for CTSW at the lead variant rs568617, which is in LD with rs568617 (r2=0.98, Supp. Fig. 23), a known risk variant for Crohn's disease. (b,c) Expression level of CTSW for different alleles at the lead eQTL variant, considering 10% strata of individuals (n = 204 independent samples) with the smallest (b) and largest (c) per-individual allelic effects as estimated using StructLMM, displaying the 25th, 50th and 75th percentiles, with whiskers extending to 1.5 times the interquartile range. (d) Scatter plot of CTSW expression level versus the aggregate environmental signal for the GxE effect at rs568617 (aggregate interacting environment), estimated using StructLMM (Supplementary Note). Individuals are stratified by the alleles at the eQTL lead variant. Solid lines denote regression lines for each genotype group.

References

    1. Hunter DJ. Gene-environment interactions in human diseases. Nat Rev Genet. 2005;6:287–98. - PubMed
    1. Ritz BR, et al. Lessons Learned From Past Gene-Environment Interaction Successes. Am J Epidemiol. 2017;186:778–786. - PMC - PubMed
    1. Brown AA, et al. Genetic interactions affecting human gene expression identified by variance association mapping. Elife. 2014;3:e01381. - PMC - PubMed
    1. Fairfax BP, et al. Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression. Science. 2014;343 1246949. - PMC - PubMed
    1. Kraft P, Yen YC, Stram DO, Morrison J, Gauderman WJ. Exploiting gene-environment interaction to detect genetic associations. Hum Hered. 2007;63:111–9. - PubMed

Publication types