Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb 21;9(1):2455.
doi: 10.1038/s41598-019-38983-z.

Asymmetric independence modeling identifies novel gene-environment interactions

Affiliations

Asymmetric independence modeling identifies novel gene-environment interactions

Guoqiang Yu et al. Sci Rep. .

Abstract

Most genetic or environmental factors work together in determining complex disease risk. Detecting gene-environment interactions may allow us to elucidate novel and targetable molecular mechanisms on how environmental exposures modify genetic effects. Unfortunately, standard logistic regression (LR) assumes a convenient mathematical structure for the null hypothesis that however results in both poor detection power and type 1 error, and is also susceptible to missing factor, imperfect surrogate, and disease heterogeneity confounding effects. Here we describe a new baseline framework, the asymmetric independence model (AIM) in case-control studies, and provide mathematical proofs and simulation studies verifying its validity across a wide range of conditions. We show that AIM mathematically preserves the asymmetric nature of maintaining health versus acquiring a disease, unlike LR, and thus is more powerful and robust to detect synergistic interactions. We present examples from four clinically discrete domains where AIM identified interactions that were previously either inconsistent or recognized with less statistical certainty.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Mathematical formulation and illustrative comparison between LR and AIM. (a) Theoretical discrepancy between Logistic Regression (LR) prediction and ground truth probability in the case of missing variables (Appendix B). (b) Theoretical capability of the Asymmetric Independence Model (AIM) to accurately predict the ground truth probability in the case of missing variables. (c) Mathematical expression of LR. (d) Mathematical expression of AIM.
Figure 2
Figure 2
Comparative performance assessment of AIM and LR using extensive simulation datasets. Our extensive simulation studies evaluate the type 1 error and detection power of AIM and LR in a controlled setting, under varying parameter settings which characterize the population being studied, as well as under the three confounding scenarios prominently identified in this paper – missing factors, surrogate factors, and disease subtypes. The goal is to understand the performance effects of different parameter settings and of these scenarios on both models. (a) The empirical type I error (evaluated when the null hypothesis of no interaction is valid) at significance level 0.05. The gray region is the 95% confidence interval. (b) Power versus sample size with interaction effect size at an odds ratio of 1.5; and case fraction of 50% and the main effect size of 1.5 for both risk factors. (c) Power versus case-control ratio. The fraction of cases is varied by adjusting the baseline parameter in the LR model possessing an interaction term. The sample size is 2000 and the interaction effect size is 1.5. The main effect size for both risk factors is 1.5. (d) Power versus frequency of risk allele, with sample size 2000, main effect size 1.5 for both risk factors, interaction effect size 1.5, and case fraction at 50%. (e) Power to detect an interaction versus correlation between the risk factors for AIM and LR models; both methods achieve their greatest detection power when risk factors are uncorrelated. (f) Power versus main effect size, with sample size 1000, interaction effect size 1.5, and case fraction 50%. (g) Sample size versus p-value threshold, with main effect size 1.5, interaction effect size 1.5, and case fraction 50%. (h) Statistical significance (log p-values) of five ground-truth interactions, as detected by the AIM and LR models (Appendix D–E).
Figure 3
Figure 3
Empirical type I error rate at significance level 0.05 for LR (dark grey) and AIM (light grey). (a) A few missing factors with large effect size; (b) Surrogate markers with strong marginal effects; (c) Three subtypes.
Figure 4
Figure 4
Re-analysis of the interaction between the ALDH2 gene and alcohol consumption.

Similar articles

References

    1. Hunter DJ. Gene-environment interactions in human diseases. Nat Rev Genet. 2005;6:287–298. doi: 10.1038/nrg1578. - DOI - PubMed
    1. Wang X, Elston RC, Zhu X. Statistical interaction in human genetics: how should we model it if we are looking for biological interaction? Nat Rev Genet. 2011;12:74. doi: 10.1038/nrg2579-c2. - DOI - PMC - PubMed
    1. Cordell HJ. Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet. 2009;10:392–404. doi: 10.1038/nrg2579. - DOI - PMC - PubMed
    1. Yang Q, Khoury MJ, Sun F, Flanders WD. Case-only design to measure gene-gene interaction. Epidemiology. 1999;10:167–170. doi: 10.1097/00001648-199903000-00014. - DOI - PubMed
    1. Wan X, et al. BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am J Hum Genet. 2010;87:325–340. doi: 10.1016/j.ajhg.2010.07.021. - DOI - PMC - PubMed

Publication types