. 2019 Feb 21;9(1):2455.

doi: 10.1038/s41598-019-38983-z.

Asymmetric independence modeling identifies novel gene-environment interactions

Guoqiang Yu¹, David J Miller², Chiung-Ting Wu³, Eric P Hoffman⁴, Chunyu Liu⁵, David M Herrington⁶, Yue Wang³

Affiliations

¹ Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, 22203, USA. yug@vt.edu.
² Department of Electrical Engineering, The Pennsylvania State University, University Park, PA, 16802, USA.
³ Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, 22203, USA.
⁴ School of Pharmacy and Pharmaceutical Sciences, State University of New York, Binghamton, NY, 13902, USA.
⁵ Psychiatry and Behavioral Sciences, Upstate Medical University, Syracuse, NY, 13210, USA.
⁶ Department of Medicine, Wake Forest University, Winston-Salem, NC, 27157, USA.

PMID: 30792419
PMCID: PMC6385186
DOI: 10.1038/s41598-019-38983-z

Asymmetric independence modeling identifies novel gene-environment interactions

Guoqiang Yu et al. Sci Rep. 2019.

. 2019 Feb 21;9(1):2455.

doi: 10.1038/s41598-019-38983-z.

Authors

Guoqiang Yu¹, David J Miller², Chiung-Ting Wu³, Eric P Hoffman⁴, Chunyu Liu⁵, David M Herrington⁶, Yue Wang³

Affiliations

¹ Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, 22203, USA. yug@vt.edu.
² Department of Electrical Engineering, The Pennsylvania State University, University Park, PA, 16802, USA.
³ Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, 22203, USA.
⁴ School of Pharmacy and Pharmaceutical Sciences, State University of New York, Binghamton, NY, 13902, USA.
⁵ Psychiatry and Behavioral Sciences, Upstate Medical University, Syracuse, NY, 13210, USA.
⁶ Department of Medicine, Wake Forest University, Winston-Salem, NC, 27157, USA.

PMID: 30792419
PMCID: PMC6385186
DOI: 10.1038/s41598-019-38983-z

Abstract

Most genetic or environmental factors work together in determining complex disease risk. Detecting gene-environment interactions may allow us to elucidate novel and targetable molecular mechanisms on how environmental exposures modify genetic effects. Unfortunately, standard logistic regression (LR) assumes a convenient mathematical structure for the null hypothesis that however results in both poor detection power and type 1 error, and is also susceptible to missing factor, imperfect surrogate, and disease heterogeneity confounding effects. Here we describe a new baseline framework, the asymmetric independence model (AIM) in case-control studies, and provide mathematical proofs and simulation studies verifying its validity across a wide range of conditions. We show that AIM mathematically preserves the asymmetric nature of maintaining health versus acquiring a disease, unlike LR, and thus is more powerful and robust to detect synergistic interactions. We present examples from four clinically discrete domains where AIM identified interactions that were previously either inconsistent or recognized with less statistical certainty.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
Mathematical formulation and illustrative comparison between LR and AIM. (a) Theoretical discrepancy between Logistic Regression (LR) prediction and ground truth probability in the case of missing variables (Appendix B). (b) Theoretical capability of the Asymmetric Independence Model (AIM) to accurately predict the ground truth probability in the case of missing variables. (c) Mathematical expression of LR. (d) Mathematical expression of AIM.

**Figure 2**
Comparative performance assessment of AIM and LR using extensive simulation datasets. Our extensive simulation studies evaluate the type 1 error and detection power of AIM and LR in a controlled setting, under varying parameter settings which characterize the population being studied, as well as under the three confounding scenarios prominently identified in this paper – missing factors, surrogate factors, and disease subtypes. The goal is to understand the performance effects of different parameter settings and of these scenarios on both models. (a) The empirical type I error (evaluated when the null hypothesis of no interaction is valid) at significance level 0.05. The gray region is the 95% confidence interval. (b) Power versus sample size with interaction effect size at an odds ratio of 1.5; and case fraction of 50% and the main effect size of 1.5 for both risk factors. (c) Power versus case-control ratio. The fraction of cases is varied by adjusting the baseline parameter in the LR model possessing an interaction term. The sample size is 2000 and the interaction effect size is 1.5. The main effect size for both risk factors is 1.5. (d) Power versus frequency of risk allele, with sample size 2000, main effect size 1.5 for both risk factors, interaction effect size 1.5, and case fraction at 50%. (e) Power to detect an interaction versus correlation between the risk factors for AIM and LR models; both methods achieve their greatest detection power when risk factors are uncorrelated. (f) Power versus main effect size, with sample size 1000, interaction effect size 1.5, and case fraction 50%. (g) Sample size versus p-value threshold, with main effect size 1.5, interaction effect size 1.5, and case fraction 50%. (h) Statistical significance (log p-values) of five ground-truth interactions, as detected by the AIM and LR models (Appendix D–E).

**Figure 3**
Empirical type I error rate at significance level 0.05 for LR (dark grey) and AIM (light grey). (a) A few missing factors with large effect size; (b) Surrogate markers with strong marginal effects; (c) Three subtypes.

**Figure 4**
Re-analysis of the interaction between the ALDH2 gene and alcohol consumption.

See this image and copyright information in PMC

References

1. Hunter DJ. Gene-environment interactions in human diseases. Nat Rev Genet. 2005;6:287–298. doi: 10.1038/nrg1578. - DOI - PubMed
1. Wang X, Elston RC, Zhu X. Statistical interaction in human genetics: how should we model it if we are looking for biological interaction? Nat Rev Genet. 2011;12:74. doi: 10.1038/nrg2579-c2. - DOI - PMC - PubMed
1. Cordell HJ. Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet. 2009;10:392–404. doi: 10.1038/nrg2579. - DOI - PMC - PubMed
1. Yang Q, Khoury MJ, Sun F, Flanders WD. Case-only design to measure gene-gene interaction. Epidemiology. 1999;10:167–170. doi: 10.1097/00001648-199903000-00014. - DOI - PubMed
1. Wan X, et al. BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am J Hum Genet. 2010;87:325–340. doi: 10.1016/j.ajhg.2010.07.021. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Asymmetric independence modeling identifies novel gene-environment interactions

Affiliations

Asymmetric independence modeling identifies novel gene-environment interactions

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical