Interaction analysis under misspecification of main effects: Some common mistakes and simple solutions
- PMID: 32101638
- DOI: 10.1002/sim.8505
Interaction analysis under misspecification of main effects: Some common mistakes and simple solutions
Abstract
The statistical practice of modeling interaction with two linear main effects and a product term is ubiquitous in the statistical and epidemiological literature. Most data modelers are aware that the misspecification of main effects can potentially cause severe type I error inflation in tests for interactions, leading to spurious detection of interactions. However, modeling practice has not changed. In this article, we focus on the specific situation where the main effects in the model are misspecified as linear terms and characterize its impact on common tests for statistical interaction. We then propose some simple alternatives that fix the issue of potential type I error inflation in testing interaction due to main effect misspecification. We show that when using the sandwich variance estimator for a linear regression model with a quantitative outcome and two independent factors, both the Wald and score tests asymptotically maintain the correct type I error rate. However, if the independence assumption does not hold or the outcome is binary, using the sandwich estimator does not fix the problem. We further demonstrate that flexibly modeling the main effect under a generalized additive model can largely reduce or often remove bias in the estimates and maintain the correct type I error rate for both quantitative and binary outcomes regardless of the independence assumption. We show, under the independence assumption and for a continuous outcome, overfitting and flexibly modeling the main effects does not lead to power loss asymptotically relative to a correctly specified main effect model. Our simulation study further demonstrates the empirical fact that using flexible models for the main effects does not result in a significant loss of power for testing interaction in general. Our results provide an improved understanding of the strengths and limitations for tests of interaction in the presence of main effect misspecification. Using data from a large biobank study "The Michigan Genomics Initiative", we present two examples of interaction analysis in support of our results.
Keywords: gene-environment interaction; generalized additive model (GAM); independence; joint tests; power; robust tests; sandwich variance estimator; type I error.
© 2020 John Wiley & Sons, Ltd.
Similar articles
-
Testing for gene-environment interaction under exposure misspecification.Biometrics. 2018 Jun;74(2):653-662. doi: 10.1111/biom.12813. Epub 2017 Nov 9. Biometrics. 2018. PMID: 29120492 Free PMC article.
-
Model-implied simulation-based power estimation for correctly specified and distributionally misspecified models: Applications to nonlinear and linear structural equation models.Behav Res Methods. 2024 Dec;56(8):8955-8991. doi: 10.3758/s13428-024-02507-z. Epub 2024 Oct 1. Behav Res Methods. 2024. PMID: 39354129 Free PMC article.
-
Robust Tests for Additive Gene-Environment Interaction in Case-Control Studies Using Gene-Environment Independence.Am J Epidemiol. 2018 Feb 1;187(2):366-377. doi: 10.1093/aje/kwx243. Am J Epidemiol. 2018. PMID: 28633381 Free PMC article.
-
Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives.Health Technol Assess. 2001;5(33):1-56. doi: 10.3310/hta5330. Health Technol Assess. 2001. PMID: 11701102 Review.
-
Generalized estimating equations in cluster randomized trials with a small number of clusters: Review of practice and simulation study.Clin Trials. 2016 Aug;13(4):445-9. doi: 10.1177/1740774516643498. Epub 2016 Apr 19. Clin Trials. 2016. PMID: 27094487 Review.
Cited by
-
Identification of Pancreatic Cancer Germline Risk Variants With Effects That Are Modified by Smoking.JCO Precis Oncol. 2024 Mar;8:e2300355. doi: 10.1200/PO.23.00355. JCO Precis Oncol. 2024. PMID: 38564682 Free PMC article.
-
A robust and adaptive framework for interaction testing in quantitative traits between multiple genetic loci and exposure variables.PLoS Genet. 2022 Nov 16;18(11):e1010464. doi: 10.1371/journal.pgen.1010464. eCollection 2022 Nov. PLoS Genet. 2022. PMID: 36383614 Free PMC article.
-
GEM: scalable and flexible gene-environment interaction analysis in millions of samples.Bioinformatics. 2021 Oct 25;37(20):3514-3520. doi: 10.1093/bioinformatics/btab223. Bioinformatics. 2021. PMID: 34695175 Free PMC article.
-
A hierarchical integrative group least absolute shrinkage and selection operator for analyzing environmental mixtures.Environmetrics. 2021 Dec;32(8):e2698. doi: 10.1002/env.2698. Epub 2021 Jul 30. Environmetrics. 2021. PMID: 34899005 Free PMC article.
-
Adjuvant nivolumab in muscle-invasive urothelial carcinoma: exploratory biomarker analysis of the randomized phase 3 CheckMate 274 trial.Nat Med. 2025 Aug 7. doi: 10.1038/s41591-025-03802-8. Online ahead of print. Nat Med. 2025. PMID: 40775055
References
REFERENCES
-
- Bateson W. Mendel's Principles of Heredity. Cambridge, UK: Cambridge University Press; 1909.
-
- Vansteelandt S, Vanderweele TJ, Tchetgen EJ, Robins JM. Multiply robust inference for statistical interactions. J Am Stat Assoc. 2008;103(484):1693-1704.
-
- Rosenblum M, van der Laan MJ. Using regression models to analyze randomized trials: asymptotically valid hypothesis tests despite incorrectly specified models. Biometrics. 2009;65(3):937-945.
-
- Tchetgen Tchetgen EJ. Multiple-Robust Estimation of an Odds Ratio Interaction. Harvard University Biostatistics Working Paper Series; 2012:paper 142.
-
- Tchetgen Tchetgen EJ, Kraft P. On the robustness of tests of genetic associations incorporating gene-environment interaction when the environmental exposure is misspecified. Epidemiology. 2011;22(2):257-261.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources