Accommodating missingness in environmental measurements in gene-environment interaction analysis
- PMID: 28657194
- PMCID: PMC5561007
- DOI: 10.1002/gepi.22055
Accommodating missingness in environmental measurements in gene-environment interaction analysis
Abstract
For the prognosis of complex diseases, beyond the main effects of genetic (G) and environmental (E) factors, gene-environment (G-E) interactions also play an important role. Many approaches have been developed for detecting important G-E interactions, most of which assume that measurements are complete. In practical data analysis, missingness in E measurements is not uncommon, and failing to properly accommodate such missingness leads to biased estimation and false marker identification. In this study, we conduct G-E interaction analysis with prognosis data under an accelerated failure time (AFT) model. To accommodate missingness in E measurements, we adopt a nonparametric kernel-based data augmentation approach. With a well-designed weighting scheme, a nice "byproduct" is that the proposed approach enjoys a certain robustness property. A penalization approach, which respects the "main effects, interactions" hierarchy, is adopted for selection (of important interactions and main effects) and regularized estimation. The proposed approach has sound interpretations and a solid statistical basis. It outperforms multiple alternatives in simulation. The analysis of TCGA data on lung cancer and melanoma leads to interesting findings and models with superior prediction.
Keywords: G-E interaction; data augmentation; missing data; penalized estimation; prognosis.
© 2017 WILEY PERIODICALS, INC.
Figures



Similar articles
-
Robust semiparametric gene-environment interaction analysis using sparse boosting.Stat Med. 2019 Oct 15;38(23):4625-4641. doi: 10.1002/sim.8322. Epub 2019 Jul 29. Stat Med. 2019. PMID: 31359454 Free PMC article.
-
Robust identification of gene-environment interactions for prognosis using a quantile partial correlation approach.Genomics. 2019 Sep;111(5):1115-1123. doi: 10.1016/j.ygeno.2018.07.006. Epub 2018 Jul 17. Genomics. 2019. PMID: 30009922 Free PMC article.
-
Identifying gene-gene interactions using penalized tensor regression.Stat Med. 2018 Feb 20;37(4):598-610. doi: 10.1002/sim.7523. Epub 2017 Oct 16. Stat Med. 2018. PMID: 29034516 Free PMC article.
-
Hierarchical selection of genetic and gene by environment interaction effects in high-dimensional mixed models.Stat Methods Med Res. 2025 Jan;34(1):180-198. doi: 10.1177/09622802241293768. Epub 2024 Dec 10. Stat Methods Med Res. 2025. PMID: 39659138 Free PMC article. Review.
-
Robust genetic interaction analysis.Brief Bioinform. 2019 Mar 25;20(2):624-637. doi: 10.1093/bib/bby033. Brief Bioinform. 2019. PMID: 29897421 Free PMC article. Review.
Cited by
-
High-Dimensional Gene-Environment Interaction Analysis.Annu Rev Stat Appl. 2025 Mar;12:10.1146/annurev-statistics-112723-034315. doi: 10.1146/annurev-statistics-112723-034315. Epub 2024 Sep 11. Annu Rev Stat Appl. 2025. PMID: 40881670
-
Aligned deep neural network for integrative analysis with high-dimensional input.J Biomed Inform. 2023 Aug;144:104434. doi: 10.1016/j.jbi.2023.104434. Epub 2023 Jun 28. J Biomed Inform. 2023. PMID: 37391115 Free PMC article.
-
Molecular Biomarker Identification Using a Network-Based Bioinformatics Approach That Links COVID-19 With Smoking.Bioinform Biol Insights. 2023 Jul 14;17:11779322231186481. doi: 10.1177/11779322231186481. eCollection 2023. Bioinform Biol Insights. 2023. PMID: 37461741 Free PMC article.
-
A Selective Review of Multi-Level Omics Data Integration Using Variable Selection.High Throughput. 2019 Jan 18;8(1):4. doi: 10.3390/ht8010004. High Throughput. 2019. PMID: 30669303 Free PMC article. Review.
-
Identifying gene-environment interactions incorporating prior information.Stat Med. 2019 Apr 30;38(9):1620-1633. doi: 10.1002/sim.8064. Epub 2019 Jan 13. Stat Med. 2019. PMID: 30637789 Free PMC article.
References
-
- Chen J, Chen Z. Extended Bayesian information criteria for model selection with large model spaces. Biometrika. 2008;95(3):759–771.
-
- Creemers A, Aerts M, Hens N, Molenberghs G. A nonparametric approach to weighted estimating equations for regression analysis with missing covariates. Computational Statistics & Data Analysis. 2012;56(1):100–113.
-
- Folch-Fortuny A, Arteaga F, Ferrer A. PCA model building with missing data: New proposals and a comparative study. Chemometrics & Intelligent Laboratory Systems. 2015;146:77–88.
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources