Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep;41(6):523-554.
doi: 10.1002/gepi.22055. Epub 2017 Jun 28.

Accommodating missingness in environmental measurements in gene-environment interaction analysis

Affiliations

Accommodating missingness in environmental measurements in gene-environment interaction analysis

Mengyun Wu et al. Genet Epidemiol. 2017 Sep.

Abstract

For the prognosis of complex diseases, beyond the main effects of genetic (G) and environmental (E) factors, gene-environment (G-E) interactions also play an important role. Many approaches have been developed for detecting important G-E interactions, most of which assume that measurements are complete. In practical data analysis, missingness in E measurements is not uncommon, and failing to properly accommodate such missingness leads to biased estimation and false marker identification. In this study, we conduct G-E interaction analysis with prognosis data under an accelerated failure time (AFT) model. To accommodate missingness in E measurements, we adopt a nonparametric kernel-based data augmentation approach. With a well-designed weighting scheme, a nice "byproduct" is that the proposed approach enjoys a certain robustness property. A penalization approach, which respects the "main effects, interactions" hierarchy, is adopted for selection (of important interactions and main effects) and regularized estimation. The proposed approach has sound interpretations and a solid statistical basis. It outperforms multiple alternatives in simulation. The analysis of TCGA data on lung cancer and melanoma leads to interesting findings and models with superior prediction.

Keywords: G-E interaction; data augmentation; missing data; penalized estimation; prognosis.

PubMed Disclaimer

Figures

Figure A.1
Figure A.1
Data analysis: the modified RV-coefficients between different approaches. Left: LUAD; Right: SKCM.
Figure A.2
Figure A.2
Analysis of one simulation replicate based on the LUAD (left) and SKCM (right) data: the modified RV-coefficients between different approaches.
Figure 1
Figure 1
A small example: data analyzed under different approaches. Grey squares represent missing measurements and their augmented/imputed values. The numbers highlighted in red deviate from the majority of the observations (“outliers”).

Similar articles

Cited by

References

    1. Bien J, Taylor J, Tibshirani R. A lasso for hierarchical interactions. Annals of Statistics. 2013;41:1111–1141. - PMC - PubMed
    1. Chen J, Chen Z. Extended Bayesian information criteria for model selection with large model spaces. Biometrika. 2008;95(3):759–771.
    1. Creemers A, Aerts M, Hens N, Molenberghs G. A nonparametric approach to weighted estimating equations for regression analysis with missing covariates. Computational Statistics & Data Analysis. 2012;56(1):100–113.
    1. Folch-Fortuny A, Arteaga F, Ferrer A. PCA model building with missing data: New proposals and a comparative study. Chemometrics & Intelligent Laboratory Systems. 2015;146:77–88.
    1. He K, Li Y, Zhu J, Liu H, Lee JE, Amos CI, Li Y. Component-wise gradient boosting and false discovery control in survival analysis with high-dimensional covariates. Bioinformatics. 2016;32(1):50–57. - PMC - PubMed

LinkOut - more resources