Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2016 Dec;124(12):1848-1856.
doi: 10.1289/EHP172. Epub 2016 May 24.

A Systematic Comparison of Linear Regression-Based Statistical Methods to Assess Exposome-Health Associations

Affiliations
Comparative Study

A Systematic Comparison of Linear Regression-Based Statistical Methods to Assess Exposome-Health Associations

Lydiane Agier et al. Environ Health Perspect. 2016 Dec.

Abstract

Background: The exposome constitutes a promising framework to improve understanding of the effects of environmental exposures on health by explicitly considering multiple testing and avoiding selective reporting. However, exposome studies are challenged by the simultaneous consideration of many correlated exposures.

Objectives: We compared the performances of linear regression-based statistical methods in assessing exposome-health associations.

Methods: In a simulation study, we generated 237 exposure covariates with a realistic correlation structure and with a health outcome linearly related to 0 to 25 of these covariates. Statistical methods were compared primarily in terms of false discovery proportion (FDP) and sensitivity.

Results: On average over all simulation settings, the elastic net and sparse partial least-squares regression showed a sensitivity of 76% and an FDP of 44%; Graphical Unit Evolutionary Stochastic Search (GUESS) and the deletion/substitution/addition (DSA) algorithm revealed a sensitivity of 81% and an FDP of 34%. The environment-wide association study (EWAS) underperformed these methods in terms of FDP (average FDP, 86%) despite a higher sensitivity. Performances decreased considerably when assuming an exposome exposure matrix with high levels of correlation between covariates.

Conclusions: Correlation between exposures is a challenge for exposome research, and the statistical methods investigated in this study were limited in their ability to efficiently differentiate true predictors from correlated covariates in a realistic exposome context. Although GUESS and DSA provided a marginally better balance between sensitivity and FDP, they did not outperform the other multivariate methods across all scenarios and properties examined, and computational complexity and flexibility should also be considered when choosing between these methods. Citation: Agier L, Portengen L, Chadeau-Hyam M, Basagaña X, Giorgis-Allemand L, Siroux V, Robinson O, Vlaanderen J, González JR, Nieuwenhuijsen MJ, Vineis P, Vrijheid M, Slama R, Vermeulen R. 2016. A systematic comparison of linear regression-based statistical methods to assess exposome-health associations. Environ Health Perspect 124:1848-1856; http://dx.doi.org/10.1289/EHP172.

PubMed Disclaimer

Conflict of interest statement

The authors declare they have no actual or potential competing financial interests.

Figures

Figure 1
Figure 1
Performances of the statistical methods for scenario set 1. Model performances are summarized by their sensitivity (A), alternative sensitivity (AltSens, see "Methods") (B), false detection proportion (FDP) (C), alternative FDP (AltFDP, see "Methods") (D), specificity (E) and mean absolute bias (F). For each scenario defined by a number of true predictors varying from 0 to 25, statistics over the 100 runs are summarized by their mean (dot), and the variability of each statistic is summarized by 1 standard error in both directions from the average value (vertical dotted line). DSA, Deletion/substitution/addition; ENET, elastic net; EWAS, environment-wide association study; EWAS-MLR, EWAS-multiple linear regression; GUESS, Graphical Unit Evolutionary Stochastic Search; sPLS, sparse partial least squares.
Figure 2
Figure 2
Sensitivity and false discovery proportion (FDP) for scenario set 1. For each scenario defined by a number of true predictors varying from 0 to 25, for each statistical method, sensitivity and FDP over 100 runs are summarized by their mean values. DSA, Deletion/substitution/addition; ENET, elastic net; EWAS, environment-wide association study; EWAS-MLR, EWAS-multiple linear regression; GUESS, Graphical Unit Evolutionary Stochastic Search; sPLS, sparse partial least-squares.
Figure 3
Figure 3
Performances of the statistical methods according to the amount of correlation between the exposures. Model performances are summarized by their sensitivity (A), alternative sensitivity (AltSens, see "Methods") (B), false detection proportion (FDP) (C), alternative FDP (AltFDP, see "Methods") (D), specificity (E) and mean absolute bias (F). The solid line connects results for exposures generated from a multivariate normal distribution with covariance matrix ∑ (scenario set 1); the dashed line connects results obtained with covariance matrix ∑ (correlations divided by 2 compared with ∑, scenario set 4), and the dotted line connects results obtained with covariance matrix ∑+ (correlations multiplied by 2 compared with ∑ and upper bounded by 1, scenario set 5). For each scenario defined by a number of true predictors varying from 0 to 25, statistics over the 100 runs are summarized by their mean (dot). DSA, Deletion/substitution/addition; ENET, elastic net; EWAS, environment-wide association study; EWAS-MLR, EWAS-multiple linear regression; GUESS, Graphical Unit Evolutionary Stochastic Search; sPLS, sparse partial least squares.

References

    1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol. 1995;57:289–300.
    1. Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Stat. 2001;29:1165–1188.
    1. Bonferroni CE. Teoria statistica delle classi e calcolo delle probabilità [in Italian]. Pubbl del R Ist Super di Sci Econ e Commer di Firenze. 1936;8:3–62.
    1. Bottolo L, Chadeau-Hyam M, Hastie DI, Zeller T, Liquet B, Newcombe P, et al. 2013. GUESS-ing polygenic associations with multiple phenotypes using a GPU-based evolutionary stochastic search algorithm. PLoS Genet 9 e1003657, doi:10.1371/journal.pgen.1003657 - DOI - PMC - PubMed
    1. Bottolo L, Richardson S. Evolutionary stochastic search for Bayesian model exploration. Bayesian Anal. 2010;5:583–618.

Publication types

Substances