Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May;128(5):55001.
doi: 10.1289/EHP6240. Epub 2020 May 6.

Challenges Raised by Mediation Analysis in a High-Dimension Setting

Affiliations

Challenges Raised by Mediation Analysis in a High-Dimension Setting

Michaël G B Blum et al. Environ Health Perspect. 2020 May.

Abstract

Background: Mediation analysis is used in epidemiology to identify pathways through which exposures influence health. The advent of high-throughput (omics) technologies gives opportunities to perform mediation analysis with a high-dimension pool of covariates.

Objective: We aimed to highlight some biostatistical issues of this expanding field of high-dimension mediation.

Discussion: The mediation techniques used for a single mediator cannot be generalized in a straightforward manner to high-dimension mediation. Causal knowledge on the relation between covariates is required for mediation analysis, and it is expected to be more limited as dimension and system complexity increase. The methods developed in high dimension can be distinguished according to whether mediators are considered separately or as a whole. Methods considering each potential mediator separately do not allow efficient identification of the indirect effects when mutual influences exist among the mediators, which is expected for many biological (e.g., epigenetic) parameters. In this context, methods considering all potential mediators simultaneously, based, for example, on data reduction techniques, are more adapted to the causal inference framework. Their cost is a possible lack of ability to single out the causal mediators. Moreover, the ability of the mediators to predict the outcome can be overestimated, in particular because many machine-learning algorithms are optimized to increase predictive ability rather than their aptitude to make causal inference. Given the lack of overarching validated framework and the generally complex causal structure of high-dimension data, analysis of high-dimension mediation currently requires great caution and effort to incorporate a priori biological knowledge. https://doi.org/10.1289/EHP6240.

PubMed Disclaimer

Figures

Figure 1 is a schematic displaying E connected to M subscript 2 which is connected to Y, and E directly connected to Y. C subscript 1 is connected to E and Y, C subscript 2 is connected to E and M, and C subscript 3 is connected to M and Y. All are connected with arrows.
Figure 1.
Example of the effect of a single exposure E whose effect on outcome Y is mediated by a single mediator M. Exposure–outcome (C1), exposure–mediator (C2) and mediator–outcome confounders (C3) need to be controlled for. Adapted from VanderWeele (2015).
Figure 2 is a schematic displaying E connected to M subscript 2 and M subscript 1, which are connected to Y. E and Y and M subscript 1 and M subscript 2 are directly connected with each other. C subscript 1 is connected to E and Y, C subscript 2 is connected to E and M subscript 2, and C subscript 3 is connected to M subscript 2 and Y. All are connected with arrows.
Figure 2.
Example of mediation with two mediators M1 and M2 influencing each other.
Figure 3 is a schematic displaying E connected to M subscript 1 to M subscript i and M subscript j to M subscript p, which are connected to Y. E and Y and M subscript j and M subscript i are directly connected with each other. C subscript 1 is connected to E and Y, C subscript 2 is connected to E and M subscript j to M subscript p, and C subscript 3 is connected to M subscript j to M subscript p and Y. All are connected with arrows.
Figure 3.
High-dimension mediation. Hypothesized relation between an exposure E; a health outcome Y; an exposure–outcome confounder C1; a high-dimension mediator M=(Mi)ip, where p is typically larger than the number of observations in the data set, an exposure–mediator confounder C2; and a mediator–outcome confounder C3. Causal influences also exist among the candidate mediators (here, Mj influences Mi). p is typically much larger than the number of observations n in the data set.
Figure 4 is a bar graph plotting Counts ranging from 0 to 150 in increments of 50 (y-axis) across p-values ranging from 0.00 to 1.00 in increments of 0.25 (x-axis) for p-values and p-values after adjustment using fdrtools.
Figure 4.
Raw distribution of the p-values of the Sobel mediation test for 5,000 simulated variables that are putative mediators (in red, not uniform) and corrected distribution (blue) after using the fdrtool package (R version 3.6.1; R Development Core Team). After correction, the distribution is closer to that expected under the simulated causal model, which assumes the presence of mediators, so that one observes a mixture of a uniform distribution and a distribution with an excess of small p-values. The distribution of the raw p-values should be uniform except for an excess of small p-values corresponding to true mediators. The fact that the (red) distribution is not uniform may indicate several deviations from the null model such as confounding factors or poor standardization of the test statistic. The red histogram indicates that the Sobel test is too conservative (MacKinnon et al. 1995). Here we use the R package fdrtool that implements an empirical null distribution approach to transform initial p-values to uniformly distributed p-values and that provides control of the false discovery rate (Strimmer 2008). To perform simulations, we consider the mediation model of Equation 4, where there are 500 random mediators influenced by the environment that affect the simulated outcome according to Equation 4. We considered 4,500 additional putative mediators distributed according to a multivariate distribution that did not depend on environment and outcome (see code on GitHub https://github.com/mblumuga/opinion_mediation/blob/master/Simus_Sobel_FDR.R).

References

    1. Abraham E, Rousseaux S, Agier L, Giorgis-Allemand L, Tost J, Galineau J, et al. . 2018. Pregnancy exposure to atmospheric pollution and meteorological conditions and placental DNA methylation. Environ Int 118:334–347, PMID: 29935799, 10.1016/j.envint.2018.05.007. - DOI - PubMed
    1. Agier L, Portengen L, Chadeau-Hyam M, Basagaña X, Giorgis-Allemand L, Siroux V, et al. . 2016. A systematic comparison of linear regression-based statistical methods to assess exposome-health associations. Environ Health Perspect 124(12):1848–1856, PMID: 27219331, 10.1289/EHP172. - DOI - PMC - PubMed
    1. Barfield R, Shen J, Just AC, Vokonas PS, Schwartz J, Baccarelli AA, et al. . 2017. Testing for the indirect effect under the null for genome-wide mediation analyses. Genet Epidemiol 41(8):824–833, PMID: 29082545, 10.1002/gepi.22084. - DOI - PMC - PubMed
    1. Baron RM, Kenny DA. 1986. The moderator–mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J Pers Soc Psychol 51(6):1173–1182, PMID: 3806354, 10.1037//0022-3514.51.6.1173. - DOI - PubMed
    1. Bellavia A, James-Todd T, Williams PL. 2019. Approaches for incorporating environmental mixtures as mediators in mediation analysis. Environ Int 123:368–374, PMID: 30572168, 10.1016/j.envint.2018.12.024. - DOI - PMC - PubMed

Publication types

LinkOut - more resources