Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 1;39(1):btad023.
doi: 10.1093/bioinformatics/btad023.

A unified mediation analysis framework for integrative cancer proteogenomics with clinical outcomes

Affiliations

A unified mediation analysis framework for integrative cancer proteogenomics with clinical outcomes

Licai Huang et al. Bioinformatics. .

Abstract

Motivation: Multilevel molecular profiling of tumors and the integrative analysis with clinical outcomes have enabled a deeper characterization of cancer treatment. Mediation analysis has emerged as a promising statistical tool to identify and quantify the intermediate mechanisms by which a gene affects an outcome. However, existing methods lack a unified approach to handle various types of outcome variables, making them unsuitable for high-throughput molecular profiling data with highly interconnected variables.

Results: We develop a general mediation analysis framework for proteogenomic data that include multiple exposures, multivariate mediators on various scales of effects as appropriate for continuous, binary and survival outcomes. Our estimation method avoids imposing constraints on model parameters such as the rare disease assumption, while accommodating multiple exposures and high-dimensional mediators. We compare our approach to other methods in extensive simulation studies at a range of sample sizes, disease prevalence and number of false mediators. Using kidney renal clear cell carcinoma proteogenomic data, we identify genes that are mediated by proteins and the underlying mechanisms on various survival outcomes that capture short- and long-term disease-specific clinical characteristics.

Availability and implementation: Software is made available in an R package (https://github.com/longjp/mediateR).

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Illustration of directed acyclic graph for mediation analysis where four disjoint sets of variables (nodes), covariates (C), exposures (X), mediators ( ) and response (Y) have their unique order, C<X<M<Y. The goal is to assess the causal impact of changing any single exposure X{X1,Xp} on an outcome Y and quantify how much of this effect is mediated by the set of mediators M={M1,,Mr}. The variables C={C1,,Cq} represent potential confounders. Our model assumes that the causal agents may be linked by unobserved factors (H) and permits mediators to have internal causal or correlation structure
Fig. 2.
Fig. 2.
Comparison of methods for computing the (a) direct effect and (b) indirect effect with logistic models. Numeric approximation has lower bias than the rare disease approximation and the probit approximation
Fig. 3.
Fig. 3.
Power for simulation with strong mediators and weak mediators with ridge penalties for 100 mediators. The true indirect effect is −695 for the strong mediators and the true indirect effect is −429 for the weak mediators
Fig. 4.
Fig. 4.
(a) Sankey diagram illustrates the indirect and direct effects (in days) of mRNA expression on three clinical survival outcomes as mediated by protein expressions (grouped into pathways). Nodes at the left are mRNA (colored coded by the pathways), cyan nodes at the middle are proteins (grouped into protein pathways), and nodes at the right are three survival endpoints. Edges are color coded by each of the mediation analyses with edge widths proportional to estimated absolute value of coefficients in regression without ridge penalties. Significant results in total/direct/indirect effect with ridge penalties are highlighted with a star that is in the color that indicates the corresponding survival outcome. (b) Multilayered network of PTEN gene on PFI mediated by proteins. A path PTEN → protein A → PFI is connected if protein A is a significant mediator and the magnitude of the product of the path coefficients is larger than 0.02. Within proteins, we connect two proteins if the P-value of its partial correlation is less than 0.001. Red indicates positive coefficients and blue indicates negative coefficients

References

    1. Akbani R. et al. (2014) A pan-cancer proteomic perspective on the cancer genome atlas. Nat. Commun., 5, 1–15. - PMC - PubMed
    1. Alcaraz N. et al. (2017) De novo pathway-based biomarker identification. Nucleic Acids Res., 45, e151. - PMC - PubMed
    1. Avin C. et al. (2005) Identifiability of path-specific effects. In Proceedings of the 19th International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. pp. 357–363.
    1. Barfield R. et al. (2017) Testing for the indirect effect under the null for genome-wide mediation analyses. Genet. Epidemiol., 41, 824–833. - PMC - PubMed
    1. Baron R.M., Kenny D.A. (1986) The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. J. Pers. Soc. Psychol., 51, 1173–1182. - PubMed

Publication types