Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jun 28;40(Suppl 1):i189-i198.
doi: 10.1093/bioinformatics/btae216.

Probabilistic pathway-based multimodal factor analysis

Collaborators, Affiliations

Probabilistic pathway-based multimodal factor analysis

Alexander Immer et al. Bioinformatics. .

Abstract

Motivation: Multimodal profiling strategies promise to produce more informative insights into biomedical cohorts via the integration of the information each modality contributes. To perform this integration, however, the development of novel analytical strategies is needed. Multimodal profiling strategies often come at the expense of lower sample numbers, which can challenge methods to uncover shared signals across a cohort. Thus, factor analysis approaches are commonly used for the analysis of high-dimensional data in molecular biology, however, they typically do not yield representations that are directly interpretable, whereas many research questions often center around the analysis of pathways associated with specific observations.

Results: We develop PathFA, a novel approach for multimodal factor analysis over the space of pathways. PathFA produces integrative and interpretable views across multimodal profiling technologies, which allow for the derivation of concrete hypotheses. PathFA combines a pathway-learning approach with integrative multimodal capability under a Bayesian procedure that is efficient, hyper-parameter free, and able to automatically infer observation noise from the data. We demonstrate strong performance on small sample sizes within our simulation framework and on matched proteomics and transcriptomics profiles from real tumor samples taken from the Swiss Tumor Profiler consortium. On a subcohort of melanoma patients, PathFA recovers pathway activity that has been independently associated with poor outcome. We further demonstrate the ability of this approach to identify pathways associated with the presence of specific cell-types as well as tumor heterogeneity. Our results show that we capture known biology, making it well suited for analyzing multimodal sample cohorts.

Availability and implementation: The tool is implemented in python and available at https://github.com/ratschlab/path-fa.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Schematic overview of our method. (A) Samples are hierarchically represented via pathways and latents inferred jointly from transcriptomics (RNA) and proteomics observations. Corresponding prior pathways translate into the space of both observed modalities. (B) Density plot of observation noises for both RNA and proteomics data shows heteroscedasticity within and across modalities. Proteomics markers have less varying precision while RNA has more spread. (C) Hierarchy of representations for a single sample. A sample is represented by a low-dimensional latent variable, projected into pathway abundances, which can be reconstructed into both proteomics and RNA observations.
Figure 2.
Figure 2.
Reconstruction log-likelihood on the synthetic benchmark data as a function of samples for proteomics and RNA averaged over 20 runs. Shaded regions denote twice the standard error about the mean. Unimodal PathFA, unimodal PLIER, and MOFA are compared to optimal attainable performance (dotted line).
Figure 3.
Figure 3.
Reconstruction log-likelihood of multimodal PathFA in comparison to the unimodal variant as depicted in Fig. 2.
Figure 4.
Figure 4.
Average absolute error of marker-level observation noises on synthetic benchmark of Multi-PathFA and MOFA for both proteomics and RNA. The lines show the average over 20 runs and shaded regions two standard errors.
Figure 5.
Figure 5.
This figure shows the Pearson correlation coefficients of pathway loadings with cell-type content (based on ground truth computed from CyTOF data) across 42 ovarian cancer samples for the four most common cell-types. MSigDB cell-type pathways were used for PathFA across all experiments. Multirefers to the multimodal setting, while RNA and Prot refers towards results based on transcriptomic and proteomic data only. FA refers to factor analysis respectively MOFA in the multisetting. For both, correlation with latent factors is computed. PLIER can only be used in unimodal setting.
Figure 6.
Figure 6.
Significant associations between survival and progression in relationship to pathway loadings from PathFA. The color refers to the normalized mean difference between the two patient groups in each column, for significant associations only. The clustering tree implies the presence of three clusters, two of which seem to be higher expressed [top (5 pathways) and middle (12 pathways)], and the remaining one appears to be down-regulated in reference to the according patient group (y-axis).
Figure 7.
Figure 7.
Scatter plot of Jensen–Shannon tumor heterogeneity computed on CyTOF measurements of tumor cells (y-axis) and mTORC1 pathway loading generated by PathFA. Each dot represents a ovarian cancer patient sample. The Pearson’s correlation is 0.41 (P-value = .0075).

References

    1. Argelaguet R, Velten B, Arnol D et al. Multi-omics factor analysis—a framework for unsupervised integration of multi-omics data sets. Mol Syst Biol 2018;14:e8124. - PMC - PubMed
    1. Bishop C. Bayesian PCA. Adv Neural Inform Process Syst 1998;11.
    1. Boehm KM, Khosravi P, Vanguri R et al. Harnessing multimodal data integration to advance precision oncology. Nat Rev Cancer 2022;22:114–26. - PMC - PubMed
    1. Chen B, Khodadoust MS, Liu CL et al. Profiling tumor infiltrating immune cells with cibersort. In: Cancer Systems Biology: Methods and Protocols, 2018, 243–59. - PMC - PubMed
    1. Consortium, U Uniprot: a worldwide hub of protein knowledge. Nucleic Acids Res 2019;47:D506–15. - PMC - PubMed

Publication types

LinkOut - more resources