Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 10;3(8):100359.
doi: 10.1016/j.xgen.2023.100359. eCollection 2023 Aug 9.

Multiset correlation and factor analysis enables exploration of multi-omics data

Affiliations

Multiset correlation and factor analysis enables exploration of multi-omics data

Brielin C Brown et al. Cell Genom. .

Abstract

Multi-omics datasets are becoming more common, necessitating better integration methods to realize their revolutionary potential. Here, we introduce multi-set correlation and factor analysis (MCFA), an unsupervised integration method tailored to the unique challenges of high-dimensional genomics data that enables fast inference of shared and private factors. We used MCFA to integrate methylation markers, protein expression, RNA expression, and metabolite levels in 614 diverse samples from the Trans-Omics for Precision Medicine/Multi-Ethnic Study of Atherosclerosis multi-omics pilot. Samples cluster strongly by ancestry in the shared space, even in the absence of genetic information, while private spaces frequently capture dataset-specific technical variation. Finally, we integrated genetic data by conducting a genome-wide association study (GWAS) of our inferred factors, observing that several factors are enriched for GWAS hits and trans-expression quantitative trait loci. Two of these factors appear to be related to metabolic disease. Our study provides a foundation and framework for further integrative analysis of ever larger multi-modal genomic datasets.

PubMed Disclaimer

Conflict of interest statement

T.L. is a paid adviser or consultant of GSK, Pfizer, and Goldfinch Bio and has equity in Variant Bio. F.A. is an employee and shareholder of Illumina, Inc.

Figures

None
Graphical abstract
Figure 1
Figure 1
Overview of MCFA integration results (A) The MCFA model. Each observed data mode (Ym) has contributions from two latent factors, one private to it (Xm) and one shared with other modes (Z). (B) Breakdown of the variance in four omics types captured by the inferred space, as well as the per-mode contribution to each shared factor. (C) UMAP embedding of the shared and private spaces, annotated with the most relevant feature set. Broadly, the top shared factors capture demographics, while the top private factors capture technical variation. (D) Variance in sample metadata explained by each learned space. This shows that the shared space also captures inferred cell-type composition estimates as well as clinical biomarkers.
Figure 2
Figure 2
Comparison of MCFA with other methods (A) UMAP embeddings of MOFA (left) and MMAE (right) shared space show that these methods fail to separate meaningful information from technical variation. (B) Variance in sample metadata explained by the MOFA2 (top) and MMAE (bottom) shared spaces. MOFA2 primarily learns factors related to the methylation dataset, while the MMAE additionally incorporates some factors related to RNA sequencing. (C) Correlation of each inferred factor with each metadata sample for MOFA (top) and the MMAE (bottom).
Figure 3
Figure 3
Factor interpretation and integration with GWAS data (A) QQ-plot of a GWAS for factors 1, 2, 6, and 7. Genetic associations with these factors are enriched for known GWAS loci (1, 6, and 7), trans-eQTLs (1 and 7), or highly influential trans-eQTLs (2 and 7). (B and C) Correlation of factors 6 (B) and 7 (C) with morphological, immune-composition, and clinical metadata reveals that factor 6 is related to body composition and lipid profile, while factor 7 is related to body composition, inferred blood cell-type composition, and inflammatory biomarkers. (D) Z-transformed correlation of individual protein and metabolite data with factor 6 reveals genes and metabolites related to insulin resistance and metabolic syndrome. (E) Z-transformed correlation of individual methylation values with factor 7. Many genes colocated to these CpGs are involved in lipid metabolism.

References

    1. Krassowski M., Das V., Sahu S.K., Misra B.B. State of the Field in Multi-Omics Research: From Computational Needs to Data Mining and Sharing. Front. Genet. 2020;11:610798. doi: 10.3389/FGENE.2020.610798. - DOI - PMC - PubMed
    1. Hasin Y., Seldin M., Lusis A. Multi-omics approaches to disease. Genome Biol. 2017;18:83. doi: 10.1186/S13059-017-1215-1. - DOI - PMC - PubMed
    1. Welch J.D., Kozareva V., Ferreira A., Vanderburg C., Martin C., Macosko E.Z. Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity. Cell. 2019;177:1873–1887.e17. doi: 10.1016/j.cell.2019.05.006. - DOI - PMC - PubMed
    1. Argelaguet R., Velten B., Arnol D., Dietrich S., Zenz T., Marioni J.C., Buettner F., Huber W., Stegle O. Multi-Omics Factor Analysis—a framework for unsupervised integration of multi-omics data sets. Mol. Syst. Biol. 2018;14:e8124. doi: 10.15252/msb.20178124. - DOI - PMC - PubMed
    1. Hotelling H. Relations Between Two Sets of Variates. Biometrika. 1936;28:321–377. doi: 10.2307/2333955. - DOI

Grants and funding