A supervised Bayesian factor model for the identification of multi-omics signatures
- PMID: 38603606
- PMCID: PMC11078774
- DOI: 10.1093/bioinformatics/btae202
A supervised Bayesian factor model for the identification of multi-omics signatures
Abstract
Motivation: Predictive biological signatures provide utility as biomarkers for disease diagnosis and prognosis, as well as prediction of responses to vaccination or therapy. These signatures are identified from high-throughput profiling assays through a combination of dimensionality reduction and machine learning techniques. The genes, proteins, metabolites, and other biological analytes that compose signatures also generate hypotheses on the underlying mechanisms driving biological responses, thus improving biological understanding. Dimensionality reduction is a critical step in signature discovery to address the large number of analytes in omics datasets, especially for multi-omics profiling studies with tens of thousands of measurements. Latent factor models, which can account for the structural heterogeneity across diverse assays, effectively integrate multi-omics data and reduce dimensionality to a small number of factors that capture correlations and associations among measurements. These factors provide biologically interpretable features for predictive modeling. However, multi-omics integration and predictive modeling are generally performed independently in sequential steps, leading to suboptimal factor construction. Combining these steps can yield better multi-omics signatures that are more predictive while still being biologically meaningful.
Results: We developed a supervised variational Bayesian factor model that extracts multi-omics signatures from high-throughput profiling datasets that can span multiple data types. Signature-based multiPle-omics intEgration via lAtent factoRs (SPEAR) adaptively determines factor rank, emphasis on factor structure, data relevance and feature sparsity. The method improves the reconstruction of underlying factors in synthetic examples and prediction accuracy of coronavirus disease 2019 severity and breast cancer tumor subtypes.
Availability and implementation: SPEAR is a publicly available R-package hosted at https://bitbucket.org/kleinstein/SPEAR.
© The Author(s) 2024. Published by Oxford University Press.
Conflict of interest statement
S.H.K. receives consulting fees from Peraton. All other authors declare that they have no competing interests.
Figures





Update of
-
A supervised Bayesian factor model for the identification of multi-omics signatures.bioRxiv [Preprint]. 2023 Sep 27:2023.01.25.525545. doi: 10.1101/2023.01.25.525545. bioRxiv. 2023. Update in: Bioinformatics. 2024 May 2;40(5):btae202. doi: 10.1093/bioinformatics/btae202. PMID: 36747790 Free PMC article. Updated. Preprint.
Similar articles
-
A supervised Bayesian factor model for the identification of multi-omics signatures.bioRxiv [Preprint]. 2023 Sep 27:2023.01.25.525545. doi: 10.1101/2023.01.25.525545. bioRxiv. 2023. Update in: Bioinformatics. 2024 May 2;40(5):btae202. doi: 10.1093/bioinformatics/btae202. PMID: 36747790 Free PMC article. Updated. Preprint.
-
Cancer subtype identification by multi-omics clustering based on interpretable feature and latent subspace learning.Methods. 2024 Nov;231:144-153. doi: 10.1016/j.ymeth.2024.09.014. Epub 2024 Sep 24. Methods. 2024. PMID: 39326482
-
MORE: a multi-omics data-driven hypergraph integration network for biomedical data classification and biomarker identification.Brief Bioinform. 2024 Nov 22;26(1):bbae658. doi: 10.1093/bib/bbae658. Brief Bioinform. 2024. PMID: 39692449 Free PMC article.
-
Advance computational tools for multiomics data learning.Biotechnol Adv. 2024 Dec;77:108447. doi: 10.1016/j.biotechadv.2024.108447. Epub 2024 Sep 7. Biotechnol Adv. 2024. PMID: 39251098 Review.
-
From Data to Cure: A Comprehensive Exploration of Multi-omics Data Analysis for Targeted Therapies.Mol Biotechnol. 2025 Apr;67(4):1269-1289. doi: 10.1007/s12033-024-01133-6. Epub 2024 Apr 2. Mol Biotechnol. 2025. PMID: 38565775 Free PMC article. Review.
Cited by
-
Identification of a multi-omics factor predictive of long COVID in the IMPACC study.bioRxiv [Preprint]. 2025 Feb 14:2025.02.12.637926. doi: 10.1101/2025.02.12.637926. bioRxiv. 2025. PMID: 39990442 Free PMC article. Preprint.
-
Predictive overfitting in immunological applications: Pitfalls and solutions.Hum Vaccin Immunother. 2023 Aug 1;19(2):2251830. doi: 10.1080/21645515.2023.2251830. Hum Vaccin Immunother. 2023. PMID: 37697867 Free PMC article. Review.