A review on machine learning principles for multi-view biological data integration
- PMID: 28011753
- DOI: 10.1093/bib/bbw113
A review on machine learning principles for multi-view biological data integration
Abstract
Driven by high-throughput sequencing techniques, modern genomic and clinical studies are in a strong need of integrative machine learning models for better use of vast volumes of heterogeneous information in the deep understanding of biological systems and the development of predictive models. How data from multiple sources (called multi-view data) are incorporated in a learning system is a key step for successful analysis. In this article, we provide a comprehensive review on omics and clinical data integration techniques, from a machine learning perspective, for various analyses such as prediction, clustering, dimension reduction and association. We shall show that Bayesian models are able to use prior information and model measurements with various distributions; tree-based methods can either build a tree with all features or collectively make a final decision based on trees learned from each view; kernel methods fuse the similarity matrices learned from individual views together for a final similarity matrix or learning model; network-based fusion methods are capable of inferring direct and indirect associations in a heterogeneous network; matrix factorization models have potential to learn interactions among features from different views; and a range of deep neural networks can be integrated in multi-modal learning for capturing the complex mechanism of biological systems.
Similar articles
-
Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE).BMC Genomics. 2019 Dec 20;20(Suppl 11):944. doi: 10.1186/s12864-019-6285-x. BMC Genomics. 2019. PMID: 31856727 Free PMC article.
-
Systems Biology and Machine Learning in Plant-Pathogen Interactions.Mol Plant Microbe Interact. 2019 Jan;32(1):45-55. doi: 10.1094/MPMI-08-18-0221-FI. Epub 2018 Nov 12. Mol Plant Microbe Interact. 2019. PMID: 30418085 Review.
-
A tree-like Bayesian structure learning algorithm for small-sample datasets from complex biological model systems.BMC Syst Biol. 2015 Aug 28;9:49. doi: 10.1186/s12918-015-0194-7. BMC Syst Biol. 2015. PMID: 26310492 Free PMC article.
-
Big data in yeast systems biology.FEMS Yeast Res. 2019 Nov 1;19(7):foz070. doi: 10.1093/femsyr/foz070. FEMS Yeast Res. 2019. PMID: 31603503 Review.
-
Seeing the wood for the trees: a forest of methods for optimization and omic-network integration in metabolic modelling.Brief Bioinform. 2018 Nov 27;19(6):1218-1235. doi: 10.1093/bib/bbx053. Brief Bioinform. 2018. PMID: 28575143
Cited by
-
Artificial Intelligence to Decode Cancer Mechanism: Beyond Patient Stratification for Precision Oncology.Front Pharmacol. 2020 Aug 12;11:1177. doi: 10.3389/fphar.2020.01177. eCollection 2020. Front Pharmacol. 2020. PMID: 32903628 Free PMC article. Review.
-
A New Era of Neuro-Oncology Research Pioneered by Multi-Omics Analysis and Machine Learning.Biomolecules. 2021 Apr 12;11(4):565. doi: 10.3390/biom11040565. Biomolecules. 2021. PMID: 33921457 Free PMC article. Review.
-
Multi-view based integrative analysis of gene expression data for identifying biomarkers.Sci Rep. 2019 Sep 18;9(1):13504. doi: 10.1038/s41598-019-49967-4. Sci Rep. 2019. PMID: 31534156 Free PMC article.
-
Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.BMC Bioinformatics. 2018 May 31;19(1):202. doi: 10.1186/s12859-018-2187-1. BMC Bioinformatics. 2018. PMID: 29855387 Free PMC article.
-
Anomaly detection in genomic catalogues using unsupervised multi-view autoencoders.BMC Bioinformatics. 2021 Sep 25;22(1):460. doi: 10.1186/s12859-021-04359-2. BMC Bioinformatics. 2021. PMID: 34563116 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources