Review

. 2020 Apr 2;16(4):e1007677.

doi: 10.1371/journal.pcbi.1007677. eCollection 2020 Apr.

Multiview learning for understanding functional multiomics

Nam D Nguyen¹, Daifeng Wang^{2

3}

Affiliations

¹ Department of Computer Science, Stony Brook University, Stony Brook, New York, United States of America.
² Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America.
³ Waisman Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America.

PMID: 32240163
PMCID: PMC7117667
DOI: 10.1371/journal.pcbi.1007677

Review

Multiview learning for understanding functional multiomics

Nam D Nguyen et al. PLoS Comput Biol. 2020.

. 2020 Apr 2;16(4):e1007677.

doi: 10.1371/journal.pcbi.1007677. eCollection 2020 Apr.

Authors

Nam D Nguyen¹, Daifeng Wang^{2

3}

Affiliations

¹ Department of Computer Science, Stony Brook University, Stony Brook, New York, United States of America.
² Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America.
³ Waisman Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America.

PMID: 32240163
PMCID: PMC7117667
DOI: 10.1371/journal.pcbi.1007677

Abstract

The molecular mechanisms and functions in complex biological systems currently remain elusive. Recent high-throughput techniques, such as next-generation sequencing, have generated a wide variety of multiomics datasets that enable the identification of biological functions and mechanisms via multiple facets. However, integrating these large-scale multiomics data and discovering functional insights are, nevertheless, challenging tasks. To address these challenges, machine learning has been broadly applied to analyze multiomics. This review introduces multiview learning-an emerging machine learning field-and envisions its potentially powerful applications to multiomics. In particular, multiview learning is more effective than previous integrative methods for learning data's heterogeneity and revealing cross-talk patterns. Although it has been applied to various contexts, such as computer vision and speech recognition, multiview learning has not yet been widely applied to biological data-specifically, multiomics data. Therefore, this paper firstly reviews recent multiview learning methods and unifies them in a framework called multiview empirical risk minimization (MV-ERM). We further discuss the potential applications of each method to multiomics, including genomics, transcriptomics, and epigenomics, in an aim to discover the functional and mechanistic interpretations across omics. Secondly, we explore possible applications to different biological systems, including human diseases (e.g., brain disorders and cancers), plants, and single-cell analysis, and discuss both the benefits and caveats of using multiview learning to discover the molecular mechanisms and functions of these systems.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Multiview learning deciphers mechanisms across functional omics.**
Molecular mechanisms (Center) are resulted from the interactions within and across multiomics, e.g., shown by green, orange, and blue color. The interactions within each omics are illustrated by colored links that matches with the color of that omics; the interactions across different omics are demonstrated by black links. Directed edges represent causal relationships. Edge weights represent relationship strengths. The single-view learning methods (Right) can only learn the within-view interactions separately for each omics via the functions f^(k),k = 1,2,3 (e.g., $p_{i, j}^{(3)} = {[f^{(3)} (X^{(3)})]}_{i, j}$ ). The multiview learning methods (Left) can reveal the cross-talk patterns among various omics, providing complete mechanistic insights on biological functions, e.g., by co-regularization terms Ω_co. These cross-talk patterns are contributed by each facet of learning in either alignment-based methods or factorization-based methods. For example, gene regulatory mechanism can relate to genomics (e.g., regulatory variants), transcriptomics (e.g., gene expression), and proteomics (e.g., TFs). Then Ω_co(f⁽²⁾,f⁽³⁾) represents that variants (e.g., SNPs) break the TFBSs (e.g., $p_{i, j}^{(2,3)} = {[Ω_{c o} (f^{(2)} (X^{(2)}), f^{(3)} (X^{(3)}))]}_{i, j}$ as in the figure). Ω_co(f⁽¹⁾,f⁽³⁾) represents that variants affect gene expression (e.g., eQTLs). Ω_co(f⁽¹⁾,f⁽²⁾) represents that TFs control target gene expression. The multiview learning can thus predict gene regulatory mechanisms across omics on how variants break TFBSs to affect gene expression. eQTL, expression quantitative trait loci; SNP, single-nucleotide polymorphism; TF, transcription factor; TFB, transcription factor binding site.

**Fig 2. MV-ERM.**
(A) ERM for single-view learning. It demonstrates a general single-view learning algorithm (based on ERM estimator) that takes one data set $X^{(1)}$ as input, adopts a hypothesis space $F^{(1)}$ and a loss function $l$ , and outputs a function $f^{(1)} \in F^{(1)}$ that predicts the label associated with any new datapoint x as f⁽¹⁾(x). **(B)** MV-ERM demonstrates a general multiview learning algorithm (based on MV-ERM estimator) that takes v datasets $X^{(1)}, \dots, X^{(v)}$ as v views, adopts v hypothesis spaces $F^{(1)}, \dots, F^{(v)}$ associated with v views, and outputs v functions $(f^{(1)}, \dots, f^{(v)}) \in F^{(1)} \times \dots \times F^{(v)}$ that reveals the interactions within and between each pair of datasets (via the terms Ω_co(f⁽ⁱ⁾,f^(j)). The consensus and complementary principles are implemented by the term Ω(f⁽ⁱ⁾) and Ω_co(f⁽ⁱ⁾,f^(j)) respectively. Note that in MV-ERM estimator, the loss function is optional because multiview learning can be unsupervised. ERM, empirical risk minimization; MV-ERM, multiview empirical risk minimization.

**Fig 3. Factorization-based versus alignment-based methods.**
(A) Factorization-based single-view learning methods. They typically factorize a data matrix X from single view (e.g., gene expression matrix of samples by genes) into a product of matrix G (coefficient matrix) and matrix $\tilde{F}$ (dictionary matrix or pattern matrix). Because matrix factorization has an intrinsic clustering property [8], the matrix $\tilde{F}$ can represent a clustering structure of the view (i.e., the soft clustering assignments or indicators). For example, $\tilde{F}$ reveals 3 different gene clusters, a, b, and c, as denoted in the figure. (B) Factorization-based multiview learning methods. They factorize different matrices from multiomics, e.g., gene expression X⁽¹⁾ (i.e., green matrix), protein expression X⁽²⁾ (i.e., blue matrix), and chromatin accessibility X⁽³⁾ (i.e., orange matrix), into a product of different coefficient matrices G^(k)(k = 1,2,3) and the common dictionary matrix $\tilde{F}$ . This common representation enables revealing of cross-talk patterns among genes, proteins (more precisely, TFs), and regulatory elements (i.e., enhancers); e.g., a TF binds to a region to regulate a gene's expression. (C) Alignment-based multiview learning methods. The 3 input omic matrices are projected via functions f^(k)(k = 1,2,3) onto spaces where their internal relationships are revealed. These representations of different omics are pairwise coordinated to each other via the term Ω_co. For example, the figure demonstrates the pairwise alignments between X⁽¹⁾, X⁽²⁾ and between X⁽²⁾, X⁽³⁾ to reveal cross-talk patterns between TFs and enhancers, and between enhancers and gene expressions. (Alignment between X⁽¹⁾ and X⁽³⁾ is not shown for making the figure concise.) TF, transcription factor.

See this image and copyright information in PMC

References

1. Koonin EV. Does the central dogma still stand? Biol Direct. 2012;7(1):27. - PMC - PubMed
1. Bussard AE. A scientific revolution? EMBO reports. 2005;6(8):691–694. 10.1038/sj.embor.7400497 - DOI - PMC - PubMed
1. Trunk GV. A problem of dimensionality: A simple example. IEEE Trans Pattern Anal Mach Intell. 1979;(3):306–307. 10.1109/tpami.1979.4766926 - DOI - PubMed
1. de Sa VR. Learning classification with unlabeled data. In: Advances in neural information processing systems. [Internet]. NIPS 1993. 1994 [cited 2020 Mar 17]. p. 112–119. Available from: https://papers.nips.cc/paper/831-learning-classification-with-unlabeled-...
1. Rappoport N, Shamir R. Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res. 2018;46(20):10546–10562. 10.1093/nar/gky889 - DOI - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multiview learning for understanding functional multiomics

Affiliations

Multiview learning for understanding functional multiomics

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources