Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Apr 2;16(4):e1007677.
doi: 10.1371/journal.pcbi.1007677. eCollection 2020 Apr.

Multiview learning for understanding functional multiomics

Affiliations
Review

Multiview learning for understanding functional multiomics

Nam D Nguyen et al. PLoS Comput Biol. .

Abstract

The molecular mechanisms and functions in complex biological systems currently remain elusive. Recent high-throughput techniques, such as next-generation sequencing, have generated a wide variety of multiomics datasets that enable the identification of biological functions and mechanisms via multiple facets. However, integrating these large-scale multiomics data and discovering functional insights are, nevertheless, challenging tasks. To address these challenges, machine learning has been broadly applied to analyze multiomics. This review introduces multiview learning-an emerging machine learning field-and envisions its potentially powerful applications to multiomics. In particular, multiview learning is more effective than previous integrative methods for learning data's heterogeneity and revealing cross-talk patterns. Although it has been applied to various contexts, such as computer vision and speech recognition, multiview learning has not yet been widely applied to biological data-specifically, multiomics data. Therefore, this paper firstly reviews recent multiview learning methods and unifies them in a framework called multiview empirical risk minimization (MV-ERM). We further discuss the potential applications of each method to multiomics, including genomics, transcriptomics, and epigenomics, in an aim to discover the functional and mechanistic interpretations across omics. Secondly, we explore possible applications to different biological systems, including human diseases (e.g., brain disorders and cancers), plants, and single-cell analysis, and discuss both the benefits and caveats of using multiview learning to discover the molecular mechanisms and functions of these systems.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Multiview learning deciphers mechanisms across functional omics.
Molecular mechanisms (Center) are resulted from the interactions within and across multiomics, e.g., shown by green, orange, and blue color. The interactions within each omics are illustrated by colored links that matches with the color of that omics; the interactions across different omics are demonstrated by black links. Directed edges represent causal relationships. Edge weights represent relationship strengths. The single-view learning methods (Right) can only learn the within-view interactions separately for each omics via the functions f(k),k = 1,2,3 (e.g., pi,j(3)=[f(3)(X(3))]i,j). The multiview learning methods (Left) can reveal the cross-talk patterns among various omics, providing complete mechanistic insights on biological functions, e.g., by co-regularization terms Ωco. These cross-talk patterns are contributed by each facet of learning in either alignment-based methods or factorization-based methods. For example, gene regulatory mechanism can relate to genomics (e.g., regulatory variants), transcriptomics (e.g., gene expression), and proteomics (e.g., TFs). Then Ωco(f(2),f(3)) represents that variants (e.g., SNPs) break the TFBSs (e.g., pi,j(2,3)=[Ωco(f(2)(X(2)),f(3)(X(3)))]i,j as in the figure). Ωco(f(1),f(3)) represents that variants affect gene expression (e.g., eQTLs). Ωco(f(1),f(2)) represents that TFs control target gene expression. The multiview learning can thus predict gene regulatory mechanisms across omics on how variants break TFBSs to affect gene expression. eQTL, expression quantitative trait loci; SNP, single-nucleotide polymorphism; TF, transcription factor; TFB, transcription factor binding site.
Fig 2
Fig 2. MV-ERM.
(A) ERM for single-view learning. It demonstrates a general single-view learning algorithm (based on ERM estimator) that takes one data set X(1) as input, adopts a hypothesis space F(1) and a loss function l, and outputs a function f(1)F(1) that predicts the label associated with any new datapoint x as f(1)(x). (B) MV-ERM demonstrates a general multiview learning algorithm (based on MV-ERM estimator) that takes v datasets X(1),,X(v) as v views, adopts v hypothesis spaces F(1),,F(v) associated with v views, and outputs v functions (f(1),,f(v))F(1)××F(v) that reveals the interactions within and between each pair of datasets (via the terms Ωco(f(i),f(j)). The consensus and complementary principles are implemented by the term Ω(f(i)) and Ωco(f(i),f(j)) respectively. Note that in MV-ERM estimator, the loss function is optional because multiview learning can be unsupervised. ERM, empirical risk minimization; MV-ERM, multiview empirical risk minimization.
Fig 3
Fig 3. Factorization-based versus alignment-based methods.
(A) Factorization-based single-view learning methods. They typically factorize a data matrix X from single view (e.g., gene expression matrix of samples by genes) into a product of matrix G (coefficient matrix) and matrix F˜ (dictionary matrix or pattern matrix). Because matrix factorization has an intrinsic clustering property [8], the matrix F˜ can represent a clustering structure of the view (i.e., the soft clustering assignments or indicators). For example, F˜ reveals 3 different gene clusters, a, b, and c, as denoted in the figure. (B) Factorization-based multiview learning methods. They factorize different matrices from multiomics, e.g., gene expression X(1) (i.e., green matrix), protein expression X(2) (i.e., blue matrix), and chromatin accessibility X(3) (i.e., orange matrix), into a product of different coefficient matrices G(k)(k = 1,2,3) and the common dictionary matrix F˜. This common representation enables revealing of cross-talk patterns among genes, proteins (more precisely, TFs), and regulatory elements (i.e., enhancers); e.g., a TF binds to a region to regulate a gene's expression. (C) Alignment-based multiview learning methods. The 3 input omic matrices are projected via functions f(k)(k = 1,2,3) onto spaces where their internal relationships are revealed. These representations of different omics are pairwise coordinated to each other via the term Ωco. For example, the figure demonstrates the pairwise alignments between X(1), X(2) and between X(2), X(3) to reveal cross-talk patterns between TFs and enhancers, and between enhancers and gene expressions. (Alignment between X(1) and X(3) is not shown for making the figure concise.) TF, transcription factor.

Similar articles

Cited by

References

    1. Koonin EV. Does the central dogma still stand? Biol Direct. 2012;7(1):27. - PMC - PubMed
    1. Bussard AE. A scientific revolution? EMBO reports. 2005;6(8):691–694. 10.1038/sj.embor.7400497 - DOI - PMC - PubMed
    1. Trunk GV. A problem of dimensionality: A simple example. IEEE Trans Pattern Anal Mach Intell. 1979;(3):306–307. 10.1109/tpami.1979.4766926 - DOI - PubMed
    1. de Sa VR. Learning classification with unlabeled data. In: Advances in neural information processing systems. [Internet]. NIPS 1993. 1994 [cited 2020 Mar 17]. p. 112–119. Available from: https://papers.nips.cc/paper/831-learning-classification-with-unlabeled-...
    1. Rappoport N, Shamir R. Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res. 2018;46(20):10546–10562. 10.1093/nar/gky889 - DOI - PMC - PubMed

Publication types