Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Mar 10;23(2):bbab569.
doi: 10.1093/bib/bbab569.

Multimodal deep learning for biomedical data fusion: a review

Affiliations
Review

Multimodal deep learning for biomedical data fusion: a review

Sören Richard Stahlschmidt et al. Brief Bioinform. .

Abstract

Biomedical data are becoming increasingly multimodal and thereby capture the underlying complex relationships among biological processes. Deep learning (DL)-based data fusion strategies are a popular approach for modeling these nonlinear relationships. Therefore, we review the current state-of-the-art of such methods and propose a detailed taxonomy that facilitates more informed choices of fusion strategies for biomedical applications, as well as research on novel methods. By doing so, we find that deep fusion strategies often outperform unimodal and shallow approaches. Additionally, the proposed subcategories of fusion strategies show different advantages and drawbacks. The review of current methods has shown that, especially for intermediate fusion strategies, joint representation learning is the preferred approach as it effectively models the complex interactions of different levels of biological organization. Finally, we note that gradual fusion, based on prior biological knowledge or on search strategies, is a promising future research path. Similarly, utilizing transfer learning might overcome sample size limitations of multimodal data sets. As these data sets become increasingly available, multimodal DL approaches present the opportunity to train holistic models that can learn the complex regulatory dynamics behind health and disease.

Keywords: data integration; deep neural networks; fusion strategies; multi-omics; multimodal machine learning; representation learning.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Development of technologies and multimodal deep learning (DL). ‘Omics’ and ‘multi-omics’ data become increasingly relevant in the scientific literature. To fully utilize the growing number of multimodal data sets, data fusion methods based on DL are evolving into an important approach in the biomedical field. This unprecedented generation of data has been made possible by high-throughtput technologies like microarrays and next-generation sequencing [7]. The development of bulk RNA-seq was followed by several related sequencing technologies, such as single-cell RNA-seq and ATAC-seq [8]. Currently, spatial transcriptomics [9] and single-cell multi-omics [10] are being increasingly used.
Figure 2
Figure 2
DL-based fusion strategies. Layers marked in blue are shared between modalities and learn joint representations. (a) Early fusion strategies take as input a concatenated vector. No marginal representations are learned. (b) Intermediate fusion strategies first learn marginal representations and fuse these later inside the network. This can occur in one layer or gradually. (c) Late fusion strategies combine decisions by sub-models for each modality. Figure adapted from [2].
Figure 3
Figure 3
Early fusion strategies. (a) Unimodal vector stacking alternatives. dim(M) is the combined dimensionality of the set of modalities M. m is the number of modalities and t the number of steps. (b) Architecture of a regular AE for early fusion with fusion layer marked in blue. (c) Visualization of the assumptions underlying variational AEs.
Figure 4
Figure 4
Intermediate fusion strategies. (a) Joint intermediate fusion with shared layer in blue. Subsequent to marginal representations, joint representations are learned (top). In marginal intermediate fusion, marginal representations are directly input to the decision function (bottom). (b) Marginal AE where marginal representations are concatenated and input into a decision function. (c) Joint AE in which a joint representation is learned in the shared layer marked in blue.

References

    1. Maayan A. Complex systems biology. J R Soc Interface 2017;14(134):20170391. - PMC - PubMed
    1. Ramachandram D, Taylor GW. Deep multimodal learning: a survey on recent advances and trends. IEEE Signal Process Mag 2017;34(6):96–108.
    1. Hall DL, Llinas J. An introduction to multisensor data fusion. Proc IEEE 1997;85(1):6–23.
    1. Durrant-Whyte HF. Sensor models and multisensor integration. Int J Robot Res 1988;7:97–113.
    1. Castanedo F. A review of data fusion techniques. Sci World J 2013;2013:704504. - PMC - PubMed

Publication types