Robust multi-modal fusion architecture for medical data with knowledge distillation

Muyu Wang¹, Shiyu Fan¹, Yichen Li¹, Binyu Gao¹, Zhongrang Xie¹, Hui Chen²

Affiliations

¹ School of Biomedical Engineering, Capital Medical University, No.10, Xitoutiao, You An Men, Fengtai District, Beijing 100069, China; Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, No.10, Xitoutiao, You An Men, Fengtai District, Beijing 100069, China.
² School of Biomedical Engineering, Capital Medical University, No.10, Xitoutiao, You An Men, Fengtai District, Beijing 100069, China; Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, No.10, Xitoutiao, You An Men, Fengtai District, Beijing 100069, China. Electronic address: chenhui@ccmu.edu.cn.

PMID: 39709743
DOI: 10.1016/j.cmpb.2024.108568

Free article

Robust multi-modal fusion architecture for medical data with knowledge distillation

Muyu Wang et al. Comput Methods Programs Biomed. 2025 Mar.

Free article

. 2025 Mar:260:108568.

doi: 10.1016/j.cmpb.2024.108568. Epub 2024 Dec 18.

Authors

Muyu Wang¹, Shiyu Fan¹, Yichen Li¹, Binyu Gao¹, Zhongrang Xie¹, Hui Chen²

Affiliations

¹ School of Biomedical Engineering, Capital Medical University, No.10, Xitoutiao, You An Men, Fengtai District, Beijing 100069, China; Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, No.10, Xitoutiao, You An Men, Fengtai District, Beijing 100069, China.
² School of Biomedical Engineering, Capital Medical University, No.10, Xitoutiao, You An Men, Fengtai District, Beijing 100069, China; Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, No.10, Xitoutiao, You An Men, Fengtai District, Beijing 100069, China. Electronic address: chenhui@ccmu.edu.cn.

PMID: 39709743
DOI: 10.1016/j.cmpb.2024.108568

Abstract

Background: The fusion of multi-modal data has been shown to significantly enhance the performance of deep learning models, particularly on medical data. However, missing modalities are common in medical data due to patient specificity, which poses a substantial challenge to the application of these models.

Objective: This study aimed to develop a novel and efficient multi-modal fusion framework for medical datasets that maintains consistent performance, even in the absence of one or more modalities.

Methods: In this paper, we fused three modalities: chest X-ray radiographs, history of present illness text, and tabular data such as demographics and laboratory tests. A multi-modal fusion module based on pooled bottleneck (PB) attention was proposed in conjunction with knowledge distillation (KD) for enhancing model inference in the case of missing modalities. In addition, we introduced a gradient modulation (GM) method to deal with the unbalanced optimization in multi-modal model training. Finally, we designed comparison and ablation experiments to evaluate the fusion effect, the model robustness to missing modalities, and the contribution of each component (PB, KD, and GM). The evaluation experiments were performed on the MIMIC-IV datasets with the task of predicting in-hospital mortality risk. Model performance was assessed using the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC).

Results: The proposed multi-modal fusion framework achieved an AUROC of 0.886 and AUPRC of 0.459, significantly surpassing the performance of baseline models. Even when one or two modalities were missing, our model consistently outperformed the reference models. Ablation of each of the three components resulted in varying degrees of performance degradation, highlighting their distinct contributions to the model's overall effectiveness.

Conclusions: This innovative multi-modal fusion architecture has demonstrated robustness to missing modalities, and has shown excellent performance in fusing three medical modalities for patient outcome prediction. This study provides a novel idea for addressing the challenge of missing modalities and has the potential be scaled to additional modalities.

Keywords: Clinical prediction; Deep learning; Knowledge distillation; Missing modalities; Multi-modal fusion; Transformer.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- ClinicalKey
- Elsevier Science

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Robust multi-modal fusion architecture for medical data with knowledge distillation

Affiliations

Robust multi-modal fusion architecture for medical data with knowledge distillation

Authors

Affiliations

Abstract

Conflict of interest statement

Similar articles

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Similar articles

MeSH terms

Related information

LinkOut - more resources

Full Text Sources