TVNet: Multimodal medical image fusion by dual-branch network with vision transformer and one-shot aggregation
- PMID: 41185898
- PMCID: PMC12586861
- DOI: 10.1177/00368504251375188
TVNet: Multimodal medical image fusion by dual-branch network with vision transformer and one-shot aggregation
Abstract
The task of medical image fusion involves synthesizing complementary information from different modal medical images, which is of very significant for clinical diagnosis. The existing medical image fusion algorithms overly rely on convolution operations and cannot establish long-range dependencies on the source images. This can lead to edge blurring and loss of details in the fused images. Because the Transformer can effectively model long-range dependencies through self-attention, a novel and effective dual-branch feature enhancement network called TVNet is proposed to fuse multimodal medical images. This network combines Vision Transformer and Convolutional Neural Network to extract global context information and local information to preserve detailed textures and highlight the structural characteristics in source images. Furthermore, to extract the multiscale information of images, an enhancement module is used to obtain multiscale characterization information, and the two branches information are efficiently aggregated at the same time. In addition, a hybrid loss function is designed to optimize the fusion results at three levels of structure, feature, and gradient. Experiment results prove that the performance of the proposed fusion network outperforms seven state-of-the-art methods in both subjective visual effects and objective metrics. Our code is available at https://github.com/sineagles/TVNet.
Keywords: Medical image fusion; convolution neural network; long-range dependencies; multiscale features; vision transformer.
Conflict of interest statement
Declaration of conflicting interestsThe authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Figures
References
-
- Du J, Li W, Lu K, et al. An overview of multi-modal medical image fusion. Neurocomputing 2016; 215: 3–20.
-
- Bilal O, Asif S, Zhao M, et al. Differential evolution optimization based ensemble framework for accurate cervical cancer diagnosis. Appl Soft Comput 2024; 167: 112366.
-
- Hekmat A, Zhang Z, Ur Rehman Khan S, et al. An attention-fused architecture for brain tumor diagnosis. Biomed Signal Process Control 2025; 101: 107221.
-
- James AP, Dasarathy BV. Medical image fusion: a survey of the state of the art. Inf Fusion 2014; 19: 4–19.
-
- Ghandour C, El-shafai W, El-Rabaie ESM, et al. Applying medical image fusion based on a simple deep learning principal component analysis network. Multimed Tools Appl 2024; 83: 5971–6003.
MeSH terms
LinkOut - more resources
Full Text Sources
