Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct-Dec;108(4):368504251375188.
doi: 10.1177/00368504251375188. Epub 2025 Nov 4.

TVNet: Multimodal medical image fusion by dual-branch network with vision transformer and one-shot aggregation

Affiliations

TVNet: Multimodal medical image fusion by dual-branch network with vision transformer and one-shot aggregation

Jianguo Wang et al. Sci Prog. 2025 Oct-Dec.

Abstract

The task of medical image fusion involves synthesizing complementary information from different modal medical images, which is of very significant for clinical diagnosis. The existing medical image fusion algorithms overly rely on convolution operations and cannot establish long-range dependencies on the source images. This can lead to edge blurring and loss of details in the fused images. Because the Transformer can effectively model long-range dependencies through self-attention, a novel and effective dual-branch feature enhancement network called TVNet is proposed to fuse multimodal medical images. This network combines Vision Transformer and Convolutional Neural Network to extract global context information and local information to preserve detailed textures and highlight the structural characteristics in source images. Furthermore, to extract the multiscale information of images, an enhancement module is used to obtain multiscale characterization information, and the two branches information are efficiently aggregated at the same time. In addition, a hybrid loss function is designed to optimize the fusion results at three levels of structure, feature, and gradient. Experiment results prove that the performance of the proposed fusion network outperforms seven state-of-the-art methods in both subjective visual effects and objective metrics. Our code is available at https://github.com/sineagles/TVNet.

Keywords: Medical image fusion; convolution neural network; long-range dependencies; multiscale features; vision transformer.

PubMed Disclaimer

Conflict of interest statement

Declaration of conflicting interestsThe authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Architecture of the proposed TVNet.
Figure 2.
Figure 2.
One-Shot Aggregation (OSA) module.
Figure 3.
Figure 3.
Vision Transformer (ViT) module.
Figure 4.
Figure 4.
Coordinate attention block.
Figure 5.
Figure 5.
Subpixel context enhancement module.
Figure 6.
Figure 6.
Comparison on computed tomography-magnetic resonance imaging (CT-MRI) fusion.
Figure 7.
Figure 7.
Comparison on magnetic resonance imaging (PET-MRI) fusion.
Figure 8.
Figure 8.
Comparison on single-photon emission computed tomography-magnetic resonance imaging (SPECT-MRI) fusion.
Figure 9.
Figure 9.
Fusion performance of different disease images.
Figure 10.
Figure 10.
Histogram of average values of six metrics.
Figure 11.
Figure 11.
Fused images of computed tomography-magnetic resonance imaging (CT-MRI), positron emission tomography-magnetic resonance imaging (PET-MRI), and single-photon emission computed tomography-magnetic resonance imaging (SPECT-MRI) using different modules.
Figure 12.
Figure 12.
Ablation experiments based on different loss functions.

References

    1. Du J, Li W, Lu K, et al. An overview of multi-modal medical image fusion. Neurocomputing 2016; 215: 3–20.
    1. Bilal O, Asif S, Zhao M, et al. Differential evolution optimization based ensemble framework for accurate cervical cancer diagnosis. Appl Soft Comput 2024; 167: 112366.
    1. Hekmat A, Zhang Z, Ur Rehman Khan S, et al. An attention-fused architecture for brain tumor diagnosis. Biomed Signal Process Control 2025; 101: 107221.
    1. James AP, Dasarathy BV. Medical image fusion: a survey of the state of the art. Inf Fusion 2014; 19: 4–19.
    1. Ghandour C, El-shafai W, El-Rabaie ESM, et al. Applying medical image fusion based on a simple deep learning principal component analysis network. Multimed Tools Appl 2024; 83: 5971–6003.

LinkOut - more resources