. 2025 Aug 22;20(8):e0330328.

doi: 10.1371/journal.pone.0330328. eCollection 2025.

Progressive decomposition of infrared and visible image fusion network with joint transformer and Resnet

Fang Zhu¹, Wei Liu²

Affiliations

¹ Department of Mathematics, Ministry of General Education, Anhui Xinhua University, Hefei, China.
² College of Mathematics and Computer Science, Tongling University, Tongling, China.

PMID: 40844972
PMCID: PMC12373181
DOI: 10.1371/journal.pone.0330328

Progressive decomposition of infrared and visible image fusion network with joint transformer and Resnet

Fang Zhu et al. PLoS One. 2025.

. 2025 Aug 22;20(8):e0330328.

doi: 10.1371/journal.pone.0330328. eCollection 2025.

Authors

Fang Zhu¹, Wei Liu²

Affiliations

¹ Department of Mathematics, Ministry of General Education, Anhui Xinhua University, Hefei, China.
² College of Mathematics and Computer Science, Tongling University, Tongling, China.

PMID: 40844972
PMCID: PMC12373181
DOI: 10.1371/journal.pone.0330328

Abstract

The objective of image fusion is to synthesize information from multiple source images into a single, high-quality composite that is information-rich, thereby enhancing both human visual interpretation and machine perception capabilities. This process also establishes a robust foundation for downstream image-related tasks. Nevertheless, current deep learning-based networks frequently neglect the distinctive features inherent in source images, presenting challenges in effectively balancing the interplay between basic and detailed features. To tackle this limitation, we introduce a progressive decomposition network that integrates Lite Transformer (LT) and ResNet architecture for infrared and visible image fusion (IVIF). Our methodology unfolds in three principal stages: Initially, a foundational convolutional neural network (CNN) is deployed to extract coarse-scale features from the source images. Subsequently, the LT is employed to bifurcate these coarse features into basic and detailed feature components. In the second phase, to augment the detail information across various inter-layer extractions, we substitute the conventional ResNet preprocessing with a combination of coarse and LT module. Cascade LT operations are implemented following the initial two ResNet blocks (ResB), enabling two-branch feature extraction from these reconfigured blocks. The final stage involves the design of specialized fusion sub-networks to process the basic and detail information blocks extracted from different layers. These processed image feature blocks are then channeled through semantic injection module (SIM) and Transformer decoders to generate the fused image. Complementing this architecture, we have developed a semantic information extraction module that aligns with the progressive inter-layer detail extraction framework. The LT module is strategically embedded within the ResNet network architecture to optimize the extraction of both basic and detailed features across diverse layers. Moreover, we introduce a novel correlation loss function that operates on the basic and detail information between layers, facilitating the correlation of basic features while maintaining the independence of detail features across layers. Through comprehensive qualitative and quantitative analyses conducted on multiple infrared-visible datasets, we demonstrate the superior potential of our proposed network for advanced visual tasks. Our network exhibits remarkable performance in detail extraction, significantly outperforming existing deep learning methodologies in this domain.

Copyright: © 2025 Zhu, Liu. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Provides a visual comparison of different fusion methods.**

**Fig 2. A comparative analysis of streamlined fusion frameworks.**

**Fig 4. Annotated supplementary documentation for the comprehensive framework schematic.**

**Fig 5. A practical demonstration of key modules.**

**Fig 6. IR & VIS image encoding flowchart.**

**Fig 7. IR & VIS image decoding flowchart.**

**Fig 8. Qualitative comparison of ten fusion methods on scene 05005 from the RoadScene dataset.**

**Fig 9. Localized zoom effects for ten fusion results of the 05005 scene.**

**Fig 10. Cumulative distribution of the six metrics on the RoadScene dataset.**

**Fig 11. Qualitative comparison of ten fusion methods on scene 00754N from the MSRS dataset.**

**Fig 12. Cumulative distribution of the six metrics on the MSRS dataset.**

**Fig 13. Qualitative comparison of ten fusion methods on scene 01443 from the M³FD dataset.**

**Fig 14. Cumulative distribution of the six metrics on the M³FD dataset.**

**Fig 15. Segmentation results of ten fusion methods under BANet.**

**Fig 16. Visualization results of ablation experiments.**

See this image and copyright information in PMC

References

1. Xiao G, Bavirisetti DP, Liu G, Zhang X. Image Fusion. Springer; 2020.
1. Paramanandham N, Rajendiran K. Infrared and visible image fusion using discrete cosine transform and swarm intelligence for surveillance applications. Infrared Physics & Technology. 2018;88:13–22. doi: 10.1016/j.infrared.2017.11.006 - DOI
1. Bogdoll D, Nitsche M, Zöllner JM. Anomaly detection in autonomous driving: A survey. In: IEEE Conference on Computer Vision and Pattern Recognition. 2022. 4488–99.
1. Zhang P, Wang D, Lu H, Yang X. Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking. Int J Comput Vis. 2021;129(9):2714–29. doi: 10.1007/s11263-021-01495-3 - DOI
1. Zhang H, Xu H, Xiao Y, Guo X, Ma J. Rethinking the Image Fusion: A Fast Unified Image Fusion Network based on Proportional Maintenance of Gradient and Intensity. AAAI. 2020;34(07):12797–804. doi: 10.1609/aaai.v34i07.6975 - DOI

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- PubMed Central
- Public Library of Science

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Progressive decomposition of infrared and visible image fusion network with joint transformer and Resnet

Affiliations

Progressive decomposition of infrared and visible image fusion network with joint transformer and Resnet

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources