Intensive vision-guided network for radiology report generation

doi:10.1088/1361-6560/ad1995

. 2024 Feb 5;69(4).

doi: 10.1088/1361-6560/ad1995.

Intensive vision-guided network for radiology report generation

Fudan Zheng¹, Mengfei Li¹, Ying Wang², Weijiang Yu³, Ruixuan Wang¹, Zhiguang Chen^{1

2}, Nong Xiao^{1

2}, Yutong Lu^{1

2}

Affiliations

¹ Sun Yat-Sen University, No. 132 Waihuandong Road, Guangzhou Higher Education Mega Center, Guangzhou, 510006, People's Republic of China.
² National SuperComputer Center in Guangzhou, No. 132 Waihuandong Road, Guangzhou Higher Education Mega Center, Guangzhou, 510006, People's Republic of China.
³ Huawei Technologies Co., Ltd, Huawei Industrial Park, Bantian, Longgang District, Shenzhen, 518129, People's Republic of China.

PMID: 38157546
DOI: 10.1088/1361-6560/ad1995

Intensive vision-guided network for radiology report generation

Fudan Zheng et al. Phys Med Biol. 2024.

. 2024 Feb 5;69(4).

doi: 10.1088/1361-6560/ad1995.

Authors

Fudan Zheng¹, Mengfei Li¹, Ying Wang², Weijiang Yu³, Ruixuan Wang¹, Zhiguang Chen^{1

2}, Nong Xiao^{1

2}, Yutong Lu^{1

2}

Affiliations

¹ Sun Yat-Sen University, No. 132 Waihuandong Road, Guangzhou Higher Education Mega Center, Guangzhou, 510006, People's Republic of China.
² National SuperComputer Center in Guangzhou, No. 132 Waihuandong Road, Guangzhou Higher Education Mega Center, Guangzhou, 510006, People's Republic of China.
³ Huawei Technologies Co., Ltd, Huawei Industrial Park, Bantian, Longgang District, Shenzhen, 518129, People's Republic of China.

PMID: 38157546
DOI: 10.1088/1361-6560/ad1995

Abstract

Objective.Automatic radiology report generation is booming due to its huge application potential for the healthcare industry. However, existing computer vision and natural language processing approaches to tackle this problem are limited in two aspects. First, when extracting image features, most of them neglect multi-view reasoning in vision and model single-view structure of medical images, such as space-view or channel-view. However, clinicians rely on multi-view imaging information for comprehensive judgment in daily clinical diagnosis. Second, when generating reports, they overlook context reasoning with multi-modal information and focus on pure textual optimization utilizing retrieval-based methods. We aim to address these two issues by proposing a model that better simulates clinicians perspectives and generates more accurate reports.Approach.Given the above limitation in feature extraction, we propose a globally-intensive attention (GIA) module in the medical image encoder to simulate and integrate multi-view vision perception. GIA aims to learn three types of vision perception: depth view, space view, and pixel view. On the other hand, to address the above problem in report generation, we explore how to involve multi-modal signals to generate precisely matched reports, i.e. how to integrate previously predicted words with region-aware visual content in next word prediction. Specifically, we design a visual knowledge-guided decoder (VKGD), which can adaptively consider how much the model needs to rely on visual information and previously predicted text to assist next word prediction. Hence, our final intensive vision-guided network framework includes a GIA-guided visual encoder and the VKGD.Main results.Experiments on two commonly-used datasets IU X-RAY and MIMIC-CXR demonstrate the superior ability of our method compared with other state-of-the-art approaches.Significance.Our model explores the potential of simulating clinicians perspectives and automatically generates more accurate reports, which promotes the exploration of medical automation and intelligence.

Keywords: multimodal learning; radiology report generation; visual reasoning; x-ray images.

PubMed Disclaimer

Cited by

A vision attention driven Language framework for medical report generation.
Varol Arısoy M, Arısoy A, Uysal İ. Varol Arısoy M, et al. Sci Rep. 2025 Mar 28;15(1):10704. doi: 10.1038/s41598-025-95666-8. Sci Rep. 2025. PMID: 40155699 Free PMC article.

MeSH terms

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- IOP Publishing Ltd.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Intensive vision-guided network for radiology report generation

Affiliations

Intensive vision-guided network for radiology report generation

Authors

Affiliations

Abstract

Similar articles

Cited by

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Similar articles

Cited by

MeSH terms

Related information

LinkOut - more resources

Full Text Sources