Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 13;13(1):4171.
doi: 10.1038/s41598-023-31223-5.

Medical image captioning via generative pretrained transformers

Affiliations

Medical image captioning via generative pretrained transformers

Alexander Selivanov et al. Sci Rep. .

Abstract

The proposed model for automatic clinical image caption generation combines the analysis of radiological scans with structured patient information from the textual records. It uses two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records. The generated textual summary contains essential information about pathologies found, their location, along with the 2D heatmaps that localize each pathology on the scans. The model has been tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO, and the results measured with natural language assessment metrics demonstrated its efficient applicability to chest X-ray image captioning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Attention module used in SAT.
Figure 2
Figure 2
The first approach. Learn the joint distribution of two models.
Figure 3
Figure 3
The second approach. Pretrained GPT-3 (B) continues text generated by SAT (A).
Figure 4
Figure 4
Image sample cases with the disease classes (DC) along with original (ground truth) and generated reports by the proposed SAT + GPT-3 model implemented as in Approach 1 and 2, respectively. Insets in the upper corners of the original images feature localization heatmaps. Heatmaps are generated using Matplotlib v.3.7.0.

References

    1. Irvin J, et al. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. Proc. AAAI Conf. Artif. Intell. 2019;33:590–597.
    1. Demner-Fushman D, et al. Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inform. Assoc. 2016;23:304–310. doi: 10.1093/jamia/ocv080. - DOI - PMC - PubMed
    1. Chan Y-H, et al. Effective pneumothorax detection for chest X-ray images using local binary pattern and support vector machine. J. Healthc. Eng. 2018;2018:1–11. - PMC - PubMed
    1. Maghdid, H. S. et al. Diagnosing covid-19 pneumonia from X-ray and CT images using deep learning and transfer learning algorithms. In Multimodal Image Exploitation and Learning 2021 Vol. 11734, 117340E (International Society for Optics and Photonics, 2021).
    1. Monshi MMA, Poon J, Chung V. Deep learning in generating radiology reports: A survey. Artif. Intell. Med. 2020;106:101878. doi: 10.1016/j.artmed.2020.101878. - DOI - PMC - PubMed

Publication types