Medical image captioning via generative pretrained transformers

Alexander Selivanov^#^{1

2}, Oleg Y Rogov^#¹, Daniil Chesakov^{1

3}, Artem Shelmanov^{1

3}, Irina Fedulova², Dmitry V Dylov⁴

Affiliations

¹ Skolkovo Institute of Science and Technology, Bolshoy blvd., 30/1, Moscow, 121205, Russia.
² Philips (Russia), Skolkovo Technopark 42, Building 1, Bolshoi Boulevard, Moscow, 121205, Russia.
³ AIRI, Kutuzovsky Ave, 32 bld. 1, Moscow, 121170, Russia.
⁴ Skolkovo Institute of Science and Technology, Bolshoy blvd., 30/1, Moscow, 121205, Russia. d.dylov@skoltech.ru.

^# Contributed equally.

PMID: 36914733
PMCID: PMC10010644
DOI: 10.1038/s41598-023-31223-5

Free PMC article

Medical image captioning via generative pretrained transformers

Alexander Selivanov et al. Sci Rep. 2023.

Free PMC article

. 2023 Mar 13;13(1):4171.

doi: 10.1038/s41598-023-31223-5.

Authors

Alexander Selivanov^#^{1

2}, Oleg Y Rogov^#¹, Daniil Chesakov^{1

3}, Artem Shelmanov^{1

3}, Irina Fedulova², Dmitry V Dylov⁴

Affiliations

¹ Skolkovo Institute of Science and Technology, Bolshoy blvd., 30/1, Moscow, 121205, Russia.
² Philips (Russia), Skolkovo Technopark 42, Building 1, Bolshoi Boulevard, Moscow, 121205, Russia.
³ AIRI, Kutuzovsky Ave, 32 bld. 1, Moscow, 121170, Russia.
⁴ Skolkovo Institute of Science and Technology, Bolshoy blvd., 30/1, Moscow, 121205, Russia. d.dylov@skoltech.ru.

^# Contributed equally.

PMID: 36914733
PMCID: PMC10010644
DOI: 10.1038/s41598-023-31223-5

Abstract

The proposed model for automatic clinical image caption generation combines the analysis of radiological scans with structured patient information from the textual records. It uses two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records. The generated textual summary contains essential information about pathologies found, their location, along with the 2D heatmaps that localize each pathology on the scans. The model has been tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO, and the results measured with natural language assessment metrics demonstrated its efficient applicability to chest X-ray image captioning.

PubMed Disclaimer

References

1. J Am Med Inform Assoc. 2016 Mar;23(2):304-10 - PubMed
1. Radiol Med. 2021 Jul;126(7):998-1006 - PubMed
1. Nat Rev Cardiol. 2021 Aug;18(8):600-609 - PubMed
1. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70 - PubMed
1. Sci Rep. 2021 Aug 11;11(1):16292 - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Medical image captioning via generative pretrained transformers

Affiliations

Medical image captioning via generative pretrained transformers

Authors

Affiliations

Abstract

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources