Weakly Supervised Captioning of Ultrasound Images

Mohammad Alsharid¹, Harshita Sharma¹, Lior Drukker², Aris T Papageorgiou², J Alison Noble¹

Affiliations

¹ Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK.
² Nuffield Department of Women's and Reproductive Health, University of Oxford, Oxford, UK.

PMID: 36848308
PMCID: PMC7614238
DOI: 10.1007/978-3-031-12053-4_14

Weakly Supervised Captioning of Ultrasound Images

Mohammad Alsharid et al. Med Image Underst Anal (2022). 2022 Jul.

. 2022 Jul:13413:187-198.

doi: 10.1007/978-3-031-12053-4_14.

Authors

Mohammad Alsharid¹, Harshita Sharma¹, Lior Drukker², Aris T Papageorgiou², J Alison Noble¹

Affiliations

¹ Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK.
² Nuffield Department of Women's and Reproductive Health, University of Oxford, Oxford, UK.

PMID: 36848308
PMCID: PMC7614238
DOI: 10.1007/978-3-031-12053-4_14

Abstract

Medical image captioning models generate text to describe the semantic contents of an image, aiding the non-experts in understanding and interpretation. We propose a weakly-supervised approach to improve the performance of image captioning models on small image-text datasets by leveraging a large anatomically-labelled image classification dataset. Our method generates pseudo-captions (weak labels) for caption-less but anatomically-labelled (class-labelled) images using an encoder-decoder sequence-to-sequence model. The augmented dataset is used to train an image-captioning model in a weakly supervised learning manner. For fetal ultrasound, we demonstrate that the proposed augmentation approach outperforms the baseline on semantics and syntax-based metrics, with nearly twice as much improvement in value on BLEU-1 and ROUGE-L. Moreover, we observe that superior models are trained with the proposed data augmentation, when compared with the existing regularization techniques. This work allows seamless automatic annotation of images that lack human-prepared descriptive captions for training image-captioning models. Using pseudo-captions in the training data is particularly useful for medical image captioning when significant time and effort of medical experts is required to obtain real image captions.

Keywords: Data Augmentation; Fetal Ultrasound; Image Captioning.

PubMed Disclaimer