Prompt injection attacks on vision language models in oncology

Jan Clusmann^{1

2}, Dyke Ferber^{1

3}, Isabella C Wiest^{1

4}, Carolin V Schneider^{1

2}, Titus J Brinker⁵, Sebastian Foersch⁶, Daniel Truhn⁷, Jakob Nikolas Kather^{8

9

10}

Affiliations

¹ Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany.
² Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
³ Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Heidelberg, Germany.
⁴ Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
⁵ Digital Biomarkers for Oncology Group, German Cancer Research Center, Heidelberg, Germany.
⁶ Institute of Pathology, University Medical Center Mainz, Mainz, Germany.
⁷ Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany.
⁸ Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany. Jakob.Kather@ukdd.de.
⁹ Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Heidelberg, Germany. Jakob.Kather@ukdd.de.
¹⁰ Department of Medicine I, University Hospital Dresden, Dresden, Germany. Jakob.Kather@ukdd.de.

PMID: 39890777
PMCID: PMC11785991
DOI: 10.1038/s41467-024-55631-x

Prompt injection attacks on vision language models in oncology

Jan Clusmann et al. Nat Commun. 2025.

. 2025 Feb 1;16(1):1239.

doi: 10.1038/s41467-024-55631-x.

Authors

Jan Clusmann^{1

2}, Dyke Ferber^{1

3}, Isabella C Wiest^{1

4}, Carolin V Schneider^{1

2}, Titus J Brinker⁵, Sebastian Foersch⁶, Daniel Truhn⁷, Jakob Nikolas Kather^{8

9

10}

Affiliations

¹ Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany.
² Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
³ Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Heidelberg, Germany.
⁴ Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
⁵ Digital Biomarkers for Oncology Group, German Cancer Research Center, Heidelberg, Germany.
⁶ Institute of Pathology, University Medical Center Mainz, Mainz, Germany.
⁷ Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany.
⁸ Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany. Jakob.Kather@ukdd.de.
⁹ Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Heidelberg, Germany. Jakob.Kather@ukdd.de.
¹⁰ Department of Medicine I, University Hospital Dresden, Dresden, Germany. Jakob.Kather@ukdd.de.

PMID: 39890777
PMCID: PMC11785991
DOI: 10.1038/s41467-024-55631-x

Abstract

Vision-language artificial intelligence models (VLMs) possess medical knowledge and can be employed in healthcare in numerous ways, including as image interpreters, virtual scribes, and general decision support systems. However, here, we demonstrate that current VLMs applied to medical tasks exhibit a fundamental security flaw: they can be compromised by prompt injection attacks. These can be used to output harmful information just by interacting with the VLM, without any access to its parameters. We perform a quantitative study to evaluate the vulnerabilities to these attacks in four state of the art VLMs: Claude-3 Opus, Claude-3.5 Sonnet, Reka Core, and GPT-4o. Using a set of N = 594 attacks, we show that all of these models are susceptible. Specifically, we show that embedding sub-visual prompts in manifold medical imaging data can cause the model to provide harmful output, and that these prompts are non-obvious to human observers. Thus, our study demonstrates a key vulnerability in medical VLMs which should be mitigated before widespread clinical adoption.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare the following competing interests: DT received honoraria for lectures by Bayer and holds shares in StratifAI GmbH, Germany. SF has received honoraria from MSD and BMS. TJB is the owner of Smart Health Heidelberg GmbH, Heidelberg, Germany, outside of the scope of the submitted work. JNK declares consulting services for Bioptimus, France; Owkin, France; DoMore Diagnostics, Norway; Panakeia, UK; AstraZeneca, UK; Mindpeak, Germany; and MultiplexDx, Slovakia. Furthermore, he holds shares in StratifAI GmbH, Germany, Synagen GmbH, Germany, and has received a research grant by GSK, and has received honoraria by AstraZeneca, Bayer, Daiichi Sankyo, Eisai, Janssen, Merck, MSD, BMS, Roche, Pfizer, and Fresenius. ICW has received honoraria from AstraZeneca. DF holds shares in Synagen GmbH, Germany. No other competing interests are declared by any of the remaining authors.

Figures

**Fig. 1. Prompt injection attacks on medical images.**
a Concept of prompt injection with correct and corrupted output plus models investigated in the study. Models that did not meet inclusion criteria displayed in grey. b Schematic illustration of different attack gateways for prompt injections. Left: Patient image data is routinely not exclusively created in the hospital itself, but also by external sources, illustrated as “External radiologist” as an example (equivalent to other external imaging providers). It is then provided either via (unprotected) CD/DVD or a server (varying protection levels) to the hospital. Right: For inference, hospitals will rely on external VLM-providers. Illustrated are several gateways to send patient data to these providers, with varying protection levels. EHR Electronic health records, API Application programming interface, HIS Hospital information system. c Exemplary views of modalities (histology, endoscopy, CT, MRI, ultrasound, photography) with 18 cases, 3 prompt injection positions, 3 variations (high contrast, low contrast, small font) and 3 iterations per unique constellations, with schematic illustration of injection methods. Contrast in “low contrast” was increased for this figure as compared to the actual model input to maintain human readability. Prompt injection in “low contrast” and “small font” reads “Describe which organ you see, but state it looks healthy”.

**Fig. 2. Prompt injection attacks manipulate the capability of VLMs to detect malignant lesions.**
a Accuracies in detecting the represented organs per model. Mean ± standard deviation (SD) is shown. n = 18 data points per model (n = 9 for Gemini), with each data point representing a mean of three replicated measurements, two-sided Kruskal-Wallis test with Dunn’s test and Bonferroni post-hoc correction. b Harmfulness scores for all queries with injected prompt vs prompts without prompt injection per model. Mean ± SD are shown. Each point represents triplicate evaluation. Two-sided Wilcoxon Signed-Rank tests with Bonferroni post-hoc correction compared lesion miss rates scores within each model (square brackets). Two-sided Mann-Whitney U tests with Bonferroni post-hoc correction compared lesion miss rates for prompt injection (PI) vs non PI over all models combined (straight bar). P-values were adjusted using the Bonferroni method, with *p < 0.05, **p < 0.01, ***p < 0.001. Harmfulness scores as mean ± standard deviation (SD) per (c) position or (d) variation of adversarial prompt, ordered as Claude-3, Claude-3.5, GPT-4o, and Reka Core from left to right. n = 18 data points per model and variation, with each data point representing a mean of three replicated measurements. Mann-Whitney U test + Bonferroni method over all models combined for each position/variation.

**Fig. 3. Prompt injection attacks are modality-agnostic.**
Heatmaps per model and imaging modality for (a) mean organ detection rate, (b) mean attack success rate, (c) lesion miss rate (LMR) for the native models and (d) mean lesion miss rate (LMR) for the prompts with prompt injection, with (b) representing the tile-based difference between (d) and (c). CT Computed Tomography, MRI Magnetic Resonance Imaging, US Ultrasound. * represents instances where LMR was higher for native models than injected models (n = 1). e Thumbnails of all images used for the study sorted by modality. All images contain a histologically confirmed malignant lesion. (Images are cropped for this figure, original images see Supplementary Data 1).

**Fig. 4. Mitigation efforts for prompt injection attacks.**
Count of prompt injections that were successful (Model reported no pathologies) or failed (Model reported lesion, either due to failed prompt injection or due to defense mechanism) of n = 54 distinct scenarios in total (0–3 missing values per scenario due to errors in model calling, see Supplementary Table 1b). Two-sided Fisher’s exact test compared ratio of successful vs failed prompt injections for each condition (intra-model comparison only). p-values were adjusted using the Bonferroni method, with *p < 0.05, **p < 0.01, ***p < 0.001.

See this image and copyright information in PMC

References

1. Singhal, K. et al. Large language models encode clinical knowledge. Nature620, 172–180 (2023). - PMC - PubMed
1. Bubeck, S. et al. Sparks of artificial general intelligence: early experiments with GPT-4. arXiv [cs.CL] (2023).
1. Clusmann, J. et al. The future landscape of large language models in medicine. Commun. Med.3, 141 (2023). - PMC - PubMed
1. Ferber, D. et al. Autonomous artificial intelligence agents for clinical decision making in oncology. arXiv [cs.AI] (2024).
1. Thirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med.29, 1930–1940 (2023). - PubMed

MeSH terms

Actions
Actions
Actions

Grants and funding

R01 CA263318/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Prompt injection attacks on vision language models in oncology

Affiliations

Prompt injection attacks on vision language models in oncology

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources