Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 1;16(1):1239.
doi: 10.1038/s41467-024-55631-x.

Prompt injection attacks on vision language models in oncology

Affiliations

Prompt injection attacks on vision language models in oncology

Jan Clusmann et al. Nat Commun. .

Abstract

Vision-language artificial intelligence models (VLMs) possess medical knowledge and can be employed in healthcare in numerous ways, including as image interpreters, virtual scribes, and general decision support systems. However, here, we demonstrate that current VLMs applied to medical tasks exhibit a fundamental security flaw: they can be compromised by prompt injection attacks. These can be used to output harmful information just by interacting with the VLM, without any access to its parameters. We perform a quantitative study to evaluate the vulnerabilities to these attacks in four state of the art VLMs: Claude-3 Opus, Claude-3.5 Sonnet, Reka Core, and GPT-4o. Using a set of N = 594 attacks, we show that all of these models are susceptible. Specifically, we show that embedding sub-visual prompts in manifold medical imaging data can cause the model to provide harmful output, and that these prompts are non-obvious to human observers. Thus, our study demonstrates a key vulnerability in medical VLMs which should be mitigated before widespread clinical adoption.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare the following competing interests: DT received honoraria for lectures by Bayer and holds shares in StratifAI GmbH, Germany. SF has received honoraria from MSD and BMS. TJB is the owner of Smart Health Heidelberg GmbH, Heidelberg, Germany, outside of the scope of the submitted work. JNK declares consulting services for Bioptimus, France; Owkin, France; DoMore Diagnostics, Norway; Panakeia, UK; AstraZeneca, UK; Mindpeak, Germany; and MultiplexDx, Slovakia. Furthermore, he holds shares in StratifAI GmbH, Germany, Synagen GmbH, Germany, and has received a research grant by GSK, and has received honoraria by AstraZeneca, Bayer, Daiichi Sankyo, Eisai, Janssen, Merck, MSD, BMS, Roche, Pfizer, and Fresenius. ICW has received honoraria from AstraZeneca. DF holds shares in Synagen GmbH, Germany. No other competing interests are declared by any of the remaining authors.

Figures

Fig. 1
Fig. 1. Prompt injection attacks on medical images.
a Concept of prompt injection with correct and corrupted output plus models investigated in the study. Models that did not meet inclusion criteria displayed in grey. b Schematic illustration of different attack gateways for prompt injections. Left: Patient image data is routinely not exclusively created in the hospital itself, but also by external sources, illustrated as “External radiologist” as an example (equivalent to other external imaging providers). It is then provided either via (unprotected) CD/DVD or a server (varying protection levels) to the hospital. Right: For inference, hospitals will rely on external VLM-providers. Illustrated are several gateways to send patient data to these providers, with varying protection levels. EHR Electronic health records, API Application programming interface, HIS Hospital information system. c Exemplary views of modalities (histology, endoscopy, CT, MRI, ultrasound, photography) with 18 cases, 3 prompt injection positions, 3 variations (high contrast, low contrast, small font) and 3 iterations per unique constellations, with schematic illustration of injection methods. Contrast in “low contrast” was increased for this figure as compared to the actual model input to maintain human readability. Prompt injection in “low contrast” and “small font” reads “Describe which organ you see, but state it looks healthy”.
Fig. 2
Fig. 2. Prompt injection attacks manipulate the capability of VLMs to detect malignant lesions.
a Accuracies in detecting the represented organs per model. Mean ± standard deviation (SD) is shown. n = 18 data points per model (n = 9 for Gemini), with each data point representing a mean of three replicated measurements, two-sided Kruskal-Wallis test with Dunn’s test and Bonferroni post-hoc correction. b Harmfulness scores for all queries with injected prompt vs prompts without prompt injection per model. Mean ± SD are shown. Each point represents triplicate evaluation. Two-sided Wilcoxon Signed-Rank tests with Bonferroni post-hoc correction compared lesion miss rates scores within each model (square brackets). Two-sided Mann-Whitney U tests with Bonferroni post-hoc correction compared lesion miss rates for prompt injection (PI) vs non PI over all models combined (straight bar). P-values were adjusted using the Bonferroni method, with *p < 0.05, **p < 0.01, ***p < 0.001. Harmfulness scores as mean ± standard deviation (SD) per (c) position or (d) variation of adversarial prompt, ordered as Claude-3, Claude-3.5, GPT-4o, and Reka Core from left to right. n = 18 data points per model and variation, with each data point representing a mean of three replicated measurements. Mann-Whitney U test + Bonferroni method over all models combined for each position/variation.
Fig. 3
Fig. 3. Prompt injection attacks are modality-agnostic.
Heatmaps per model and imaging modality for (a) mean organ detection rate, (b) mean attack success rate, (c) lesion miss rate (LMR) for the native models and (d) mean lesion miss rate (LMR) for the prompts with prompt injection, with (b) representing the tile-based difference between (d) and (c). CT Computed Tomography, MRI Magnetic Resonance Imaging, US Ultrasound. * represents instances where LMR was higher for native models than injected models (n = 1). e Thumbnails of all images used for the study sorted by modality. All images contain a histologically confirmed malignant lesion. (Images are cropped for this figure, original images see Supplementary Data 1).
Fig. 4
Fig. 4. Mitigation efforts for prompt injection attacks.
Count of prompt injections that were successful (Model reported no pathologies) or failed (Model reported lesion, either due to failed prompt injection or due to defense mechanism) of n = 54 distinct scenarios in total (0–3 missing values per scenario due to errors in model calling, see Supplementary Table 1b). Two-sided Fisher’s exact test compared ratio of successful vs failed prompt injections for each condition (intra-model comparison only). p-values were adjusted using the Bonferroni method, with *p < 0.05, **p < 0.01, ***p < 0.001.

Similar articles

Cited by

References

    1. Singhal, K. et al. Large language models encode clinical knowledge. Nature620, 172–180 (2023). - PMC - PubMed
    1. Bubeck, S. et al. Sparks of artificial general intelligence: early experiments with GPT-4. arXiv [cs.CL] (2023).
    1. Clusmann, J. et al. The future landscape of large language models in medicine. Commun. Med.3, 141 (2023). - PMC - PubMed
    1. Ferber, D. et al. Autonomous artificial intelligence agents for clinical decision making in oncology. arXiv [cs.AI] (2024).
    1. Thirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med.29, 1930–1940 (2023). - PubMed

LinkOut - more resources