Red teaming ChatGPT in medicine to yield real-world insights on model behavior
- PMID: 40055532
- PMCID: PMC11889229
- DOI: 10.1038/s41746-025-01542-0
Red teaming ChatGPT in medicine to yield real-world insights on model behavior
Abstract
Red teaming, the practice of adversarially exposing unexpected or undesired model behaviors, is critical towards improving equity and accuracy of large language models, but non-model creator-affiliated red teaming is scant in healthcare. We convened teams of clinicians, medical and engineering students, and technical professionals (80 participants total) to stress-test models with real-world clinical cases and categorize inappropriate responses along axes of safety, privacy, hallucinations/accuracy, and bias. Six medically-trained reviewers re-analyzed prompt-response pairs and added qualitative annotations. Of 376 unique prompts (1504 responses), 20.1% were inappropriate (GPT-3.5: 25.8%; GPT-4.0: 16%; GPT-4.0 with Internet: 17.8%). Subsequently, we show the utility of our benchmark by testing GPT-4o, a model released after our event (20.4% inappropriate). 21.5% of responses appropriate with GPT-3.5 were inappropriate in updated models. We share insights for constructing red teaming prompts, and present our benchmark for iterative model assessments.
© 2025. The Author(s).
Conflict of interest statement
Competing interests: RD has served as an advisor to MDAlgorithms and Revea and received. consulting fees from Pfizer, L’Oreal, Frazier Healthcare Partners, and DWA, and research funding from UCB and declares no non-financial competing interests. All other authors declare no financial or non-financial competing interests.
Figures

References
LinkOut - more resources
Full Text Sources