Influence of medical educational background on the diagnostic quality of ChatGPT-4 responses in internal medicine: A pilot study
- PMID: 40922522
- PMCID: PMC12517243
- DOI: 10.1111/eci.70113
Influence of medical educational background on the diagnostic quality of ChatGPT-4 responses in internal medicine: A pilot study
Abstract
This pilot study evaluated the influence of medical background on the diagnostic quality of ChatGPT-4's responses in Internal Medicine. Third-year students, residents and specialists summarised five complex NEJM clinical cases before querying ChatGPT-4. Diagnostic ranking, assessed by independent experts, revealed that residents significantly outperformed students (OR 2.33, p = .007); though overall performance was low. These findings indicate that user expertise and concise case summaries are critical for optimising AI diagnostics, highlighting the need for enhanced AI training and user interaction strategies.
Keywords: ChatGPT‐4; artificial intelligence; clinical decision making; diagnostic ranking; internal medicine; large language models.
© 2025 The Author(s). European Journal of Clinical Investigation published by John Wiley & Sons Ltd on behalf of Stichting European Society for Clinical Investigation Journal Foundation.
Conflict of interest statement
Two study co‐authors of this manuscript are members of the Editorial Board of European Journal of Clinical Investigation. Fabrizio Montecucco is Editor in Chief of European Journal of Clinical Investigation and a co‐author of this article; Federico Carbone is an Editorial Board member of European Journal of Clinical Investigation and a co‐author of this article. To minimize bias, they were excluded from all editorial decision‐making related to the acceptance of this article for publication. The remaining authors declare no significant conflict of interest with this study.
Figures
References
-
- Kaul V, Enslin S, Gross SA. History of artificial intelligence in medicine. Gastrointest Endosc. 2020;92(4):807‐812. - PubMed
-
- Wolfram DA. An appraisal of INTERNIST‐I. Artif Intell Med. 1995;7(2):93‐116. - PubMed
-
- Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29(8):1930‐1940. - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
