Assessing ChatGPT's theoretical knowledge and prescriptive accuracy in bacterial infections: a comparative study with infectious diseases residents and specialists
- PMID: 38995551
- PMCID: PMC12137519
- DOI: 10.1007/s15010-024-02350-6
Assessing ChatGPT's theoretical knowledge and prescriptive accuracy in bacterial infections: a comparative study with infectious diseases residents and specialists
Abstract
Objectives: Advancements in Artificial Intelligence(AI) have made platforms like ChatGPT increasingly relevant in medicine. This study assesses ChatGPT's utility in addressing bacterial infection-related questions and antibiogram-based clinical cases.
Methods: This study involved a collaborative effort involving infectious disease (ID) specialists and residents. A group of experts formulated six true/false, six open-ended questions, and six clinical cases with antibiograms for four types of infections (endocarditis, pneumonia, intra-abdominal infections, and bloodstream infection) for a total of 96 questions. The questions were submitted to four senior residents and four specialists in ID and inputted into ChatGPT-4 and a trained version of ChatGPT-4. A total of 720 responses were obtained and reviewed by a blinded panel of experts in antibiotic treatments. They evaluated the responses for accuracy and completeness, the ability to identify correct resistance mechanisms from antibiograms, and the appropriateness of antibiotics prescriptions.
Results: No significant difference was noted among the four groups for true/false questions, with approximately 70% correct answers. The trained ChatGPT-4 and ChatGPT-4 offered more accurate and complete answers to the open-ended questions than both the residents and specialists. Regarding the clinical case, we observed a lower accuracy from ChatGPT-4 to recognize the correct resistance mechanism. ChatGPT-4 tended not to prescribe newer antibiotics like cefiderocol or imipenem/cilastatin/relebactam, favoring less recommended options like colistin. Both trained- ChatGPT-4 and ChatGPT-4 recommended longer than necessary treatment periods (p-value = 0.022).
Conclusions: This study highlights ChatGPT's capabilities and limitations in medical decision-making, specifically regarding bacterial infections and antibiogram analysis. While ChatGPT demonstrated proficiency in answering theoretical questions, it did not consistently align with expert decisions in clinical case management. Despite these limitations, the potential of ChatGPT as a supportive tool in ID education and preliminary analysis is evident. However, it should not replace expert consultation, especially in complex clinical decision-making.
Keywords: Abdominal infection; Antibiotic resistance; Antimicrobial stewardship; Artificial intelligence; Bacterial infections; Blood-stream infection; ChatGPT; Endocarditis; Infectious diseases; Pneumonia.
© 2024. The Author(s).
Conflict of interest statement
Declarations. Competing interests: The authors declare no competing interests.
Figures




Similar articles
-
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514. JMIR Med Educ. 2024. PMID: 38335017 Free PMC article.
-
Evaluating ChatGPT's effectiveness and tendencies in Japanese internal medicine.J Eval Clin Pract. 2024 Sep;30(6):1017-1023. doi: 10.1111/jep.14011. Epub 2024 May 19. J Eval Clin Pract. 2024. PMID: 38764369
-
ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.Front Med (Lausanne). 2023 Dec 13;10:1296615. doi: 10.3389/fmed.2023.1296615. eCollection 2023. Front Med (Lausanne). 2023. PMID: 38155661 Free PMC article.
-
Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.Int J Nurs Stud. 2024 May;153:104717. doi: 10.1016/j.ijnurstu.2024.104717. Epub 2024 Feb 8. Int J Nurs Stud. 2024. PMID: 38401366
-
The impact of the large language model ChatGPT in oral and maxillofacial surgery: a systematic review.Br J Oral Maxillofac Surg. 2025 Jun;63(5):357-362. doi: 10.1016/j.bjoms.2025.03.006. Epub 2025 Mar 24. Br J Oral Maxillofac Surg. 2025. PMID: 40251084
Cited by
-
Advantages and limitations of large language models for antibiotic prescribing and antimicrobial stewardship.NPJ Antimicrob Resist. 2025 Feb 27;3(1):14. doi: 10.1038/s44259-025-00084-5. NPJ Antimicrob Resist. 2025. PMID: 40016394 Free PMC article. Review.
-
Antibiotics and Artificial Intelligence: Clinical Considerations on a Rapidly Evolving Landscape.Infect Dis Ther. 2025 Mar;14(3):493-500. doi: 10.1007/s40121-025-01114-5. Epub 2025 Feb 15. Infect Dis Ther. 2025. PMID: 39954227 Free PMC article.
-
Assessing the accuracy and clinical utility of GPT-4O in abnormal blood cell morphology recognition.Digit Health. 2024 Nov 5;10:20552076241298503. doi: 10.1177/20552076241298503. eCollection 2024 Jan-Dec. Digit Health. 2024. PMID: 39502485 Free PMC article.
-
Enhancing AI Chatbot Responses in Health Care: The SMART Prompt Structure in Head and Neck Surgery.OTO Open. 2025 Jan 16;9(1):e70075. doi: 10.1002/oto2.70075. eCollection 2025 Jan-Mar. OTO Open. 2025. PMID: 39822375 Free PMC article.
-
Can we rely on artificial intelligence to guide antimicrobial therapy? A systematic literature review.Antimicrob Steward Healthc Epidemiol. 2025 Mar 31;5(1):e90. doi: 10.1017/ash.2025.47. eCollection 2025. Antimicrob Steward Healthc Epidemiol. 2025. PMID: 40226293 Free PMC article.
References
-
- Shah YB, Ghosh A, Hochberg AR, Rapoport E, Lallas CD, Shah MS, Cohen SD. Comparison of ChatGPT and Traditional Patient Education Materials for Men’s Health. Urol Pract. 2024;11:87–94. 10.1097/UPJ.0000000000000490. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Medical