Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2025 Jun;53(3):873-881.
doi: 10.1007/s15010-024-02350-6. Epub 2024 Jul 12.

Assessing ChatGPT's theoretical knowledge and prescriptive accuracy in bacterial infections: a comparative study with infectious diseases residents and specialists

Affiliations
Comparative Study

Assessing ChatGPT's theoretical knowledge and prescriptive accuracy in bacterial infections: a comparative study with infectious diseases residents and specialists

Andrea De Vito et al. Infection. 2025 Jun.

Abstract

Objectives: Advancements in Artificial Intelligence(AI) have made platforms like ChatGPT increasingly relevant in medicine. This study assesses ChatGPT's utility in addressing bacterial infection-related questions and antibiogram-based clinical cases.

Methods: This study involved a collaborative effort involving infectious disease (ID) specialists and residents. A group of experts formulated six true/false, six open-ended questions, and six clinical cases with antibiograms for four types of infections (endocarditis, pneumonia, intra-abdominal infections, and bloodstream infection) for a total of 96 questions. The questions were submitted to four senior residents and four specialists in ID and inputted into ChatGPT-4 and a trained version of ChatGPT-4. A total of 720 responses were obtained and reviewed by a blinded panel of experts in antibiotic treatments. They evaluated the responses for accuracy and completeness, the ability to identify correct resistance mechanisms from antibiograms, and the appropriateness of antibiotics prescriptions.

Results: No significant difference was noted among the four groups for true/false questions, with approximately 70% correct answers. The trained ChatGPT-4 and ChatGPT-4 offered more accurate and complete answers to the open-ended questions than both the residents and specialists. Regarding the clinical case, we observed a lower accuracy from ChatGPT-4 to recognize the correct resistance mechanism. ChatGPT-4 tended not to prescribe newer antibiotics like cefiderocol or imipenem/cilastatin/relebactam, favoring less recommended options like colistin. Both trained- ChatGPT-4 and ChatGPT-4 recommended longer than necessary treatment periods (p-value = 0.022).

Conclusions: This study highlights ChatGPT's capabilities and limitations in medical decision-making, specifically regarding bacterial infections and antibiogram analysis. While ChatGPT demonstrated proficiency in answering theoretical questions, it did not consistently align with expert decisions in clinical case management. Despite these limitations, the potential of ChatGPT as a supportive tool in ID education and preliminary analysis is evident. However, it should not replace expert consultation, especially in complex clinical decision-making.

Keywords: Abdominal infection; Antibiotic resistance; Antimicrobial stewardship; Artificial intelligence; Bacterial infections; Blood-stream infection; ChatGPT; Endocarditis; Infectious diseases; Pneumonia.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Percentage of correct and wrong answers for the true or false questions in the different groups
Fig. 2
Fig. 2
Performance of Infectious Diseases residents and specialists, ChatGPT4 and trained ChatGPT4 in answering open questions regarding antibiotic treatment. p-value calculated with Kruskal-Wallis test < 0.001
Fig. 3
Fig. 3
Performance of Infectious Diseases residents, specialists, ChatGPT4, and trained ChatGPT4 in prescribing the correct antibiotic treatment according to the clinical cases and the antibiograms. p-value 0.068
Fig. 4
Fig. 4
Performance of Infectious Diseases residents, specialists, ChatGPT4, and trained ChatGPT4 in prescribing the correct length of treatment according to the clinical cases. p-value = 0.022

Similar articles

Cited by

References

    1. Khan RA, Jawaid M, Khan AR, Sajjad M. ChatGPT - reshaping Medical Education and Clinical Management. Pak J Med Sci. 2023;39:605. 10.12669/PJMS.39.2.7653. - PMC - PubMed
    1. Ruksakulpiwat S, Kumar A, Ajibade A. Using ChatGPT in Medical Research: current status and future directions. J Multidiscip Healthc. 2023;16:1513–20. 10.2147/JMDH.S413470. - PMC - PubMed
    1. Dave T, Athaluri SA, Singh S. ChatGPT in Medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell. 2023;6. 10.3389/FRAI.2023.1169595. - PMC - PubMed
    1. Shah YB, Ghosh A, Hochberg AR, Rapoport E, Lallas CD, Shah MS, Cohen SD. Comparison of ChatGPT and Traditional Patient Education Materials for Men’s Health. Urol Pract. 2024;11:87–94. 10.1097/UPJ.0000000000000490. - PubMed
    1. Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibility of ChatGPT in Healthcare: an analysis of multiple clinical and research scenarios. J Med Syst. 2023;47. 10.1007/S10916-023-01925-4. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources