Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 14;113(1):65-77.
doi: 10.5195/jmla.2025.1985.

Evaluating a large language model's ability to answer clinicians' requests for evidence summaries

Affiliations

Evaluating a large language model's ability to answer clinicians' requests for evidence summaries

Mallory N Blasingame et al. J Med Libr Assoc. .

Abstract

Objective: This study investigated the performance of a generative artificial intelligence (AI) tool using GPT-4 in answering clinical questions in comparison with medical librarians' gold-standard evidence syntheses.

Methods: Questions were extracted from an in-house database of clinical evidence requests previously answered by medical librarians. Questions with multiple parts were subdivided into individual topics. A standardized prompt was developed using the COSTAR framework. Librarians submitted each question into aiChat, an internally managed chat tool using GPT-4, and recorded the responses. The summaries generated by aiChat were evaluated on whether they contained the critical elements used in the established gold-standard summary of the librarian. A subset of questions was randomly selected for verification of references provided by aiChat.

Results: Of the 216 evaluated questions, aiChat's response was assessed as "correct" for 180 (83.3%) questions, "partially correct" for 35 (16.2%) questions, and "incorrect" for 1 (0.5%) question. No significant differences were observed in question ratings by question category (p=0.73). For a subset of 30% (n=66) of questions, 162 references were provided in the aiChat summaries, and 60 (37%) were confirmed as nonfabricated.

Conclusions: Overall, the performance of a generative AI tool was promising. However, many included references could not be independently verified, and attempts were not made to assess whether any additional concepts introduced by aiChat were factually accurate. Thus, we envision this being the first of a series of investigations designed to further our understanding of how current and future versions of generative AI can be used and integrated into medical librarians' workflow.

Keywords: Artificial Intelligence; Biomedical Informatics; Evidence Synthesis; Generative AI; Information Science; LLMs; Large Language Models; Library Science.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Standardized prompt, in COSTAR format, used to submit each question to aiChat.

Update of

Similar articles

Cited by

References

    1. OpenAI. Introducing ChatGPT [Internet]. OpenAI; 2022. Nov 30 [cited 2024 Apr 25]. <https://openai.com/blog/chatgpt>.
    1. Johns WL, Kellish A, Farronato D, Ciccotti MG, Hammoud S. ChatGPT can offer satisfactory responses to common patient questions regarding elbow ulnar collateral ligament reconstruction. Arthrosc Sports Med Rehabil. 2024. Apr;6(2):100893. DOI: 10.1016/j.asmr.2024.100893 - DOI - PMC - PubMed
    1. Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, Faix DJ, Goodman AM, Longhurst CA, Hogarth M, Smith DM. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. 2023. Jun 1;183(6):589–96. DOI: 10.1001/jamainternmed.2023.1838 - DOI - PMC - PubMed
    1. Wei Q, Yao Z, Cui Y, Wei B, Jin Z, Xu X. Evaluation of ChatGPT-generated medical responses: a systematic review and meta-analysis. J Biomed Inform. 2024. Mar;151:104620. DOI: 10.1016/j.jbi.2024.104620 - DOI - PubMed
    1. Mohammad B, Supti T, Alzubaidi M, Shah H, Alam T, Shah Z, Househ M. The pros and cons of using ChatGPT in medical education: a scoping review. Stud Health Technol Inform. 2023. Jun 29;305:644–7. DOI: 10.3233/shti230580 - DOI - PubMed

LinkOut - more resources