"Can ChatGPT Answer Patient's Questions?": A Preliminary Analysis

Donghua Tao¹, Karl M Kochendorfer², Tina Griffin¹, Quincy McCrary¹, Anuj Gautam³, Bichoy S R Labib⁴, Mohammad Arvan², Jason Flynn¹, Kaitlyn Jiang⁵

Affiliations

¹ Library of the Health Sciences, University of Illinois Chicago, Chicago IL, US.
² Department of Family &Community Medicine, University of Illinois Chicago, Chicago IL, US.
³ Discovery Partners Institute, Chicago IL, US.
⁴ College of Medicine, University of Illinois Chicago, Chicago IL, US.
⁵ Glenbrook South High School, Glenview IL, US.

PMID: 40776131
DOI: 10.3233/SHTI251114

"Can ChatGPT Answer Patient's Questions?": A Preliminary Analysis

Donghua Tao et al. Stud Health Technol Inform. 2025.

. 2025 Aug 7:329:1586-1587.

doi: 10.3233/SHTI251114.

Authors

Donghua Tao¹, Karl M Kochendorfer², Tina Griffin¹, Quincy McCrary¹, Anuj Gautam³, Bichoy S R Labib⁴, Mohammad Arvan², Jason Flynn¹, Kaitlyn Jiang⁵

Affiliations

¹ Library of the Health Sciences, University of Illinois Chicago, Chicago IL, US.
² Department of Family &Community Medicine, University of Illinois Chicago, Chicago IL, US.
³ Discovery Partners Institute, Chicago IL, US.
⁴ College of Medicine, University of Illinois Chicago, Chicago IL, US.
⁵ Glenbrook South High School, Glenview IL, US.

PMID: 40776131
DOI: 10.3233/SHTI251114

Abstract

Whether ChatGPT's answers to medical questions are accurate, reliable, and trustworthy, and whether the public, not having a health background, knows how to evaluate ChatGPT's answers remains unclear. This study assessed ChatGPT's performance in answering medical questions posed by the public. An existing clinical question dataset of consumer questions from the NIH Genetic and Rare Diseases Information Center (GARD) was used for this study. API calls produced 1467 question-answer pairs for GPT-4-0613 (ChatGPT-4.0). 100 question-answer pairs were randomly selected as the sample of this study. They were evaluated on two criteria, Scientific Accuracy, and Comprehensiveness, with scales from 0 to 5. The results showed that ChatGPT-4.0 provided about 90% above average performance on scale points 4 and 5 on Scientific Accuracy, 84% on Comprehensiveness, and approximately 7% and 14% on average on scale point 3 on these two quality criteria. No statistical differences were found in the quality of answers to the questions following a question framework and those without. These study results indicate that healthcare consumers must consult healthcare providers and/or other reliable information resources to verify the answers and gauge the applicability to individual situations. Further studies can include investigating the impact of how the medical questions were asked on the quality of ChatGPT answers, comparing healthcare consumers' evaluations with healthcare and information professionals' evaluations, and so on.

Keywords: ChatGPT; Consumer Informatics; Evaluation; Information Quality.

PubMed Disclaimer

MeSH terms

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- IOS Press

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

"Can ChatGPT Answer Patient's Questions?": A Preliminary Analysis

Affiliations

"Can ChatGPT Answer Patient's Questions?": A Preliminary Analysis

Authors

Affiliations

Abstract

MeSH terms

LinkOut - more resources

Full Text Sources