"Can ChatGPT Answer Patient's Questions?": A Preliminary Analysis
- PMID: 40776131
- DOI: 10.3233/SHTI251114
"Can ChatGPT Answer Patient's Questions?": A Preliminary Analysis
Abstract
Whether ChatGPT's answers to medical questions are accurate, reliable, and trustworthy, and whether the public, not having a health background, knows how to evaluate ChatGPT's answers remains unclear. This study assessed ChatGPT's performance in answering medical questions posed by the public. An existing clinical question dataset of consumer questions from the NIH Genetic and Rare Diseases Information Center (GARD) was used for this study. API calls produced 1467 question-answer pairs for GPT-4-0613 (ChatGPT-4.0). 100 question-answer pairs were randomly selected as the sample of this study. They were evaluated on two criteria, Scientific Accuracy, and Comprehensiveness, with scales from 0 to 5. The results showed that ChatGPT-4.0 provided about 90% above average performance on scale points 4 and 5 on Scientific Accuracy, 84% on Comprehensiveness, and approximately 7% and 14% on average on scale point 3 on these two quality criteria. No statistical differences were found in the quality of answers to the questions following a question framework and those without. These study results indicate that healthcare consumers must consult healthcare providers and/or other reliable information resources to verify the answers and gauge the applicability to individual situations. Further studies can include investigating the impact of how the medical questions were asked on the quality of ChatGPT answers, comparing healthcare consumers' evaluations with healthcare and information professionals' evaluations, and so on.
Keywords: ChatGPT; Consumer Informatics; Evaluation; Information Quality.
MeSH terms
LinkOut - more resources
Full Text Sources