. 2024 Oct 1:26:e58831.

doi: 10.2196/58831.

"Doctor ChatGPT, Can You Help Me?" The Patient's Perspective: Cross-Sectional Study

Jonas Armbruster¹, Florian Bussmann¹, Catharina Rothhaas¹, Nadine Titze¹, Paul Alfred Grützner¹, Holger Freischmidt¹

Affiliations

PMID: 39352738
PMCID: PMC11480680
DOI: 10.2196/58831

"Doctor ChatGPT, Can You Help Me?" The Patient's Perspective: Cross-Sectional Study

Jonas Armbruster et al. J Med Internet Res. 2024.

. 2024 Oct 1:26:e58831.

doi: 10.2196/58831.

Authors

Jonas Armbruster¹, Florian Bussmann¹, Catharina Rothhaas¹, Nadine Titze¹, Paul Alfred Grützner¹, Holger Freischmidt¹

Affiliation

¹ Department of Trauma and Orthopedic Surgery, BG Klinik Ludwigshafen, Ludwigshafen am Rhein, Germany.

PMID: 39352738
PMCID: PMC11480680
DOI: 10.2196/58831

Abstract

Background: Artificial intelligence and the language models derived from it, such as ChatGPT, offer immense possibilities, particularly in the field of medicine. It is already evident that ChatGPT can provide adequate and, in some cases, expert-level responses to health-related queries and advice for patients. However, it is currently unknown how patients perceive these capabilities, whether they can derive benefit from them, and whether potential risks, such as harmful suggestions, are detected by patients.

Objective: This study aims to clarify whether patients can get useful and safe health care advice from an artificial intelligence chatbot assistant.

Methods: This cross-sectional study was conducted using 100 publicly available health-related questions from 5 medical specialties (trauma, general surgery, otolaryngology, pediatrics, and internal medicine) from a web-based platform for patients. Responses generated by ChatGPT-4.0 and by an expert panel (EP) of experienced physicians from the aforementioned web-based platform were packed into 10 sets consisting of 10 questions each. The blinded evaluation was carried out by patients regarding empathy and usefulness (assessed through the question: "Would this answer have helped you?") on a scale from 1 to 5. As a control, evaluation was also performed by 3 physicians in each respective medical specialty, who were additionally asked about the potential harm of the response and its correctness.

Results: In total, 200 sets of questions were submitted by 64 patients (mean 45.7, SD 15.9 years; 29/64, 45.3% male), resulting in 2000 evaluated answers of ChatGPT and the EP each. ChatGPT scored higher in terms of empathy (4.18 vs 2.7; P<.001) and usefulness (4.04 vs 2.98; P<.001). Subanalysis revealed a small bias in terms of levels of empathy given by women in comparison with men (4.46 vs 4.14; P=.049). Ratings of ChatGPT were high regardless of the participant's age. The same highly significant results were observed in the evaluation of the respective specialist physicians. ChatGPT outperformed significantly in correctness (4.51 vs 3.55; P<.001). Specialists rated the usefulness (3.93 vs 4.59) and correctness (4.62 vs 3.84) significantly lower in potentially harmful responses from ChatGPT (P<.001). This was not the case among patients.

Conclusions: The results indicate that ChatGPT is capable of supporting patients in health-related queries better than physicians, at least in terms of written advice through a web-based platform. In this study, ChatGPT's responses had a lower percentage of potentially harmful advice than the web-based EP. However, it is crucial to note that this finding is based on a specific study design and may not generalize to all health care settings. Alarmingly, patients are not able to independently recognize these potential dangers.

Keywords: AI; ChatGPT; LLM; artificial intelligence; chatbot; chatbots; empathy; large language models; patient education; patient information; patient perceptions.

©Jonas Armbruster, Florian Bussmann, Catharina Rothhaas, Nadine Titze, Paul Alfred Grützner, Holger Freischmidt. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 01.10.2024.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 1**
Study workflow. (A): Identification of 100 patient questions, 20 questions per specialty. (B + C): Collection of existing responses from a web-based EP (B) and generation of new responses from ChatGPT (C). (D): Building database with anonymized questions and responses. (E + F): Assembly of specialty-specific packages for physicians (E) and mixed packages for patients (F). (G + H): Data collection: patients rated responses for empathy and usefulness, while physicians provided feedback encompassing empathy, usefulness, correctness, and potential harm. ENT: otolaryngology; EP: expert panel; GS: general surgery; Internal: internal medicine; Ped: pediatrics; trauma: traumatology.

**Figure 2**
Rating of ChatGPT versus EP by specialists in their respective field—combined specialties. (A) Empathy. (B) Usefulness. (C) Correctness. (D) Potential harm. EP: expert panel.

**Figure 3**
Rating of ChatGPT by specialists in their respective fields—specialties separated. (A) Empathy. (B) Usefulness. (C) Correctness. (D) Potential harm. P values of Bonferroni post hoc test >0.99 each but empathy ENT versus Internal P=.826. ENT: otolaryngology; GS: general surgery; Internal: internal medicine; NS: not significant; Ped: pediatrics; trauma: traumatology.

**Figure 4**
Rating of ChatGPT versus EP by patients—combined specialties. (A) Empathy. (B) Usefulness. EP: expert panel.

**Figure 5**
Rating of ChatGPT by patients—specialties separated. (A) Empathy. (B) Usefulness. P values of Bonferroni post hoc test >0.99 each. ENT: otolaryngology; GS: general surgery; Internal: internal medicine; NS: not significant; Ped: pediatrics; trauma: traumatology.

**Figure 6**
Rating of ChatGPT by patients—gender separated. (A) Empathy. (B) Usefulness.

**Figure 7**
Rating of ChatGPT by patients—results in correlation to age. (A) Empathy, Pearson correlation: –0.067. (B) Usefulness, Pearson correlation: –0.109.

**Figure 8**
Rating of ChatGPT by physicians and patients—potentially harmful and nonharmful advice separated. (A) Empathy—patients. (B) Usefulness—patients. (C) Empathy—physicians. (D) Usefulness—physicians. (E) Correctness—physicians. Δ indicates differences of mean.

See this image and copyright information in PMC

References

1. De Angelis L, Baglivo F, Arzilli G, Privitera GP, Ferragina P, Tozzi AE, Rizzo C. ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health. Front Public Health. 2023;11:1166120. doi: 10.3389/fpubh.2023.1166120. https://europepmc.org/abstract/MED/37181697 - DOI - PMC - PubMed
1. Christiano PF, Leike J, Brown TB, Martic M, Legg S, Amodei D. Deep reinforcement learning from human preferences. arXiv. Preprint posted online. 2013:1–17. doi: 10.5260/chara.21.2.8. - DOI
1. Shen Y, Heacock L, Elias J, Hentel KD, Reig B, Shih G, Moy L. ChatGPT and other large language models are double-edged swords. Radiology. 2023;307(2):e230163. doi: 10.1148/radiol.230163. - DOI - PubMed
1. Ouyang L, Wu J, Jiang X, Almeida D, Wainwright CL, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A, Schulman J, Hilton J, Kelton F, Miller L, Simens M, Askell A, Welinder P, Christiano P, Leike J, Lowe R. Training language models tof ollow instructions with human feedback. arXiv. Published online. 2022 Mar 4;:1–68.
1. Min B, Ross H, Sulem E, Veyseh APB, Nguyen TH, Sainz O, Agirre E, Heintz I, Roth D. Recent advances in natural language processing via large pre-trained language models: a survey. ACM Comput. Surv. 2023 Sep 14;56(2):1–40. doi: 10.1145/3605943. - DOI

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- JMIR Publications
- PubMed Central
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

"Doctor ChatGPT, Can You Help Me?" The Patient's Perspective: Cross-Sectional Study

Affiliation

"Doctor ChatGPT, Can You Help Me?" The Patient's Perspective: Cross-Sectional Study

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Research Materials