Chatbots as Patient Education Resources for Aesthetic Facial Plastic Surgery: Evaluation of ChatGPT and Google Bard Responses

Neha Garg¹, Daniel J Campbell¹, Angela Yang², Adam McCann¹, Annie E Moroco¹, Leonard E Estephan¹, William J Palmer¹, Howard Krein¹, Ryan Heffelfinger¹

Affiliations

¹ Department of Otolaryngology - Head and Neck Surgery, Thomas Jefferson University Hospitals, Philadelphia, Pennsylvania, USA.
² Sidney Kimmel Medical College, Philadelphia, Pennsylvania, USA.

PMID: 38946595
DOI: 10.1089/fpsam.2023.0368

Comparative Study

Chatbots as Patient Education Resources for Aesthetic Facial Plastic Surgery: Evaluation of ChatGPT and Google Bard Responses

Neha Garg et al. Facial Plast Surg Aesthet Med. 2024 Nov-Dec.

. 2024 Nov-Dec;26(6):665-673.

doi: 10.1089/fpsam.2023.0368. Epub 2024 Jul 1.

Authors

Neha Garg¹, Daniel J Campbell¹, Angela Yang², Adam McCann¹, Annie E Moroco¹, Leonard E Estephan¹, William J Palmer¹, Howard Krein¹, Ryan Heffelfinger¹

Affiliations

¹ Department of Otolaryngology - Head and Neck Surgery, Thomas Jefferson University Hospitals, Philadelphia, Pennsylvania, USA.
² Sidney Kimmel Medical College, Philadelphia, Pennsylvania, USA.

PMID: 38946595
DOI: 10.1089/fpsam.2023.0368

Abstract

Background: ChatGPT and Google Bard™ are popular artificial intelligence chatbots with utility for patients, including those undergoing aesthetic facial plastic surgery. Objective: To compare the accuracy and readability of chatbot-generated responses to patient education questions regarding aesthetic facial plastic surgery using a response accuracy scale and readability testing. Method: ChatGPT and Google Bard™ were asked 28 identical questions using four prompts: none, patient friendly, eighth-grade level, and references. Accuracy was assessed using Global Quality Scale (range: 1-5). Flesch-Kincaid grade level was calculated, and chatbot-provided references were analyzed for veracity. Results: Although 59.8% of responses were good quality (Global Quality Scale ≥4), ChatGPT generated more accurate responses than Google Bard™ on patient-friendly prompting (p < 0.001). Google Bard™ responses were of a significantly lower grade level than ChatGPT for all prompts (p < 0.05). Despite eighth-grade prompting, response grade level for both chatbots was high: ChatGPT (10.5 ± 1.8) and Google Bard™ (9.6 ± 1.3). Prompting for references yielded 108/108 of chatbot-generated references. Forty-one (38.0%) citations were legitimate. Twenty (18.5%) provided accurately reported information from the reference. Conclusion: Although ChatGPT produced more accurate responses and at a higher education level than Google Bard™, both chatbots provided responses above recommended grade levels for patients and failed to provide accurate references.

PubMed Disclaimer

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Chatbots as Patient Education Resources for Aesthetic Facial Plastic Surgery: Evaluation of ChatGPT and Google Bard Responses

Affiliations

Chatbots as Patient Education Resources for Aesthetic Facial Plastic Surgery: Evaluation of ChatGPT and Google Bard Responses

Authors

Affiliations

Abstract

Publication types

MeSH terms