The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease

Bright Huo¹, Elisa Calabrese², Patricia Sylla³, Sunjay Kumar⁴, Romeo C Ignacio⁵, Rodolfo Oviedo^{6

7

8}, Imran Hassan⁹, Bethany J Slater¹⁰, Andreas Kaiser¹¹, Danielle S Walsh¹², Wesley Vosburg¹³

Affiliations

¹ Division of General Surgery, Department of Surgery, McMaster University, Hamilton, ON, Canada.
² University of California South California, East Bay, Oakland, CA, USA.
³ Division of Colon and Rectal Surgery, Department of Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁴ Department of General Surgery, Thomas Jefferson University Hospital, Philadelphia, PA, USA.
⁵ Division of Pediatric Surgery/Department of Surgery, San Diego School of Medicine, University of California, California, CA, USA.
⁶ Nacogdoches Center for Metabolic and Weight Loss Surgery, Nacogdoches, TX, USA.
⁷ University of Houston Tilman J. Fertitta Family College of Medicine, Houston, TX, USA.
⁸ Sam Houston State University College of Osteopathic Medicine, Conroe, TX, USA.
⁹ University of Iowa, Iowa City, IA, USA.
¹⁰ Department of Surgery, University of Chicago, Chicago, IL, USA.
¹¹ Division of Colorectal Surgery, Department of Surgery, City of Hope National Medical Center, Duarte, CA, USA.
¹² Department of Surgery, University of Kentucky, Lexington, KY, USA.
¹³ Department of Surgery, Harvard Medical School, Mount Auburn Hospital, Cambridge, MA, USA. wesvosburg@gmail.com.

PMID: 38630178
DOI: 10.1007/s00464-024-10807-w

The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease

Bright Huo et al. Surg Endosc. 2024 May.

. 2024 May;38(5):2320-2330.

doi: 10.1007/s00464-024-10807-w. Epub 2024 Apr 17.

Authors

Affiliations

¹ Division of General Surgery, Department of Surgery, McMaster University, Hamilton, ON, Canada.
² University of California South California, East Bay, Oakland, CA, USA.
³ Division of Colon and Rectal Surgery, Department of Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁴ Department of General Surgery, Thomas Jefferson University Hospital, Philadelphia, PA, USA.
⁵ Division of Pediatric Surgery/Department of Surgery, San Diego School of Medicine, University of California, California, CA, USA.
⁶ Nacogdoches Center for Metabolic and Weight Loss Surgery, Nacogdoches, TX, USA.
⁷ University of Houston Tilman J. Fertitta Family College of Medicine, Houston, TX, USA.
⁸ Sam Houston State University College of Osteopathic Medicine, Conroe, TX, USA.
⁹ University of Iowa, Iowa City, IA, USA.
¹⁰ Department of Surgery, University of Chicago, Chicago, IL, USA.
¹¹ Division of Colorectal Surgery, Department of Surgery, City of Hope National Medical Center, Duarte, CA, USA.
¹² Department of Surgery, University of Kentucky, Lexington, KY, USA.
¹³ Department of Surgery, Harvard Medical School, Mount Auburn Hospital, Cambridge, MA, USA. wesvosburg@gmail.com.

PMID: 38630178
DOI: 10.1007/s00464-024-10807-w

Abstract

Background: Large language model (LLM)-linked chatbots may be an efficient source of clinical recommendations for healthcare providers and patients. This study evaluated the performance of LLM-linked chatbots in providing recommendations for the surgical management of gastroesophageal reflux disease (GERD).

Methods: Nine patient cases were created based on key questions addressed by the Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) guidelines for the surgical treatment of GERD. ChatGPT-3.5, ChatGPT-4, Copilot, Google Bard, and Perplexity AI were queried on November 16th, 2023, for recommendations regarding the surgical management of GERD. Accurate chatbot performance was defined as the number of responses aligning with SAGES guideline recommendations. Outcomes were reported with counts and percentages.

Results: Surgeons were given accurate recommendations for the surgical management of GERD in an adult patient for 5/7 (71.4%) KQs by ChatGPT-4, 3/7 (42.9%) KQs by Copilot, 6/7 (85.7%) KQs by Google Bard, and 3/7 (42.9%) KQs by Perplexity according to the SAGES guidelines. Patients were given accurate recommendations for 3/5 (60.0%) KQs by ChatGPT-4, 2/5 (40.0%) KQs by Copilot, 4/5 (80.0%) KQs by Google Bard, and 1/5 (20.0%) KQs by Perplexity, respectively. In a pediatric patient, surgeons were given accurate recommendations for 2/3 (66.7%) KQs by ChatGPT-4, 3/3 (100.0%) KQs by Copilot, 3/3 (100.0%) KQs by Google Bard, and 2/3 (66.7%) KQs by Perplexity. Patients were given appropriate guidance for 2/2 (100.0%) KQs by ChatGPT-4, 2/2 (100.0%) KQs by Copilot, 1/2 (50.0%) KQs by Google Bard, and 1/2 (50.0%) KQs by Perplexity.

Conclusions: Gastrointestinal surgeons, gastroenterologists, and patients should recognize both the promise and pitfalls of LLM's when utilized for advice on surgical management of GERD. Additional training of LLM's using evidence-based health information is needed.

Keywords: ChatGPT; GERD; Generative artificial intelligence; Guidelines; Large language models; Natural language processing; Surgery.

PubMed Disclaimer

References

1. Meyer JG, Urbanowicz RJ, Martin PCN, O’Connor K, Li R, Peng PC, Bright TJ, Tatonetti N, Won KJ, Gonzalez-Hernandez G, Moore JH (2023) ChatGPT and large language models in academia: opportunities and challenges. BioData Min 16:1–11 - DOI
1. Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW (2023) Large language models in medicine. Nat Med 29:1930–1940 - DOI - PubMed
1. Sakirin T, Ben Said R (2023) User preferences for ChatGPT-powered conversational interfaces versus traditional methods. MJCSC. https://doi.org/10.58496/MJCSC/2023/004 - DOI
1. Dave T, Athaluri SA, Singh S (2023) ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell 6:1–5 - DOI
1. Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, Faix DJ, Goodman AM, Longhurst CA, Hogarth M, Smith DM (2023) Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med 183:589–596. https://doi.org/10.1001/jamainternmed.2023.1838 - DOI - PubMed - PMC

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Springer
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease

Affiliations

The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease

Authors

Affiliations

Abstract

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical