Clinical artificial intelligence: teaching a large language model to generate recommendations that align with guidelines for the surgical management of GERD

Bright Huo¹, Nana Marfo², Patricia Sylla³, Elisa Calabrese⁴, Sunjay Kumar⁵, Bethany J Slater⁶, Danielle S Walsh⁷, Wesley Vosburg⁸

Affiliations

¹ Division of General Surgery, Department of Surgery, McMaster University, Hamilton, ON, Canada.
² Ross University School of Medicine, Miramar, FL, USA.
³ Division of Colon and Rectal Surgery, Department of Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁴ University of Adelaide, Adelaide, SA, Australia.
⁵ Department of General Surgery, Thomas Jefferson University Hospital, Philadelphia, PA, USA.
⁶ Department of Surgery, University of Chicago, Chicago, IL, USA.
⁷ Department of Surgery, University of Kentucky, Lexington, KY, USA.
⁸ Department of Surgery, Mount Auburn Hospital, Harvard Medical School, Cambridge, MA, USA. wesvosburg@gmail.com.

PMID: 39134725
DOI: 10.1007/s00464-024-11155-5

Clinical artificial intelligence: teaching a large language model to generate recommendations that align with guidelines for the surgical management of GERD

Bright Huo et al. Surg Endosc. 2024 Oct.

. 2024 Oct;38(10):5668-5677.

doi: 10.1007/s00464-024-11155-5. Epub 2024 Aug 12.

Authors

Bright Huo¹, Nana Marfo², Patricia Sylla³, Elisa Calabrese⁴, Sunjay Kumar⁵, Bethany J Slater⁶, Danielle S Walsh⁷, Wesley Vosburg⁸

Affiliations

¹ Division of General Surgery, Department of Surgery, McMaster University, Hamilton, ON, Canada.
² Ross University School of Medicine, Miramar, FL, USA.
³ Division of Colon and Rectal Surgery, Department of Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁴ University of Adelaide, Adelaide, SA, Australia.
⁵ Department of General Surgery, Thomas Jefferson University Hospital, Philadelphia, PA, USA.
⁶ Department of Surgery, University of Chicago, Chicago, IL, USA.
⁷ Department of Surgery, University of Kentucky, Lexington, KY, USA.
⁸ Department of Surgery, Mount Auburn Hospital, Harvard Medical School, Cambridge, MA, USA. wesvosburg@gmail.com.

PMID: 39134725
DOI: 10.1007/s00464-024-11155-5

Abstract

Background: Large Language Models (LLMs) provide clinical guidance with inconsistent accuracy due to limitations with their training dataset. LLMs are "teachable" through customization. We compared the ability of the generic ChatGPT-4 model and a customized version of ChatGPT-4 to provide recommendations for the surgical management of gastroesophageal reflux disease (GERD) to both surgeons and patients.

Methods: Sixty patient cases were developed using eligibility criteria from the Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) & United European Gastroenterology (UEG)-European Association of Endoscopic. Surgery (EAES) guidelines for the surgical management of GERD. Standardized prompts were engineered for physicians as the end-user, with separate layperson prompts for patients. A customized GPT was developed to generate recommendations based on guidelines, called the GERD Tool for Surgery (GTS). Both the GTS and generic ChatGPT-4 were queried July 21st, 2024. Model performance was evaluated by comparing responses to SAGES & UEG-EAES guideline recommendations. Outcome data was presented using descriptive statistics including counts and percentages.

Results: The GTS provided accurate recommendations for the surgical management of GERD for 60/60 (100.0%) surgeon inquiries and 40/40 (100.0%) patient inquiries based on guideline recommendations. The Generic ChatGPT-4 model generated accurate guidance for 40/60 (66.7%) surgeon inquiries and 19/40 (47.5%) patient inquiries. The GTS produced recommendations based on the 2021 SAGES & UEG-EAES guidelines on the surgical management of GERD, while the generic ChatGPT-4 model generated guidance without citing evidence to support its recommendations.

Conclusion: ChatGPT-4 can be customized to overcome limitations with its training dataset to provide recommendations for the surgical management of GERD with reliable accuracy and consistency. The training of LLM models can be used to help integrate this efficient technology into the creation of robust and accurate information for both surgeons and patients. Prospective data is needed to assess its effectiveness in a pragmatic clinical environment.

Keywords: Artificial intelligence; ChatGPT; GERD; Guidelines; Large language models; Natural language processing; Surgery.

PubMed Disclaimer

References

1. Dave T, Athaluri SA, Singh S (2023) ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell. https://doi.org/10.3389/frai.2023.1169595 - DOI - PubMed - PMC
1. Ge J, Sun S, Owens J, Galvez V, Gologorskaya O, Lai JC, Pletcher MJ, Lai K (2024) Development of a liver disease-specific large language model chat interface using retrieval augmented generation. Hepatology. https://doi.org/10.1097/hep.0000000000000834 - DOI - PubMed
1. Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW (2023) Large language models in medicine. Nat Med. https://doi.org/10.1038/s41591-023-02448-8 - DOI - PubMed
1. Johnson D, Goodman R, Patrinely J, Stone C, Zimmerman E, Donald R, Chang S, Berkowitz S, Finn A, Jahangir E, Scoville E, Reese T, Friedman D, Bastarache J, van der Heijden Y, Wright J, Carter N, Alexander M, Choe J, Wheless L (2023) Assessing the accuracy and reliability of AI-generated medical responses: an evaluation of the Chat-GPT model. Res Square. https://doi.org/10.21203/rs.3.rs-2566942/v1 - DOI
1. Clusmann J, Kolbinger FR, Muti HS, Carrero ZI, Eckardt JN, Laleh NG, Löffler CML, Schwarzkopf SC, Unger M, Veldhuizen GP, Wagner SJ, Kather JN (2023) The future landscape of large language models in medicine. Commun Med. https://doi.org/10.1038/s43856-023-00370-1 - DOI - PubMed - PMC

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Springer
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Clinical artificial intelligence: teaching a large language model to generate recommendations that align with guidelines for the surgical management of GERD

Affiliations

Clinical artificial intelligence: teaching a large language model to generate recommendations that align with guidelines for the surgical management of GERD

Authors

Affiliations

Abstract

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical