Use of Large Language Models to Determine the Surveillance Colonoscopy Interval: A Bi-institutional Validation Study

Vedant Acharya¹, Shivan J Mehta², Daniel A Sussman³, Vignesh Kumaresan⁴, Jonathan England⁵, Tessa S Cook¹, S Barry Issenberg⁶, Amar R Deshpande³

Affiliations

¹ Department of Radiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA.
² Division of Gastroenterology, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA.
³ Division of Digestive Health and Liver Diseases, Department of Medicine, University of Miami Miller School of Medicine, Miami, Florida, USA.
⁴ V Labs, Sunnyvale, California, USA.
⁵ Gastromed, Miami, Florida, USA.
⁶ Michael S Gordon Center for Simulation and Innovation in Medical Education, University of Miami Miller School of Medicine, Miami, Florida, USA.

PMID: 41351229
DOI: 10.14309/ajg.0000000000003864

Use of Large Language Models to Determine the Surveillance Colonoscopy Interval: A Bi-institutional Validation Study

Vedant Acharya et al. Am J Gastroenterol. 2025.

. 2025 Nov 24.

doi: 10.14309/ajg.0000000000003864. Online ahead of print.

Authors

Vedant Acharya¹, Shivan J Mehta², Daniel A Sussman³, Vignesh Kumaresan⁴, Jonathan England⁵, Tessa S Cook¹, S Barry Issenberg⁶, Amar R Deshpande³

Affiliations

¹ Department of Radiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA.
² Division of Gastroenterology, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA.
³ Division of Digestive Health and Liver Diseases, Department of Medicine, University of Miami Miller School of Medicine, Miami, Florida, USA.
⁴ V Labs, Sunnyvale, California, USA.
⁵ Gastromed, Miami, Florida, USA.
⁶ Michael S Gordon Center for Simulation and Innovation in Medical Education, University of Miami Miller School of Medicine, Miami, Florida, USA.

PMID: 41351229
DOI: 10.14309/ajg.0000000000003864

Abstract

Introduction: To determine the appropriate post-polypectomy colonoscopy surveillance interval, endoscopists synthesize information from colonoscopy and pathology report impressions and subsequently apply guideline-recommended interval algorithms, such as those developed by the United States Multi-Society Task Force (USMSTF). Given the complexity of these guidelines, this manual process is error-prone, necessitating automated tools, including large language models (LLMs), to improve guideline adherence.

Objective: The primary aim of this study was to identify the LLM performance in determining the guideline-concordant post-polypectomy surveillance interval on a cohort of 1000 real-world colonoscopy and pathology report impressions.

Methods: The data of patients who underwent a screening or surveillance colonoscopy in 2023-2024 at two academic health centers were included. Using a custom prompt outlining the USMSTF post-polypectomy surveillance algorithm, the LLM (GPT-4o) was asked to determine the appropriate surveillance interval for all 1000 examples in the dataset. This experiment, using the same model, prompt, and dataset, was repeated 10 times; all experiments were conducted between January 27, 2025, and February 3, 2025.

Results: Across 10 experiments, the average accuracy was 94.6%. There was no significant difference in accuracy based on the institution from which the data originated or the presence of synchronous upper GI endoscopy data within the pathology report impression. Examples with 1-3 colon polyps had an average accuracy of 95.8% while examples with 4 or more colon polyps had an average accuracy of 88.2%, combined p-value < 0.001.

Conclusion: LLMs with a custom prompt achieve consistently high accuracy in determining the guideline-based surveillance colonoscopy interval.

PubMed Disclaimer

LinkOut - more resources

Full Text Sources
- Ovid Technologies, Inc.
- Wolters Kluwer

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Use of Large Language Models to Determine the Surveillance Colonoscopy Interval: A Bi-institutional Validation Study

Affiliations

Use of Large Language Models to Determine the Surveillance Colonoscopy Interval: A Bi-institutional Validation Study

Authors

Affiliations

Abstract

LinkOut - more resources

Full Text Sources