Challenges in Implementing Endoscopic Artificial Intelligence: The Impact of Real-World Imaging Conditions on Barrett's Neoplasia Detection
- PMID: 40116287
- PMCID: PMC12269737
- DOI: 10.1002/ueg2.12760
Challenges in Implementing Endoscopic Artificial Intelligence: The Impact of Real-World Imaging Conditions on Barrett's Neoplasia Detection
Abstract
Background: Endoscopic deep learning systems are often developed using high-quality imagery obtained from expert centers. Therefore, they may underperform in community hospitals where image quality is more heterogeneous.
Objective: This study aimed to quantify the performance degradation of a computer aided detection system for Barrett's neoplasia, trained on expert images, when exposed to more heterogeneous imaging conditions representative of daily clinical practice. Further, we evaluated strategies to mitigate this performance loss.
Methods: We developed a computer aided detection system using 1011 high-quality, expert-acquired images from 373 Barrett's patients. We assessed its performance on high, moderate and low image quality test sets, each containing images from an independent group of 117 Barrett's patients. These test sets reflected the varied image quality of routine patient care and contained artefacts such as insufficient mucosal cleaning and inadequate esophageal expansion. We then applied three methods to improve the algorithm's robustness to data heterogeneity: inclusion of more diverse training data, domain-specific pretraining and architectural optimization.
Results: The computer aided detection system, when trained exclusively on high-quality data, achieved area under the curve (AUC), sensitivity and specificity scores of 83%, 85% and 67% on the high quality test set. AUC and sensitivity were significantly lower with 80% (p < 0.001) and 62% (p = 0.002) on the moderate-quality and 71% (p > 0.001) and 47% (p = 0.002) on the low-quality test set. Incorporating robustness-enhancing strategies significantly improved the AUC, sensitivity and specificity to 92% (p = 0.004), 88% (p = 0.84) and 81% (p = 0.003) on the high-quality test set, 93% (p = 0.006), 86% (p = 0.01) and 83% (p = 0.09) on the moderate-quality test set and 84% (p = 0.001), 78% (p = 0.002) and 77% (p = 0.23) on the low-quality test set.
Conclusion: Endoscopic deep learning systems trained solely on high-quality images may not perform well when exposed to heterogeneous imagery, as found in routine practice. Robustness-enhancing training strategies can increase the likelihood of successful clinical implementation.
Keywords: Barrett's esophagus; artificial intelligence; computer aided detection; deep learning systems; endoscopy; esophageal adenocarcinoma.
© 2025 The Author(s). United European Gastroenterology Journal published by Wiley Periodicals LLC on behalf of United European Gastroenterology.
Conflict of interest statement
JJB reports financial support for IRB approved research from C2Therapeutics, Pentax Medical, Medtronic, Olympus and Aqua Medical. PHW received financial support for IRB approved research from Olympus.
Figures
References
-
- US Food & Drug Administration. Artificial Intelligence and Machine Learning (AI/ML)‐Enabled Medical Devices [Internet], Silver Spring, MD: US Food & Drug Administration, accessed June 12, 2024, https://www.fda.gov/medical‐devices/software‐medical‐device‐samd/artific....
-
- Karsenti D., Tharsis G., Perrot B., et al., “Effect of Real‐Time Computer‐Aided Detection of Colorectal Adenoma in Routine Colonoscopy (COLO‐GENIUS): A Single‐Centre Randomised Controlled Trial,” Lancet Gastroenterology and Hepatology 8, no. 8 (2023): 726–734, 10.1016/s2468-1253(23)00104-8. - DOI - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical
