Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul;13(6):929-937.
doi: 10.1002/ueg2.12760. Epub 2025 Mar 21.

Challenges in Implementing Endoscopic Artificial Intelligence: The Impact of Real-World Imaging Conditions on Barrett's Neoplasia Detection

Affiliations

Challenges in Implementing Endoscopic Artificial Intelligence: The Impact of Real-World Imaging Conditions on Barrett's Neoplasia Detection

M R Jong et al. United European Gastroenterol J. 2025 Jul.

Abstract

Background: Endoscopic deep learning systems are often developed using high-quality imagery obtained from expert centers. Therefore, they may underperform in community hospitals where image quality is more heterogeneous.

Objective: This study aimed to quantify the performance degradation of a computer aided detection system for Barrett's neoplasia, trained on expert images, when exposed to more heterogeneous imaging conditions representative of daily clinical practice. Further, we evaluated strategies to mitigate this performance loss.

Methods: We developed a computer aided detection system using 1011 high-quality, expert-acquired images from 373 Barrett's patients. We assessed its performance on high, moderate and low image quality test sets, each containing images from an independent group of 117 Barrett's patients. These test sets reflected the varied image quality of routine patient care and contained artefacts such as insufficient mucosal cleaning and inadequate esophageal expansion. We then applied three methods to improve the algorithm's robustness to data heterogeneity: inclusion of more diverse training data, domain-specific pretraining and architectural optimization.

Results: The computer aided detection system, when trained exclusively on high-quality data, achieved area under the curve (AUC), sensitivity and specificity scores of 83%, 85% and 67% on the high quality test set. AUC and sensitivity were significantly lower with 80% (p < 0.001) and 62% (p = 0.002) on the moderate-quality and 71% (p > 0.001) and 47% (p = 0.002) on the low-quality test set. Incorporating robustness-enhancing strategies significantly improved the AUC, sensitivity and specificity to 92% (p = 0.004), 88% (p = 0.84) and 81% (p = 0.003) on the high-quality test set, 93% (p = 0.006), 86% (p = 0.01) and 83% (p = 0.09) on the moderate-quality test set and 84% (p = 0.001), 78% (p = 0.002) and 77% (p = 0.23) on the low-quality test set.

Conclusion: Endoscopic deep learning systems trained solely on high-quality images may not perform well when exposed to heterogeneous imagery, as found in routine practice. Robustness-enhancing training strategies can increase the likelihood of successful clinical implementation.

Keywords: Barrett's esophagus; artificial intelligence; computer aided detection; deep learning systems; endoscopy; esophageal adenocarcinoma.

PubMed Disclaimer

Conflict of interest statement

JJB reports financial support for IRB approved research from C2Therapeutics, Pentax Medical, Medtronic, Olympus and Aqua Medical. PHW received financial support for IRB approved research from Olympus.

Figures

FIGURE 1
FIGURE 1
Example cases displaying the consequences of minimal image quality variation. The CADe system suffers from significant performance loss when confronted with lower quality images of the same patient.
FIGURE 2
FIGURE 2
Comparison of a conventionally trained CADe system with a robust CADe system across three test sets comprising the complete spectrum of image quality.
FIGURE 3
FIGURE 3
Representative neoplasia cases from three different quality test sets: high‐quality (left), moderate‐quality (center), and low‐quality (right).
FIGURE 4
FIGURE 4
Results of the conventionally trained CADe system versus the robust CADe system. Dashed bars represent the scores on the high‐quality test set.

References

    1. Khunte M., Chae A., Wang R., et al., “Trends in Clinical Validation and Usage of US Food and Drug Administration‐Cleared Artificial Intelligence Algorithms for Medical Imaging,” Clinical Radiology 78, no. 2 (2023): 123–129, 10.1016/j.crad.2022.09.122. - DOI - PubMed
    1. Visaggi P., Barberio B., Gregori D., et al., “Systematic Review With Meta‐Analysis: Artificial Intelligence in the Diagnosis of Oesophageal Diseases,” Alimentary Pharmacology and Therapeutics 55, no. 5 (2022): 528–540. - PMC - PubMed
    1. Hassan C., Spadaccini M., Mori Y., et al., “Real‐Time Computer‐Aided Detection of Colorectal Neoplasia During Colonoscopy : A Systematic Review and Meta‐Analysis,” Annals of Internal Medicine 176, no. 9 (2023): 1209–1220, 10.7326/m22-3678. - DOI - PubMed
    1. US Food & Drug Administration. Artificial Intelligence and Machine Learning (AI/ML)‐Enabled Medical Devices [Internet], Silver Spring, MD: US Food & Drug Administration, accessed June 12, 2024, https://www.fda.gov/medical‐devices/software‐medical‐device‐samd/artific....
    1. Karsenti D., Tharsis G., Perrot B., et al., “Effect of Real‐Time Computer‐Aided Detection of Colorectal Adenoma in Routine Colonoscopy (COLO‐GENIUS): A Single‐Centre Randomised Controlled Trial,” Lancet Gastroenterology and Hepatology 8, no. 8 (2023): 726–734, 10.1016/s2468-1253(23)00104-8. - DOI - PubMed

Publication types

MeSH terms