Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 1;14(10):e00637.
doi: 10.14309/ctg.0000000000000637.

Development of Electronic Health Record-Based Machine Learning Models to Predict Barrett's Esophagus and Esophageal Adenocarcinoma Risk

Affiliations

Development of Electronic Health Record-Based Machine Learning Models to Predict Barrett's Esophagus and Esophageal Adenocarcinoma Risk

Prasad G Iyer et al. Clin Transl Gastroenterol. .

Abstract

Introduction: Screening for Barrett's esophagus (BE) is suggested in those with risk factors, but remains underutilized. BE/esophageal adenocarcinoma (EAC) risk prediction tools integrating multiple risk factors have been described. However, accuracy remains modest (area under the receiver-operating curve [AUROC] ≤0.7), and clinical implementation has been challenging. We aimed to develop machine learning (ML) BE/EAC risk prediction models from an electronic health record (EHR) database.

Methods: The Clinical Data Analytics Platform, a deidentified EHR database of 6 million Mayo Clinic patients, was used to predict BE and EAC risk. BE and EAC cases and controls were identified using International Classification of Diseases codes and augmented curation (natural language processing) techniques applied to clinical, endoscopy, laboratory, and pathology notes. Cases were propensity score matched to 5 independent randomly selected control groups. An ensemble transformer-based ML model architecture was used to develop predictive models.

Results: We identified 8,476 BE cases, 1,539 EAC cases, and 252,276 controls. The BE ML transformer model had an overall sensitivity, specificity, and AUROC of 76%, 76%, and 0.84, respectively. The EAC ML transformer model had an overall sensitivity, specificity, and AUROC of 84%, 70%, and 0.84, respectively. Predictors of BE and EAC included conventional risk factors and additional novel factors, such as coronary artery disease, serum triglycerides, and electrolytes.

Discussion: ML models developed on an EHR database can predict incident BE and EAC risk with improved accuracy compared with conventional risk factor-based risk scores. Such a model may enable effective implementation of a minimally invasive screening technology.

PubMed Disclaimer

Conflict of interest statement

Guarantor of the article: Prasad G. Iyer, MD, MSc, FACG.

Specific author contributions: P.G.I.: concept, obtaining funding, writing the initial draft, and revisions. K.S.: data collection. C.L., D.C.C., H.A., K.A., and J.B.K.: editing the manuscript. S. Asfahan, S. Awasthi, P.A., P.K.M., S.P.S., S.S., S.B., C.M., and S.Y.: model development. N.S. and C.P.: model development and editing the manuscript.

Financial support: Supported in part by a NIH grant (NCI R01CA241164), the Mayo Foundation, and the Freeman Foundation.

Potential competing interests: P.G.I.: research funding: Exact Sciences, Pentax Medical, CDx Medical, and Castle Biosciences; consultant: Exact Sciences, Pentax Medical, CDx Medical, Castle Biosciences, Ambu, and Symple Surgical. C.L.: consultant Verily Life Sciences. J.B.K.: research funding and intellectual property, Exact Sciences. The remaining authors have no disclosures.

IRB approval: Approved by the Mayo Clinic IRB.

Figures

Figure 1.
Figure 1.
Process of identifying case (BE/EAC) and control cohorts from the Clinical Data Analytics Platform. BE, Barrett's esophagus; BERT, Bidirectional Encoder Representations from Transformer; EAC, esophageal adenocarcinoma.
Figure 2.
Figure 2.
Time line of inclusion of data for patients with BE or EAC included in model development. Anchor date was the date of diagnosis for a patient with BE or EAC. Lead time refers to a period of 1 year before the anchor date. Events in the lead time period were not used to train the model. Observation time is the period 5 years preceding the lead date. All events in the observation period were used to train the model. BE, Barrett's esophagus; EAC, esophageal adenocarcinoma.
Figure 3.
Figure 3.
Description of case and control cohort utilization in model development. (a) Ensemble model development and architecture. Five independent control cohorts were created. Five control patients were matched to each patient with BE and 10 control patients matched to each patient with EAC. Five transformer models were developed by pairing the BE and EAC case cohort with 5 independent control cohorts. These 5 transformer models were then integrated into a single ensemble model using logistic regression. (b) Schematic showing the layers of the transformer model used to build the BE and EAC machine learning predictive models. BE, Barrett's esophagus; EAC, esophageal adenocarcinoma.
Figure 4.
Figure 4.
Distribution of CDAP data used for model development and testing. CDAP, Clinical Data Analytics Platform.
Figure 5.
Figure 5.
Sequential identification of BE and EAC cases from the CDAP, with the application of prespecified data sufficiency, inclusion criteria, and exclusion criteria. BE, Barrett's esophagus; CDAP, Clinical Data Analytics Platform; EAC, esophageal adenocarcinoma

References

    1. Curtius K, Rubenstein JH, Chak A, et al. Computational modelling suggests that Barrett's oesophagus may be the precursor of all oesophageal adenocarcinomas. Gut 2020;70(8):1435–40. - PMC - PubMed
    1. Shaheen NJ, Falk GW, Iyer PG, et al. Diagnosis and management of Barrett's esophagus: An updated ACG guideline. Am J Gastroenterol 2022;117(4):559–87. - PMC - PubMed
    1. Muthusamy VR, Wani S, Gyawali CP, et al. AGA clinical practice update on new technology and innovation for surveillance and screening in Barrett's esophagus: Expert review. Clin Gastroenterol Hepatol 2022;20(12):2696–706.e1. - PMC - PubMed
    1. Asge Standards Of Practice C, Qumseya B, Sultan S, et al. ASGE guideline on screening and surveillance of Barrett's esophagus. Gastrointest Endosc 2019;90(3):335–59.e2. - PubMed
    1. Sami SS, Moriarty JP, Rosedahl JK, et al. Comparative cost effectiveness of reflux-based and reflux-independent strategies for Barrett's esophagus screening. Am J Gastroenterol 2021;116(8):1620–31. - PMC - PubMed

Publication types

MeSH terms

Supplementary concepts