Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec;165(6):1420-1429.e10.
doi: 10.1053/j.gastro.2023.08.011. Epub 2023 Aug 18.

Predicting Incident Adenocarcinoma of the Esophagus or Gastric Cardia Using Machine Learning of Electronic Health Records

Affiliations

Predicting Incident Adenocarcinoma of the Esophagus or Gastric Cardia Using Machine Learning of Electronic Health Records

Joel H Rubenstein et al. Gastroenterology. 2023 Dec.

Abstract

Background & aims: Tools that can automatically predict incident esophageal adenocarcinoma (EAC) and gastric cardia adenocarcinoma (GCA) using electronic health records to guide screening decisions are needed.

Methods: The Veterans Health Administration (VHA) Corporate Data Warehouse was accessed to identify Veterans with 1 or more encounters between 2005 and 2018. Patients diagnosed with EAC (n = 8430) or GCA (n = 2965) were identified in the VHA Central Cancer Registry and compared with 10,256,887 controls. Predictors included demographic characteristics, prescriptions, laboratory results, and diagnoses between 1 and 5 years before the index date. The Kettles Esophageal and Cardia Adenocarcinoma predictioN (K-ECAN) tool was developed and internally validated using simple random sampling imputation and extreme gradient boosting, a machine learning method. Training was performed in 50% of the data, preliminary validation in 25% of the data, and final testing in 25% of the data.

Results: K-ECAN was well-calibrated and had better discrimination (area under the receiver operating characteristic curve [AuROC], 0.77) than previously validated models, such as the Nord-Trøndelag Health Study (AuROC, 0.68) and Kunzmann model (AuROC, 0.64), or published guidelines. Using only data from between 3 and 5 years before index diminished its accuracy slightly (AuROC, 0.75). Undersampling men to simulate a non-VHA population, AUCs of the Nord-Trøndelag Health Study and Kunzmann model improved, but K-ECAN was still the most accurate (AuROC, 0.85). Although gastroesophageal reflux disease was strongly associated with EAC, it contributed only a small proportion of gain in information for prediction.

Conclusions: K-ECAN is a novel, internally validated tool predicting incident EAC and GCA using electronic health records data. Further work is needed to validate K-ECAN outside VHA and to assess how best to implement it within electronic health records.

Keywords: Electronic Health Records; Esophageal Neoplasms; Gastroesophageal Reflux Disease; Mass Screening; Stomach Neoplasms.

PubMed Disclaimer

Conflict of interest statement

Potential Conflicts of Interest:

JHR has received research support from Lucid Diagnostics. LPW is a consultant for Gilead Sciences. None of the other authors have any potential conflicts of interest.

Figures

Figure 1.
Figure 1.
Flow Diagram of Identification of Cases and Controls
Figure 2.
Figure 2.. Comparison of K-ECAN to Guidelines and Validated Models.
Sensitivity and specificity of each guideline, and area under the receiver operating characteristic curve is displayed for each model with 95% confidence intervals in brackets. All analyses use the 25% left out final testing dataset. ACG: American College of Gastroenterology, ACP: American College of Physicians, AGA: American Gastroenterological Association, ASGE: American Society for Gastrointestinal Endoscopy, BSG: British Society of Gastroenterology, ESGE: European Society for Gastrointestinal Endoscopy, HUNT: Nord-Trøndelag Health Study
Figure 3.
Figure 3.. Receiver Operating Characteristic Curves Stratified by Sex (A) and Race (B)
Area under the receiver operating characteristic curve is displayed for each model or stratum, with 95% confidence intervals in brackets. Analysis use the 25% left out final testing dataset. H/PI: Hawaiian or Pacific Islander, NA/AN: Native American or Alaskan Native
Figure 4.
Figure 4.. Calibration Plot
The x-axis is the predicted cumulative incidence of EAC or GCA per 100,000 individuals in the testing set over the 14 years of ascertainment. The y-axis is the observed cumulative incidence per 100,000 individuals. Each dot represents 2% of the testing set (51,398 individuals). This demonstrates that the observed risk is very close to the predicted risk across the range of predicted risks.
Figure 5.
Figure 5.. Analysis Simulating a Non-Veteran Population
Since 92% of patients were male, male sex is an important risk factor for EAC and GCA, and HUNT and Kunzmann were previously developed in a general populations with nearly equal female representation, a test set was designed that more closely represents a general population. All female case and controls were utilized and the 25% final testing dataset was randomly down-sampled to include an equal number of male controls as female controls, and male cases were randomly down-sampled to result in the expected odds ratio of male sex for EAC. This resulted in 215 cases and 429,144 controls. Sensitivity and specificity of each guideline, and area under the receiver operating characteristic curve is displayed for each model with 95% confidence intervals in brackets. ACG: American College of Gastroenterology, ACP: American College of Physicians, AGA: American Gastroenterological Association, ASGE: American Society for Gastrointestinal Endoscopy, BSG: British Society of Gastroenterology, ESGE: European Society for Gastrointestinal Endoscopy, HUNT: Nord-Trøndelag Health Study
Figure 6.
Figure 6.. Rankings of Variable Importance in K-ECAN
Aside from demographics and ICD diagnoses, features were constructed for each variable collected longitudinally for each subject (e.g., mean, maximum, minimum, maximum difference, variance) to be utilized by the machine learning. In this analysis, all the features of each variable were grouped together (e.g., all the features of serum potassium, K) and ranked in terms of the mean Shapley Additive Explanations (SHAP, a measure of the magnitude of association of each variable, panel A) and in terms of the proportion of gain in information in K-ECAN attributed to each group of features (panel B). Both rankings were led by age. GERD diagnosis was ranked highly in terms of strength of association, but less so in terms of how much it contributed information to the model. AlkPhos: alkaline phosphatase, BMI: body mass index, BUN: blood urea nitrogen, CHF: congestive heart failure, COPD: chronic obstructive pulmonary disease, CO2: serum bicarbonate, CRP: c-reactive protein, CTD: connective tissue disease, DM: diabetes mellitus, GERD: gastroesophageal reflux disease, HDL: high density lipoprotein, Hgb: hemoglobin, HIV: human immunodeficiency virus, H2R: histamine type 2 receptor antagonist, LDL: low density lipoprotein, MCH: mean corpuscular hemoglobin, MCHC: mean corpuscular hemoglobin concentration, MCV: mean corpuscular volume, MPV: mean platelet volume, PPI: proton pump inhibitor, PUD: peptic ulcer disease, PVD: peripheral vascular disease, WBC: white blood cell count

References

    1. Hur C, Miller M, Kong CY, et al. Trends in esophageal adenocarcinoma incidence and mortality. Cancer 2013;119:1149–58. - PMC - PubMed
    1. Spechler SJ, Sharma P, Souza RF, et al. American Gastroenterological Association medical position statement on the management of Barrett's esophagus. Gastroenterology 2011;140:1084–91. - PubMed
    1. Fitzgerald RC, di Pietro M, Ragunath K, et al. British Society of Gastroenterology guidelines on the diagnosis and management of Barrett's oesophagus. Gut 2014;63:7–42. - PubMed
    1. Qumseya B, Sultan S, Bain P, et al. ASGE guideline on screening and surveillance of Barrett’s esophagus. Gastrointestinal Endoscopy 2019;90:335–359.e2. - PubMed
    1. Shaheen NJ, Falk GW, Iyer PG, et al. Diagnosis and Management of Barrett's Esophagus: An Updated ACG Guideline. Am J Gastroenterol 2022;117:559–587. - PMC - PubMed

Publication types

Supplementary concepts