Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 1;2(1):E37-E48.
doi: 10.1016/S2589-7500(19)30216-X. Epub 2019 Dec 5.

Development and validation of a risk prediction model to diagnose Barrett's oesophagus (MARK-BE): a case-control machine learning approach

Affiliations

Development and validation of a risk prediction model to diagnose Barrett's oesophagus (MARK-BE): a case-control machine learning approach

Avi Rosenfeld et al. Lancet Digit Health. .

Abstract

Background: Screening for Barrett's Oesophagus (BE) relies on endoscopy which is invasive and has a low yield. This study aimed to develop and externally validate a simple symptom and risk-factor questionnaire to screen for patients with BE.

Methods: Questionnaires from 1299 patients in the BEST2 case-controlled study were analysed: 880 had BE including 40 with invasive oesophageal adenocarcinoma (OAC) and 419 were controls. This was randomly split into a training cohort of 776 patients and an internal validation cohort of 523 patients. External validation included 398 patients from the BOOST case-controlled study: 198 with BE (23 with OAC) and 200 controls. Identification of independently important diagnostic features was undertaken using machine learning techniques information gain (IG) and correlation based feature selection (CFS). Multiple classification tools were assessed to create a multi-variable risk prediction model. Internal validation was followed by external validation in the independent dataset.

Findings: The BEST2 study included 40 features. Of these, 24 added IG but following CFS, only 8 demonstrated independent diagnostic value including age, gender, smoking, waist circumference, frequency of stomach pain, duration of heartburn and acid taste and taking of acid suppression medicines. Logistic regression offered the highest prediction quality with AUC (area under the receiver operator curve) of 0.87. In the internal validation set, AUC was 0.86. In the BOOST external validation set, AUC was 0.81.

Interpretation: The diagnostic model offers valid predictions of diagnosis of BE in patients with symptomatic gastroesophageal reflux, assisting in identifying who should go forward to invasive testing. Overweight men who have been taking stomach medicines for a long time may merit particular consideration for further testing. The risk prediction tool is quick and simple to administer but will need further calibration and validation in a prospective study in primary care.

Funding: Charles Wolfson Trust and Guts UK.

PubMed Disclaimer

Conflict of interest statement

Declarations of Interest The Cytosponge device was designed by RCF and her research team in between 2009 and 2010. Patents and a trademark were filed in 2010 by the Medical Research Council (MRC). The BEST2 study was designed in 2010 and the device was manufactured for the specific purpose of this study following a letter of no objection from the Medical Health Regulatory Agency. In 2013 the MRC licensed the technology to Covidien GI Solutions, now part of Medtronic Inc. They have had no influence in any way on the design, conduct or analysis of this study. RCF, is a named inventor on patents pertaining to the Cytosponge and related assays. She has not received any financial benefits to date. All other authors have no conflicts of interest to declare

Figures

Figure 1
Figure 1. Workflow Schema
The workflow is shown for filtering the data and model creation for both the entire dataset and the smaller case-control cohort analyses. The number of features remaining in the analysis at each stage is shown together with the area under the curve (AUC), sensitivity and specificity following logistic regression.
Figure 2
Figure 2. The discriminatory panels and analyses performed.
Panel A shows the 8 features selected by CFS for the BEST2 training set together with the direction associated with presence of BE. Panel B shows which features are found in the CFS model after the datasets are recreated to exclude any potential age, sex, race and symptom duration biases.
Figure 3
Figure 3. The Curse of Dimensionality
The model’s AUC (Y-axis) is compared to the number of features used in the model (X-axis) within the BEST2 training dataset. Increasing the number of features strengthens the model to a plateau point which is reached around 8 features. The model AUC remains unaffected as up to a total of 25 features are added.
Figure 4
Figure 4. Comparing the model’s AUC with different machine learning classification algorithms.
Five classification algorithms were used. Shown here are the machine learning models for the BEST2 training dataset with 13 features using Logistic Regression (LR), Decision Tree (DT), Naïve Bayes (NB), Support Vector Machine (SVM) and Random Forest (RF). Logistic Regression performed best and was therefore used for the rest of the analyses

Comment in

References

    1. Brown KF, Rumgay H, Dunlop C, et al. The fraction of cancer attributable to modifiable risk factors in England, Wales, Scotland, Northern Ireland, and the United Kingdom in 2015. Br J Cancer. 2018;118(8):1130–1141. doi: 10.1038/s41416-018-0029-6. - DOI - PMC - PubMed
    1. Lagergren J. Adenocarcinoma of oesophagus: What exactly is the size of the problem and who is at risk? Gut. 2005;54(SUPPL. 1):1–5. doi: 10.1136/gut.2004.041517. - DOI - PMC - PubMed
    1. Hvid-Jensen F, Pedersen L, Mohr Drewes A, et al. Incidence of adenocarcinoma among patients with Barrett’s esophagus. N Engl J Med. 2011;365(15):1375–1383. doi: 10.1056/NEJMoa1103042. - DOI - PubMed
    1. Ross-Innes CSCSCS, Debiram-Beecham I, Walker E, et al. Evaluation of a Minimally Invasive Cell Sampling Device Coupled with Assessment of Trefoil Factor 3 Expression for Diagnosing Barrett’s Esophagus: A Multi-Center Case–Control Study. PLoS Med. 2015;12(1):1–19. doi: 10.1371/journal.pmed.1001780. - DOI - PMC - PubMed
    1. Fitzgerald RC, di Pietro M, Ragunath K, et al. British Society of Gastroenterology guidelines on the diagnosis and management of Barrett’s oesophagus. Gut. 2014;63(1):7–42. doi: 10.1136/gutjnl-2013-305372. - DOI - PubMed

Publication types

LinkOut - more resources