Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul;57(3):821-829.
doi: 10.4143/crt.2024.843. Epub 2024 Dec 16.

A Machine Learning Risk Prediction Model for Gastric Cancer with SHapley Additive exPlanations

Affiliations

A Machine Learning Risk Prediction Model for Gastric Cancer with SHapley Additive exPlanations

Bomi Park et al. Cancer Res Treat. 2025 Jul.

Abstract

Purpose: Gastric cancer (GC) prediction models hold potential for enhancing early detection by enabling the identification of high-risk individuals, facilitating personalized risk-based screening, and optimizing the allocation of healthcare resources.

Materials and methods: In this study, we developed a machine learning-based GC prediction model utilizing data from the Korean National Health Insurance Service, encompassing 10,515,949 adults who had not been diagnosed with GC and underwent GC screening during 2013-2014, with a follow-up period of 5 years. The cohort was divided into training and test datasets at an 8:2 ratio, and class imbalance was mitigated through random oversampling.

Results: Among various models, logistic regression demonstrated the highest predictive performance, with an area under the receiver operating characteristic curve (AUC) of 0.708, which was consistent with the AUC obtained in external validation (0.669). Importantly, the outcomes were robust to missing data imputation and variable selection. The SHapley Additive exPlanations (SHAP) algorithm enhanced the explainability of the model, identifying advancing age, being male, Helicobacter pylori infection, current smoking, and a family history of GC as key predictors of elevated risk.

Conclusion: This predictive model could significantly contribute to the early identification of individuals at elevated risk for GC, thereby enabling the implementation of targeted preventive strategies. Furthermore, the integration of noninvasive and cost-effective predictors enhances the clinical utility of the model, supporting its potential application in routine healthcare settings.

Keywords: Machine learning; Prediction model; SHapley Additive exPlanations; Stomach neoplasms.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest

Conflict of interest relevant to this article was not reported.

Figures

Fig. 1.
Fig. 1.
Flowchart of the study participants. BMI, body mass index.
Fig. 2.
Fig. 2.
AUROCs of the prediction models. AUROC, area under the receiver operating characteristic curve; DT, decision tree; LR, logistic regression; XGB, eXtreme Gradient Boosting.
Fig. 3.
Fig. 3.
Summary of SHapley Additive exPlanations (SHAP) values. (A) Each dot represents the impact of a feature on one subject. The dot’s color indicates the feature’s value, while its position on the x-axis indicates the SHAP value, reflecting the feature’s contribution to altering the model’s prediction for that individual. Features are plotted on the y-axis and organized in descending order based on mean SHAP values. Variables and coding for analysis: age group (1, 40-44; 2, 45-49; 3, 50-54; 4, 55-59; 5, 60-64; 6, 65-69; 7, 70-74), sex (1, male; 2, female), Helicobacter pylori infection (1, infection; 0, no infection), smoking status (1, nonsmoker; 2, former smoker; 3, current smoker), Family history of gastric cancer (1, yes; 0, no), body mass index (BMI; 1, < 23; 2, 23-24.9; 3, 25-29.9; 4, ≥ 30), alcohol consumption (0, no drinking; 1, ≤ 3 times/wk; 2, ≥ 4 times/wk), disease history or family history of disease (1, yes; 0, no). (B) Mean absolute SHAP values. The five most influential features are age, sex, H. pylori, smoking, and family history of gastric cancer (GC). CC, colorectal cancer; HTN, hypertension; Hx, history; LC, liver cancer; MI, myocardial Infarction.

Similar articles

Cited by

References

    1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71:209–49. - PubMed
    1. Kang MJ, Jung KW, Bang SH, Choi SH, Park EH, Yun EH, et al. Cancer statistics in Korea: incidence, mortality, survival, and prevalence in 2020. Cancer Res Treat. 2023;55:385–99. - PMC - PubMed
    1. National Cancer Institute Stomach cancer survival rates and prognosis [Internet] National Cancer Institute; 2023 [cited 2024 Aug 10]. Available from: https://www.cancer.gov/types/stomach/survival.
    1. Correa P. Gastric cancer: overview. Gastroenterol Clin North Am. 2013;42:211–7. - PMC - PubMed
    1. Miyamoto A, Kuriyama S, Nishino Y, Tsubono Y, Nakaya N, Ohmori K, et al. Lower risk of death from gastric cancer among participants of gastric cancer screening in Japan: a population-based cohort study. Prev Med. 2007;44:12–9. - PubMed

LinkOut - more resources