A Machine Learning Risk Prediction Model for Gastric Cancer with SHapley Additive exPlanations
- PMID: 39701090
- PMCID: PMC12263238
- DOI: 10.4143/crt.2024.843
A Machine Learning Risk Prediction Model for Gastric Cancer with SHapley Additive exPlanations
Abstract
Purpose: Gastric cancer (GC) prediction models hold potential for enhancing early detection by enabling the identification of high-risk individuals, facilitating personalized risk-based screening, and optimizing the allocation of healthcare resources.
Materials and methods: In this study, we developed a machine learning-based GC prediction model utilizing data from the Korean National Health Insurance Service, encompassing 10,515,949 adults who had not been diagnosed with GC and underwent GC screening during 2013-2014, with a follow-up period of 5 years. The cohort was divided into training and test datasets at an 8:2 ratio, and class imbalance was mitigated through random oversampling.
Results: Among various models, logistic regression demonstrated the highest predictive performance, with an area under the receiver operating characteristic curve (AUC) of 0.708, which was consistent with the AUC obtained in external validation (0.669). Importantly, the outcomes were robust to missing data imputation and variable selection. The SHapley Additive exPlanations (SHAP) algorithm enhanced the explainability of the model, identifying advancing age, being male, Helicobacter pylori infection, current smoking, and a family history of GC as key predictors of elevated risk.
Conclusion: This predictive model could significantly contribute to the early identification of individuals at elevated risk for GC, thereby enabling the implementation of targeted preventive strategies. Furthermore, the integration of noninvasive and cost-effective predictors enhances the clinical utility of the model, supporting its potential application in routine healthcare settings.
Keywords: Machine learning; Prediction model; SHapley Additive exPlanations; Stomach neoplasms.
Conflict of interest statement
Conflict of interest relevant to this article was not reported.
Figures



Similar articles
-
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23. Clin Orthop Relat Res. 2024. PMID: 39051924
-
Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733. J Med Internet Res. 2025. PMID: 40418571 Free PMC article.
-
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12. Clin Orthop Relat Res. 2024. PMID: 37306629 Free PMC article.
-
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3. Cochrane Database Syst Rev. 2022. PMID: 35593186 Free PMC article.
-
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340. Health Technol Assess. 2006. PMID: 16959170
Cited by
-
Development and validation of a prediction model for myelosuppression in lung cancer patients after platinum-based doublet chemotherapy: a multifactorial analysis approach.Am J Cancer Res. 2025 Feb 15;15(2):470-486. doi: 10.62347/TFUC2568. eCollection 2025. Am J Cancer Res. 2025. PMID: 40084374 Free PMC article.
-
Development and Validation of the Early Gastric Carcinoma Prediction Model in Post-Eradication Patients with Intestinal Metaplasia.Cancers (Basel). 2025 Jun 26;17(13):2158. doi: 10.3390/cancers17132158. Cancers (Basel). 2025. PMID: 40647458 Free PMC article.
References
-
- Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71:209–49. - PubMed
-
- National Cancer Institute Stomach cancer survival rates and prognosis [Internet] National Cancer Institute; 2023 [cited 2024 Aug 10]. Available from: https://www.cancer.gov/types/stomach/survival.
-
- Miyamoto A, Kuriyama S, Nishino Y, Tsubono Y, Nakaya N, Ohmori K, et al. Lower risk of death from gastric cancer among participants of gastric cancer screening in Japan: a population-based cohort study. Prev Med. 2007;44:12–9. - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical
Miscellaneous