Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 25;17(1):30.
doi: 10.3390/cancers17010030.

Predictive Mortality and Gastric Cancer Risk Using Clinical and Socio-Economic Data: A Nationwide Multicenter Cohort Study

Affiliations

Predictive Mortality and Gastric Cancer Risk Using Clinical and Socio-Economic Data: A Nationwide Multicenter Cohort Study

Seong Uk Kang et al. Cancers (Basel). .

Abstract

Background/Objectives: Gastric cancer is a leading cause of cancer-related mortality, particularly in East Asia, with a notable burden in Republic of Korea. This study aimed to construct and develop machine learning models for the prediction of gastric cancer mortality and the identification of risk factors. Methods: All data were acquired from the Korean Clinical Data Utilization for Research Excellence by multiple medical centers in South Korea. A total of 23,717 gastric cancer patients were divided into two groups by cause of mortality (all-cause of 2664 and disease-specific of 1620) and investigated. We used comprehensive data integrating clinical, pathological, lifestyle, and socio-economic factors. Cox proportional hazards analysis was conducted to estimate hazard ratios for mortality. Five machine learning models (random forest, gradient boosting machine, XGBoost, light GBM, and cat boosting) were developed to predict mortality. The models were interpreted by SHAP, one of the explainable AI techniques. Results: For all-cause mortality, the gradient-boosting machine learning model demonstrated the highest performance with an AUC-ROC of 0.795. For disease-specific mortality, the light GBM model outperformed others, achieving an AUC-ROC of 0.867. Significant predictors included the AJCC7 stage, tumor size, lymph node count, and lifestyle factors such as smoking, drinking, and diabetes. Conclusions: This study underscores the importance of integrating both clinical and lifestyle data to enhance mortality prediction accuracy in gastric cancer patients. The findings highlight the need for personalized treatment approaches in the Korean population and emphasize the role of demographic-specific data in predictive modeling.

Keywords: cohort study; gastric cancer; lifestyle factors; machine learning; mortality.

PubMed Disclaimer

Conflict of interest statement

The author Sang Won Park was employed by the company Weknew Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
The scheme of study flow.
Figure 2
Figure 2
The scheme of patient selection. Total of 23,717 patients diagnosed for the first time between 2002 and 2019. All investigated 6907 patients were divided into two groups for analysis after 1:3 propensity score matching. Group 1: all-cause mortality, and Group 2: disease-specific mortality.
Figure 3
Figure 3
AUROC of model performance. (A) All-cause mortality shows best performance in GBM (AUC of 0.795), followed by Light GBM (AUC = 0.787), while the Random Forest (RF) model has the lowest performance (AUC = 0.728). (B) Disease-specific mortality shows best performance in LGB (AUC of 0.867), followed by GBM (AUC of 0.830) and XGB (AUC of 0.803). The RF model shows the lowest performance AUC of 0.771. Abbreviation: AUROC, Area Under the Receiver Operating Characteristic Curve; RF, Random Forest; GBM, Gradient Boosting Machine; XGBoost, Extreme Gradient Boosting; Light GBM, Light Gradient Boosting Machine; CAT Boost, Categorical Boosting.
Figure 4
Figure 4
Variables interpretation by SHAP. (A) Gradient boosting machine learning model for all-cause mortality. The plot illustrates the contribution of features that affect the prediction of mortality, such as tumor size, AJCC7 stage, CEA, smoking, etc. (B) Light gradient boosting machine learning model for disease-specific mortality. The plot illustrates the contribution of features that affect the prediction of mortality, such as AJCC7 stage, lymph node counts, tumor size, CEA, CA19-9, smoking, etc. Abbreviation: BMI, body mass index; CEA, carcinoembryonic antigen; CA19-9, CA 19-9 antigen; AJCC7 STAGE, AJCC Cancer Staging (7th edition); GRADE, Tumor Grade.

Similar articles

Cited by

References

    1. Bray F., Laversanne M., Sung H., Ferlay J., Siegel R.L., Soerjomataram I., Jemal A. Global Cancer Statistics 2022: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA A Cancer J. Clin. 2024;74:229–263. doi: 10.3322/caac.21834. - DOI - PubMed
    1. Sung H., Ferlay J., Siegel R.L., Laversanne M., Soerjomataram I., Jemal A., Bray F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA A Cancer J. Clin. 2021;71:209–249. doi: 10.3322/caac.21660. - DOI - PubMed
    1. Wong M.C.S., Huang J., Chan P.S.F., Choi P., Lao X.Q., Chan S.M., Teoh A., Liang P. Global Incidence and Mortality of Gastric Cancer, 1980–2018. JAMA Netw. Open. 2021;4:e2118457. doi: 10.1001/jamanetworkopen.2021.18457. - DOI - PMC - PubMed
    1. Grad C., Grad S., Fărcaș R.A., Popa S., Dumitrașcu D.L. Changing Trends in the Epidemiology of Gastric Cancer. Med. Pharm. Rep. 2022;96:229–234. doi: 10.15386/mpr-2538. - DOI - PMC - PubMed
    1. Thrift A.P., El-Serag H.B. Burden of Gastric Cancer. Clin. Gastroenterol. Hepatol. 2020;18:534–542. doi: 10.1016/j.cgh.2019.07.045. - DOI - PMC - PubMed

LinkOut - more resources