Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 30;14(6):2011-2030.
doi: 10.21037/tlcr-2025-152. Epub 2025 Jun 26.

An explainable AI approach to surgical and radiotherapy interventions for optimized treatment decision-making in early-stage non-small cell lung cancer

Affiliations

An explainable AI approach to surgical and radiotherapy interventions for optimized treatment decision-making in early-stage non-small cell lung cancer

Qunzhe Ding et al. Transl Lung Cancer Res. .

Abstract

Background: For individual patients with early-stage non-small cell lung cancer (NSCLC), robust evidence to guide treatment selection between surgery and stereotactic body radiotherapy (SBRT) remains limited. This study aimed to develop machine learning-driven predictive models using the Surveillance, Epidemiology, and End Results (SEER) database to evaluate the efficacy of these treatments, thereby providing a data-driven foundation for personalized treatment decisions.

Methods: Stage I or IIA NSCLC patients diagnosed between 2012 and 2018 were identified from the SEER database. Six machine learning models, spanning from classical to advanced approaches, were employed to predict 1-, 3-, and 5-year survival, with their performance assessed using seven metrics. The SHAP (SHapley Additive exPlanations) interpretability method was employed to explain the optimal predictive model, focusing on analyzing the differences between surgical and radiotherapy treatments under various factors, providing valuable insights to optimizing treatment strategies. Patients diagnosed between 2019 and 2021 were selected as an external validation cohort to assess the generalizability and robustness of the previously developed models.

Results: A total of 26,566 patients were included in the training and internal testing cohort of the study. LightGBM (light gradient boosting machine) outperformed other models across most metrics for survival predictions. The SHAP interpretability analysis revealed that tumor location, tumor size, pathology, and treatment type were significant factors for 3- and 5-year predictions. Furthermore, at 3- and 5-year intervals, the efficacy of radiotherapy was comparable to surgery for left upper lobe tumors, while radiotherapy appeared slightly inferior to surgery for right lower lobe tumors. Meanwhile, for tumors <1.5 cm or 3.5-5 cm, lobectomy exhibited the best efficacy, while for tumors measuring 1.5-3.5 cm, the efficacy of lobectomy seemed to be slightly inferior to radiotherapy and sublobar resection. For adenocarcinoma and squamous cell carcinoma, radiotherapy and lobectomy could be regarded as the preferred treatment methods, respectively. Besides, for patients <45 or >75 years old, sublobar resection showed the best efficacy at the 5-year interval. The external validation cohort of 11,927 patients further confirmed the effectiveness of the models in predicting 1-, 3-, and 5-year survival outcomes, reinforcing their reliability and applicability in clinical decision-making.

Conclusions: This study provides valuable insights into treatment decision-making for stages I and IIA NSCLC. The LightGBM model is a reliable tool for survival prediction for early-stage NSCLC. By utilizing this model, it can be concluded that tumor location, tumor size, pathological type and age are vital factors significantly influencing the choice of treatment methods.

Keywords: Non-small cell lung cancer (NSCLC); SHAP interpretability; machine learning; stereotactic body radiotherapy (SBRT); surgery.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-152/coif). The authors have no conflicts of interest to declare.

Figures

Figure 1
Figure 1
Flow chart of the research process. CatBoost, categorical boosting; GBM, gradient boosting machine; LightGBM, light gradient boosting machine; LR, logistic regression; NSCLC, non-small cell lung cancer; RF, random forest; SEER, Surveillance; SHAP, SHapley Additive exPlanations; XGBoost, extreme gradient boosting.
Figure 2
Figure 2
Comparison of AUCs (A,D,G), P-R curves (B,E,H), and calibration performance (C,F,I) in the internal cohort across six prediction models. AUC, area under the curve; CatBoost, categorical boosting; GBM, gradient boosting machine; LightGBM, light gradient boosting machine; LR, logistic regression; XGBoost, extreme gradient boosting; P-R, precision-recall.
Figure 3
Figure 3
SHAP-based global interpretation of the LightGBM model. (A,C,E) Feature importance plots for 1-year, 3-year, and 5-year survival predictions, respectively, highlighting the most influential factors based on SHAP values. (B,D,F) SHAP summary plots corresponding to 1-year, 3-year, and 5-year survival predictions, illustrating the impact of individual features on model output. LightGBM, light gradient boosting machine; SHAP, SHapley Additive exPlanations.
Figure 4
Figure 4
SHAP-based impact of treatment type on survival predictions. (A) SHAP value distribution for different treatment types in 1-year survival prediction. (B) SHAP value distribution for different treatment types in 3-year survival prediction. (C) SHAP value distribution for different treatment types in 5-year survival prediction. SHAP, SHapley Additive exPlanations.
Figure 5
Figure 5
SHAP interaction plot for tumor location and treatment type. (A,C,E) SHAP value distributions illustrating the interaction between tumor location and treatment type for 1-year, 3-year, and 5-year survival predictions, respectively. (B,D,F) Corresponding box plots showing the SHAP value variations of different treatment types across various tumor locations for 1-year, 3-year, and 5-year survival predictions. SHAP, SHapley Additive exPlanations.
Figure 6
Figure 6
SHAP interaction plot for tumor size and treatment type. (A) SHAP value distribution illustrating the interaction between tumor size and treatment type in 1-year survival prediction. (B) SHAP value distribution illustrating the interaction between tumor size and treatment type in 3-year survival prediction. (C) SHAP value distribution illustrating the interaction between tumor size and treatment type in 5-year survival prediction. SHAP, SHapley Additive exPlanations.
Figure 7
Figure 7
SHAP interaction plot for pathology and treatment type. (A,C,E) SHAP value distributions illustrating the interaction between pathology and treatment type for 1-year, 3-year, and 5-year survival predictions, respectively. (B,D,F) Corresponding box plots showing the SHAP value variations of different treatment types across various pathology for 1-year, 3-year, and 5-year survival predictions. SHAP, SHapley Additive exPlanations.
Figure 8
Figure 8
SHAP interaction plot for Age and treatment type. (A) SHAP value distribution illustrating the interaction between age and treatment type in 1-year survival prediction. (B) SHAP value distribution illustrating the interaction between age and treatment type in 3-year survival prediction. (C) SHAP value distribution illustrating the interaction between age and treatment type in 5-year survival prediction. SHAP, SHapley Additive exPlanations.
Figure 9
Figure 9
Representative SHAP force plots for individual model predictions. Representative SHAP force plots illustrating individual model predictions. (A-D) Four randomly selected patients from the test set, demonstrating the contribution of each feature to the predicted outcome. The base value represents the mean prediction across all samples. Feature values are displayed at the bottom of each plot, where red indicates a positive contribution to the prediction and blue signifies a negative impact. SHAP, SHapley Additive exPlanations.
Figure 10
Figure 10
Comparison of AUCs (A,D,G), P-R curves (B,E,H), and calibration performance (C,F,I) in the external validation cohort across six prediction models. AUC, area under the curve; CatBoost, categorical boosting; GBM, gradient boosting machine; LightGBM, light gradient boosting machine; LR, logistic regression; XGBoost, extreme gradient boosting; P-R, precision-recall.

Similar articles

References

    1. Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. 10.3322/caac.21834 - DOI - PubMed
    1. Jakobsen E, Olsen KE, Bliddal M, et al. Forecasting lung cancer incidence, mortality, and prevalence to year 2030. BMC Cancer 2021;21:985. 10.1186/s12885-021-08696-6 - DOI - PMC - PubMed
    1. Goldstraw P, Chansky K, Crowley J, et al. The IASLC Lung Cancer Staging Project: Proposals for Revision of the TNM Stage Groupings in the Forthcoming (Eighth) Edition of the TNM Classification for Lung Cancer. J Thorac Oncol 2016;11:39-51. 10.1016/j.jtho.2015.09.009 - DOI - PubMed
    1. Ettinger DS, Wood DE, Aisner DL, et al. NCCN Guidelines Insights: Non-Small Cell Lung Cancer, Version 2.2021. J Natl Compr Canc Netw 2021;19:254-66. 10.6004/jnccn.2021.0013 - DOI - PubMed
    1. Henschke CI, Yip R, Sun Q, et al. Prospective Cohort Study to Compare Long-Term Lung Cancer-Specific and All-Cause Survival of Clinical Early Stage (T1a-b; ≤20 mm) NSCLC Treated by Stereotactic Body Radiation Therapy and Surgery. J Thorac Oncol 2024;19:476-90. 10.1016/j.jtho.2023.10.002 - DOI - PubMed

LinkOut - more resources