Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Nov 14;15(11):1753.
doi: 10.3390/life15111753.

Using Radiomics and Explainable Ensemble Learning to Predict Radiation Pneumonitis and Survival in NSCLC Patients Post-VMAT

Affiliations

Using Radiomics and Explainable Ensemble Learning to Predict Radiation Pneumonitis and Survival in NSCLC Patients Post-VMAT

Tsair-Fwu Lee et al. Life (Basel). .

Abstract

Purpose: This study aimed to develop a precise predictive model to assess the risk of radiation pneumonitis (RP) and three-year survival in patients with non-small cell lung cancer (NSCLC) following volumetric modulated arc therapy (VMAT). Radiomics features, ensemble stacking, and explainable artificial intelligence (XAI) were integrated to enhance predictive performance and clinical interpretability. Materials and Methods: A retrospective cohort of 221 NSCLC patients treated with VMAT at Kaohsiung Veterans General Hospital between 2013 and 2023 was analyzed, including 168 patients for RP prediction (47 with ≥grade 2 RP) and 118 patients for survival prediction (34 deaths). Clinical variables, dose-volume histogram (DVH) parameters, and radiomic features (original, Laplacian of Gaussian [LoG], and wavelet filtered) were extracted. ANOVA was used for initial feature reduction, followed by LASSO and Boruta-SHAP for feature selection, which formed 10 feature subsets. The data were divided at an 8:2 ratio into training and testing sets, with SMOTE balancing and 10-fold cross-validation for parameter optimization. Six models-logistic regression (LR), random forest (RF), support vector machine (SVM), k-nearest neighbors (KNN), XGBoost, and Ensemble Stacking-were evaluated in terms of the AUC, accuracy (ACC), negative predictive value (NPV), precision, and F1 score. SHAP analysis was applied to interpret feature contributions. Results: For RP prediction, the LASSO-selected radiomic subset (FR) combined with Ensemble Stacking achieved optimal performance (AUC 0.91, ACC 0.89), with SHAP identifying V40 Firstorder_Min as the most influential feature. For survival prediction, the FR subset yielded an AUC of 0.97, an ACC of 0.92, and an NPV of 1.00, with V10 Wavelet Firstorder_Min as the top contributor. The multimodal subset (FC+R) also performed strongly, achieving an AUC of 0.91 for RP and 0.96 for survival. Conclusions: This study demonstrated the superior performance of radiomics combined with Ensemble Stacking and XAI for the prediction of RP and survival following VMAT in patients with NSCLC. SHAP-based interpretation enhances transparency and clinical trust, offering a robust foundation for personalized radiotherapy and precision medicine.

Keywords: ensemble learning; explainable artificial intelligence; lung cancer; machine learning; radiation pneumonitis; radiomics; survival analysis; volumetric modulated arc therapy.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Overall study workflow segmented into five steps (Step 1–5) for improved readability. Step 1: Data acquisition (clinical, dosimetric, imaging); Step 2: Feature extraction (DVH, radiomics with filters); Step 3: Preprocessing (SMOTE oversampling); Step 4: Feature selection (ANOVA, Boruta-SHAP, LASSO); Step 5: Model construction, evaluation, and XAI (Ensemble Stacking, metrics, SHAP). The figure annotates the fold-wise preprocessing pipeline, indicating that ANOVA filtering, LASSO/Boruta-SHAP feature selection, and SMOTE oversampling were performed independently for each training fold during cross-validation, with the test set evaluated only once after model training. Abbreviations: VMAT, Volumetric Modulated Arc Therapy; BMI, Body Mass Index; RP, Radiation Pneumonitis; TN, Tumor Node Staging; DICOM, Digital Imaging and Communications in Medicine; RT, Radiation Therapy; CT, Computed Tomography; DVH, Dose-Volume Histogram; Gy, Gray; PTV, Planning Target Volume; ROI, Region of Interest; SMOTE, Synthesized Minority Oversampling Technique; O, Original image; W, Wavelet; LoG, Laplacian of Gaussian; ANOVA, Analysis of Variance; LASSO, Least Absolute Shrinkage and Selection Operator; LR, Logistic Regression; RF, Random Forest; SVM, Support Vector Machine; KNN, K-Nearest Neighbors; XGBoost, eXtreme Gradient Boosting; AUC, Area Under the ROC curve; ROC, Receiver Operating Characteristic; ACC, Accuracy; NPV, Negative Predictive Value; F1, F1-score; SHAP, SHapley Additive exPlanations.
Figure 2
Figure 2
Workflow of the ensemble-stacking model for classification. Abbreviations: LR, Logistic Regression; RF, Random Forest; KNN, K-Nearest Neighbors; SVM, Support Vector Machine; XGBoost, eXtreme Gradient Boosting.
Figure 3
Figure 3
Flowchart of machine learning model construction, training, validation, and testing. Abbreviation: SMOTE, Synthesized Minority Oversampling Technique.
Figure 4
Figure 4
SHAP beeswarm plot for RP prediction using the LASSO-selected radiomics subset (FR) in the Ensemble Stacking model. Each dot represents a feature contribution for a sample; red dots indicate positive SHAP values (higher risk of RP), blue dots indicate negative values (lower risk). Abbreviations: RP, Radiation Pneumonitis; VMAT, Volumetric Modulated Arc Therapy; NSCLC, Non-Small Cell Lung Cancer; LASSO, Least Absolute Shrinkage and Selection Operator; SHAP, SHapley Additive exPlanations; FR, Filtered Radiomics; R, Radiomics; F, Feature; LoG, Laplacian of Gaussian; LoG−Sigma−1−5 mm−3D, Laplacian of Gaussian filter with sigma 1.5 mm in 3D; W, Wavelet; Wavelet-HLH, Wavelet filter in HLH direction; V40, volume receiving ≥ 40 Gy; GLCM, Gray Level Co-occurrence Matrix; GLRLM, Gray Level Run Length Matrix; GLSZM, Gray Level Size Zone Matrix; GLDM, Gray Level Dependence Matrix; NGTDM, Neighboring Gray Tone Difference Matrix; Firstorder_Min, minimum voxel intensity (first-order statistics); Contrast, NGTDM contrast feature; ZonePercentage, GLSZM zone percentage feature; SAE, Small Area Emphasis; SRLGLE, Short Run Low Gray Level Emphasis; SDLGLE, Small Dependence Low Gray Level Emphasis; LDLGLE, Large Dependence Low Gray Level Emphasis; LRLGLE, Long Run Low Gray Level Emphasis; LAHGLE, Large Area High Gray Level Emphasis; LDHGLE, Large Dependence High Gray Level Emphasis; MP, Maximum Probability; RMS, Root Mean Squared.
Figure 5
Figure 5
SHAP beeswarm plot for survival prediction using the LASSO-selected radiomics subset (FR) in the Ensemble Stacking model. Each dot represents a feature contribution for a sample; red dots indicate positive SHAP values (higher risk of death), blue dots indicate negative values (better survival). Abbreviations: LASSO, Least Absolute Shrinkage and Selection Operator; FR, Filtered Radiomics; R, Radiomics; SHAP, SHapley Additive exPlanations; W, Wavelet; LoG, Laplacian of Gaussian; LoG-Sigma-1-0 mm-3D, Laplacian of Gaussian filter with sigma 1.0 mm in 3D; V10, volume receiving ≥10 Gy; Firstorder_Min, minimum voxel intensity (first-order statistics); Skewness, distribution skewness; GLCM, Gray Level Co-occurrence Matrix; Maximum Probability, GLCM-based feature; GLRLM, Gray Level Run Length Matrix; GLSZM, Gray Level Size Zone Matrix; GLDM, Gray Level Dependence Matrix; NGTDM, Neighboring Gray Tone Difference Matrix; SDLGLE, Small Dependence Low Gray Level Emphasis; SRHGLE, Short Run High Gray Level Emphasis; LAHGLE, Large Area High Gray Level Emphasis; DV, Dependence Variance; DE, Dependence Entropy; Max3DD, Maximum 3D Diameter; Max2DDS, Maximum 2D Diameter Slice; GLN, Gray Level Non-uniformity; IQR, Interquartile Range; LAL, Least Axis Length.

References

    1. Wang S., Zheng R., Li J., Zeng H., Li L., Chen R., Sun K., Han B., Bray F., Wei W. Global, regional, and national lifetime risks of developing and dying from gastrointestinal cancers in 185 countries: A population-based systematic analysis of GLOBOCAN. Lancet Gastroenterol. Hepatol. 2024;9:229–237. doi: 10.1016/S2468-1253(23)00366-7. - DOI - PMC - PubMed
    1. Zappa C., Mousa S.A. Non-small cell lung cancer: Current treatment and future advances. Transl. Lung Cancer Res. 2016;5:288. doi: 10.21037/tlcr.2016.06.07. - DOI - PMC - PubMed
    1. Fujiwara M., Doi H., Igeta M., Suzuki H., Kitajima K., Tanooka M., Ishida T., Wakayama T., Yokoi T., Kuribayashi K., et al. Radiation pneumonitis after volumetric modulated arc therapy for non-small cell lung cancer. Anticancer Res. 2021;41:5793–5802. doi: 10.21873/anticanres.15396. - DOI - PubMed
    1. Imano N., Kimura T., Kawahara D., Nishioka R., Fukumoto W., Kawano R., Kubo K., Katsuta T., Takeuchi Y., Nishibuchi I., et al. Potential benefits of volumetric modulated arc therapy to reduce the incidence of ≥grade 2 radiation pneumonitis in radiotherapy for locally advanced non-small cell lung cancer patients. Jpn. J. Clin. Oncol. 2021;51:1729–1735. doi: 10.1093/jjco/hyab163. - DOI - PubMed
    1. Wu K., Xu X., Li X., Wang J., Zhu L., Chen X., Wang B., Zhang M., Xia B., Ma S. Radiation pneumonitis in lung cancer treated with volumetric modulated arc therapy. J. Thorac. Dis. 2018;10:6531. doi: 10.21037/jtd.2018.11.132. - DOI - PMC - PubMed

LinkOut - more resources