. 2025 Aug;16(15):e70128.

doi: 10.1111/1759-7714.70128.

Machine Learning Model for Predicting Pathological Invasiveness of Pulmonary Ground-Glass Nodules Based on AI-Extracted Radiomic Features

Guozhen Yang¹, Yuanheng Huang¹, Huiguo Chen¹, Weibin Wu¹, Yonghui Wu¹, Kai Zhang¹, Xiaojun Li¹, Jiannan Xu¹, Jian Zhang¹

Affiliations

PMID: 40745923
PMCID: PMC12313823
DOI: 10.1111/1759-7714.70128

Machine Learning Model for Predicting Pathological Invasiveness of Pulmonary Ground-Glass Nodules Based on AI-Extracted Radiomic Features

Guozhen Yang et al. Thorac Cancer. 2025 Aug.

. 2025 Aug;16(15):e70128.

doi: 10.1111/1759-7714.70128.

Authors

Guozhen Yang¹, Yuanheng Huang¹, Huiguo Chen¹, Weibin Wu¹, Yonghui Wu¹, Kai Zhang¹, Xiaojun Li¹, Jiannan Xu¹, Jian Zhang¹

Affiliation

¹ Department of Cardiothoracic Surgery, Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China.

PMID: 40745923
PMCID: PMC12313823
DOI: 10.1111/1759-7714.70128

Abstract

Background: With the widespread adoption of low-dose CT screening, the detection of pulmonary ground-glass nodules (GGNs) has risen markedly, presenting diagnostic challenges in distinguishing preinvasive lesions from invasive adenocarcinomas (IAC). This study aimed to develop a machine learning (ML)-based model using artificial intelligence (AI)-extracted CT radiomic features to predict the invasiveness of GGNs.

Methods: A retrospective cohort of 285 patients (148 with preinvasive lesions, 137 with IAC) from the Lingnan Campus was divided into training and internal validation sets (8:2). An independent cohort of 210 patients (118 with preinvasive lesions, 92 with IAC) from the Tianhe Campus served as external validation. Nineteen radiomic features were extracted and filtered using Boruta and LASSO algorithms. Seven ML classifiers were evaluated using AUC-ROC, decision curve analysis (DCA), and SHAP interpretability.

Results: Median CT value, skewness, 3D long-axis diameter, and transverse diameter were ultimately selected for model construction. Among all classifiers, the Gradient Boosting Machine (GBM) model achieved the best performance (AUC = 0.965 training, 0.908 internal validation, and 0.965 external validation). It demonstrated strong accuracy (88.1%), specificity (80.7%), and F1 score (0.87) in the external validation cohort. The GBM model demonstrated superior net clinical benefit. SHAP analysis identified median CT value and skewness as the most influential predictors.

Conclusion: This study presents a simplified ML model using AI-extracted radiomic features, which has strong predictive performance and biological interpretability for preoperative risk stratification of GGNs. By leveraging median CT value, skewness, 3D long-axis diameter, and transverse diameter, the model enables accurate and noninvasive differentiation between IAC and indolent lesions, supporting precise surgical planning.

Keywords: artificial intelligence; invasiveness; pulmonary ground‐glass nodules; radiomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

**FIGURE 1**
The feature selection process. (A) Ranking of features for predicting invasiveness of pulmonary nodule by Boruta algorithm. The plot demonstrates boxplot of important attributes in color green, tentative attributes in yellow, non‐important attributes in red, and shadow attributes in blue box, respectively. The vertical axis lists the name of each variable, and the horizontal axis is the Z‐value. (B) Features determined by LASSO analysis (n = 4). (C) LASSO Coefficient distribution map‐LASSO coefficient distribution of all features. (D) Coefficients for the four key features in Lasso model.

**FIGURE 2**
ROC curves for predicting pathological invasiveness of pulmonary ground‐glass nodules using different machine learning algorithms. (A) ROC curves for training cohort. (B) ROC curves for internal validation cohort. (C) ROC curves for external validation cohort.

**FIGURE 3**
Decision curve analysis of the GBM model of the training cohort (A), internal validation cohort (B), and external validation cohort (C). The vertical axis is the net benefit after intervention; the horizontal axis is the threshold.

**FIGURE 4**
The mean SHAP values of features for the GBM model. The horizontal axis represents the average SHAP value, and the vertical axis represents the predictor in the GBM model.

**FIGURE 5**
Single feature SHAP dependency graph, the horizontal axis represents the value range of a single feature, the vertical axis represents the SHAP value of the feature, and the scattered points represent each sample.

See this image and copyright information in PMC

References

1. Liu Z., Liu X., and Ni L., “Analysis of Pulmonary Nodules Detected by Annual Low‐Dose Computed Tomography in the Elderly During a 10‐Year Follow‐Up,” Geriatrics & Gerontology International 22, no. 10 (2022): 865–869. - PubMed
1. Gould M. K., Tang T., Liu I. L., et al., “Recent Trends in the Identification of Incidental Pulmonary Nodules,” American Journal of Respiratory and Critical Care Medicine 192, no. 10 (2015): 1208–1214. - PubMed
1. Ye T., Deng L., Wang S., et al., “Lung Adenocarcinomas Manifesting as Radiological Part‐Solid Nodules Define a Special Clinical Subtype,” Journal of Thoracic Oncology 14, no. 4 (2019): 617–627. - PubMed
1. Tsao M. S., Nicholson A. G., Maleszewski J. J., Marx A., and Travis W. D., “Introduction to 2021 WHO Classification of Thoracic Tumors,” Journal of Thoracic Oncology 17, no. 1 (2022): e1–e4. - PubMed
1. Travis W. D., Brambilla E., Nicholson A. G., et al., “The 2015 World Health Organization Classification of Lung Tumors: Impact of Genetic, Clinical and Radiologic Advances Since the 2004 Classification,” Journal of Thoracic Oncology 10, no. 9 (2015): 1243–1260. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Consumer Health Information
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine Learning Model for Predicting Pathological Invasiveness of Pulmonary Ground-Glass Nodules Based on AI-Extracted Radiomic Features

Affiliation

Machine Learning Model for Predicting Pathological Invasiveness of Pulmonary Ground-Glass Nodules Based on AI-Extracted Radiomic Features

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Medical