Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug;16(15):e70128.
doi: 10.1111/1759-7714.70128.

Machine Learning Model for Predicting Pathological Invasiveness of Pulmonary Ground-Glass Nodules Based on AI-Extracted Radiomic Features

Affiliations

Machine Learning Model for Predicting Pathological Invasiveness of Pulmonary Ground-Glass Nodules Based on AI-Extracted Radiomic Features

Guozhen Yang et al. Thorac Cancer. 2025 Aug.

Abstract

Background: With the widespread adoption of low-dose CT screening, the detection of pulmonary ground-glass nodules (GGNs) has risen markedly, presenting diagnostic challenges in distinguishing preinvasive lesions from invasive adenocarcinomas (IAC). This study aimed to develop a machine learning (ML)-based model using artificial intelligence (AI)-extracted CT radiomic features to predict the invasiveness of GGNs.

Methods: A retrospective cohort of 285 patients (148 with preinvasive lesions, 137 with IAC) from the Lingnan Campus was divided into training and internal validation sets (8:2). An independent cohort of 210 patients (118 with preinvasive lesions, 92 with IAC) from the Tianhe Campus served as external validation. Nineteen radiomic features were extracted and filtered using Boruta and LASSO algorithms. Seven ML classifiers were evaluated using AUC-ROC, decision curve analysis (DCA), and SHAP interpretability.

Results: Median CT value, skewness, 3D long-axis diameter, and transverse diameter were ultimately selected for model construction. Among all classifiers, the Gradient Boosting Machine (GBM) model achieved the best performance (AUC = 0.965 training, 0.908 internal validation, and 0.965 external validation). It demonstrated strong accuracy (88.1%), specificity (80.7%), and F1 score (0.87) in the external validation cohort. The GBM model demonstrated superior net clinical benefit. SHAP analysis identified median CT value and skewness as the most influential predictors.

Conclusion: This study presents a simplified ML model using AI-extracted radiomic features, which has strong predictive performance and biological interpretability for preoperative risk stratification of GGNs. By leveraging median CT value, skewness, 3D long-axis diameter, and transverse diameter, the model enables accurate and noninvasive differentiation between IAC and indolent lesions, supporting precise surgical planning.

Keywords: artificial intelligence; invasiveness; pulmonary ground‐glass nodules; radiomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

FIGURE 1
FIGURE 1
The feature selection process. (A) Ranking of features for predicting invasiveness of pulmonary nodule by Boruta algorithm. The plot demonstrates boxplot of important attributes in color green, tentative attributes in yellow, non‐important attributes in red, and shadow attributes in blue box, respectively. The vertical axis lists the name of each variable, and the horizontal axis is the Z‐value. (B) Features determined by LASSO analysis (n = 4). (C) LASSO Coefficient distribution map‐LASSO coefficient distribution of all features. (D) Coefficients for the four key features in Lasso model.
FIGURE 2
FIGURE 2
ROC curves for predicting pathological invasiveness of pulmonary ground‐glass nodules using different machine learning algorithms. (A) ROC curves for training cohort. (B) ROC curves for internal validation cohort. (C) ROC curves for external validation cohort.
FIGURE 3
FIGURE 3
Decision curve analysis of the GBM model of the training cohort (A), internal validation cohort (B), and external validation cohort (C). The vertical axis is the net benefit after intervention; the horizontal axis is the threshold.
FIGURE 4
FIGURE 4
The mean SHAP values of features for the GBM model. The horizontal axis represents the average SHAP value, and the vertical axis represents the predictor in the GBM model.
FIGURE 5
FIGURE 5
Single feature SHAP dependency graph, the horizontal axis represents the value range of a single feature, the vertical axis represents the SHAP value of the feature, and the scattered points represent each sample.

Similar articles

References

    1. Liu Z., Liu X., and Ni L., “Analysis of Pulmonary Nodules Detected by Annual Low‐Dose Computed Tomography in the Elderly During a 10‐Year Follow‐Up,” Geriatrics & Gerontology International 22, no. 10 (2022): 865–869. - PubMed
    1. Gould M. K., Tang T., Liu I. L., et al., “Recent Trends in the Identification of Incidental Pulmonary Nodules,” American Journal of Respiratory and Critical Care Medicine 192, no. 10 (2015): 1208–1214. - PubMed
    1. Ye T., Deng L., Wang S., et al., “Lung Adenocarcinomas Manifesting as Radiological Part‐Solid Nodules Define a Special Clinical Subtype,” Journal of Thoracic Oncology 14, no. 4 (2019): 617–627. - PubMed
    1. Tsao M. S., Nicholson A. G., Maleszewski J. J., Marx A., and Travis W. D., “Introduction to 2021 WHO Classification of Thoracic Tumors,” Journal of Thoracic Oncology 17, no. 1 (2022): e1–e4. - PubMed
    1. Travis W. D., Brambilla E., Nicholson A. G., et al., “The 2015 World Health Organization Classification of Lung Tumors: Impact of Genetic, Clinical and Radiologic Advances Since the 2004 Classification,” Journal of Thoracic Oncology 10, no. 9 (2015): 1243–1260. - PubMed