Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 23;11(1):102.
doi: 10.1186/s40364-023-00539-9.

Improving the prediction of Spreading Through Air Spaces (STAS) in primary lung cancer with a dynamic dual-delta hybrid machine learning model: a multicenter cohort study

Affiliations

Improving the prediction of Spreading Through Air Spaces (STAS) in primary lung cancer with a dynamic dual-delta hybrid machine learning model: a multicenter cohort study

Weiqiu Jin et al. Biomark Res. .

Abstract

Background: Reliable pre-surgical prediction of spreading through air spaces (STAS) in primary lung cancer is essential for precision treatment and surgical decision-making. We aimed to develop and validate a dual-delta deep-learning and radiomics model based on pretreatment computed tomography (CT) image series to predict the STAS in patients with lung cancer.

Method: Six hundred seventy-four patients with pre-surgery CT follow-up scans (with a minimum interval of two weeks) and primary lung cancer diagnosed by surgery were retrospectively recruited from three Chinese hospitals. The training cohort and internal validation cohort, comprising 509 and 76 patients respectively, were selected from Shanghai Chest Hospital; the external validation cohorts comprised 36 and 53 patients from two other centers, respectively. Four imaging signatures (classic radiomics features and deep learning [DL] features, delta-radiomics and delta-DL features) reflecting the STAS status were constructed from the pretreatment CT images by comprehensive methods including handcrafting, 3D views extraction, image registration and subtraction. A stepwise optimized three-step procedure, including feature extraction (by DL and time-base radiomics slope), feature selection (by reproducibility check and 45 selection algorithms), and classification (32 classifiers considered), was applied for signature building and methodology optimization. The interpretability of the proposed model was further assessed with Grad-CAM for DL-features and feature ranking for radiomics features.

Results: The dual-delta model showed satisfactory discrimination between STAS and non-STAS and yielded the areas under the receiver operating curve (AUCs) of 0.94 (95% CI, 0.92-0.96), 0.84 (95% CI, 0.82-0.86), and 0.84 (95% CI, 0.83-0.85) in the internal and two external validation cohorts, respectively, with interpretable core feature sets and feature maps.

Conclusion: The coupling of delta-DL model with delta-radiomics features enriches information such as anisotropy of tumor growth and heterogeneous changes within the tumor during the radiological follow-up, which could provide valuable information for STAS prediction in primary lung cancer.

Keywords: Deep learning; Lung cancer; Radiomics; Spreading through air spaces (STAS).

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
The framework of this study. This work mainly contained three steps: feature extraction by delta-radiomics and deep learning from the delta-images of lung tumors acquired by image registration, feature merging and selection where ICC values were applied to select the features with reliability and 45 methods were applied to optimize the feature set, optimized classification where 32 AI methods were tested. Model performances were evaluated by ROAUC and confusion matrix where two external cohort (ZS Cohort and Ninth Cohort) were used for cross-center validation
Fig. 2
Fig. 2
The building of CHEST cohort and its baseline information. A The enrollment eligibility for CHEST cohort and the data allocation in the experimental and validation setups. B The baseline information of the studied cohort (Gender, age, smoking history, AJCC stages, invasion status, molecular and paraffin pathology with LUAD subtypes, and the CT signs in baseline scan). Abbreviations: lepidic, L; acinar, A; papillary, P; micropapillary, MP; solid, S; complex glandular pattern, C; Not applicable, N.A.; American Joint Committee on Cancer, AJCC
Fig. 3
Fig. 3
The feature extraction processes in this study and the characteristics of the extracted features. A The extraction of dual-delta hybrid features. B The characteristics (pros and cons) of deep-learning-extracted features and radiomics features compared from three aspects: interpretability and repeatability, stability, vulnerability and data-demand
Fig. 4
Fig. 4
Registration results. A SSIM scores of various registration methods. B SSIM scores and examples for three different types of registration methods (shown in Green-Magenta pattern)
Fig. 5
Fig. 5
Results of five-fold cross-validation and in-center validation in real world. A Training curve of AlexNet model extracting the delta-DL features (ICV accuracy vs. ICV loss during the training process). B A representative confusion matrix of a near-average classification result by dual-delta machine learning model. C T-SNE unsupervised clustering of features. D Five-fold cross-validation ROC curves and their AUC values. E In-center validation ROC curves and their AUC values. F AUC values and the feature numbers for the combinations of feature selection algorithms and their optimal classification models, where the LASSO cross-validation plot and LASSO trajectory plots of variables (green vertical lines represent the number of features corresponding to MSEmin), and the ranked feature weights by ReliefF (pie charts show the compositions of essential feature sets selected by LASSO and ReliefF) are given
Fig. 6
Fig. 6
Effect of different follow-up intervals on model performance and the cross-center performance of the model. A Frequency distribution of different follow-up time groups. B Performances of different follow-up time groups by different models. C A schematic diagram showing possible relationships between the feature effectiveness and follow-up time interval in different models. D External validation results showing the ROC curves and confusion matrices of Zhongshan cohort and Ninth Hospital cohort
Fig. 7
Fig. 7
The model interpretability. A A representative example of GRAD-CAM visualization result for CNN classification, where the attention distributions of classic DL model and delta-DL model were shown in annotated images. B The essential feature set selected by LASSO and ReliefF and their compositions

Similar articles

Cited by

References

    1. Sung H, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71:209–249. doi: 10.3322/caac.21660. - DOI - PubMed
    1. Kadota K, et al. Tumor spread through air spaces is an important pattern of invasion and impacts the frequency and location of recurrences after limited resection for small Stage I lung adenocarcinomas. J Thorac Oncol. 2015;10:806–814. doi: 10.1097/JTO.0000000000000486. - DOI - PMC - PubMed
    1. Dai C, et al. Tumor spread through air spaces affects the recurrence and overall survival in patients with lung adenocarcinoma >2 to 3 cm. J Thorac Oncol. 2017;12:1052–1060. doi: 10.1016/j.jtho.2017.03.020. - DOI - PubMed
    1. Shiono S, et al. Spread through air spaces is a prognostic factor in sublobar resection of non-small cell lung cancer. Ann Thorac Surg. 2018;106:354–360. doi: 10.1016/j.athoracsur.2018.02.076. - DOI - PubMed
    1. Warth A, et al. Prognostic impact of intra-alveolar tumor spread in pulmonary adenocarcinoma. Am J Surg Pathol. 2015;39:793–801. doi: 10.1097/PAS.0000000000000409. - DOI - PubMed

LinkOut - more resources