Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 25;23(1):842.
doi: 10.1186/s12967-025-06860-1.

Spatial features of tumor-infiltrating lymphocytes in primary lesions of lung adenocarcinoma predict lymph node metastasis

Affiliations

Spatial features of tumor-infiltrating lymphocytes in primary lesions of lung adenocarcinoma predict lymph node metastasis

Huibo Zhang et al. J Transl Med. .

Abstract

Background: Lymph node metastasis (LNM) is critical for staging, prognosis, and treatment decisions in lung adenocarcinoma (LUAD). While tumor-infiltrating lymphocytes (TILs) have demonstrated prognostic value, their role in LNM risk remains uninvestigated. This study evaluates the relationship between TIL features from primary tumor whole slide images (WSIs) and LNM in LUAD.

Methods: TILScout was utilized to derive patch-level TIL scores and generate global TIL maps from primary tumor WSIs. Hot spot analysis and deep learning-based feature extraction followed by K-means clustering were applied to identify and characterize spatial TIL clusters (sTILCs) from the global TIL maps. Random forest models incorporating clinical/pathological data with (M1) and without (M2) TIL features (TIL scores and sTILCs) were developed on a training cohort (N = 312) to predict LNM, and performance was compared across validation (N = 78) and independent test cohorts (N = 148).

Results: Two sTILC types ("TIL-cold" cluster [sTILC1] and "TIL-hot" cluster [sTILC2]) were identified. Model M1 significantly improved LNM prediction over M2, with AUCs increasing from 0.63 to 0.78 (Z = 5.366, P < 0.001) and from 0.61 to 0.72 (Z = 1.999, P = 0.046) in the training and validation cohorts, and from 0.69 to 0.80 (Z = 3.030, P = 0.002) in the test cohort. Decision curve analysis indicated that M1 provided greater net benefit across a broad spectrum of threshold probabilities. Importantly, patients with lower TIL scores and/or classified as sTILC1 consistently had an increased risk of LNM.

Conclusions: Spatial TIL features in primary tumors are linked to LNM in LUAD, thereby enabling the identification of high-risk patients and guiding personalized treatment strategies.

Keywords: Lung adenocarcinoma; Lymph node metastasis; TILScout; Tumor-infiltrating lymphocytes; Whole slide images.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: The retrospective study on the XY cohort received approval from the institutional committee of The Third Xiangya Hospital, Central South University, and informed consent was waived. Consent for publication: No applicable. Competing interests: The authors declare no conflict of interest.

Figures

Fig. 1
Fig. 1
Computational pipeline for predicting LNM. In the first step, TILScout was utilized to calculate TIL scores and generate TIL maps from WSIs. Hot spot maps were derived from the TIL maps, highlighting regions with high density of TIL infiltration. In the second step, these hot spot maps were further processed to extract spatial TIL clusters using a deep encoder architecture after being resized to dimensions of 1024 × 1024 pixels. The encoder consists of four convolutional layers, each followed by max-pooling operations, progressively down-sampling the spatial dimensions and producing a 16,384-dimensional feature vector per image. Next, Principal Component Analysis (PCA) was applied to further reduce the feature space to 244 principal components per image. K-means clustering was performed to identify two spatial clusters based on consensus voting across three clustering evaluation metrics: Silhouette Score, Davies-Bouldin Index, and Calinski-Harabasz Index. In the final step, Node Metastasis Random Forest Models (NMRFMs) were trained on the training cohort and validated and tested on separate validation and test cohorts. Two models were developed: Model M1, incorporating clinical/pathological data (age, T stage) and TIL features (TIL scores and spatial TIL clusters), and Model M2, trained solely on clinical/pathological data (age, T stage). The performance of both models was evaluated and compared across cohorts
Fig. 2
Fig. 2
Hot spot maps and spatial TIL clusters. A Examples of hot spot maps. A gradient color scale from blue to red represents varying TIL densities. Red areas highlight regions of high TIL density. B Two examples of WSIs with the corresponding hot spot maps and sTILCs. Two clusters were identified: the “TIL-cold” cluster, characterized by sparse TIL density across hot spot maps; and the “TIL-hot” cluster, defined by densely concentrated TIL distribution
Fig. 3
Fig. 3
Process of the patient enrollment for the training, validation, and test cohorts
Fig. 4
Fig. 4
Performance of two models and feature impact for model M1 derived by SHAP. Figure 4A-C, ROC curves, illustrating the performance of the two models on the training, validation, and test cohorts with the values. Figure 4D-F, SHAP summary plots, highlighting the contribution of 4 features to the predictions of model M1 across the training, validation, and test cohorts. SHAP values were calculated based on the trained model; positive values represent a favorable impact on the prediction, while negative values indicate a contrary effect. Variables were assigned by values (feature values): Age, 0: < 45, 1: 45–65, 2: > 65; T stage, 0: T1, 1: T2, 2: T3, 3: T4; sTILCs, 1: “TIL-cold” cluster, 2: “TIL-hot” cluster. Figure 4G-I, SHAP feature importance plots, displaying the mean absolute SHAP value for each feature and ranking the features by their average impact on the output of model M1 across the cohorts
Fig. 5
Fig. 5
Decision Curve Analysis (DCA) comparing models M1 and M2 across the training, validation, and test cohorts. The DCA plots indicate that model M1 demonstrates a consistently higher net benefit compared to model M2, particularly within the threshold probability range of 0.2 to 0.7 in the training cohort, 0.2 to 0.5 in the validation cohort, and 0.2 to 0.7 in the test cohort. Threshold probability is the minimum probability of a positive outcome at which a patient would opt for an intervention
Fig. 6
Fig. 6
Correlation between TIL scores and immune cell infiltration. PCCs, Pearson correlation coefficients. “*”, P < 0.05; “**”, P < 0.01; “***”, P < 0.001
Fig. 7
Fig. 7
Biological relevance of TIL scores. A GO analysis. The circular plot consists of four concentric rings. The outermost ring represents the top 10 enriched GO terms from three different functional categories (biological process, BP, cellular component, CC, molecular function, MF). The scale on the outer ring indicates the number of genes within each GO term, and the colored blocks are labeled with the GO term IDs. In the second ring, moving inward, the length of each rectangle represents the number of genes enriched in each GO term (for example, in the top 1000 genes most correlated with TIL scores, the GO:0002443 term contains 118 enriched genes), with the length proportional to the scale in the outer ring. The color of the rectangles indicates the enrichment p-value of each term, with darker colors representing smaller p-values. The third ring shows the number of up-regulated and down-regulated genes for each term (the total number of upregulated and downregulated genes for each term corresponds to the gene count in the second ring). The innermost ring displays the enrichment score bars. The height of the bars represents the proportion of the gene count in the second ring to the total number of genes for the respective GO term. The scale of bars is set from 0 to 0.15. B The cnet plot of KEGG analysis. The top 10 enriched KEGG pathways are shown. The value of size represents the number of genes enriched under each pathway. C-D gene set enrichment analysis (GSEA). Figure 7 C shows the top five up-regulated (left) and down-regulated (right) Reactome pathways. Figure 7D shows the top five up-regulated (left) and down-regulated (right) Hallmark pathways. NES, normalized enrichment score

Similar articles

References

    1. Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74(3):229–63. - PubMed
    1. Zhang Y, Vaccarella S, Morgan E, Li M, Etxeberria J, Chokunonga E, et al. Global variations in lung cancer incidence by histological subtype in 2020: a population-based study. Lancet Oncol. 2023;24(11):1206–18. - PubMed
    1. Koike T, Tsuchiya R, Goya T, Sohara Y, Miyaoka E. Prognostic factors in 3315 completely resected cases of clinical stage I non-small cell lung cancer in Japan. J Thorac Oncol. 2007;2(5):408–13. - PubMed
    1. Li H, Hu H, Wang R, Li Y, Shen L, Sun Y, et al. Lung adenocarcinoma: are skip N2 metastases different from non-skip? J Thorac Cardiovasc Surg. 2015;150(4):790–5. - PubMed
    1. Osarogiagbon RU, Van Schil P, Giroux DJ, Lim E, Putora PM, Lievens Y, et al. The international association for the study of lung cancer lung cancer staging project: overview of challenges and opportunities in revising the nodal classification of lung cancer. J Thorac Oncol. 2023;18(4):410–8. - PMC - PubMed

LinkOut - more resources