Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr;31(4):1351-1363.
doi: 10.1038/s41591-025-03530-z. Epub 2025 Feb 27.

Single-cell and spatial genomic landscape of non-small cell lung cancer brain metastases

Affiliations

Single-cell and spatial genomic landscape of non-small cell lung cancer brain metastases

Somnath Tagore et al. Nat Med. 2025 Apr.

Abstract

Brain metastases frequently develop in patients with non-small cell lung cancer (NSCLC) and are a common cause of cancer-related deaths, yet our understanding of the underlying human biology is limited. Here we performed multimodal single-nucleus RNA and T cell receptor, single-cell spatial and whole-genome sequencing of brain metastases and primary tumors of patients with treatment-naive NSCLC. Chromosomal instability (CIN) is a distinguishing genomic feature of brain metastases compared with primary tumors, which we validated through integrated analysis of molecular profiling and clinical data in 4,869 independent patients, and a new cohort of 12,275 patients with NSCLC. Unbiased analyses revealed transcriptional neural-like programs that strongly enriched in cancer cells from brain metastases, including a recurring, CINhigh cell subpopulation that preexists in primary tumors but strongly enriched in brain metastases, which was also recovered in matched single-cell spatial transcriptomics. Using multiplexed immunofluorescence in an independent cohort of treatment-naive pairs of primary tumors and brain metastases from the same patients with NSCLC, we validated genomic and tumor-microenvironmental findings and identified a cancer cell population characterized by neural features strongly enriched in brain metastases. This comprehensive analysis provides insights into human NSCLC brain metastasis biology and serves as an important resource for additional discovery.

PubMed Disclaimer

Conflict of interest statement

Competing interests: B.I. has received consulting fees and honoraria from Volastra Therapeutics Inc, Merck, AstraZeneca, Novartis, Eisai and Janssen Pharmaceuticals and has received research funding to Columbia University from Alkermes, Arcus Biosciences, Checkmate Pharmaceuticals, Compugen, Immunocore, Merck, Regeneron and Synthekine. B.I. is a founder of Basima Therapeutics, Inc. C.G. has received consulting fees from Watershed Informatics. B.H. received consulting fees and honoraria from Amgen, Eisai and MJH Life Sciences. A.M.T. received research funding from Ono Pharmaceuticals. N.A.R. is currently an employee and shareholder of Synthekine Inc. B.S.H. has received consulting fees from AstraZeneca, Ideaya, Jazz Pharmaceuticals, Sorrento Therapeutics, Genentech-Roche, OncLive, Veeva, Athenium, Boxer, Dava Oncology and SAI-Med and research funding to Columbia University from Neximmune, Inc, Janssen and Genentech-Roche. A.D.A. is now an employee of Adaptimmune. J.B. is now an employee of Pfizer. S.W., .S.K.D, and G.S. are employees of Caris Life Sciences. A.S. received consulting fees and honoraria from Abbvie, Bristol Myers Squibb, Veracyte, Genentech, Medscape and Physician Education Resource, and research support from Boehringer Ingelheim. All remaining authors report no competing interests.

Figures

Extended Data Fig. 1 |
Extended Data Fig. 1 |. snRNA-seq quality control and refinement of malignant cell assignment.
a, Basic quality control measures of samples (n = 43; PTs = 12, BMs = 31) processed for snRNA-seq and b, expression of a variety of different artifactual stress signatures, each separated by tissue origin. Upper and lower edges of boxplot indicate 75th and 25th percentiles, respectively, and middle line indicates median. One-sided Wilcoxon tests were performed, with p values as indicated on each graph. c, Representative schematic of refinement of malignant cell identity using URBAN (see Methods). An exemplary case is shown with initial cell type assignment (malignant vs. non-malignant) projected in two UMAP dimensions, a schematic summary of URBAN integrating inferred copy number variants (CNV) in single-cell transcriptome data, and refinement using matched low-pass whole-genome sequencing (lpWGS), that enables more accurate determination of malignant cells. This approach is performed in each sample individually and in an iterative fashion across the cohort. d, Exemplary UMAP embedding indicating PTPRC/CD45 and EPCAM expression across individual samples. e, ichorCNA plots for selected samples. X axis indicates chromosome number and y axis log2 ratio for ploidy, where 0 indicates diploidy, +1 gain, and –1 loss.
Extended Data Fig. 2 |
Extended Data Fig. 2 |. Chromosomal instability in NSCLC BMs.
a-b, Enrichment of CNVs per genomic location profiled from each individual PT and BM sample, with an upper consensus plot which totals all the CNVs observed at a given genomic location. The color strip at the top of the plot represents the plurality consensus at each genomic location (red = amp, grey = loh, blue = del, orange = bamp, skyblue = bdel) across the chromosomal landscape (columns) for all samples (rows) from a, primary tumors and b, brain metastases. c, Kaplan Meier survival curve of lung adenocarcinoma patient survival, stratified by CINhigh (upper quartile, red line) and CINlow (lower quartile, blue line) status as estimated by measurement of fraction of genome altered (FGA) from whole-exome sequencing data in the cancer genome atlas (TCGA) lung adenocarcinoma cohort. d, Pearson’s correlation of Caris’ panel DNA sequence of ~700 genes with TCGA WES data based on FGA (R = 0.99, p = 0). One-sided Wilcoxon test of significance performed., e-f, CIN70 signature expression (z-score) (e) and STING signature expression (z-score) (f) in LUAD RNAseq data from Caris Life Sciences across disease site (PTs = 5867, BMs = 617, ECMs = 2612), g-h, CIN70 signature expression (z-score) (g) and STING signature expression (z-score) (h) in LUSC RNAseq data from Caris Life Sciences across disease site (PTs = 2641, BMs = 91, ECMs = 747). Upper and lower edges of boxplots in e-h indicate 75th and 25th percentiles, respectively, and middle line indicates median. Tails extend to the 95% confidence interval. Statistical test and significance level as indicated on top of each boxplot.
Extended Data Fig. 3 |
Extended Data Fig. 3 |. NMF metaprograms.
a, Gene set enrichment analysis indicating cancer hallmarks and gene ontology signatures enriched in each of the 8 metaprograms identified by NMF. Q score is FDR adjusted p-value. b, Expression of mixed PT/BM MP signatures (MPs 1, 2, 3, 6) across healthy and malignant tissues (normal lymph node, normal lung, pleural effusion, early/late stage lung cancer, lymph node metastases, and BMs) profiled in Kim et al 2020. Upper and lower edges of boxplots indicate 75th and 25th percentiles, respectively, and middle line indicates median. Tails extend to the 95% confidence interval. Statistical test and significance level as indicated on each boxplot.
Extended Data Fig. 4 |
Extended Data Fig. 4 |. Quality control of rare malignant cell population (Cluster 21).
a-f, Quality control measures of rare, malignant cell population (Cluster 21). Box plots depicting a, gene counts, b, mitochondrial reads, c, cluster 21 signature expression across non-malignant groups, d, doublet score, and e-f, processing-related stress signature expression. g, Copy number alteration (CNA) patterns of cluster 21 cells compared to remaining malignant cells in the patient’s original cluster. Columns represent chromosomes, rows individual cell transcriptomes with inferred CNAs (blue = deletion, red = amplification). Left bar indicates rows occupied by cluster 21 cells (green) or other malignant cells from the representative patient (yellow). h, Gene set enrichment analysis (GSEA) of the cluster 21 population based on differential gene expression, Q score is FDR adjusted p-value. i, Stacked bar plots denoting percentage of PT or BM cells that comprise cluster 21 compared to the proportion of non-cluster 21 cells. Two-sided Fisher’s exact test, significance level as indicated. j, CIN70 signature expression in cancer cells from cluster 21 compared to all others. k-l, Boxplots indicating cluster 21 signature expression across matched sample pairs, PA060 (BM): N254 (PT) (k) and PA067 (BM): N586 (PT) (l). m, Cluster 21 signature expression in 44 snRNA-seq profiles in Kim et al dataset. n, Box plots indicating the expression of the cluster 21 signature in cell lines derived from NSCLC primary tumors (n = 91) or metastatic lesions (n = 101) of the cancer cell line encyclopedia (CCLE). o, Kaplan Meier survival curve of lung adenocarcinoma patients, stratified by expression of the cluster 21 signature. Data derived from TCGA. p, UMAP embedding of healthy lung epithelial cell types coupled with density gradients to highlight distinct cellular neighborhoods. q, Computation of probability of random walks starting from mixed lineage cells (ML1-ML15) reaching a class of labeled cells (AT1, AT2-Main, AT2-high MT, Airway Ciliated, Other), followed by assigning each unlabeled cell in ML group a cell state label based maximum probability. Upper and lower edges of boxplots indicate 75th and 25th percentiles, respectively, and middle line indicates median. Tails extend to the 95% confidence interval. Statistical test and significance level as indicated on each boxplot. One-side Wilcoxon tests were performed when utilized.
Extended Data Fig. 5 |
Extended Data Fig. 5 |. The tumor microenvironment of NSCLC PTs and BMs.
a, Dot plot of representative T cell marker genes, separated by disease site. Rows indicate selected genes, while columns indicate refined cell type assignment. b, UMAP embedding of CD8 + T cell clusters indicating expression of TCF7 and TOX. c-d, Diffusion component (D.C) analysis of CD8 + T cells colored by TCF7 and TOX expression ratio (c), separated by disease site (d). e, Dot plot of representative Myeloid marker genes, separated by disease site. Rows indicate selected genes, while columns indicate refined cell type assignment. f, Dot plot of representative CNS marker genes. Rows indicate selected genes, while columns indicate refined cell type assignment. g, Gene set enrichment analysis indicating cancer hallmarks and gene ontology signatures enriched in CNS: cancer cell, CNS: myeloid, and CNS:T cells interactions identified in the ContactTracing analysis. Q score is FDR adjusted p-value.
Extended Data Fig. 6 |
Extended Data Fig. 6 |. The spatial landscape of NSCLC PTs and BMs.
a-f, Correlation of TME cell fraction discovered in spatial transcriptomics (x axis) and snRNA-seq (y axis) analysis. Pearson’s correlation and significance as indicated on plot, Shaded areas surrounding line of best fit indicate 95% confidence interval. One-sided Wilcoxon tests of significance were performed. g, Recurrent drivers of spatial variability (selected genes indicated on the left) across samples profiled by spatial transcriptomics (columns). Top bar indicates cell types (in this case, including only malignant cells), and the bottom bar tissue origin (BM vs. PT), h-q, Spatial cell-type annotation across all other samples profiled, with colors indicating cell type assignment.
Fig. 1 |
Fig. 1 |. Experimental design and refinement of malignant cell assignment.
a, Outline of this study, including profiling methods used for treatment-naive NSCLC PTs and BMs (left), a summary of analytical approaches (middle) and validation cohorts (right). WES, whole-exome sequencing; CCLE, Cancer Cell Line Encyclopedia; MSKCC, Memorial Sloan Kettering Cancer Center; IF, immunofluorescence b, UMAP embedding of integrated cell transcriptomes from all profiled samples, and assignment of malignant and nonmalignant cells based on URBAN-refined labels. c, UMAP embedding with major cell type labels. CAFs, cancer-associated fibroblasts. CNS, central nervous system. d, Cell transcriptome origin from PTs and BMs. Drawings in a were created with BioRender.com.
Fig. 2 |
Fig. 2 |. Genomic instability in NSCLC BMs.
a, FGA based on the lp-WGS of NSCLC PTs and BMs in our cohort. b, Expression of the CIN70 signature in our cohort. c, FGA across metastatic sites in 4,710 NSCLC specimens with CNA data. d, Integration of clinical and FGA data of cohort shown in c. Plus sign in brackets indicates either present or absent. FGA across sequenced PTs, ECMs and/or BMs. Samples are stratified based on the site of the sequenced sample and whether patients had concurrent ECMs and/or BMs that were not sequenced but clinically diagnosed. e, FGA derived from WES data of 86 patients with cancer (left), including 38 patients with NSCLC (right), in which matched PT and BM samples, and separate healthy control samples, were available. f, FGA derived from WES data of 73 NSCLC BMs (left) in which there are 58 matched patient PTs and BMs (right). g, Percentage of human LUAD tumors marked by LOH according to site (PTs = 5,867; BMs = 617; ECMs = 2,612) in the Caris Life Sciences cohort. h, Percentage of LUAD tumors designated as high LOH (percentage of tumors in the top LOH quartile) in this cohort. i, Percentage of human LUSC tumors marked by LOH according to site (PTs = 2,641, BMs = 91, ECMs = 747) in the Caris Life Sciences cohort. j, Percentage of LUSC tumors designated as high LOH (percentage of tumors in the top LOH quartile) in this cohort. k, Representative immunofluorescence images showing nuclei (DAPI), EpCAM and cGAS expression in matched pairs (n = 18 pairs) of PT and BM samples. l, Fraction of EpCAM+ cells with detected cGAS+ puncta out of the total EpCAM+ cancer cells measured by immunofluorescence with statistical significance as indicated on the graph; two-sided Wilcoxon test was performed. The upper and lower edges of the box plots indicate the 75th and 25th percentiles, respectively, and the middle line indicates the median. The tails extend to the 95% confidence interval. The statistical test and significance level are indicated on top of each boxplot, with one-sided Wilcoxon tests used in a, b, e and f.
Fig. 3 |
Fig. 3 |. Cancer cell programs of NSCLC BMs.
a, Spearman correlation of individual programs combined into 8 MPs (indicated as boxes within the co-correlation plot) using NMF as described in ref. with annotations for tissue origin. The outer bars indicate MPs (left and top) and the inner bar (top) indicates the origin of the program (BM or PT). b, Stack plot showing the contribution of individual samples to MPs. c, Number and tissue origin of programs contributing to each of the eight MPs. The green trendline indicates the expression strength of the CIN70 signature in each MP. d, Normalized gene contribution to MPs (columns). Indicated are selected, representative genes (rows) from related MPs and biological functions corresponding to these genes, individual MPs or groups of MPs. e, Correlation (based on the Jaccard similarity coefficient) of biological functions annotated in a recently described study of the transcriptional hallmarks of cancer with each MP. f, Expression of BM-specific MP signatures (MPs 4, 5, 7, 8) across healthy and malignant tissues (normal lymph node, normal lung, pleural effusion, early- and late-stage lung cancer, lymph node metastases and BMs) profiled in ref. . The upper and lower edges of the boxplots indicate the 75th and 25th percentiles, respectively, and the middle line indicates the median. The tails extend to the 95% confidence interval. The statistical test and significance level are indicated on the graphs.
Fig. 4 |
Fig. 4 |. Discovery of a malignant cell population shared across patients with NSCLC.
a,b, UMAP embedding of all malignant cells after unsupervised clustering (a). The black arrow indicates a population of malignant cells from different patients forming a unique cluster (cluster 21). The clusters are colored by patient of origin. Cluster 21 is indicated by red coloring for enhanced visualization (b). c, Pie chart indicating the contribution (number of cells) to cluster 21 by each patient (inner chart). The outer chart indicates the tissue origin (PTs versus BMs). d, Volcano plot comparing the gene expression of cells in cluster 21 compared with those of all other cancer cells not part of this cluster, based on DGE using a nonparametric Wilcoxon rank-sum test with Bonferroni-adjusted P values. Indicated are selected genes with higher (red) or lower (blue) expression in cluster 21, respectively, highlighting notable down- and upregulated genes in the cluster 21 population. e, Representative immunofluorescence image showing nuclei (DAPI), EpCAM and GFAP expression in a BM sample. f, Fraction of EpCAM+GFAP+ cells in an independent cohort of matched PT and BM samples (n = 18 pairs) measured by IF. A two-sided Wilcoxon test was performed, with significance as indicated on the graph. g, Expression of an alveolar epithelial cell signature across healthy tissue, tumor cells not in cluster 21 and cluster 21 tumor cells (x axis). The upper and lower edges of the boxplots indicate the 75th and 25th percentiles, respectively, and the middle line indicates the median. The tails extend to the 95% confidence interval. The statistical test and significance level are indicated. h, UMAP embedding of a normal lung reference atlas of normal cell types with their respective neighborhoods as indicated by density contour outlines; MT, mitochondria. Projected are malignant cell states and their closest relationship to normal cell neighborhoods of normal lung epithelial cell types as defined in an snRNA-seq reference . Integration of all malignant cells was performed (gray). i, Same as in h but with a separate projection of the cluster 21 cell population. Neighborhoods occupied by the cluster 21 cell population (red) and all other sequenced malignant cells (gray). j, Cell state assignment probabilities per cell, as calculated by Markov absorption probabilities.
Fig. 5 |
Fig. 5 |. The tumor immune landscape of NSCLC BMs.
a, UMAP embedding of integrated cells showing intermediate cell-type assignment of nonmalignant cells. b, UMAP embedding of integrated T cells colored by cell-type assignment. EM, effector memory; NKT, Natural killer T cells. c, Milo fractional analysis of refined T cell clusters. The dot size corresponds to neighborhood size. Directionality (violin plot) and weighted mean (positive (red) = toward BM, negative (blue) = toward PT) indicate the disease site where there is a greater relative abundance of the indicated cell type. The x axis represents log2FC of the relative cell abundance; the y axis indicates individual T cell types. d, Representative image showing nuclei (DAPI), cancer cell (EpCAM+), T cell (CD8+) and macrophage (CD163+) expression in a matched PT and BM. e, Fraction of CD8+ T cells in matched pairs (n = 18) of PT and BM samples, measured by IF. Statistical significance is indicated on the graph. A two-sided Wilcoxon test was used. f, DC analysis of CD8+ T cells colored by degree of clonal expansion. g, UMAP embedding of integrated myeloid cells colored by cell type assignment. h, Milo fractional analysis of myeloid clusters. The dot size corresponds to neighborhood size. Directionality and weighted mean (positive = toward BM, negative = toward PT) indicate the tissue site where there is a greater relative abundance of the indicated cell type. The x axis represents log2FC of relative cell abundance; the y axis indicates individual myeloid cell types. i, DC analysis of myeloid cells, colored by refined cell type and tissue origin (inset, upper left). j, Intensity of CD163 protein expression measured by IF in the same cohort as e with statistical significance indicated on the graph. A two-sided Wilcoxon test was used. k, UMAP embedding of integrated CNS cells colored by cell type assignment and tissue origin (inset, upper right). l, Ligand–receptor binding analysis using ContactTracing, highlighting inferred cellular interactions significantly enriched in BM or PT samples across all major cell types. The ribbon thickness is proportional to the number of cells expressing a BM- or PT-dependent interaction effect. Ligand (black) and receptors (gray) are labeled at ribbon endpoints.
Fig. 6 |
Fig. 6 |. Spatial landscape of NSCLC BMs.
a, Schematic of the Slide-seqV2 experimental procedure. Selected treatment-naive NSCLC PTs and BMs were retrieved from a tissue bank, cut and placed onto immobilized capture beads with known locations on a glass slide. Sample mRNA gets captured in a spatially preserved fashion during hybridization. Beads are removed from the slide after reverse transcription (RT) and sample digestion and serve as input for next generation sequencing (NGS) library construction. b, Computational analysis pipeline design. Concurrently collected spatial and snRNA-seq data were used for cell-type assignment using RCTD. Following integration, cell–cell co-occurrences and respective programs of interactions are determined for selected cell types. c, Pearson’s correlation of malignant cell fraction discovered in spatial transcriptomics (x axis) and snRNA-seq (y axis) analysis. The correlation coefficient and significance are indicated on the plot. The shaded area surrounding the line indicates the 95% confidence interval. A one-sided Wilcoxon test of significance was performed. dg, Exemplary slideseq of samples KRAS_10 (PT) (d,e) and STK_14 (BM) (f,g) showing cell type composition (d,f) and distribution of expression of the cluster 21 signature (e,g). Each dot within spatial transcriptomics represents a deconvolved discrete single cell with superimposed cell identity or cluster 21 expression (the latter, only in malignant cells). h,i,l,m, Exemplary pucks (KRAS_12 (PT) (h,i) and PA054 (BM) (l,m)) recapitulating spatial localization and density of the composition of the myeloid compartment in PTs and BMs. j,k,n,o, Spatial DGE in malignant cells co-localized in myeloid-high compared with myeloid-low regions in exemplary PTs and BMs (KRAS_12 (PT) (j,k) and PA054 (BM) (n,o)). Density contour outlines representing GSEA pathway enrichment of malignant cells near myeloid-enriched locations in BMs were enriched for MYC target expression, TGF-beta signaling and EMT, whereas malignant cells located near myeloid-enriched regions in PTs upregulated genes involved in interferon alpha response, antigen presentation, terpenoid metabolism and interferon gamma response. p, Pearson’s correlation of the average cluster 21 score discovered in spatial transcriptomics (x axis) and average CIN70 score in spatial transcriptomics (y axis) analysis. The correlation coefficient and significance are indicated on the plot. The shaded area surrounding the line indicates the 95% confidence interval. A one-sided Wilcoxon test of significance was performed. Panel a created with BioRender.com.

References

    1. Sperduto PW et al. Diagnosis-specific prognostic factors, indexes, and treatment outcomes for patients with newly diagnosed brain metastases: a multi-institutional analysis of 4,259 patients. Int. J. Radiat. Oncol. Biol. Phys 77, 655–661 (2010). - PubMed
    1. Moravan MJ et al. Current multidisciplinary management of brain metastases. Cancer 126, 1390–1406 (2020). - PubMed
    1. Achrol AS et al. Brain metastases. Nat. Rev. Dis. Primers 5, 5 (2019). - PubMed
    1. Gridelli C et al. Non-small cell lung cancer. Nat. Rev. Dis. Primers 1, 15009 (2015). - PubMed
    1. Abdallah SM-B & Wong A Brain metastases in non-small cell lung cancer: are tyrosine kinase inhibitors and checkpoint inhibitors now viable options? Curr. Oncol 25, S103–S114 (2018). - PMC - PubMed