Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr;57(4):897-909.
doi: 10.1038/s41588-025-02120-6. Epub 2025 Mar 18.

Quantitative characterization of cell niches in spatially resolved omics data

Affiliations

Quantitative characterization of cell niches in spatially resolved omics data

Sebastian Birk et al. Nat Genet. 2025 Apr.

Abstract

Spatial omics enable the characterization of colocalized cell communities that coordinate specific functions within tissues. These communities, or niches, are shaped by interactions between neighboring cells, yet existing computational methods rarely leverage such interactions for their identification and characterization. To address this gap, here we introduce NicheCompass, a graph deep-learning method that models cellular communication to learn interpretable cell embeddings that encode signaling events, enabling the identification of niches and their underlying processes. Unlike existing methods, NicheCompass quantitatively characterizes niches based on communication pathways and consistently outperforms alternatives. We show its versatility by mapping tissue architecture during mouse embryonic development and delineating tumor niches in human cancers, including a spatial reference mapping application. Finally, we extend its capabilities to spatial multi-omics, demonstrate cross-technology integration with datasets from different sequencing platforms and construct a whole mouse brain spatial atlas comprising 8.4 million cells, highlighting NicheCompass' scalability. Overall, NicheCompass provides a scalable framework for identifying and analyzing niches through signaling events.

PubMed Disclaimer

Conflict of interest statement

Competing interests: S.B. is a part-time employee at Avanade Deutschland. M.L. owns interests in Relation Therapeutics and is a scientific cofounder and part-time employee at AIVIVO. F.J.T. consults for Immunai Inc., Singularity Bio B.V., CytoReason Ltd. and Omniscope and has an ownership interest in Dermagnostix GmbH and Cellarity. As of 1 February, 2025, C.T-L. is an employee at Cellzome GmbH/GSK. His contributions were done while being at the University of Würzburg. The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of NicheCompass.
a, NicheCompass takes single-sample or multi-sample spatial omics data with cell-level or spot-level observations as input. Using the 2D coordinates, it constructs a spatial neighborhood graph (represented with a binary adjacency matrix), with each cell or spot representing a node. Each observation includes omics features (gene expression and optionally paired chromatin accessibility) and covariates to account for confounders (for example, sample). b, A graph neural network (GNN) encoder generates cell embeddings, with covariates embedded for removal of confounding effects. c, The model is incentivized to learn an embedding in which each feature represents the activity of a spatially localized interaction pathway retrieved from domain knowledge, represented as a prior program. In addition to prior programs, the model can discover de novo programs, which learn a set of spatially co-occurring genes and peaks. GPs, gene programs. d, GPs, derived from databases or experts, are classified into three categories and comprise neighborhood components and self components to reflect intercellular and intracellular interactions. The neighborhood component contains genes linked to the interaction source of intercellular interactions, and the self component contains genes linked to the interaction target of intercellular interactions and genes linked to intracellular interactions. Peaks are associated with genes if locationally proximal. TF, transcription factor. e, Decoders reconstruct spatial and molecular information while constraining embedding features to represent the activity of a specific program: a graph decoder reconstructs sample-specific input adjacencies, and omics decoders reconstruct a node’s omics counts and aggregated counts of its neighborhood. Omics decoders are linear and masked based on programs, thus enabling interpretability (exemplified by a combined interaction program). f, NicheCompass facilitates critical downstream applications in spatial omics data analysis. Illustrations of cells were created with BioRender.com.
Fig. 2
Fig. 2. NicheCompass reveals cellular interactions shaping tissue organization in mouse development.
a, Uniform manifold approximation and projection (UMAP) of integrated NicheCompass embeddings and the three embryo tissues, colored by niches annotated using characterizing programs (gene names in niche annotations refer to characterizing programs that are upregulated in the niche compared to all other niches). The floor plate niche is outlined and labeled. b, Same UMAP as a but colored by original cell type or region annotations. ExE endoderm, extraembryonic endoderm; NMP, neuromesodermal progenitor. c, Cell proportions from each section across niches. d, Dendrogram of average program activities showing a functional higher-order hierarchy. e, Heatmap of normalized activities for two characterizing programs per niche, showing gradients along the hierarchy. f, Cell type proportions for each niche (colors from b). g,h, Activities of characterizing programs differentiating ventral and dorsal gut niches (g) and CNS niches (h), with correlated expression of ligand-encoding and receptor-encoding genes. i,j, Cell–cell communication analysis for a ventral gut program (i) and a floor plate program (j), showing inferred communication strengths between niches and consistent member gene expression. Nodes represent niches and edges the strength (width) and direction (arrowheads) of the interaction. Com. strength, communication strength.
Fig. 3
Fig. 3. Benchmarking NicheCompass across diverse scenarios.
a, Coronal mouse brain image from the Allen Brain Atlas (left) and a SlideSeqV2 hippocampus tissue (right), showing corresponding niches identified by NicheCompass. CA1sp, CA1 pyramidal layer; CA2sp, CA2 pyramidal layer; CA3sp, CA3 pyramidal layer. b, Dendrogram of average program activities reveals a hierarchy of anatomically and molecularly similar niches, and their cell type compositions. c, Top: mouse hippocampus tissue colored by niches identified using four methods. Cluster colors match with a. Bottom: the corresponding dendrograms computed on each method’s embeddings. d, Performance comparison across six metrics for spatial consistency and niche coherence, aggregated into an overall score (Methods). e, Integration performance of NicheCompass, CellCharter, BANKSY, GraphST and STACI on a NanoString CosMx NSCLC dataset subsample. Top: UMAPs colored by data source highlight endothelial and stroma niches integrated only by NicheCompass. Bottom: lung tissue replicates display differences in batch effect removal and niche resolution. Highlighted is the first field of view (FoV) across all three replicates where other methods show FoV effects hindering integration. Niche annotations below tissue sections refer to niches identified by the respective method. For methods other than NicheCompass, only differences compared to NicheCompass are displayed. f,g, Performance summary metrics of NicheCompass and similar methods on four single-sample (f) and three multi-sample (g) datasets. Metrics were computed for n = 8 training runs per dataset and method while varying sizes of the k-nearest neighbors graph (two runs per k with k = 4, 8, 12, 16). Missing boxes indicate training failures resulting from memory constraints. Numbers on the right indicate mean score differences between NicheCompass and the second-best performing method on each dataset (green, NicheCompass performs better; yellow, NicheCompass is on par).
Fig. 4
Fig. 4. NicheCompass identifies meaningful niches and de novo programs in human breast cancer.
a, Top: UMAP of the NicheCompass embedding space after integrating two replicates of a 313-probe Xenium dataset. Bottom: tissue replicates colored by identified niches. Niches include FB-Epi (fibroblast-epithelial), CD4+T (CD4+T cells), EMT-Immune, Epi-Immune (epithelial-immune), FB-EMT (fibroblast-EMT), FB-Lymphoid (fibroblast-lymphoid), FB-Myeloid (fibroblast-myeloid), FB-Endo (fibroblast-endothelial), Mast-Stromal (mast cells-stromal), EMT-Mɸ (EMT-macrophage), EMT-Endo (EMT-endothelial), Epi-Bcells (epithelial-B cells), Stromal and Endo-Lymphoid (endothelial-lymphoid). b, Same UMAP as a, colored by cell types. DC, dendritic cell; Mɸ, macrophage; NK, natural killer. c, UMAP colored by data source, showing successful integration and proportion of cells from each data source across niches. d, Annotated H&E slides of the breast cancer tumor resection. e, Heatmap of normalized activities for characterizing programs associated with cancer progression and pathological histology. f,g, Program activity and expression of key genes for de novo 37 (f) and 86 (g) programs, showing correlations between activity and gene expression. h,i, Sunburst plots of gene weights for de novo 37 (h) and 86 (i) programs. De novo 37 program highlights keratin genes and an uncharacterized gene (C5orf46). De novo 86 program reveals a KRT8-driven program with links to fatty acid metabolism (FASN, ABCC1) and ELF3 as a potential regulator. The scale represents inferred gene weights.
Fig. 5
Fig. 5. NicheCompass spatial reference mapping contextualizes new donors and reveals emergent niches.
ac, UMAP of NicheCompass embeddings for six NSCLC lung samples, colored by identified niches (a), pre-annotated cell types (b) and donor or donor replicate (c). d, Spatial visualization of tissue sections from donors 9 and 12, showing niches, cell types and CXCL1 ligand–receptor (LR) program activity, distinguishing tumor niches interacting with stromal tissue (niche 1) or neutrophils (niche 3). e, Spatial visualization of tissue sections colored by niche and cell type, highlighting shared and donor-specific stromal structures across donors. f, UMAP of NicheCompass spatial reference with query cells mapped by fine-tuning. g,h, UMAPs of mapped query cells colored by pre-annotated cell types (g) and niche labels as predicted by a k-NN classifier trained on the reference, including prediction probabilities (h). i, Joint UMAP of reference and query embeddings, colored by niches as identified by re-clustering. In addition, bar plots represent the donor distribution of the niches the query sample maps to. j, Spatial visualization of query tissue (donor 13) and its most similar reference samples, colored by cell type (key at bottom) and niche (colored as in i), comparing newly identified niches to reference counterparts. k, Neighborhood composition in tumor niches (niche 1, 89,814 cells; niche 2, 60,131 cells; niche 3, 39,500 cells; niche 4, 41,864 cells; niche 5, 14,516 cells; niche 15, 25,271 cells). A boxplot per tumor niche and neighboring cell type represents the niche-specific distribution of cells of a given cell type among the 25 physically closest cells. Only cell types composing on average more than 5% and less than 60% of the neighborhood are shown. The query tumor niche is highlighted. l, Joint UMAP of reference and query embeddings, colored by SPP1 LR and combined interaction program activity, and expression of the ligand-encoding and receptor-encoding genes. m, Heatmap of SPP1 LR communication strengths between niches in the query (donor 13) and reference (donor 6) samples, the two donors with highest macrophage infiltration.
Extended Data Fig. 1
Extended Data Fig. 1. Enriched programs in gut and brain niches.
a, Programs enriched in gut niches show strong spatial correlation with the expression of their ligand- and receptor-encoding genes. Model-reconstructed gene expression closely matches the original while providing a smoothing effect. b, Similarly, programs enriched in brain niches exhibit strong spatial correlation with the expression of ligand- and receptor-encoding genes. The reconstructed expression aligns closely with the original but is smoother. GP: gene program.
Extended Data Fig. 2
Extended Data Fig. 2. Niche and program inference reproducibility, generalizability and robustness.
a, b, Embryo 2 niches and program activities inferred by NicheCompass with different random seeds. Displayed are characterizing programs from the main analysis in Fig. 2. Missing programs were filtered by program pruning in the respective model. Overall, there is good robustness of inferred niches and program activities across random seeds; however, there are also minor differences, most pronounced in the Hindbrain niche. c, Embryo 2 niches identified by NicheCompass when leaving out embryo 3 during reference model training. Next to it, the inferred program activity for the characterizing programs from the main analysis in Fig. 2. d, Same as c but when leaving out embryo 2 during reference model training. Overall, there is high robustness of inferred niches and program activities providing evidence for generalizability. e, Embryo 2 niches and program activities inferred by NicheCompass with a longer-range k-NN graph (k = 12). Displayed are characterizing programs from the main analysis in Fig. 2. f, Same as e but with a shorter-range k-NN graph (k = 4). g, Embryo 2 niches and program activities inferred by NicheCompass with a radius-based neighborhood graph (average number of neighbors ~9). Missing programs were filtered by program pruning in the respective model. Overall, there is good robustness of inferred niches and program activities across neighborhood graphs; however, there are also minor differences, most pronounced in the Hindbrain niche. GP: gene program. k-NN: k-nearest neighbors.
Extended Data Fig. 3
Extended Data Fig. 3. Data simulation.
a, The reference-based simulated tissue and a UMAP representation of the gene expression space reduced by principal component analysis, colored by ground truth niches. b, Same as a but colored by ground truth cell types. c, Example of an injected ground truth program which was upregulated in Niche 2 via an additive gene expression model. The target genes were upregulated in all Cell Type 3 cells if they had Cell Type 2 cells in their k neighborhood (with k = 6). Equally, the source genes were upregulated in Cell Type 2 cells. The increment factor determined the strength of upregulation. d, The reference-based simulated tissue colored by the predicted niches of each method. e, Metrics from the NicheCompass benchmarking suite (left) and metrics that measure the performance of the predicted niches compared to the ground truth (right). The overall score and ground truth prediction score are computed by min-max normalization and subsequent aggregation of the individual metrics. The ranking of methods is largely consistent between the two metrics suites. f, F1 scores between inferred and ground truth upregulated programs across n = 8 training runs for each workflow to infer niche-specific programs, with varying random seeds and a k-nearest neighbors graph with k = 6 (the ground truth cell interaction range). NicheCompass considerably outperforms alternative methods, providing evidence that it is useful to integrate pathways during training. GP: gene program.
Extended Data Fig. 4
Extended Data Fig. 4. Benchmarking on the nanoString CosMx human NSCLC 10% subsample.
a, UMAP representation after applying principal component analysis (PCA) to the raw gene expression of the three lung replicates, showing the presence of strong batch effects in the first field of view of the second replicate. b, Cell type composition of niches identified by each method. NicheCompass identified Lymphoid Structures and Tumor-Stroma Boundary niches and could differentiate between Stroma enriched by endothelial cells and Stroma enriched by plasmablast cells. CellCharter could not separate Plasmablast/Stroma from the Lymphoid Structures. BANKSY could not identify the Lymphoid Structures and Plasmablast/Stroma but instead identified artifact clusters. GraphST separated two Endothelial-enriched Stroma niches due to batch effects; however, these niches had very similar cell type composition, suggesting they should be unified. In addition, plasmablast cells were misallocated to one of those niches. STACI showed a similar failure to unify the two Endothelial-enriched Stroma niches. c, Comparison of the integration performance of further method variants. Illustrated are the UMAP representations of the learned embedding spaces and the tissue, colored by annotated niches. Niches in the first field of view are highlighted, showing differences in batch effect removal capabilities. UMAP representations colored by data source further emphasize differences in batch effect removal for the first field of view. FoV: field of view. GraphST (No Prior Alignment) was trained without prior alignment through PASTE. d, Metrics for the training runs from c and Fig. 3d. The overall score is computed by aggregating min-max-normalized individual metrics into the two categories spatial consistency and niche coherence, followed by equal weighting of these categories. NicheCompass Light is a variation of our model that uses graph convolutional layers instead of dynamic graph attention layers. NSCLC: non-small cell lung cancer.
Extended Data Fig. 5
Extended Data Fig. 5. Analysis of inter-tumoral heterogeneity.
a, A dendrogram computed based on average program activities, showing a hierarchy of niches. b, UMAP representation of the reference atlas, colored by niches identified with NicheCompass. c, d, Bar plots representing the cellular composition (c) and donor composition (d) of the identified niches. e, Spatial visualization of the six tissue sections included in the ref. , colored by cell type and identified niche. f, Dot plot showing the five most differential genes expressed in each tumor niche compared to the rest. The dot size represents the fractions of cells in a niche with expression higher than 0, while the dot color represents the mean expression level within expressing cells. g, Cell type composition in the spatial neighborhood of all cells in tumor niches 1 to 5 (niche 1: n = 81,577 cells, niche 2: n = 59,263 cells, niche 3: n = 38,937 cells, niche 4: n = 34,920 cells, niche 5: n = 10,820 cells), using a symmetric k-nearest neighbors graph with 25 neighbors. In this dataset, tumor niches consist of spatially segregated tumor cells, reflected by the identification of pure tumor niches where cells only have tumor cells in their spatial neighborhood.
Extended Data Fig. 6
Extended Data Fig. 6. Characterization of stromal niches.
a, Each row represents a niche. The bar plots on the left represent cell proportions for the most abundant cell types in that niche (that is more than 10% of the cells in the niche). The length of the bars is proportional to the cell abundance within the niche and the color is proportional to the cell abundance across all 7 stroma niches (ranging from epithelial cells with 14,922 cells to fibroblasts with 52,910 cells). The heatmaps show mean expression of selected gene markers across cell types in each niche separately, with color representing mean gene expression. Shown are selected marker genes per cell type that are differential in that cell type compared to the rest, considering all the niches together. Indicated at the top are the cell types represented by each set of markers. b, Niche cell type composition for all the samples where the niche is present (that is more than 5% of the cells in the niche are from that sample). Top bar plots show the cell type composition and bottom bar plots show the proportion of the cells from each niche in each of the samples.
Extended Data Fig. 7
Extended Data Fig. 7. Niches identified in the mouse brain are consistent across sections and correspond to regions from a reference atlas.
a, Sagittal tissue sections ordered by 3D position and colored by identified niches, showing consistency across sequential tissue sections. Below it the number of cells occurring in each tissue section for each niche. b, Same as a but for the coronal tissue sections (spinal cord is not shown). Cell numbers are scaled separately for coronal and sagittal tissue sections. c, Number of cells of different cell types in each niche. 10,683 of 1,091,280 cells are not assigned to a niche and are not shown. d, Coronal section showing NicheCompass niches obtained through clustering of the embedding space (left) and regions from the Allen Brain Atlas (right). The isocortex is highlighted. e, Magnified view showing cells assigned to the isocortex, based on the Allen Brain Atlas annotations. Sub-niches with more than 250 cells annotated in this tissue section are shown. Sub-niches are obtained through clustering of cells in a niche and correspond with regions in the reference annotation.
Extended Data Fig. 8
Extended Data Fig. 8. NicheCompass integrates 8.4 million cells across 239 tissue sections.
a, UMAP representation of the NicheCompass (Light) embedding space, colored by identified niches. Around it, randomly selected tissue slices for each major brain region, colored by identified niches. Only cells belonging to the specific region are shown. Scale bars, 1 mm. b, c, UMAP representations colored by major brain regions (b) and donor mouse (c), showing successful integration of cells in matching brain regions across donors.
Extended Data Fig. 9
Extended Data Fig. 9. NicheCompass integrates samples across different spatial transcriptomics technologies.
a, UMAP representation of the NicheCompass embedding space after integrating the MERFISH mouse brain and STARmap PLUS mouse CNS datasets, colored by dataset/sequencing technology. b, Composition of niches in terms of cells from each of the two technologies, showing that all niches except niche 9 were present in both datasets. Only niches with more than 100,000 cells are displayed. c, Two example tissue slices of the same brain region, one from the MERFISH mouse brain dataset and the other from the STARmap PLUS mouse CNS dataset, highlighting consistent anatomical niches. d, Zoom in on four specific niches that emphasize the consistency in niche identification across technologies. e, Two additional pairs of tissue slices showing consistent NicheCompass niches across technologies.
Extended Data Fig. 10
Extended Data Fig. 10. seqFISH mouse organogenesis spatial reference mapping.
a, Power analysis using different dataset proportions of the mouse embryos 1 and 2 as reference while holding out embryo 3 as query. Embryo 3 is mapped onto the reference using weight-restricted fine-tuning. UMAPs represent the integrated embedding space. BLISI quantifies the integration performance. Label transfer from reference to query is performed via a k-nearest neighbors (k-NN) classifier trained on the reference. The prediction probability of this k-NN classifier quantifies uncertainty in niche label transfer. NMI quantifies niche prediction performance based on niche labels from the full analysis in Fig. 3. b, Metrics from the scenarios in a per number of cells in the reference. NMI significantly reduces at a size of ~80,000 reference cells. c, Comparison of niche detection of the Presomitic Mesoderm niche in scenarios 1 and 2. In scenario 1, this niche is seen in the reference, and we recover the same characterizing programs as in the analysis on the full dataset, supported by expression of the respective ligand-encoding genes. In scenario 2, this niche is not seen in the reference, yet it is detected as a novel niche; however, the same programs could not be recovered as these were not relevant during reference training. GP: gene program.

References

    1. Kanemaru, K. et al. Spatially resolved multiomics of human cardiac niches. Nature619, 801–810 (2023). - PMC - PubMed
    1. Scadden, D. T. The stem-cell niche as an entity of action. Nature441, 1075–1079 (2006). - PubMed
    1. Ren, X. et al. Reconstruction of cell spatial organization from single-cell RNA sequencing data based on ligand–receptor mediated self-assembly. Cell Res.30, 763–778 (2020). - PMC - PubMed
    1. Armingol, E. et al. Inferring a spatial code of cell–cell interactions across a whole animal body. PLoS Comput. Biol.18, e1010715 (2022). - PMC - PubMed
    1. Palla, G., Fischer, D. S., Regev, A. & Theis, F. J. Spatial components of molecular tissue biology. Nat. Biotechnol.40, 308–318 (2022). - PubMed

LinkOut - more resources