Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 May 14:2024.05.12.593710.
doi: 10.1101/2024.05.12.593710.

Cross-domain information fusion for enhanced cell population delineation in single-cell spatial-omics data

Affiliations

Cross-domain information fusion for enhanced cell population delineation in single-cell spatial-omics data

Bokai Zhu et al. bioRxiv. .

Update in

Abstract

Cell population delineation and identification is an essential step in single-cell and spatial-omics studies. Spatial-omics technologies can simultaneously measure information from three complementary domains related to this task: expression levels of a panel of molecular biomarkers at single-cell resolution, relative positions of cells, and images of tissue sections, but existing computational methods for performing this task on single-cell spatial-omics datasets often relinquish information from one or more domains. The additional reliance on the availability of "atlas" training or reference datasets limits cell type discovery to well-defined but limited cell population labels, thus posing major challenges for using these methods in practice. Successful integration of all three domains presents an opportunity for uncovering cell populations that are functionally stratified by their spatial contexts at cellular and tissue levels: the key motivation for employing spatial-omics technologies in the first place. In this work, we introduce Cell Spatio- and Neighborhood-informed Annotation and Patterning (CellSNAP), a self-supervised computational method that learns a representation vector for each cell in tissue samples measured by spatial-omics technologies at the single-cell or finer resolution. The learned representation vector fuses information about the corresponding cell across all three aforementioned domains. By applying CellSNAP to datasets spanning both spatial proteomic and spatial transcriptomic modalities, and across different tissue types and disease settings, we show that CellSNAP markedly enhances de novo discovery of biologically relevant cell populations at fine granularity, beyond current approaches, by fully integrating cells' molecular profiles with cellular neighborhood and tissue image information.

Keywords: cellular neighborhoods; clustering; graph neural network; multiplexed imaging; spatial-omics.

PubMed Disclaimer

Conflict of interest statement

CONFLICT OF INTERESTS S.J. is a co-founder of Elucidate Bio Inc, has received speaking honorariums from Cell Signaling Technology, and has received research support from Roche unrelated to this work. G.P.N. received research grants from Pfizer, Inc.; Vaxart, Inc.; Celgene, Inc.; and Juno Therapeutics, Inc. during the time of and unrelated to this work. G.P.N. is a co-founder of Akoya Biosciences, Inc. and of Ionpath Inc., inventor on patent US9909167, and is a Scientific Advisory Board member for Akoya Biosciences, Inc. A.K.S. reports compensation for consulting and/or scientific advisory board membership from Honeycomb Biotechnologies, Cellarity, Ochre Bio, Relation Therapeutics, IntrECate Biotherapeutics, Bio-Rad Laboratories, Fog pharma, Passkey Therapeutics, and Dahlia Biosciences unrelated to this work. S.J.R. receives research support from Bristol-Myers-Squibb and KITE/Gilead. S.J.R. is a member of the SAB of Immunitas Therapeutics. The other authors declare no competing interests.

Figures

Figure 1:
Figure 1:. Illustration of the CellSNAP pipeline.
CellSNAP is compatible with imaging-based spatial-omics modalities with single-cell or finer resolutions (e.g., CODEX and CosMx). Information from three domains is extracted from each individual cell and its surroundings: 1) single-cell expression profile (e.g., measured protein or mRNA features); 2) single-cell location information (e.g., cellular neighborhood composition); 3) single-cell local tissue image information (e.g., local images from nuclear and membrane channels). CellSNAP takes these three types of information as input. It first utilizes a CNN encoder to extract features from images of local tissues surrounding each cell. Next, two separate GNN models are constructed in parallel: 1) a ‘Spatial-GNN’, where each node represents a cell, with the initial node vector assigned as CNN-extracted local image features, and nodes are connected according to spatial adjacency. 2) an ‘Expression-GNN’, where each node represents a cell, with the initial node vector assigned as the expression profile, and nodes are connected according to expression similarity. The two GNNs are connected by an overarching MLP head which combines the message passing outputs of the two GNNs for predicting the target vector of each cell, that is the concatenation of the cell’s feature-based population identity (one-hot) and its neighborhood composition (percentage) vectors. After training, the last layers of the two GNN models are extracted, combined, and reduced (via SVD) to form the final, tri-domain integrated representation vector for each cell. This multi-domain fused representation vector is then used in downstream analysis for cell type identification purposes, which is compatible with commonly used unsupervised clustering methods (e.g., Leiden clustering). Detailed illustration of the involved model architectures can be found in Supp. Fig. 1.
Figure 2:
Figure 2:. Refined B cell subpopulations discovered by CellSNAP in a healthy mouse spleen CODEX dataset.
(A) Metric-based evaluations of cell population delineation performances on CODEX mouse spleen tissue. Representations of cells, from 5 different methods, were used as input: CellSNP representation, feature (protein expression table), concact (protein expression + neighborhood composition table), SpiceMix representation, and MUSE representation (detail in Material & Methods). A total of 5 batches, each with 10,000 randomly selected cells were tested. Solid line indicates average and shade indicates 95% CI of the scores. (B) UMAP visualizations of representations and Leiden clustering results. Cell types of the CellSNAP or feature-only clusters were annotated based on the average expression profiles of the clusters. Left panel: CellSNAP representation; Right panel: feature expression. Dotted line indicates B cell subpopulations. (C) UMAP visualizations and B cell-related protein expression profiles of clusters annotated as B cells. Left panel: B cell clusters from CellSNAP and their expression heatmap. Right panel: B cell clusters from feature expression and their expression heatmap. (D) Comparison of spatial locations of different B cell clusters identified by CellSNAP representation vs. feature expression in the spleen tissue. In each plot, red dots indicate cells from a specific cluster, and green lines indicate germinal center boundaries. Upper panel: Spatial locations of B cell subpopulations identified by CellSNAP representation clusters. Lower panel: Spatial locations of B cell subpopulations identified by feature expression clusters.
Figure 3:
Figure 3:. Refined T cell subpopulations in tumor microenvironments discovered by CellSNAP in a cHL tumor CODEX dataset.
(A) Metric-based evaluations of cell population delineation performances on CODEX human cHL tissue. Representations of cells, from 5 different methods, were used as input: CellSNP representation, feature (protein expression table), concact (protein expression + neighborhood composition table), SpiceMix representation, and MUSE representation (detail in Material & Methods). A total of 5 batches, each with 10,000 randomly selected cells were tested. Solid line indicates average and shade indicates 95% CI of the scores. (B) UMAP visualizations of representations and Leiden clustering results. Cell types of the CellSNAP or the feature-only clusters were annotated based on the average expression profiles of the clusters. Left panel: CellSNAP representation; Right panel: feature expression. (C) Visualization of cell type spatial locations in the cHL tissue, colored by annotations on CellSNAP clusters. Black regions are empty spaces. White lines indicate the borders of the cHL tumor regions. (D) Visualization of the spatial locations of different CD4 T cell subpopulations identified by CellSNAP representation clusters. Black lines indicate borders of the cHL tumor regions.
Figure 4:
Figure 4:. CellSNAP-enabled delineation of biologically distinct macrophage subpopulations in a HCC tumor CosMx-SMI dataset.
(A) Metric-based evaluations of cell population delineation performances on CosMx-SMI human HCC tissue. Representations of cells, from 5 different methods, were used as input: CellSNP representation, feature (protein expression table), concact (protein expression + neighborhood composition table), SpiceMix representation, and MUSE representation (detail in Material & Methods). A total of 5 batches, each with 10,000 randomly selected cells were tested. Solid line indicates average and shade indicates 95% CI of the scores. (B) Visualizations of spatial locations of different cell populations, including all cell types (the first panel; colored by cell type annotation obtained from CellSNAP clusters; black regions indicating empty spaces) and different macrophage subpopulations identified by CellSNAP representation clusters (the second to the fourth panels). In each of the second to the fourth panels, all tumor cells, and macrophage cells from a specific CellSNAP cluster are colored, while other cells and empty spaces are in black. (C) Volcano plot of differentially expressed genes between CellSNAP-c6 cluster and other macrophage clusters. (D) Comparison of module score values (43) between CellSNAP-c6 and all other macrophage cells. ‘M1-like’ and ‘M2-like’ scores were calculated by genes from (44). Splenic macrophage specific ‘pro-inflammatory’ and ‘immunoregulatory’ scores were calculated by genes from (45). The unpaired Wilcoxon test was implemented to produce p values. (E) Visualization of the spatial distribution of all macrophages and their respective ligand-receptor interaction detection score levels. Detection score was calculated based on significant ligand-receptor interaction pairs between macrophages and tumor cells (46). (F) Top 10 most frequent ligand-receptor interaction pairs associated with CellSNAP-c6 macrophages. (G) GEP usage scores (47) among tumor cells, stratified by infiltration (by macrophage) status. The unpaired Wilcoxon test was implemented to produce p values.

References

    1. Giesen Charlotte, Wang Hao AO, Schapiro Denis, Zivanovic Nevena, Jacobs Andrea, Hattendorf Bodo, Schüffler Peter J, Grolimund Daniel, Buhmann Joachim M, Brandt Simone, et al. Highly multiplexed imaging of tumor tissues with subcellular resolution by mass cytometry. Nature methods, 11(4):417–422, 2014. - PubMed
    1. Angelo Michael, Bendall Sean C, Finck Rachel, Hale Matthew B, Hitzman Chuck, Borowsky Alexander D, Levenson Richard M, Lowe John B, Liu Scot D, Zhao Shuchun, et al. Multiplexed ion beam imaging of human breast tumors. Nature medicine, 20(4):436–442, 2014. - PMC - PubMed
    1. Goltsev Yury, Samusik Nikolay, Kennedy-Darling Julia, Bhate Salil, Hale Matthew, Vazquez Gustavo, Black Sarah, and Nolan Garry P. Deep profiling of mouse splenic architecture with codex multiplexed imaging. Cell, 174(4):968–981, 2018. - PMC - PubMed
    1. Chen Kok Hao, Boettiger Alistair N, Moffitt Jeffrey R, Wang Siyuan, and Zhuang Xiaowei. Spatially resolved, highly multiplexed rna profiling in single cells. Science, 348(6233):aaa6090, 2015. - PMC - PubMed
    1. Wang Xiao, Allen William E, Wright Matthew A, Sylwestrak Emily L, Samusik Nikolay, Vesuna Sam, Evans Kathryn, Liu Cindy, Ramakrishnan Charu, Liu Jia, et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science, 361(6400):eaat5691, 2018. - PMC - PubMed

Publication types