Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 19;13(1):385.
doi: 10.1038/s41467-022-28020-5.

Advances in mixed cell deconvolution enable quantification of cell types in spatial transcriptomic data

Affiliations

Advances in mixed cell deconvolution enable quantification of cell types in spatial transcriptomic data

Patrick Danaher et al. Nat Commun. .

Abstract

Mapping cell types across a tissue is a central concern of spatial biology, but cell type abundance is difficult to extract from spatial gene expression data. We introduce SpatialDecon, an algorithm for quantifying cell populations defined by single cell sequencing within the regions of spatial gene expression studies. SpatialDecon incorporates several advancements in gene expression deconvolution. We propose an algorithm harnessing log-normal regression and modelling background, outperforming classical least-squares methods. We compile cell profile matrices for 75 tissue types. We identify genes whose minimal expression by cancer cells makes them suitable for immune deconvolution in tumors. Using lung tumors, we create a dataset for benchmarking deconvolution methods against marker proteins. SpatialDecon is a simple and flexible tool for mapping cell types in spatial gene expression studies. It obtains cell abundance estimates that are spatially resolved, granular, and paired with highly multiplexed gene expression data.

PubMed Disclaimer

Conflict of interest statement

All authors were employees and shareholders of NanoString Technologies while performing this work. No authors had nonfinancial competing interests. J.M.B. is listed on the relevant U.S. patent 10640816, “Simultaneous quantification of gene expression in a user-defined region of a cross-sectioned tissue”.

Figures

Fig. 1
Fig. 1. Overview of algorithm and advancements to the deconvolution field.
The image summarizes the deconvolution workflow. Text boxes summarize developments proposed in this manuscript. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. Comparison of deconvolution algorithms in mixtures of two cell lines.
The cell lines HEK293T and CCRF-CEM were mixed in varying proportions and profiled with the GeoMx platform. a True mixing proportions plotted against estimates from four deconvolution algorithms: non-negative least squares (NNLS), v-support vector regression (v-SVR), Dampened Weighted Least Squares regression (DWLS), and log-normal regression. b Influence of each gene on the deconvolution result from a single cell pellet with a 50–50 mix. Point size shows the change in estimated mixing proportion when each gene is removed. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Genes’ proportions of cancer cell-derived expression in tumors.
a For each cancer type, density of genes’ percent of transcripts attributed to cancer cells. b For all genes in all cancer types, estimated percent of transcripts attributed to cancer cells. c Averaged across all non-immune tumors, genes’ mean expression and percent of transcripts attributed to cancer cells. Panels show gene lists from CIBERSORT, EPIC, MCP-counter, quanTIseq, Timer, xCELL, Danaher (2017), and SafeTME, the tumor-immune deconvolution cell profile matrix developed here. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Benchmarking of immune deconvolution against the expression of canonical marker proteins.
a Expression of marker proteins (horizontal axis) against cell abundance estimates from the application of SpatialDecon to gene expression data (vertical axis). Each column of panels shows results from a single protein/cell pair; each row shows results from a different lung tumor. Tumor segments are shown in blue, microenvironment segments in red. b Pearson correlation between protein and cell abundance estimates for different deconvolution algorithms. NNLS non-negative least squares, v-SVR v-support vector regression, DWLS Dampened Weighted Least Squares. CD8 T cells and macrophages are unavailable under the Stereoscope + scRNA-seq lung profiles method. c Mean correlations between deconvolution methods and protein expression. Lines show 95% confidence intervals from n = 30 correlations. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Immune cell deconvolution in 191 microenvironment segments of a NSCLC tumor.
a Image of the tumor, with segments superimposed. Green = Pan-cytokeratin+ (tumor) segments; red = Pan-cytokeratin (microenvironment) segments. b Color key for panels c, d, f, and g. c Abundance estimates of 18 cell types in the microenvironment segments within 191 regions of the tumor. Wedge size is proportional to estimated cell counts. d Abundance estimates of 12 cell populations in microenvironment segments. Point size is proportional to estimated cell counts within each panel; scale of point size is not consistent across panels. e Dendrogram showing clustering of microenvironment segments’ abundance estimates. f Proportions of cell populations in microenvironment segments. g, h Estimated absolute numbers of cell populations in microenvironment segments. i Spatial distribution of microenvironment segment clusters. Point color indicates cluster from (e); point size is proportional to total estimated immune and stromal cells in microenvironment segments. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. Results of reverse deconvolution in a NSLCL tumor.
a Schematic of reverse deconvolution approach: gene expression is predicted from cell abundance estimates using the SpatialDecon algorithm, obtained fitted values and residuals. b Genes’ dependence on cell mixing. The horizontal axis shows Pearson correlation between observed expression and fitted expression based on cell abundance. Vertical axis shows the standard deviation of the log2-scale residuals from the reverse deconvolution fit. c Example genes from the extremes of the space of panel (b) are shown, with observed expression (vertical axis) plotted against fitted expression (horizontal axis). Color scale applies to panels c, d, e, f, h, i. df For CXCL13, LYZ and CCL17, observed expression is plotted against fitted expression (left), and observed expression is plotted in the space of the tissue (right). In all panels, point color indicates residuals. In panels on the right, point size is proportional to observed expression level. g Pearson correlation matrices of genes in log-scale normalized data (top) and in residual space (below). h, i Spatial expression of gene clusters defined by high correlation in residuals of reverse deconvolution. Wedge color shows genes’ residual values; wedge size is proportional to genes’ expression levels. Source data are provided as a Source Data file.

References

    1. Merritt CR, et al. Multiplex digital spatial profiling of proteins and RNA in fixed tissue. Nat. Biotechnol. 2020;38:586–599. - PubMed
    1. Ståhl PL, et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 2016;353:78–82. - PubMed
    1. Newman AM, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods. 2015;12:453–457. - PMC - PubMed
    1. Wang X, Park J, Susztak K, Zhang NR, Li M. Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat. Commun. 2019;10:1–9. - PMC - PubMed
    1. Monaco G, et al. RNA-Seq signatures normalized by mRNA abundance allow absolute deconvolution of human immune cell types. Cell Rep. 2019;26:1627–1640. - PMC - PubMed