Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 21;49(9):e50.
doi: 10.1093/nar/gkab043.

SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes

Affiliations

SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes

Marc Elosua-Bayes et al. Nucleic Acids Res. .

Abstract

Spatially resolved gene expression profiles are key to understand tissue organization and function. However, spatial transcriptomics (ST) profiling techniques lack single-cell resolution and require a combination with single-cell RNA sequencing (scRNA-seq) information to deconvolute the spatially indexed datasets. Leveraging the strengths of both data types, we developed SPOTlight, a computational tool that enables the integration of ST with scRNA-seq data to infer the location of cell types and states within a complex tissue. SPOTlight is centered around a seeded non-negative matrix factorization (NMF) regression, initialized using cell-type marker genes and non-negative least squares (NNLS) to subsequently deconvolute ST capture locations (spots). Simulating varying reference quantities and qualities, we confirmed high prediction accuracy also with shallowly sequenced or small-sized scRNA-seq reference datasets. SPOTlight deconvolution of the mouse brain correctly mapped subtle neuronal cell states of the cortical layers and the defined architecture of the hippocampus. In human pancreatic cancer, we successfully segmented patient sections and further fine-mapped normal and neoplastic cell states. Trained on an external single-cell pancreatic tumor references, we further charted the localization of clinical-relevant and tumor-specific immune cell states, an illustrative example of its flexible application spectrum and future potential in digital pathology.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Spatial Transcriptomics (ST) technologies generate gene expression profiles while retaining the tissue context. However, most methods lack single-cell resolution and encompass multiple cells within their capture sites (spots). SPOTlight uses seeded NMF regression to integrate single-cell RNA sequencing and ST datasets. SPOTlight learns topic signatures from single-cell data and finds the optimal weighted combinations of cell types to explain a spot's cellular composition.
Figure 1.
Figure 1.
SPOTlight scheme. Step-by-step illustration of SPOTlight's algorithm. At the beginning of this process we have a count matrix, V, for scRNAseq data and a set of marker genes for the identified cell types. First, we use prior information to initialize the basis and coefficient matrices, W and H respectively. We assume the number of topics, k, to be equal to the number of cell types in the dataset. Each topic is then associated with a cell type; columns in W are initialized with marker genes for the associated cell type with that topic, while rows in H are initialized with the membership of each cell to its associated topic. Second, we proceed with the matrix factorization from which we obtain gene distributions for each topic in W, and topic profiles for each cell in H. Third, we use W to map the ST data, V’, by means of non-negative least squares (NNLS) to obtain H’. Columns in H’ represent the topic profile for each spot. Fourth, from the H matrix obtained from the scRNAseq data we consolidate all the cells from the same cell type to obtain cell type-specific topic profiles. Lastly, we use NNLS to find which combination of cell type-specific topics resembles each spot's topic profile.
Figure 2.
Figure 2.
Benchmarking SPOTlight under different technical conditions and parameter optimization. In the classification metrics, the mean of 10 iterations is shown; when assessing JSD we show the results for each iteration. (A) The performance of SPOTlight was assessed with different scRNA-seq protocols to identify optimal input protocols (20 cells per cell type). (B) Benchmarking SPOTlight on the same dataset downsampled to different sequencing depths (20 cells per cell type). Performance improved at higher depths, but no sharp decrease at shallow sequencing depths was observed. (C) Optimizing the number of cells per cell type used to train the SPOTlight model. Peak performance was obtained using 100 cells per cell type, less cells decreased performance while more cells increased computational time without improving performance. (D) Optimizing the gene sets used to train the SPOTlight model. Optimal performance was obtained when using the union gene set between marker genes and 3000 highly variable genes (HVG). The unsupervised approach using only the 3000 HVG performed the worst. (E) Benchmarking classification performance of bulk and single-cell deconvolution tools on 1000 synthetic mixtures. SPOTlight proofed to be the most accurate and with the highest F1 score. (F) Proportion prediction performance of the different deconvolution tools on 1000 synthetic mixtures.
Figure 3.
Figure 3.
Cell type mapping on sagittal adult mouse brain anterior and posterior slices. (A) Spatial scatter pie plot representing the proportions of the cells from the reference atlas within capture locations in the adult mouse brain; we can observe the substructures of anatomical regions in the brain as defined by their specific cell types. (B) Proportions of the cortical cells from the reference atlas within capture locations; SPOTlight is able to capture the cortical structure being able to discern between highly similar neuronal cell types. (C–J) Proportion within each capture location of each specific cortical neuron type.
Figure 4.
Figure 4.
Mapping cell subpopulations across the tissue, charting tumoral and immune cell distribution on the tissue to identify differential immune microenvironments in tumoral versus non-tumoral regions. (A) UMAP projections of 1926 cells from PDAC-A, paired data from tissue slices. Cells are colored and labelled according to the cell type annotations from the original paper. (B) Spatial scatter pie plot representing the proportions of the cell types in the paired inDrop dataset within the capture locations. (C) Predicted proportion within each capture location for cancer clones S100A4 and TM4SF1 and centroacinar and hypoxic ductal cells. (D) UMAP projections of pancreatic immune reference cells mapped onto PDAC-A ST1. (E) Spatial scatter pie plot representing the proportions of the immune cells within the capture locations. (F) Predicted proportion within each capture spot for proliferative T-cells, pre-exhausted CD8 cells as well as proinflammatory and M2 TAMs. (G) Tissue stratification by tumoral - non-tumoral capture locations, stratification coincides with pathologist's annotation. (H) Cell type proportion comparison within each spot between tumoral and non-tumoral sections. (I) Proportion of capture locations containing each immune cell type within the tumoral and non-tumoral sections.

References

    1. Berglund E., Maaskola J., Schultz N., Friedrich S., Marklund M., Bergenstråhle J., Tarish F., Tanoglidi A., Vickovic S., Larsson L. et al. Spatial maps of prostate cancer transcriptomes reveal an unexplored landscape of heterogeneity. Nat. Commun. 2018; 9:2419. - PMC - PubMed
    1. Moncada R., Barkley D., Wagner F., Chiodin M., Devlin J.C., Baron M., Hajdu C.H., Simeone D.M., Yanai I. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol. 2020; 38:333–342. - PubMed
    1. Thrane K., Eriksson H., Maaskola J., Hansson J., Lundeberg J. Spatially resolved transcriptomics enables dissection of genetic heterogeneity in stage III cutaneous malignant melanoma. Cancer Res. 2018; 78:5970–5979. - PubMed
    1. Maniatis S., Äijö T., Vickovic S., Braine C., Kang K., Mollbrink A., Fagegaltier D., Andrusivová Ž., Saarenpää S., Saiz-Castro G. et al. Spatiotemporal dynamics of molecular pathology in amyotrophic lateral sclerosis. Science. 2019; 364:89–93. - PubMed
    1. Asp M., Giacomello S., Larsson L., Wu C., Fürth D., Qian X., Wärdell E., Custodio J., Reimegård J., Salmén F. et al. A spatiotemporal organ-wide gene expression and cell atlas of the developing human heart. Cell. 2019; 179:1647–1660. - PubMed

Publication types