Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Feb;21(1):24-47.
doi: 10.1016/j.gpb.2022.10.001. Epub 2022 Oct 14.

Computational Approaches and Challenges in Spatial Transcriptomics

Affiliations
Review

Computational Approaches and Challenges in Spatial Transcriptomics

Shuangsang Fang et al. Genomics Proteomics Bioinformatics. 2023 Feb.

Abstract

The development of spatial transcriptomics (ST) technologies has transformed genetic research from a single-cell data level to a two-dimensional spatial coordinate system and facilitated the study of the composition and function of various cell subsets in different environments and organs. The large-scale data generated by these ST technologies, which contain spatial gene expression information, have elicited the need for spatially resolved approaches to meet the requirements of computational and biological data interpretation. These requirements include dealing with the explosive growth of data to determine the cell-level and gene-level expression, correcting the inner batch effect and loss of expression to improve the data quality, conducting efficient interpretation and in-depth knowledge mining both at the single-cell and tissue-wide levels, and conducting multi-omics integration analysis to provide an extensible framework toward the in-depth understanding of biological processes. However, algorithms designed specifically for ST technologies to meet these requirements are still in their infancy. Here, we review computational approaches to these problems in light of corresponding issues and challenges, and present forward-looking insights into algorithm development.

Keywords: Computational approach; Data interpretation; Data quality; Multi-omics integration; Spatial transcriptomics.

PubMed Disclaimer

Conflict of interest statement

All authors are current employees of BGI Group Ltd.

Figures

Figure 1
Figure 1
The main sections inSTdata analysis Schematic of the five critical sections involved in ST data analysis, including (1) big data acquisition, visualization, storage, and access; (2) data quality control; (3) single cell-level and tissue-level definition; (4) tissue-wide data interpretation; and (5) spatial multi-omics integration. ST, spatial transcriptomics; ISH, in situ hybridization; ISS, in situ sequencing; SVG, spatial variable gene; CCI, cell–cell interaction; 3D, three-dimensional.
Figure 2
Figure 2
Data computing to obtain gene expression information from raw sequencing or image data The raw sequencing data generated from spatial barcode-based transcriptomics technologies contain two kinds of sequence information, barcodes and RNA sequences. The barcodes are mapped back to the spatial location, and corresponding RNA sequences are aligned with a genome reference. UMI counting is performed to count the number of aligned genes belonging to each cell or spot, and the gene expression profile is generated. The imaging data generated using ISS-based or ISH-based technologies can be transformed into images containing RNA signals by alignment or decoding. Image segmentation is used to isolate the RNA signals and each RNA is assigned to each spot or cell afterwards. Gene expression information with spatial locations is obtained. UMI, unique molecular identifier.
Figure 3
Figure 3
Data preprocessing for ST data quality control Data quality assessment is the first step to evaluate and determine whether the data are ready for the next analysis. Image data with high resolution and expression data with enough supported reads, including valid reads mapped to the tissue-covered region (valid tissue reads), clean reads, unique mapping reads, and the total gene number, are checked. Then, the data after tissue segmentation, which contain cells/bins and genes, is further screened to eliminate low-expression cells/spots and genes. Finally, data normalization and data imputation are applied to improve the data quality.
Figure 4
Figure 4
Cell- and tissue-level annotationsfor high- or low-resolution data There are mainly two solutions for cell annotation of the ST data: marker gene-based annotation and reference-based annotation. Marker gene-based annotation includes cell region definition, cell clustering, and cell cluster annotation. With this solution, the ST data with subcellular resolution are used to perform cell identification. This is implemented by cell segmentation based on stained images to obtain the cell boundaries and to identify the areas with relatively concentrated gene expression signals. For low-resolution data, computational approaches can be applied to enhance the resolution. Clustering and annotation based on marker genes are then applied to provide more comprehensive definitions of the cells. Reference-based annotation integrates scRNA-seq data to annotate cell types or cell-type composition by deconvolution for low-resolution ST data or by direct mapping for high-resolution ST data. Tissue architecture annotation can be further applied to obtain the tissue architecture based on cell clustering information. scRNA-seq, single-cell RNA sequencing; DEG, differentially expressed gene; HR, high resolution. LR, low resolution.
Figure 5
Figure 5
Tissue-wide data interpretation includes single or multiple slices in two- or three-dimensional space A. Cell communication and gene behavior can be predicted based on spatial gene expression. The receptors and ligands can be identified to further predict cell communication, and key signaling genes and SVGs can be identified as another aspect to study cell behaviors. B. Trajectory analysis and RNA velocity are used to predict the cell state dynamics and infer the cell fate particularly in time-series slices to study cell development. C. 3D reconstruction can be divided into three steps, namely tissue segmentation, registration, and 3D visualization. A 3D view of the tissue makes it possible to perform the aforementioned data mining from 2D to 3D data. 2D, two-dimensional.

References

    1. Raj A., van den Bogaard P., Rifkin S.A., van Oudenaarden A., Tyagi S. Imaging individual mRNA molecules using multiple singly labeled probes. Nat Methods. 2008;5:877–879. - PMC - PubMed
    1. Chen K.H., Boettiger A.N., Moffitt J.R., Wang S., Zhuang X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science. 2015;348:aaa6090. - PMC - PubMed
    1. Lubeck E., Coskun A.F., Zhiyentayev T., Ahmad M., Cai L. Single-cell in situ RNA profiling by sequential hybridization. Nat Methods. 2014;11:360–361. - PMC - PubMed
    1. Eng C.H.L., Lawson M., Zhu Q., Dries R., Koulena N., Takei Y., et al. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH+ Nature. 2019;568:235–239. - PMC - PubMed
    1. Lee J.H., Daugharthy E.R., Scheiman J., Kalhor R., Ferrante T.C., Terry R., et al. Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression profiling in intact cells and tissues. Nat Protoc. 2015;10:442–458. - PMC - PubMed