Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 17;23(1):bbab466.
doi: 10.1093/bib/bbab466.

SC-MEB: spatial clustering with hidden Markov random field using empirical Bayes

Affiliations

SC-MEB: spatial clustering with hidden Markov random field using empirical Bayes

Yi Yang et al. Brief Bioinform. .

Abstract

Spatial transcriptomics has been emerging as a powerful technique for resolving gene expression profiles while retaining tissue spatial information. These spatially resolved transcriptomics make it feasible to examine the complex multicellular systems of different microenvironments. To answer scientific questions with spatial transcriptomics and expand our understanding of how cell types and states are regulated by microenvironment, the first step is to identify cell clusters by integrating the available spatial information. Here, we introduce SC-MEB, an empirical Bayes approach for spatial clustering analysis using a hidden Markov random field. We have also derived an efficient expectation-maximization algorithm based on an iterative conditional mode for SC-MEB. In contrast to BayesSpace, a recently developed method, SC-MEB is not only computationally efficient and scalable to large sample sizes but is also capable of choosing the smoothness parameter and the number of clusters. We performed comprehensive simulation studies to demonstrate the superiority of SC-MEB over some existing methods. We applied SC-MEB to analyze the spatial transcriptome of human dorsolateral prefrontal cortex tissues and mouse hypothalamic preoptic region. Our analysis results showed that SC-MEB can achieve a similar or better clustering performance to BayesSpace, which uses the true number of clusters and a fixed smoothness parameter. Moreover, SC-MEB is scalable to large 'sample sizes'. We then employed SC-MEB to analyze a colon dataset from a patient with colorectal cancer (CRC) and COVID-19, and further performed differential expression analysis to identify signature genes related to the clustering results. The heatmap of identified signature genes showed that the clusters identified using SC-MEB were more separable than those obtained with BayesSpace. Using pathway analysis, we identified three immune-related clusters, and in a further comparison, found the mean expression of COVID-19 signature genes was greater in immune than non-immune regions of colon tissue. SC-MEB provides a valuable computational tool for investigating the structural organizations of tissues from spatial transcriptomic data.

Keywords: cell phenotype; empirical Bayes; expectation-maximization algorithm; hidden Markov random field; spatial transcriptomics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
SC-MEB workflow. A. The SC-MEB workflow mainly comprises the following steps, data preprocessing, spatial clustering using the hidden MRF model, a series of downstream analyses. B. Data preprocessing: log-transformation, dimension reduction. C. The hidden MRF model. For the Visium dataset, we used six neighborhoods for each spot. D. The SC-MEB outputs: a scatter plot of MBIC for all K, a tissue plot with spots colored by clustering, a heatmap of DEGs.
Figure 2
Figure 2
Summary of clustering accuracy of the six methods in the analysis of simulated data. A. Example 1, Gaussian: PCs were sampled from a GMM. B. Example 1, t: PCs were sampled from a Student’s-formula image mixture model. C. Example 2, Gaussian: PCs were sampled from a GMM. D. Example 2, t: PCs were sampled from a Student’s-formula image mixture model.
Figure 3
Figure 3
The computation time of SC-MEB increases linearly with sample size. The number of iterations was set to 50 for the different sample sizes.
Figure 4
Figure 4
Clustering results for a colon sample. (A) Original H&E-stained tissue image for the colon sample. (BF) Heatmaps for clustering assignments in the colon sample using the proposed SC-MEB, BayesSpace, Giotto, Louvain and GMM, respectively. The eight clusters identified included two stromal regions, a muscle region, two epithelial-cell regions and three immune-cell regions.
Figure 5
Figure 5
Heatmaps of normalized expression of signature genes identified in the DE analysis based on two clustering analysis methods: (A) SC-MEB and (B) BayesSpace. In both subfigures, S1 and S2 represent Stroma 1 and 2, respectively; M is Muscle; E1 and E2 are Epithelial 1 and 2, respectively; and I1, I2 and I3 are Immune 1, 2 and 3, respectively.
Figure 6
Figure 6
Boxplots of mean expression of COVID-19 signature genes in immune and non-immune regions.

References

    1. Alon S, Goodwin DR, Sinha A, et al. Expansion sequencing: Spatially precise in situ transcriptomics in intact biological systems. Science 2021; 371(6528). - PMC - PubMed
    1. Besag J. Spatial interaction and the statistical analysis of lattice systems. J R Stat Soc B Methodol 1974; 36(2): 192–225.
    1. Besag J. On the statistical analysis of dirty pictures. J R Stat Soc B Methodol 1986; 48(3): 259–79.
    1. Bishop CM. Pattern recognition and machine learning. springer, 2006.
    1. Burgess DJ. Spatial transcriptomics coming of age. Nat Rev Genet 2019; 20(6): 317–7. - PubMed

Publication types