Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 23;25(1):55.
doi: 10.1186/s13059-024-03180-3.

SHARE-Topic: Bayesian interpretable modeling of single-cell multi-omic data

Affiliations

SHARE-Topic: Bayesian interpretable modeling of single-cell multi-omic data

Nour El Kazwini et al. Genome Biol. .

Abstract

Multi-omic single-cell technologies, which simultaneously measure the transcriptional and epigenomic state of the same cell, enable understanding epigenetic mechanisms of gene regulation. However, noisy and sparse data pose fundamental statistical challenges to extract biological knowledge from complex datasets. SHARE-Topic, a Bayesian generative model of multi-omic single cell data using topic models, aims to address these challenges. SHARE-Topic identifies common patterns of co-variation between different omic layers, providing interpretable explanations for the data complexity. Tested on data from different technological platforms, SHARE-Topic provides low dimensional representations recapitulating known biology and defines associations between genes and distal regulators in individual cells.

Keywords: Bayesian modeling; Gene regulation; Gene regulator in cancer; Interpretability; Lymphoma; Single-cell multi-omics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Workflow of SHARE-Topic: the scATAC-seq binary data and the expression matrix of the scRNA-seq data are fed to SHARE-Topic. SHARE-Topic extracts latent representation in topic space for each cell, gene, and region in the data. The latent representation of the cells is used to visualize the heterogeneity in cell types using Umap. The latent representations of genes and regions are used to extract biological interactions between genes and regions that shape the regulatory mechanisms in the cells
Fig. 2
Fig. 2
The graphical model of SHARE-Topic: graphical representation of the SHARE-Topic model illustrating the interrelationships between latent topics and observed gene expression reads (ngc) and chromatin regions observed (rc). The model depicts the interactions on a given cell c between its transcriptomic profile and accessible chromatin region profile. These observations, according to SHARE-Topic are generated in the following way: each cell c is a different mixture of topics (θtc). Given a contribution of a certain topic t, there is a likelihood to observe a gene count in the cell ngc sampled from a Poisson distribution with an expected number of reads λgt. On the other side also for a given topic t contribution in a cell, the likelihood of finding a region rc open is ϕrt. The priors are shown in the model at the top layer and descend down in a hierarchical fashion to observations
Fig. 3
Fig. 3
UMAP embedding of SHARE-Topic based on cell-topic distribution(θc). a SHARE-seq mouse brain data set embedding of 2781 cells from topic space of dimension 30. b SHARE-seq mouse skin data set embedding of 27,782 cells from topic space of dimension 60. c B-cell lymphoma data set embedding of 14,566 cells from topic space of dimension 45. d SNARE-seq mouse cortex data set of 9161 cells embedded in 50 dimensions. e 10x Genomics human PBMC10k of 9631 cells embedded in 45 dimensions
Fig. 4
Fig. 4
a Umap embedding of the B-lymphoma dataset showing the enrichment of topic 13 across cells. Topic 13 is relatively highly enriched in the tumor B cells. This can be an indication that topic 13 captures biological processes specific to B-lymphoma. b SHARE-Topic score Prg IN B-lymphoma dataset for gene-region pairs at a distance d of the region from the starting site of the gene (GSS). The regions are selected such that they are on window 105 from the gene. The SHARE-Topic score captures distance dependence. The score decays when going far from the GSS
Fig. 5
Fig. 5
a Analysis of the activity of the enhancer in gene FOXP1. According to the SCREEN database, the regions are intersected to distal enhancer-like sites and sometimes also CTCF-bound sites. The SHARE-Topic score is scattered on the open chromatin regions (annotated as enhancers) from the B-lymphoma dataset. The enhancer regions shown are located within a window of 105 within and around FOXP1. The curve is fitted by taking the average of the SHARE-Topic score on intervals of length 103 nucleotide. According to the SHARE-Topic score, the enhancers at the starting site of FOXP1 are shown to have a higher contribution to the gene activity. b SHARE-Topic score for regions at distance 105 from two genes (CD35, XRCC5) in the B-lymphoma dataset. The regions are annotated using the SCREEN database as promoter-like sites (PLS), CTCF-bound sites, and proximal/distal enhancer-like sites (p/dELS)

Similar articles

Cited by

References

    1. The ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. - DOI - PMC - PubMed
    1. Regev A, Teichmann SA, Lander ES, Amit I, Benoist C, Birney E, Bodenmiller B, Campbell P, Carninci P, Clatworthy M, et al. Hum Cell Atlas elife. 2017;6:e27041. - PMC - PubMed
    1. Chen S, Lake BB, Zhang K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat Biotechnol. 2019;37(12):1452–1457. doi: 10.1038/s41587-019-0290-0. - DOI - PMC - PubMed
    1. Cao J, Cusanovich DA, Ramani V, Aghamirzaie D, Pliner HA, Hill AJ, Daza RM, McFaline-Figueroa JL, Packer JS, Christiansen L, et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science. 2018;361(6409):1380–1385. doi: 10.1126/science.aau0730. - DOI - PMC - PubMed
    1. Zhu C, Yu M, Huang H, et al. An ultra high-throughput method for single-cell joint analysis of open chromatin and transcriptome. Nat Struct Mol Biol. 2019;26(11):1063–1070. doi: 10.1038/s41594-019-0323-x. - DOI - PMC - PubMed

Publication types

LinkOut - more resources