Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 23;3(9):100388.
doi: 10.1016/j.xgen.2023.100388. eCollection 2023 Sep 13.

Unraveling dynamically encoded latent transcriptomic patterns in pancreatic cancer cells by topic modeling

Affiliations

Unraveling dynamically encoded latent transcriptomic patterns in pancreatic cancer cells by topic modeling

Yichen Zhang et al. Cell Genom. .

Abstract

Building a comprehensive topic model has become an important research tool in single-cell genomics. With a topic model, we can decompose and ascertain distinctive cell topics shared across multiple cells, and the gene programs implicated by each topic can later serve as a predictive model in translational studies. Here, we present a Bayesian topic model that can uncover short-term RNA velocity patterns from a plethora of spliced and unspliced single-cell RNA-sequencing (RNA-seq) counts. We showed that modeling both types of RNA counts can improve robustness in statistical estimation and can reveal new aspects of dynamic changes that can be missed in static analysis. We showcase that our modeling framework can be used to identify statistically significant dynamic gene programs in pancreatic cancer data. Our results discovered that seven dynamic gene programs (topics) are highly correlated with cancer prognosis and generally enrich immune cell types and pathways.

Keywords: RNA velocity; machine learning; pancreatic cancer; pancreatic ductal adenocarcinoma; single-cell RNA-seq; topic model; variational autoencoder.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Modeling single-cell transcription dynamics with sparse probabilistic topic models (A) BALSAM: given a raw gene expression count matrix, BALSAM learns cell topics to represent cell types or cell states using neural networks. The encoder transforms the expression space into a latent topic space through a stack of non-linear layers (NN1). The decoder (data-generative components) models single-cell data vectors as a probabilistic topic model (Dirichlet-multinomial). The Dirichlet parameters are modeled as a generalized linear model (with log link functions) as a linear combination of cell-topic-specific sparse factors ρ weighted by topic proportions θ. (B) Here, representative examples of gene expression dynamics in the PDAC data are shown as a scatterplot of the spliced and unspliced counts. The x axis: the spliced gene count (log1p transformed); the y axis: the unspliced gene counts (log1p transformed). The red dashed line indicates where the spliced and unspliced genes are of the same amount (not a steady state). (C) DeltaTopic: given the spliced and unspliced gene expression count matrices, DeltaTopic’s encoder layers embed a pair of the spliced and unspliced count vectors to latent space (NN1, NN2) and combine the information to form a shared latent space through a fusion layer (NN3). The decoder generates sparse gene factors—one for the static and the other for the dynamic ones—and constructs two gene-by-topic matrices, each corresponding to the spliced and the unspliced counts. The static topic matrix ρ sets a background level for the spliced and unspliced gene expressions. As for the spliced expression counts, the dynamic topic loading matrix is added to the static loading matrix to account for the divergence between the spliced and unspliced counts. (D) Model evaluation on held-out data likelihood (spliced and unspliced). The y axis: the average held-out data likelihood and 95% confidence interval; the x axis: sparsity probability prior.
Figure 2
Figure 2
DeltaTopic approach identifies disease-relevant cell topics, implicating putative causative regulatory programs (A) DeltaTopic model estimates topic proportions across 227,331 cells in the PDAC data. (B) Kaplan-Meier survival curves for 234 donors in ICGC data differentially correlated with the positive and negative topic-specific risk scores implicated by DeltaTopic gene factors. The p values are computed by log-rank test comparing positive and negative risk groups in survival probability. (C) A volcano plot summarizes the hazard ratios and p values testing the associations between topic-specific risk scores (derived from different topic models) and observed survival times across donors in three different cancer cohorts (ICGC-PDAC-US, ICGC-PDAC-CA, and ICGC-PDAC-AU; see the text). Each point represents an aggregated hazard ratio measure and a p value in the meta-analysis. The x axis: the hazard ratio estimate from the Cox proportional hazard model; the y axis: p values in negative log10 scale. Survival-relevant cell topics are colored red and blue for up- and down-regulation with respect to the PDAC survival time. The two vertical dashed lines correspond hazard ratio cutoff at ±1.5 × 10−3. The horizontal line marks the p value cutoff at 0.05.
Figure 3
Figure 3
DeltaTopic approach uncover both static and dynamic transcriptome patterns (A) The top heatmap: the static topic-by-gene parameters ρ; the bars on the left scale proportional to the size of each topic (log10 scale). As a comparison, the bottom heatmap indicates marker genes for the ten cell types assigned by the original PDAC study.. (B) Gene set enrichment analysis of dynamic loading matrix. B1, ImmuneSig gene sets; B2, KEGG gene sets; B3, Hallmark gene sets. All three gene sets are from the MsigDB database., For brevity, only significant gene sets and their corresponding cell topics are displayed.
Figure 4
Figure 4
Velocities derived from the DeltaTopic identify distinct cell trajectories for disease development and cell-type differentiation (A) Each segment corresponds to each cell uniformly sampled in each topic at a 0.5% rate to avoid visual clutter. The length of each segment (colored red, blue, and gray) scales proportionally to the estimated velocity projected onto two principal-component axes. We highlight cells constituting several disease-relevant topics identified by the previous survival analysis. We colored the cells red and blue according to their membership in the up-regulated and down-regulated topics, respectively. (B) The same velocity plot colored by different cell types.
Figure 5
Figure 5
Benchmark results confirm that DeltaTopic and BALSAM can accurately predict cell-type labels and recapitulate true dynamic and static gene programs (A) Normalized mutual information (NMI) scores between the predicted labels and true labels. The mean NMI scores and 95% confidence intervals are plotted for each method. (B) Mean precision scores for static and dynamic gene activity identification. The mean and 95% confidence intervals are plotted for each method.

Similar articles

Cited by

References

    1. La Manno G., Soldatov R., Zeisel A., Braun E., Hochgerner H., Petukhov V., Lidschreiber K., Kastriti M.E., Lönnerberg P., Furlan A., et al. RNA velocity of single cells. Nature. 2018;560:494–498. - PMC - PubMed
    1. Bergen V., Lange M., Peidli S., Wolf F.A., Theis F.J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 2020;38:1408–1414. - PubMed
    1. Gorin G., Fang M., Chari T., Pachter L. RNA velocity unraveled. PLoS Comput. Biol. 2022;18 - PMC - PubMed
    1. Gu Y., Blaauw D.T., Welch J. In: Proceedings of the 39th international conference on machine learning Proceedings of machine learning research. Chaudhuri K., Jegelka S., Song L., Szepesvari C., Niu G., Sabato S., editors. PMLR; 2022. Variational mixtures of ODEs for inferring cellular gene expression dynamics; pp. 7887–7901.
    1. Bergen V., Soldatov R.A., Kharchenko P.V., Theis F.J. RNA velocity—current challenges and future perspectives. Mol. Syst. Biol. 2021;17 - PMC - PubMed

LinkOut - more resources