Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar;41(3):387-398.
doi: 10.1038/s41587-022-01476-y. Epub 2022 Oct 13.

Multi-omic single-cell velocity models epigenome-transcriptome interactions and improves cell fate prediction

Affiliations

Multi-omic single-cell velocity models epigenome-transcriptome interactions and improves cell fate prediction

Chen Li et al. Nat Biotechnol. 2023 Mar.

Abstract

Multi-omic single-cell datasets, in which multiple molecular modalities are profiled within the same cell, offer an opportunity to understand the temporal relationship between epigenome and transcriptome. To realize this potential, we developed MultiVelo, a differential equation model of gene expression that extends the RNA velocity framework to incorporate epigenomic data. MultiVelo uses a probabilistic latent variable model to estimate the switch time and rate parameters of chromatin accessibility and gene expression and improves the accuracy of cell fate prediction compared to velocity estimates from RNA only. Application to multi-omic single-cell datasets from brain, skin and blood cells reveals two distinct classes of genes distinguished by whether chromatin closes before or after transcription ceases. We also find four types of cell states: two states in which epigenome and transcriptome are coupled and two distinct decoupled states. Finally, we identify time lags between transcription factor expression and binding site accessibility and between disease-associated SNP accessibility and expression of the linked genes. MultiVelo is available on PyPI, Bioconda and GitHub ( https://github.com/welch-lab/MultiVelo ).

PubMed Disclaimer

Figures

Extended Data Fig. 1 |
Extended Data Fig. 1 |. Additional figures for mouse brain dataset.
a, Canonical marker gene expression for embryonic mouse brain cell types. b, Comparison of Cdh13 fits from scVelo and MultiVelo. An elevating transcription rate due to opening of chromatin produces a more linear fit and better captures the observed phase portrait. c, Scatterplot of gene likelihood against log total spliced count. Gene likelihood is not significantly affected by model assignment or trajectory type. Likelihood does increase with spliced count, as this usually indicates higher quality or highly variable genes. d, Switch times can be used to rank genes by the length of priming and decoupling intervals. Each range is scaled to 1 with outliers (n=1) removed. Top two rows: Histogram of priming intervals. Pbx3 and Celsr1 possess short and long priming phases, respectively. Bottom two rows: Histogram of decoupled intervals. While Rspo3 has a short decoupling phase with few cells within, Tgfbr1’s decoupling phase extends from RNA induction to RNA repression, and up to the end of the trajectory.
Extended Data Fig. 2 |
Extended Data Fig. 2 |. Additional figures for HSPC dataset.
a, Canonical marker gene expression for HSPCs. b, Cell cycle (S phase and G2M phase) scores and total unspliced ratio (U/(U+S)) plotted on UMAP coordinates. These factors were regressed out of the total RNA expression (but not the unspliced and spliced counts) during the preprocessing step as they do not appear to be cell-type or lineage specific. c, Box plots of histone modification levels from bulk ChIP-seq of FACS-purified HSCs (center line, median; box, Q1 and Q3; whiskers, 1.5x IQR; points, outliers). Each point in the box plot represents the sum of histone modification signal at chromatin accessibility peaks linked to a Model 1 or Model 2 gene. P-values are from a one-sided Wilcoxon rank-sum test. d, Velocity stream plot from MultiVelo analysis of Day 0 and Day 7 HSPC samples (Top). The majority of arrows go from Day 0 stem cells toward more differentiated Day 7 cells. UMAP coordinates colored by cell-type labels (Middle). UMAP coordinates colored by expression of CD133 (PROM1), an HSPC marker (Bottom).
Extended Data Fig. 3 |
Extended Data Fig. 3 |. Additional figures for human brain dataset.
a, Validation of the direction of MEF2C. Left: UMAP with cell types. Top: scVelo’s MEF2C fit produces inconsistency between gene time and global latent time. Bottom: MultiVelo’s results show consistent progression from nIPC to deeper layer (ExDp). b, DTW and UMAP results for EOMES and FOXP2 transcription factors. c, Additional motif DTW alignment results showing time lags between TF gene expression and corresponding motif accessibility. d, The accessibility of TF motifs binned across latent time. The latent time scale was split into 20 equalsized bins, and the average motif accessibility of cells in each bin was computed and plotted. The motif sequence logos (downloaded from jaspar2020.genereg. net) are shown next to the TF names. e, Time-lag analysis of transcription factors and the expression of their validated downstream target genes. Top: UMAP plots colored by TF and target gene expression. Bottom: Line plots of TF and target gene expression, with correspondences from DTW alignment shown as dotted lines. Magenta: TFs. Cyan: target genes.
Extended Data Fig. 4 |
Extended Data Fig. 4 |. Chromatin dynamics, Model 0, the necessity of chromatin preprocessing, and pre-fitting illustrations.
a, Chromatin dynamics illustration: chromatin opening and closing are modeled as asymptotically approaching fully opened (1) or fully closed (0) starting from any initial value. b, Chromatin accessibility change as a function latent time inferred by scVelo using only the RNA portion of the 10X multiome mouse brain dataset (colored by mouse brain cell types). Black lines connect the mean accessibilities within 20 equal-sized windows. The shapes of the ATAC trends are qualitatively very similar to the ODE model we propose. c, Simulation of Model 0 samples. The long delay between chromatin closing and transcription initiation is unlikely to happen in real biological systems. In the rare cases when high chromatin accessibility but low expression or high expression but low accessibility pattern is observed, it is likely due to technical issues such as dropout or background noise. d, The need for normalization as a preprocessing step for ATAC-seq. e, The need for smoothing as a preprocessing step for ATAC-seq. f, Chromatin accessibility results after peak-to-gene aggregation, TF-IDF normalization, and WNN smoothing. It is the same as Fig. S2E. g, Illustration of bi-modal expression pattern for complete genes. Cells at the lower quantile can be far apart in low-dimensional embedded expression space. h, Simplified illustration of model predetermination reasoning. The highest chromatin accessibility region appears in different RNA phases in M1 and M2 genes. i, Illustration of the internal unspliced modality rescaling factor initialization.
Extended Data Fig. 5 |
Extended Data Fig. 5 |. Simulation study to assess parameter estimation and model determination.
A total of 1000 genes were simulated with various parameters for both model 1 and model 2. a, C-U view of noiseless simulations of 2000 time-points in the 0–20 hr range. b, U-S view of noiseless simulations from A. c, 3D view of noiseless simulations from A. d, Noise added to simulated points to mimic real data. e, f, Model 1 and Model 2 fits for the same simulated gene (S17). The likelihood is higher under Model 1, consistent with the ground truth. e, Left: 3D view of the fit Model 1 trajectory colored by states, along with predicted switch time points. Middle: simulation with ground-truth switch times. Right: U-S view of fitted trajectory colored by log(c). f, Similar to e, but the fit shown is for Model 2 (the incorrect model). g, h, Model fits for simulated gene S41, similar to e and f, but this time, Model 2 is the ground truth model. MultiVelo correctly identifies the sample to be Model 2 with accurate switch time estimations. The model assignments of 985/1000 samples were correctly predicted based on likelihood.
Fig. 1 |
Fig. 1 |. Schematic of MultiVelo approach.
a, System of three ODEs summarizes the temporal relationship among c,u and s values during the gene expression process. b, Two different models (abbreviated as M1 and M2) describe two potential orderings of chromatin and RNA state changes. Chromatin accessibility starts to drop before transcriptional repression begins in M1, and the reverse happens in M2. c, Priming occurs when chromatin opens before transcription initiates. d, Decoupling occurs when chromatin closing and transcription repression begin at different times (example shown for model 1). e, Phase portraits predicted by the ODE model, showing the four possible states each gene can occupy. Gene expression and chromatin accessibility are coupled in the orange and blue states and decoupled in the red and green states. f,g, Simulated (c,u,s) values for a model 1 (f) and a model 2 (g) gene.
Fig. 2 |
Fig. 2 |. MultiVelo reveals two distinct mechanisms of gene regulation.
a, UMAP coordinates with stream plot of velocity vectors (left) and latent time (right) from MultiVelo. OPC, oligodendrocyte progenitor cells; Astro, astrocytes; V-SVZ, ventricular–subventricular zone. b, Stream plot of velocity vectors estimated from RNA only by scVelo. c, Cell cycle score indicating active dividing and cycling population (arrow). d, Chromatin values better separate differentiating cells when chromatin opening precedes transcription. e, RNA phase portraits (u versus s) colored by c values show clear differences between model 1 (left) and model 2 (right) genes. f, Additional phase portraits for the genes shown in panel e. g, Heatmaps of model 1 and model 2 gene expressions as a function of latent time. Color represents smoothed spliced counts. Model 2 genes tend to achieve highest expression earlier in latent time than model 1 genes. h, Relative proportion of each type of kinetics across all fit genes (n=865). Note that genes with partial kinetics (induction only or repression only) cannot be identified as model 1 or model 2. i, MultiVelo predicts 3D velocity vectors, which can be visualized as three-dimensional arrow plots.
Fig. 3 |
Fig. 3 |. MultiVelo captures epigenomic priming and decoupling in embryonic mouse brain.
a, Three-dimensional phase portraits overlaid with MultiVelo fits (solid lines) and inferred states (colors). Each point represents the (c,u,s) values observed for one gene in one cell. b, UMAP plots colored by c (left), u (middle) and state assignments (right) for genes predicted by MultiVelo to have notable priming or decoupling intervals. Regions with priming or decoupling are circled. c, Observed values for c (left), u (middle) and s (right) plotted as a function of latent time and colored by state assignment. Vertical lines indicate inferred switch times. d, UMAP plots colored by the number of genes in each cell assigned to each of the four states. e, Box plots summarizing the lengths of each of the four states across all fit genes (center line, median; box, Q1 and Q3; whiskers, 1.5× IQR; points, outliers). f, Box plot summarizing the ratio between chromatin closing rate αcc and opening rate αco across all fit genes.
Fig. 4 |
Fig. 4 |. MultiVelo quantifies epigenomic priming in mouse skin.
a, UMAP coordinates with stream plot of velocity vectors (left) and latent time (right) from MultiVelo. b, Velocity stream plot from RNA-only model (scVelo). c, Relative proportion of each type of kinetics across all fit genes (n=960). d, UMAP coordinates colored by c (left), u (middle) and s (right) values for Wnt3. e, Examples of genes showing priming or decoupling. Observed c (left), u (middle) and s (right) values plotted as a function of latent time and colored by state assignment. Vertical lines indicate inferred switch times. f, Dynamic time warping alignment of c and s values (top) and u and s values (middle) for Wnt3. Dotted gray lines indicate corresponding time points after alignment. Bottom panel shows instantaneous time lags computed by subtracting times of aligned time points from the previous two panels. Norm values, normalized values.
Fig. 5 |
Fig. 5 |. MultiVelo identifies priming in HSPCs.
a, UMAP coordinates with stream plot of velocity vectors inferred by MultiVelo (left) and an RNA-only model (scVelo; right). Cell types were annotated based on marker gene expression (Extended Data Fig. 2a). Prog., progenitors; MK, megakaryocyte. b, Relative proportion of each type of kinetics across all fit genes (n=936). c, Box plots summarizing the lengths of each of the four states across all fit genes (see Fig. 3e). d, Several G2/M cell cycle phase markers show model 2 expression pattern toward different lineages. e, Examples of genes showing priming or decoupling. Observed c,u and s values plotted as a function of latent time and colored by cell type. f, Corresponding velocity vectors of the same genes as in panel e. Cell velocities and times have been smoothed by RNA neighbors. Note that all velocity values are non-negative, and the lowest velocities are not necessarily at 0. dc/dt, chromatin velocity; du/dt, unspliced velocity; ds/dt, spliced velocity. g, RNA phase portraits of the same genes as in panels e and f.
Fig. 6 |
Fig. 6 |. MultiVelo infers epigenome and transcriptome dynamics in fetal human brain.
a, UMAP coordinates with stream plot of velocity vectors (left) and latent time (right) from MultiVelo. nIPC/ExN, intermediate progenitor cells/newborn excitatory neurons; ExUp, upper-layer neurons; SP, subplate; ExM, maturing neurons; RG/Astro, radial glia/astrocytes; ExDp, deep-layer neurons; Cyc., cycling progenitors; mGPC/OPC, multipotent glial progenitor cells/oligodendrocyte progenitor cells. b, Velocity streamplot from RNA-only model (scVelo). c, RNA phase portraits (u versus s) colored by c values show clear differences between model 1 (ROBO2) and model 2 (MEF2C) genes. Arrows indicate where chromatin closing begins. d, Relative proportion of each type of kinetics across all fit genes (n=747). e, Dynamic time warping alignment of TF gene expression and the accessibility of predicted binding sites for four TFs. Dotted gray lines indicate corresponding time points after alignment. Inset UMAPs colored by TF expression and motif accessibility are shown for two of the TFs, EGR1 and PBX3. f, Quantiles (q) of TF motif time lags inferred by DTW across all expressed TFs. The median time lag across TFs is positive at most times, indicating that TF expression generally precedes motif accessibility. g, Classification of SNPs according to the relationship between maximum accessibility time and time of maximum linked gene expression. The contour lines indicate density, and three main groups of SNPs are visible. Inset UMAP plots are shown for one example SNP from each group.

References

    1. Cao J et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019). - PMC - PubMed
    1. Street K et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genom. 19, 477 (2018). - PMC - PubMed
    1. Ji Z & Ji H Tscan: pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 44, 117 (2016). - PMC - PubMed
    1. Setty M et al. Characterization of cell fate probabilities in single-cell data with Palantir. Nat. Biotechnol. 37, 451–460 (2019). - PMC - PubMed
    1. Welch JD, Hartemink AJ & Prins JF SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data. Genome Biol. 17, 106 (2016). - PMC - PubMed

Publication types