Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep;18(9):e11129.
doi: 10.15252/msb.202211129.

Probing cell identity hierarchies by fate titration and collision during direct reprogramming

Affiliations

Probing cell identity hierarchies by fate titration and collision during direct reprogramming

Bob A Hersbach et al. Mol Syst Biol. 2022 Sep.

Abstract

Despite the therapeutic promise of direct reprogramming, basic principles concerning fate erasure and the mechanisms to resolve cell identity conflicts remain unclear. To tackle these fundamental questions, we established a single-cell protocol for the simultaneous analysis of multiple cell fate conversion events based on combinatorial and traceable reprogramming factor expression: Collide-seq. Collide-seq revealed the lack of a common mechanism through which fibroblast-specific gene expression loss is initiated. Moreover, we found that the transcriptome of converting cells abruptly changes when a critical level of each reprogramming factor is attained, with higher or lower levels not contributing to major changes. By simultaneously inducing multiple competing reprogramming factors, we also found a deterministic system, in which titration of fates against each other yields dominant or colliding fates. By investigating one collision in detail, we show that reprogramming factors can disturb cell identity programs independent of their ability to bind their target genes. Taken together, Collide-seq has shed light on several fundamental principles of fate conversion that may aid in improving current reprogramming paradigms.

Keywords: Collide-seq; direct reprogramming; fate conversion; reprogramming; scRNA-seq.

PubMed Disclaimer

Figures

Figure 1
Figure 1. A multiplexed strategy for the comparison and collision of cell fate conversions
  1. A

    Schematic depiction of Waddington's landscape highlighting key transcription factors during development, induction of pluripotency, and direct reprogramming.

  2. B

    Schematic outline describing key aspects of Collide‐seq including (i) PiggyBac vectors used (top left), (ii) multiplexing of reprogramming factors (top right), (iii) selection of positive cells (bottom right) (iv) pooling of cells and transgene induction (bottom middle) and (v) scRNA‐seq (bottom left).

  3. C

    Overview of all experimental conditions included in the Collide‐seq experiment including single, double, and triple transcription factor combinations.

  4. D

    Representative immunofluorescence images of MEFs 48 h after transgene induction stained for the indicated reprogramming factors. Scale bar represents 50 μm (n = 3 biological replicates).

  5. E

    Log fluorophore expression superimposed on Uniform Manifold Approximation and Projection (UMAP) embedding of the dataset. Shown are fluorophore expression levels (gene exp.) on a logarithmic (ln) scale.

  6. F

    Log transgene expression superimposed on a UMAP embedding of the dataset. Shown are transgene expression levels (gene exp.) on a logarithmic (ln) scale.

  7. G

    UMAP embedding of scRNA‐seq dataset colored by technical replicate. Orange: Replicate 1, Blue: Replicate 2.

  8. H

    Cell‐cycle score superimposed on UMAP embedding of the dataset. See Unsupervised analysis of single‐cell RNA‐seq section in Materials and Methods for further details.

Source data are available online for this figure.
Figure EV1
Figure EV1. Analysis of transgene induction and detection
  1. A

    Quantification of transgene induction using qRT–PCR. Shown is the mean fold change of doxycycline‐induced samples over noninduced samples for the indicated transcription factors after 48 h (n = 2/3). Error bars represent 95% confidence interval. Primers used for qPCR are listed in Appendix Table S2.

  2. B

    Fraction of UMIs mapped to transgenic allele within cells expressing a single reprogramming factor. Reads for transgenic transcripts were distinguished from endogenous transcripts using a custom annotation based on single nucleotide polymorphisms (See Generation of a modified gene annotation section in Materials and Methods for further details). Shown are 15,768 cells per barplot with the 95% confidence interval as error bars.

  3. C

    Uniform Manifold Approximation and Projection (UMAP) embedding of scRNA‐seq data colored by technical replicate (Replicate 1: blue, Replicate 2: orange) after ambient RNA correction.

  4. D

    Cell‐cycle score superimposed on UMAP embedding of ambient corrected data (See Unsupervised analysis of single‐cell RNA‐seq data in Materials and Methods).

  5. E

    Log fluorophore expression superimposed on a UMAP embedding after ambient RNA correction. Shown are fluorophore expression levels (gene exp.) on a logarithmic (ln) scale.

  6. F

    Log transgene expression superimposed on a UMAP embedding after ambient RNA correction. Shown are transgene expression (gene exp.) levels on a logarithmic (ln) scale.

Source data are available online for this figure.
Figure 2
Figure 2. Demultiplexing of combinatorial reprogramming factor expression reveals discrete induction of distinct lineages
  1. A

    Visualization of condition assignment outcome (see Computational demultiplexing section in Materials and Methods for further details) superimposed on a Uniform Manifold Approximation and Projection (UMAP) embedding of the dataset. Depicted are the total population of cells carrying one, two, or three factors (left panels, blue shades) and the individual conditions (right panels, red shades) for the indicated conditions. For individual conditions, numbers indicate the total number of cells per condition.

  2. B

    Louvain clustering of the entire dataset (left panel) separated into clusters corresponding to the expression of the individual transcription factors, the ground fibroblast state, and the collision of reprogramming factors (right panels). Clusters are labeled according to the predominant transcription factor or cell state.

Figure 3
Figure 3. Comparing principles of fate conversion between distinct lineages
  1. A

    Bar plot of transgene‐wise coefficient vector magnitude as a measure of induced transcriptomic changes by individual reprogramming transcription factors (see Differential expression analysis section in Materials and Methods for further details, A = Ascl1, M = MyoD1, F = FoxA2, S = Sox2, O = Oct4).

  2. B

    Latent pseudotime (lineage progression) superimposed on a Uniform Manifold Approximation and Projection (UMAP) of the dataset computed with scVelo (Bergen et al, ; see RNA velocity and CellRank analysis section in Materials and Methods for further details).

  3. C

    Heatmap showing the intersection of upregulated genes in individual factor conditions (y‐axis) vs. putative binding sites derived from a publicly available ChIP dataset for MyoD1 (M), Ascl1 (A), FoxA2 (F), and Sox2 (S) (x‐axis; Oki et al, 2018).

  4. D

    Matrixplot showing the relative mean expression of key target genes related to the different lineages for the indicated single factor conditions and control vector expressing fibroblasts (Fib.).

  5. E

    Fibroblast score (see Unsupervised analysis of single‐cell RNA‐seq data in Materials and Methods and Appendix Table S1 for further details) for single factor conditions compared with fibroblasts expressing only control vectors (Fib. = Fibroblasts). The number of data points per violin plot is the number of cells per matched condition shown in Fig 2A. For each violin, the center dot represents the median, the centerline defines the range and the solid box marks the interquartile range (IQR).

  6. F

    Venn diagram of downregulated gene overlap between single factor conditions. Downregulated genes were determined by performing differential expression between each condition and the control vector carrying fibroblasts (See Differential expression section in Materials and Methods for further details).

Figure 4
Figure 4. Factor collision reveals antagonistic effects of a deterministic system
  1. A

    Fibroblast score (see Unsupervised analysis of single‐cell RNA‐seq data section in Materials and Methods for further details) for single and double factor conditions in comparison to control vector carrying fibroblasts. The number of data points per violin plot is the number of cells per matched condition shown in Fig 2A. For each violin, the center dot represents the median, the centerline defines the range and the solid box marks the interquartile range (IQR).

  2. B

    PCA scatter plot of first (PC1, vertical) and second (PC2, horizontal) loading of the different conditions after collapsing into pseudobulk samples. Colored rectangles correspond to clusters defined in Fig EV2B. CV: Control vector expressing fibroblasts.

  3. C

    Correlation matrix of linear model effects for single (left panel) and double factor (right panel) collisions. Color defines direction of correlation of gene‐wise coefficient vectors (negative = blue, positive = red) and color tone depicts size of correlation. See Differential expression analysis section in Materials and Methods for further details.

  4. D

    Stacked violin plots showing the standardized median expression of several lineage marker genes for single factor conditions (bold) and upon collision with the indicated factors (A = Ascl1, M = MyoD1, F = FoxA2, S = Sox2).

  5. E

    Conceptual summary of fate titration analysis (upper left panel) and data‐derived examples for the indicated combinations of factors. Shown are the Louvain cluster assignments for the indicated collisions (upper panels) and the decision boundaries for indicated intermediate fates according to transcription factor levels (lower panels). Transcription factor levels shown on the x‐ and y‐axis are log‐normalized expression values scaled into the dynamic range of the single‐positive condition. Cells are colored according to their Louvain cluster identity. See Fate titration section in Materials and Methods for further details.

  6. F, G

    Predictive performance for a linear model of the categorical condition variable (L‐C: Linear condition.), a linear model of the log of the transgene expression (L‐E: Linear expression), and a nonlinear model of the log of the transgene expression (NL‐E: Nonlinear expression) in randomly held‐out test cells (n = 1,184 cells). For the task of Louvain cluster assignment prediction (F), shown are the area under the receiver–operator characteristic curve (ROC AUC), top 3 accuracy (Top 3 Acc.), Accuracy (Acc.), and class‐balanced accuracy (Bal. Acc.). For the task of prediction of log‐normalized expression values of highly variable genes (G), shown is the cell‐wise R2 (G). See Supervised modeling section in Materials and Methods for further details. For each box in G, the centerline defines the median, the height of the box is given by the interquartile range (IQR), the whiskers are given by 1.5 * IQR, and the outliers are given as points beyond the minimum or maximum whisker.

  7. H, I

    Predictive performance on held‐out reprogramming conditions for models and metrics as described in (H), with the exception of the baseline model, which is a linear model of binary transgene presence (L‐B: Linear binary). The hold‐out task in (H) is a particular triple‐positive condition as indicated in the legend (N = 4). The hold‐out task in (I) is the set of all triple‐positive conditions and models are either trained on single‐positive conditions only or on single‐ and double‐positive conditions (N = 4). See Supervised modeling section in Materials and Methods for further details. For each box, the centerline defines the median, the height of the box is given by the interquartile range (IQR), the whiskers are given by 1.5 * IQR, and the outliers are given as points beyond the minimum or maximum whisker.

Figure EV2
Figure EV2. Transcriptomic effects of factor collision and general determinism in Collide‐seq data
  1. A

    Fibroblast score (see Unsupervised analysis of single‐cell RNA‐seq data section in Materials and Methods and Appendix Table S1 for further details) for indicated conditions. The number of data points per violin plot is the number of cells per matched condition shown in Fig 2A.

  2. B

    Similarity of pseudobulk (mean expression per condition) principal components per condition clustered hierarchically. Colors of clusters correspond to those used in Fig 4B.

  3. C

    Correlation matrix of linear model effects for correlation between individual factors and collision state (see Differential expression analysis section in Materials and Methods for further details). Color defines direction of correlation (negative = blue, positive = red), and color tone depicts size of correlation (darker = higher).

  4. D

    Correlation of predictive accuracy, measured as explained variance, on held‐out test data with the predicted uncertainty of the model, measured as the mean of the predicted standard deviation by gene.

  5. E

    Explained variance of model prediction by condition and replicate. For each violin, the center dot represents the median, the centerline defines the range and the solid box marks the interquartile range (IQR).

  6. F, G

    Explained variance of cells binned by individual transgene expression (F) or individually with trend fit (G). In (G), the color indicates the predicted uncertainty measured as the mean standard deviation over output genes. For each violin in F, the center dot represents the median, the centerline defines the range and the solid box marks the interquartile range (IQR).

Figure 5
Figure 5. Fate competition between the bHLH factors MyoD1 and Ascl1
  1. A

    Louvain clustering superimposed on a Uniform Manifold Approximation and Projection (UMAP) embedding of the second Collide‐seq experiment. Clusters are labeled according to the predominant transcription factor or cell state.

  2. B

    Condition‐based RNA velocity and CellRank for the indicated conditions. Shown are UMAP embeddings of all cells in the dataset (gray) with cells positive for the indicated transcription factor(s) colored by the terminal fate assigned by CellRank to indicate lineage endpoints (Bergen et al, 2020). See RNA velocity and CellRank analysis section of Materials and Methods for further details.

  3. C

    Intersections of genes bound by Ascl1 and/or MyoD1 based on ChIP‐seq data (Lee et al, ; see ChIP‐seq data analysis and synergism and antagonism annotation sections in Materials and Methods for further details) with gene sets that correspond to synergistic and antagonistic up‐ and downregulation. We defined synergism and antagonism based on differential expression analysis (See Differential expression analysis section in Materials and Methods for further details). Genes were considered synergistic or antagonistic when their expression in double‐positive conditions was higher (synergism) or lower (antagonism) than expected by adding their expression in the single factor conditions together. Shown is the size of the intersection (number of genes) and the relative size of this intersection with respect to the respective differentially expressed gene sets as defined by the column.

  4. D

    Log‐normalized expression z‐score of genes up‐ (top panels) and downregulated (bottom panels) by Ascl1 (left, blue violins) and MyoD1 (right, yellow violins) vs. their scaled expression levels (x‐axis) binned into 10% intervals. For each violin, the center dot represents the median, the centerline defines the range and the solid box marks the interquartile range (IQR).

  5. E

    Fate titration plot of Ascl1 and MyoD1 double‐positive cells (see Fate titration section in Materials and Methods for further details). Shown are decision boundaries for indicated fates according to transcription factor levels. MyoD1 and Ascl1 expressions shown on the x‐ and y‐axis are log‐normalized expression values scaled into the dynamic range of the single‐positive condition (see Fate titration analysis section in Materials and Methods for further details). Cells are colored according to their lineage endpoint as determined by CellRank.

  6. F

    Representative immunofluorescence images of cells 3 days after transgene induction. Cells were stained for the fluorescent reporter present on the PiggyBac construct of the reprogramming factor. Arrows indicate multinucleated cells. Scale bar represents 50 μm. mutAscl1 = mutant Ascl1.

  7. G

    Quantification of the percentage of cells with more than one nucleus among all transfected cells within a given condition. Data points represent biological replicates (n = 4). For each box, the centerline defines the median, the height of the box is given by the interquartile range (IQR), the whiskers are given by 1.5 * IQR, and the outliers are given as points beyond the minimum or maximum whisker. Pairwise comparisons were performed with the Mann–Whitney U test and correction for multiple testing performed with Benjamini–Hochberg correction. *P < 0.05 (MyoD1 vs. Ascl1: P = 0.03, MyoD1 vs. mutAscl1: P = 0.03, MyoD1 vs. Ascl1 and MyoD1: P = 0.03, MyoD1 vs. mutAscl1 and MyoD1: P = 0.03, Ascl1 vs. Ascl1 and MyoD1: P = 0.03, Ascl1 vs. mutAscl1 and MyoD1: P = 0.03, mutAscl1 vs. Ascl1 and MyoD1: P = 0.03, mutAscl1 vs. mutAscl1 and MyoD1: P = 0.03).

Source data are available online for this figure.
Figure EV3
Figure EV3. Investigation of Ascl1 and MyoD1 collision at three defined time points
  1. A

    Schematic overview of experimental conditions in second Collide‐seq experiment.

  2. B

    Visualization of assignment outcome (see Computational demultiplexing section in Materials and Methods for further details) for individual cells superimposed on a Uniform Manifold Approximation and Projection (UMAP) embedding of second Collide‐seq dataset. Depicted are cells positive for the indicated conditions and their total number.

  3. C

    Condition‐based RNA velocity trajectories. Shown are RNA velocities computed within single positive cells, projected on a UMAP embedding and computed on the transcriptomes of all cells (see RNA velocity and CellRank section in Materials and Methods for further details). Superimposed is the latent pseudotime (lineage progression coordinate) computed with scVelo (Bergen et al, 2020) in the context of the velocity computation.

  4. D

    Relative abundance of cells from Ascl1 (left) and MyoD1 (right) single‐positive conditions within Louvain clusters (x‐axis) colored by the time point of their collection.

  5. E

    Fraction of cells per Louvain cluster colored according to whether they were actual or in silico simulated double‐positive cells.

  6. F

    Simulated double positives and actual double positives colored and superimposed on UMAP.

  7. G

    Neuronal (N.score) and myogenic (M.score) score for Ascl1 and MyoD1 double‐positive cells residing in collision states.

  8. H

    Comparison of target gene induction by Ascl1 and mutAscl1 using qRT–PCR. Shown is the fold change of Ascl1 and mutAscl1 induced gene expression for the indicated genes as compared to untransfected fibroblasts 48 h after induction. Each panel represents a biological replicate (n1‐3). Primers used for qRT–PCR are listed in Appendix Table S2. Error bars represent standard deviation for technical replicates within each biological replicate. n = 3 biological replicates.

  9. I

    Representative images of Desmin fluorescence intensity after 72 h of transgene induction for the indicated conditions. Scale bars represent 100 μm. mutAscl1 = mutant Ascl1.

  10. J

    Normalized Desmin fluorescence intensity for indicated conditions (see Fluorescence intensity quantification section in Materials and Methods for further details). For each box, the centerline defines the median, the height of the box is given by the interquartile range (IQR), the whiskers are given by 1.5 * IQR, and the outliers are given as points beyond the minimum or maximum whisker. Pairwise comparisons were performed with the Mann–Whitney U test and correction for multiple testing performed with Benjamini–Hochberg correction. *P < 0.05 (MyoD1 vs. Ascl1 and MyoD1: P = 0.048, MyoD1 vs. mutAscl1 and MyoD1: P = 0.048, Ascl1 and MyoD1 vs. mutAscl1 and MyoD1 = 0.5). n = 3 biological replicates.

Source data are available online for this figure.
Figure EV4
Figure EV4. Validation of collision effects in a biological replicate
  1. A

    Log‐normalized transgene expression superimposed on Uniform Manifold Approximation and Projection (UMAP) embedding of separate 48 h dataset. Shown are total count‐normalized transgene expression levels (gene exp.) on a logarithmic (ln) scale.

  2. B

    Louvain clustering superimposed on UMAP embedding.

  3. C

    Condition‐based RNA velocity on a UMAP computed on cells from designated conditions. Shown is a UMAP of all cells in the dataset (gray). Superimposed are cells from the indicated condition colored in with the terminal fate assigned by CellRank (Lange et al, 2022), see RNA velocity and CellRank section in Materials and Methods for further details).

Figure 6
Figure 6. Competitive inhibition between Ascl1 and MyoD1 impairs pioneer factor activity
  1. A

    Schematic overview of third Collide‐seq experiment.

  2. B

    Visualization of assignment outcome (see Computational demultiplexing section in Materials and Methods for further details) for individual cells superimposed on Uniform Manifold Approximation and Projection (UMAP) embedding of the scRNA‐seq part of the experiment. Depicted are cells positive for the indicated conditions using all cells (top panels) and the top 500 highest transgene expressing cells (bottom panels). Bottom right panel depicts grouped Louvain clustering based on the factors present. MyoD1 = MyoD1 only, Collision = Ascl1 & MyoD1 and mAscl1 & MyoD1, Control = control vector carrying fibroblasts.

  3. C

    Matrixplot showing the relative expression of key myogenic genes between different clusters as indicated in Fig 6B. Myo = MyoD1, Col = Collision, Con = Control.

  4. D

    Log2 fold change of myogenic gene expression upon Ascl1 & MyoD1 (blue) and mAscl1 & MyoD1 (orange) collision as compared to MyoD1 only expressing cells. The number of data points per violin plot is the number of cells per matched condition shown in Fig 6B. For each violin, the center dot represents the median, the centerline defines the range and the solid box marks the interquartile range (IQR).

  5. E

    Fraction of cells for each experimental condition among the different Louvain clusters.

  6. F

    Visualization of assignment outcome (see Computational demultiplexing section in Materials and Methods for further details) for individual cells superimposed on a UMAP embedding of the scATAC‐seq part of the experiment. Depicted are cells positive for the indicated conditions using all cells (top panels) and the top 500 highest transgene expressing cells (bottom panels). Bottom right panel depicts grouped Louvain clustering based on the factors present MyoD1 = MyoD1, Collision = Ascl1 & MyoD1 and mAscl1 & MyoD1, Control = Control vector carrying fibroblasts.

  7. G

    Number of CUT&RUN peaks from a condition on the y‐axis overlapping with CUT&RUN peaks from a condition on the x‐axis. The presented number of overlapping peaks is the average over two CUT&RUN replicates of both conditions. (M = MyoD1, +A = Ascl1 & MyoD1, +mA = mAscl1 & MyoD1).

  8. H

    CUT&RUN peak classification averaged between two independent biological replicates depicted as a fraction of the total number of CUT&RUN peaks (Enh., Enhancers; Prom., Promoters). For each barplot, the 95% confidence interval is shown as error bars.

  9. I

    Representative Integrative Genome Browser (IGV, see CUT&RUN peak visualization in Materials and Methods for further details; Robinson et al, 2011) tracks for the indicated samples in CUT&RUN replicate 1 out of 2.

  10. J

    Bar plot showing the number of averaged CUT&RUN peaks (blue) over two independent biological replicates that are covered by scATAC‐seq (orange).

  11. K

    CUT&RUN (left panel), scATAC (middle panel), and scRNA‐seq (right panel) signal for key myogenic marker genes for the indicated conditions. CUT&RUN signal is averaged for two biological replicates. The number of data points per barplot of the CUT&RUN data is two replicates, and per box plot of ATAC and RNA data is the number of cells per matched condition shown in Fig 6B. For each barplot, the 95% confidence interval is shown as error bars. For each box, the centerline defines the median, the height of the box is given by the interquartile range (IQR), the whiskers are given by 1.5 * IQR, and the outliers are given as points beyond the minimum or maximum whisker.

Figure EV5
Figure EV5. Multimodal analysis of Ascl1 and MyoD1 collision
  1. A

    Schematic overview of experimental conditions for third Collide‐seq experiment.

  2. B

    Log transgene expression superimposed on a Uniform Manifold Approximation and Projection (UMAP) embedding for Ascl1 (left panel) and MyoD1 (right panel). Shown are transgene expression levels on a logarithmic (ln) scale.

  3. C

    Visualization of assignment outcome (see Computational demultiplexing section in Materials and Methods for further details) for individual cells superimposed on scRNA‐seq UMAP embedding (top panels) and scATAC‐seq UMAP embedding (bottom panels). Depicted are cells positive for the indicated conditions.

  4. D

    Number of CUT&RUN peaks from a condition on the y‐axis overlapping with CUT&RUN peaks from a condition on the x‐axis. The presented number of overlapping peaks is presented for both CUT&RUN replicates shown in Fig 6G.

  5. E

    Bar plot showing the number of CUT&RUN peaks covered by scATAC‐seq for two biological replicates.

  6. F

    Confusion matrix showing the number of ATAC peaks in the different experimental conditions.

Figure 7
Figure 7. Proposed model
  1. A

    Schematic summary of factor hierarchies deducted from our experiments. Strength of interaction between factors is reflected by the line width. Nature of interaction (occurrence of collision state or dominance) is depicted by arrowheads (collision state) or bar ends (dominance).

  2. B

    Schematic overview of the possible mechanisms for Ascl1 and MyoD1 collision. (i) collision between transcriptional programs, (ii) competition between (shared) binding sites, (iii) formation of an inactive heterodimer, and (iv) cofactor competition.

References

    1. Bergen V, Lange M, Peidli S, Wolf FA, Theis FJ (2020) Generalizing RNA velocity to transient cell states through dynamical modeling. Nat Biotechnol 38: 1408–1414 - PubMed
    1. Brown KE, Fisher AG (2021) Reprogramming lineage identity through cell–cell fusion. Curr Opin Genet Dev 70: 15–23 - PubMed
    1. Buganim Y, Faddah DA, Cheng AW, Itskovich E, Markoulaki S, Ganz K, Klemm SL, van Oudenaarden A, Jaenisch R (2012) Single‐cell expression analyses during cellular reprogramming reveal an early stochastic and a late hierarchic phase. Cell 150: 1209–1222 - PMC - PubMed
    1. Caiazzo M, Dell'Anno MT, Dvoretskova E, Lazarevic D, Taverna S, Leo D, Sotnikova TD, Menegon A, Roncaglia P, Colciago G et al (2011) Direct generation of functional dopaminergic neurons from mouse and human fibroblasts. Nature 476: 224–227 - PubMed
    1. Casey BH, Kollipara RK, Pozo K, Johnson JE (2018) Intrinsic DNA binding properties demonstrated for lineage‐specifying basic helix‐loop‐helix transcription factors. Genome Res 28: 484–496 - PMC - PubMed

Publication types

MeSH terms