Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug;632(8027):1073-1081.
doi: 10.1038/s41586-024-07706-4. Epub 2024 Jul 17.

Symbolic recording of signalling and cis-regulatory element activity to DNA

Affiliations

Symbolic recording of signalling and cis-regulatory element activity to DNA

Wei Chen et al. Nature. 2024 Aug.

Abstract

Measurements of gene expression or signal transduction activity are conventionally performed using methods that require either the destruction or live imaging of a biological sample within the timeframe of interest. Here we demonstrate an alternative paradigm in which such biological activities are stably recorded to the genome. Enhancer-driven genomic recording of transcriptional activity in multiplex (ENGRAM) is based on the signal-dependent production of prime editing guide RNAs that mediate the insertion of signal-specific barcodes (symbols) into a genomically encoded recording unit. We show how this strategy can be used for multiplex recording of the cell-type-specific activities of dozens to hundreds of cis-regulatory elements with high fidelity, sensitivity and reproducibility. Leveraging signal transduction pathway-responsive cis-regulatory elements, we also demonstrate time- and concentration-dependent genomic recording of WNT, NF-κB and Tet-On activities. By coupling ENGRAM to sequential genome editing via DNA Typewriter1, we stably record information about the temporal dynamics of two orthogonal signalling pathways to genomic DNA. Finally we apply ENGRAM to integratively record the transient activity of nearly 100 transcription factor consensus motifs across daily windows spanning the differentiation of mouse embryonic stem cells into gastruloids, an in vitro model of early mammalian development. Although these are proof-of-concept experiments and much work remains to fully realize the possibilities, the symbolic recording of biological signals or states within cells, to the genome and over time, has broad potential to complement contemporary paradigms for how we make measurements in biological systems.

PubMed Disclaimer

Conflict of interest statement

The University of Washington has filed a patent application partially based on this work, in which J.C., W.C. and J.S. are listed as inventors. J.S. is on the scientific advisory board, a consultant and/or a cofounder of Adaptive Biotechnologies, Cajal Neuroscience, Camp4 Therapeutics, Guardant Health, Maze Therapeutics, Pacific Biosciences, Phase Genomics, Prime Medicine, Scale Biosciences, Somite Therapeutics and Sixth Street Capital. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. ENGRAM.
a, Schematic of ENGRAM. Endogenous or designed CREs drive signal-dependent, Pol2-mediated production of a Csy4 transcript bearing an embedded pegRNA. Csy4 cleaves two 17 bp csy4 hairpins from its own transcript, liberating the pegRNA to write a CRE-specific insertional barcode to DNA Tape. b, Three ENGRAM architectures were tested. Solid lines correspond to Csy4 targeting csy4 hairpins; dashed lines correspond to potential for cleavage events to mediate autoregulatory negative feedback on Csy4 levels. c, ENGRAM recorders, driven only by minP and encoding a degenerate 5-mer insertion to the HEK3 locus, were integrated to PE2(+) HEK293T cells. Background accumulation at HEK3 was monitored for 20 days. d, NF-κB recorders were integrated to PE2(+) HEK293T cells. Recording at HEK3 was measured in the presence versus absence of 10 ng ml−1 TNF. P values derived from two-tailed t-test. Data in c and d are mean and s.d. from n = 3 integration replicates. eh, Insertional barcodes predictably bias recording efficiency. e, A 5′ ENGRAM recorder library with constitutive (PGK-driven) production of pegRNAs encoding a degenerate 5-mer insertion into HEK3 was integrated to PE2(+) HEK293T cells. f, The log-scaled abundances of individual 5-mer insertions at HEK3 were highly correlated between transfection replicates (rep1 and rep2). g, Editing scores were calculated as (genomic reads with insertion/total edited HEK3 reads)/(plasmid reads with insertion/total plasmid reads) and are plotted here for 948 5-mers. h, Predicted versus observed editing scores for 5-mer insertions. A linear lasso regression model was trained on one-hot encoded single and dinucleotide content of the 5-mer, together with the MFE of the predicted secondary structure. The model was trained with tenfold cross-validation on a 680 barcode training set and then applied to predict editing scores on a held-out 268 barcode test set.
Fig. 2
Fig. 2. Multiplex recording of CRE activities with ENGRAM.
a, A library of ENGRAM reporters bearing various CREs was constructed and integrated into PE2(+) K562 cells. CRE activity was recorded at an endogenous or synthetic (piggyBac) DNA Tape. For benchmarking, relative activities of CREs were measured via either recording (ENGRAM) or reporting (MPRA). b, Each of 300 CREs was linked to a distinct pegRNA-encoded 6-mer insertional barcode. c, ENGRAM-recorded barcode proportions were highly correlated with MPRA-reported barcode proportions. Correction of ENGRAM-recorded proportions by MFE of corresponding pegRNAs did not markedly alter correlation (r = 0.860 versus 0.889 with versus without MFE correction). d, ENGRAM preserves overall rank order of CRE activity reasonably well. Top, CREs ranked by MPRA-reported activity; bottom, ENGRAM-recorded activities plotted in the same order. e, Boxplot of Spearman correlations within each quartile of CRE activity. CREs were split into four quartiles based on MPRA-reported activities. Within each quartile, 20 CREs were randomly sampled and their rank order compared for MPRA versus ENGRAM. Points represent sampling iterations (n = 10), boxes represent 25th, 50th and 75th percentiles, whiskers represent 1.5× interquartile range. P values derived from two-tailed t-test. f,g, ENGRAM recording of cell-type-specific activities of 98 synthetic CREs. f, Design of synthetic CREs. Each synthetic CRE is homotypic, bearing tandem copies of one transcription factor binding site motif, and is linked to a pegRNA encoding a 5-mer insertional barcode. The recorder library was transiently transfected to PE2(+) K562 or HEK293T cells in triplicate. Genomic DNA was harvested 48 h later, followed by PCR and sequencing. g, Volcano plot of differentially recorded activity in K562 versus HEK293T cells. Red points indicate significant and substantial differences (Wald test with Benjamini–Hochberg correction, P < 0.001 for fold difference above 2). Labels correspond to names of transcription factor representatives for synthetic CRE motifs (Supplementary Table 3). NS, not significant.
Fig. 3
Fig. 3. Multiplex recording of intensity and duration of signalling pathway activity.
a, ENGRAM recorders driven by signal-responsive CREs for doxycycline (Tet-On, TRE), TNF (NF-κB response element) and CHIR99021 (TCF-LEF response element, WNT signalling) were constructed. Each recorder was linked to one or two unique barcodes. bd, Recording levels are dependent on agonist concentration. Recorders were integrated to PE2(+) HEK293T cells, which were exposed to a serial twofold dilution series of doxycycline (b), TNF (c) or CHIR99021 (d), with starting concentrations of 8 μg ml−1, 64 ng ml−1 and 32 μM, respectively, for 48 h in triplicate. For CHIR99021, additional concentrations were sampled between 1 and 4 μM. The half-maximal effective concentrations for doxycycline, TNF and CHIR99021 are 0.17 μg ml−1, 2.5 ng ml−1 and 2.2 μM, respectively. Data were fitted to sigmoid curves using nonlinear regression. e, Fold difference in editing levels observed for the three signalling pathway recorders with versus without the maximum dose of the corresponding agonist. Data in be are mean and s.d. from n = 3 integration replicates. f,g, Heatmap showing editing efficiencies observed in matrix experiments on NF-κB (f) and WNT (g) recorders in which both agonist concentration and exposure duration were varied. h, Schematic of multiplex recording of signalling pathway activities. The three recorders shown in a were mixed at an equimolar ratio and integrated to PE2(+) HEK293T cells. The recorders write different barcodes to the same DNA Tape (endogenous HEK3). i, These cells were exposed to all possible combinations of three agonists for 48 h, followed by sequencing-based measurement of recording levels based on signal-specific barcodes written to HEK3. Coloured shapes as in a. Concentrations used were 500 ng ml−1, 10 ng ml−1 and 3 μM for doxycycline, TNF and CHIR99021, respectively.
Fig. 4
Fig. 4. Combining ENGRAM and DNA Typewriter.
a, Tet-On (orange) and WNT (blue) ENGRAM recorders were modified to drive the expression of pegRNAs that write to DTT. With DNA Typewriter, insertional edits include a barcode but also a key that shifts the type guide position to the next unit of the DTT. Temporal dynamics (for example, the order of two serially applied agonists) should be captured by the order in which the corresponding symbols appear in DTT. b,c, Modified ENGRAM recorders and five-unit DTT were sequentially integrated to PEmax+ HEK293T cells. We designed and tested serial (b) and layered (c) programmes in which cells were exposed to different patterns of either 100 ng ml−1 doxycycline or 3 μM CHIR99021 (left-hand columns) across (two patterns × three intervals × two possible orders × three integration replicates = 36 cell populations). Using sequencing DTT after 6 days, we calculated the log ratio of (Tet-On → WNT) versus (WNT → Tet-On) bigrams at sites 1 and 2, predicting and observing positive values when Tet-On activation preceded WNT activation, and negative values when WNT activation preceded Tet-On activation (right-hand columns). d, As in b,c, but for pulse programmes (three pulse timings × three integration replicates) in which these cells were exposed to 500 ng ml−1 doxycycline for 24 h against a background of continuous 3 μM CHIR99021 stimulation. Data in bd are mean and s.d. from n = 3 integration replicates. e, PCA on proportions of unigrams and bigrams observed at each of five DTT positions and four DTT position-pairs, respectively, across 45 cell populations subjected to various patterns of exposure to doxycycline and CHIR99021 (15 programmes, executed in triplicate). Circled subsets correspond to serial and layered programmes in either order, or to pulse programmes. The top three PCs are plotted, collectively explaining 90% of variance.
Fig. 5
Fig. 5. Biologically conditional recording in mES cells and gastruloids.
a, Schematic of polyclonal ENGRAM mES cells, with each cell bearing multiple copies of doxycycline-inducible PEmax, ENGRAM recorders and synthetic DNA Tape (HEK3). b,c, Volcano plot of differential activity of ENGRAM recorders in cultured mES cells versus K562 cells (b) and in cultured mES cells versus HEK293T cells (c). Red points indicate significant and substantial differences (Wald test with Benjamini–Hochberg correction, P < 0.001 for fold difference above 2). Labels correspond to names of transcription factor representatives for synthetic CRE motifs (Supplementary Table 3). mES cell data were corrected for relative abundance of recorders in the polyclonal mES cell line versus the plasmid pool for transient transfection of K562 and HEK293T cells (Extended Data Fig. 9b). d, Polyclonal ENGRAM mES cells were differentiated to gastruloids. For each of the five 24-h windows, PEmax was activated by the addition of 50 ng ml−1 doxycycline; gastruloids were harvested 24 h later. Each recording window was tested in duplicate. e, Hierarchically clustered heatmap showing recorded activities across each 24 h interval (rows) for 17 of the 98 ENGRAM recorders exhibiting significant and substantial differences (Wald test with Benjamini–Hochberg correction, P < 0.1 for fold difference above 2) in one or more of the five windows (columns) relative to cultured mES cells. Values are log-scaled barcode proportion ratios. *P < 0.10, **P < 0.01 and ***P < 0.001 f, Dynamics of selected ENGRAM recorders during gastruloid induction. Labels are representative of transcription factor(s) thought to bind each motif, and it remains uncertain which are driving the activity of each synthetic CRE recorder (Supplementary Table 3). Plotted on the y axis are the log2-scaled barcode proportion ratios for gastruloids with windowed recording in a particular 24 h interval (x axis) versus cultured mES cells. Dots and line shadow represent two integration replicates and 95% confidence interval, respectively.
Extended Data Fig. 1
Extended Data Fig. 1. ENGRAM architecture.
(a) Sequence of the pegRNA predicted to be liberated from a Pol-2 transcript by Csy4. The csy4-hairpin residuals on both ends are shown in lower case. The spacer and primer binding sequence (PBS) are highlighted in orange. The reverse transcription template (RTT) consists of a homology arm (blue) and barcode (red). (b) Schematic of ENGRAM recorder. A pegRNA writing unit is flanked by csy4 hairpins and embedded within the 3′ or 5′ UTR of a Pol-2-driven Csy4-encoding mRNA. PE2 (or PEmax) is constitutively expressed from a separate locus. When the ENGRAM recorder is active, Csy4 is produced, cleaves at the csy4 hairpins and releases the active pegRNA.
Extended Data Fig. 2
Extended Data Fig. 2. ENGRAM installs insertional barcodes with reproducible, predictable efficiencies.
(a-c) The relative proportions of 1023 5 N barcodes installed by ENGRAM driven by the constitutive Pol-2 PGK promoter were measured in triplicate. Log-scaled insertion proportions (calculated as the proportion of edited HEK3 sites with a given insertion) were strongly correlated between pairs of transfection replicates. (d-e) Predicted secondary structures for pegRNAs with the lowest (left) and highest (right) insertional efficiencies. Sequences shown above are those observed in DNA Tape, which are the reverse complement of sequences in pegRNAs. (f) The rank-ordered coefficients of the linear lasso regression. Positional information of single nucleotides and dinucleotides and minimum free energy (MFE) of secondary structure were used as input features for training. In addition to MFE, which received the highest coefficient, the top 4 and bottom 4 coefficients for sequence features are annotated (e.g. 3-TC means TC dinucleotide starting at position 3). (g) MFE alone can explain 70% of the variance in editing scores observed for different insertional barcodes.
Extended Data Fig. 3
Extended Data Fig. 3. Benchmarking of ENGRAM against reporter assays.
(a) ENGRAM recorders with highly vs. lowly active CRE fragments (as previously measured via MPRA) upstream of a minP, together with minP-only and promoter-less constructs, were cloned, each driving expression of two distinct pegRNA-encoded barcodes. (b) Barplot showing the editing efficiency of individual barcodes associated with each of the eight members of the CRE library (4 architectures x 2 barcodes each). Fold differences were calculated by first summing the counts for the pair of barcodes associated with each architecture, and then calculating the ratio between pairs of architectures. Barcodes corresponding to the highly active CRE were 41.3-fold, 23.6-fold, and 15.1-fold more abundant than barcodes corresponding to promoter-less, minP-only or lowly active CRE controls, respectively. P-values were from two-tailed t-test. (c) Insertion efficiency of various barcodes at synthetic (1.8%) vs. endogenous (3.1%) HEK3 loci are highly correlated. Of note, for synthetic HEK3 sites, the observed efficiencies reflect an average across many genomic contexts. The center and error bars in b-c correspond to mean and standard deviations, from n = 3 integration replicates. (d) Log-scaled insertion proportions for 300 6-mer barcodes were highly correlated between DNA Tape sites located at synthetic vs. endogenous HEK3 loci. (e) Log-scaled insertion proportions for 300 6-mer barcodes were highly reproducible between integration replicates. Each value corresponds to the proportion of barcodes read out from synthetic DNA Tape. (f) Log-scaled RNA proportions for 300 6-mer barcodes were highly reproducible across integration replicates. Each value corresponds to the proportion of barcodes read out at the RNA level from transcribed pegRNAs. (g) The log-scaled proportions of ENGRAM events recorded to DNA were highly correlated with log-scaled proportions of barcodes measured directly from RNA.
Extended Data Fig. 4
Extended Data Fig. 4. Further benchmarking of ENGRAM.
(a) Comparison of CRE ranks for ENGRAM vs. MPRA across quartiles. Eight CREs were randomly sampled from each of four quartiles based on the RNA-based activity measurement (i.e. MPRA). The relative activity based on reporters (MPRA) for each set of eight is shown at the top, and the activity for the same CREs based on recorders (ENGRAM) is shown at the bottom. Overall, ENGRAM reasonably preserved the rank of CREs when comparing the quartiles to one another. (b-c) Different cell numbers were sampled (6,000, 12,000, 24,000, 48,000, 96,000 cells) prior to measuring ENGRAM recorded activity of 300 CREs, either from endogenous and synthetic DNA Tape, and then recovery (b) and reproducibility (c) were assessed. (d-e) Sequencing data from synthetic DNA Tape and 96,000 cell input condition was downsampled, and then recovery (d) and reproducibility (e) were assessed. The error bars in b-e correspond to standard deviations, from n = 3 integration replicates.
Extended Data Fig. 5
Extended Data Fig. 5. Multiplex recording of cell-type-specific activities of synthetic CREs with ENGRAM.
(a) Recording efficiency of synthetic CREs at the endogenous DNA Tape site (HEK3 locus) in HEK293T (12.6%) and K562 (1.0%) cells. The difference in overall recording between cell lines is likely attributable to differences in transfection efficiency. The center and error bars correspond to mean and standard deviations, from n = 3 transfection replicates (b-c) Log-scaled insertion proportions for 5-mer barcodes linked to the 98 synthetic CREs were highly reproducible across transfection replicates for both HEK293T (b) and K562 (c) cells. Each value corresponds to the proportion of barcodes read out at the DNA level from the endogenous HEK3 locus. As the same number of cells were sampled for recording, the lower reproducibility in K562 cells is likely secondary to lower transfection/editing efficiency. (d-e) Differential expression of TFs in HEK293T vs. K562 cells. As many TFs share similar binding motifs, here we show expression ratios between the cell lines for all expressed TFs (normalized transcripts per million > 0.5 in one or both cell lines) assigned to the corresponding motif by JASPAR, for each of the 17 differentially active synthetic CRE recorders (Fig. 2g). In the bottom row of each plot, we show an expression ratio based on summing the read counts of all the motif-associated TFs in bulk RNA-seq data from these cell lines, except for GCM1, as its JASPAR-associated TFs (GCM1, GCM2) are not detected as expressed in either cell line. Pink, higher expression in K562 cells; Blue, higher expression in HEK293T cells. Most K562-specific (d) and HEK293T-specific (e) recording activities are directionally concordant with the summed expression of the TFs that JASPAR associates with the motif embedded in the synthetic CRE (all but the HOXB9, POU2F1, ZNF449, and GCM1-named synthetic CRE recorders; 13/17; p = 0.02; binomial test).
Extended Data Fig. 6
Extended Data Fig. 6. Recording of the intensity and duration of signaling pathway activity.
(a) We observed minimal background recording in the absence of stimulus with signal-responsive ENGRAM recorders after 7 or 14 days. This background did not accumulate over time, consistent with the hypothesis that it primarily accumulates shortly after transfection, potentially due to ORI-driven, plasmid-mediated transcription. Plotted points correspond to three integration replicates. (b-c) Histograms, broken out by ligand exposure time and agonist concentration, showing editing efficiencies resulting from matrix experiment on the NFκB (b) and Wnt (c) recorders, in which both stimulant concentrations and durations of exposure were varied (2 recorders x 8 concentrations x 8 durations x 3 integration replicates = 384 conditions). (d) Estimating the multiplicity of integration (MOI) of piggyBac transposase integration with qPCR. Cells were transfected with a piggyBac-GFP construct, either with or without piggyBac transposase. GFP DNA abundance was measured using qPCR with two pairs of GFP-specific primers (together with a pair of primers directed at native RPPH1 locus as an internal control) over the course of 15 days. The estimated levels of GFP abundance in the no-transposase control decreased to background levels after 7-9 days. DNA-level GFP abundance with transposase present, at timepoints where controls have gone to background, suggests that we were achieving an MOI on the order of 15-20. (e) The estimated proportion of cells with at least one copy of each of the three recorders as a function of MOI, assuming a Poisson distribution. At an MOI of 15, over 98% of cells are predicted to bear at least one copy of each of the three recorders. (f) Barcode composition of DNA Tape from cells treated with different combinations of stimuli. Of note, the recorders did not exhibit any discernible crosstalk, suggesting that the underlying signaling pathways are truly orthogonal (e.g. stimulating with CHIR does not lead to appreciable recording by the NF-κB recorder). (g) Cells bearing multiple recorders were exposed to all possible combinations of high, medium or low concentrations of three stimuli for 48 hrs, followed by harvesting and sequencing-based quantification of the levels of signal-specific barcodes. For Dox, 62.5, 250 or 1000 ng/ml were used; for TNFα, 1, 4 or 16 ng/ml; and for CHIR99021, 1, 2 or 2.5 μM. (h) Heatmap visualization of the data presented in panel g. The center and error bars in b,c,g correspond to mean and standard deviations, from n = 3 integration replicates.
Extended Data Fig. 7
Extended Data Fig. 7. Combining ENGRAM and DNA Typewriter to record the temporal dynamics of biological signals.
(a) Sequence of the predicted pegRNA modified for compatibility with DNA Typewriter. This pegRNA is similar to the one shown in Extended Data Fig. 1a except that the spacer and PBS are modified to target the DNA Typewriter Tape’s active type guide and the encoded insert is modified to include both a symbol (3-bp in this case) and a key sequence (3-bp). (b-d) Overall editing efficiencies for the serial (b), layered (c) and pulse (d) programs, stratified by position along the 5-unit DNA Typewriter Tape. (e) Heatmap showing proportions at which two possible unigrams (left columns) or two possible heterogeneous bigrams (right columns) were observed. Note that proportions as shown here are calculated separately here for those classes in isolation, i.e. for each row, the left two columns sum to 100%, and the right two columns sum to 100%. (f-h) Modified ENGRAM recorders and 5-unit DTT were sequentially integrated to PEmax(+) HEK293T cells. We designed and tested serial (f) and layered (g) programs in which these cells were exposed to different patterns of 100 ng/ml doxycycline or 3 μM CHIR-99021 (left columns) across a total of 36 cell populations (2 patterns * 3 intervals * 2 possible orders * 3 integration replicates). Harvesting, amplifying and sequencing the DTT region after 6 days, we show here the absolute fractions of [Tet-On→Wnt] and [Wnt→Tet-On] bigrams at Sites 1-2 (right columns). See also Fig. 4b,c. (h) Same as panels f-g, but for pulse programs in which these cells were exposed to 500 ng/ml doxycycline for 24 hrs against the background of continuous stimulation with 3 μM CHIR-99021. A total of 9 cell populations are represented (3 pulse timings * 3 integration replicates). See also Fig. 4d. (i) Longer signal durations are associated with longer homopolymeric runs of the signal-specific symbol. Focusing on symbols corresponding to the first signal applied in Serial programs, we calculated the proportion of homopolymeric runs of various lengths, i.e. consecutive, identical symbols beginning at the first position in the DTT. Log-scaled proportions for homopolymeric runs of 1-4 are plotted for signal durations of 1, 2 or 3 days. We did not observe any “5-in-a-row” homopolymeric instances in the data. The center and error bars in b-d, f-i correspond to mean and standard deviations, from n = 3 integration replicates.
Extended Data Fig. 8
Extended Data Fig. 8. Decoding dynamic signaling programs based on ensembles of editing patterns resulting from the combination of ENGRAM and DNA Typewriter.
(a-c) Comparison of different encoding strategies. DTT recording data was encoded as either the proportions of various unigrams/bigrams at each position (2*5 = 10 unigrams; 4*4 = 16 bigrams; 26 values) (left), the proportions of each possible state sequence that is consistent with ordered recording (20 + 21 + 22 + 23 + 24 + 25 = 63 values) (middle), or the proportions of all possible sequences of three states (unedited, Tet-on symbol, Wnt symbol) across 5 positions (3^5 = 243 values) (right). (a) Scree plots showing the proportion of variance explained by the top 5 principal components with each of the three strategies), which explain 96%, 89% and 50% of the variance, respectively. (b) PCA based on patterns observed in DTT across 45 cell populations subjected to various patterns of exposure to doxycycline and CHIR-99021 (15 programs, executed in triplicate). Circled subsets correspond to serial and layered programs in either order, or to pulse programs. The top three PCs are plotted for 26 value (left), 62 value (middle) or 243 value (right) encoding. (c) Barplot showing the accuracy of applying a random forest classifier to assess which of the 15 signal programs an unseen set of sequenced tapes derived from. We randomly split the tape ensembles from 45 samples (15 programs x 3 integration replicates) into 5 groups and conducted 5-fold cross-validation, i.e. using each group once as a test set, while training on all other groups. The model achieved a mean accuracy of 0.91. The center and error bars correspond to mean and standard deviations, from n = 5 5-fold cross-validation.
Extended Data Fig. 9
Extended Data Fig. 9. Multiplex recording of CRE activity in embryonic stem cells and differentiating gastruloids.
(a) Log-scaled insertion proportions for 5-mer barcodes linked to the 98 synthetic CRE-driven ENGRAM recorders were highly reproducible across integration replicates for cultured mESC, as read out by amplification and sequencing of synthetic DNA Tape. (b) Log-scaled proportions for 5-mer barcodes linked to the 98 synthetic CRE-driven ENGRAM recorders are well correlated between the original plasmid pool and genomic integrations in the polyclonal mESC line. (c-e) Log-scaled barcode proportion ratios, as calculated from one pair of replicates vs. as calculated from another pair of replicates, for mESC vs. HEK293T cells (c), mESC vs. K562 cells (d), or K562 vs. HEK293T cells (e). Note that we corrected the mESC data for differences in relative abundances of recorders in the polyclonal mESC line vs. the plasmid pool used to transiently transfect K562 and HEK293T cells (as shown in Extended Data Fig. 9b), prior to performing these comparisons. As the same number of cells were sampled for recording, the lower reproducibility for comparisons involving K562 cells is likely secondary to lower transfection/editing efficiency, as shown in Extended Data Fig. 5a. (f) Stacked bar plot showing the proportion of 5-mer barcodes associated with each of the 98 synthetic CRE-driven ENGRAM recorders in cell lines and gastruloids. Recorder activities are presented in the order of their maximal proportion across all samples. Error bars correspond to standard deviations across 3 transfection replicates (in K562 and HEK293T cells), or 3 integration replicates in mESCs and 2 integration replicates for gastruloid time-windows. Note that these labels are representative of TF(s) thought to bind each motif, and it remains uncertain which TF(s) are driving the activity of each synthetic CRE recorder. See Supplementary Table 3 for the corresponding consensus motifs, and the full list of TFs associated with each motif by the JASPAR database.
Extended Data Fig. 10
Extended Data Fig. 10. Multiplex recording of CRE activity in embryonic stem cells and differentiating gastruloids.
(a) Representative images of gastruloids induced from polyclonal ENGRAM mESCs, illustrating that the components of the ENGRAM recording system do not substantially impact the morphological development of gastruloids. Scale bar: 100 μm. (b) Overall ENGRAM recording efficiency at synthetic DNA Tape for cultured mESCs or differentiating mouse gastruloids in which PEmax was induced for a particular 24 hr window. The center and error bars correspond to mean and standard deviations, from n = 2-3 integration replicates. (c) Log-scaled insertion proportions for 5-mer barcodes linked to the 98 synthetic CRE-driven ENGRAM recorders were highly reproducible across integration replicates for differentiating gastruloids, as read out by amplification and sequencing of synthetic DNA Tape, for integration replicates in which doxycycline was used to induce PEmax during particular 24 hr windows. (d) Plot showing -log10 adjusted p-values of ENGRAM recorders (y-axis) across each 24 hr recording window (x-axis) during gastruloid differentiation. A total of 17 ENGRAM recorders exhibiting significant and substantial differences (Wald-test with Benjamini-Hochberg correction P < 0.1 for a fold-difference >2) for a particular window, in comparison to recordings made from the same polyclonal ENGRAM mESCs under normal culture conditions. The 12 recorders with increased activity are labeled in red, and the 5 recorders with decreased activity are labeled in blue. (e) Heatmap presenting recorded activities across each 24 hr interval (rows) for 48 of the 98 ENGRAM recorders with substantial activity in any of the five windows (columns). Columns are hierarchically clustered for presentation purposes. Values are log-scaled barcode proportion ratios for gastruloids with windowed recording in a particular 24 hr interval vs. cultured mESCs. The 17 recorders whose activity exhibiting significantly and substantially increased (*) or decreased (^) activity in one or more of the 24 hr windows, relative to cultured mESCs, are bolded (Wald-test with Benjamini-Hochberg correction P < 0.1 for a fold-difference >2).

References

    1. Choi, J. et al. A time-resolved, multi-symbol molecular recorder via sequential genome editing. Nature608, 98–107 (2022). - PMC - PubMed
    1. Golic, K. G. & Lindquist, S. The FLP recombinase of yeast catalyzes site-specific recombination in the Drosophila genome. Cell59, 499–509 (1989). - PubMed
    1. Sauer, B. Functional expression of the cre-lox site-specific recombination system in the yeast Saccharomyces cerevisiae. Mol. Cell. Biol.7, 2087–2096 (1987). - PMC - PubMed
    1. Kretzschmar, K. & Watt, F. M. Lineage tracing. Cell148, 33–45 (2012). - PubMed
    1. Livet, J. et al. Transgenic strategies for combinatorial expression of fluorescent proteins in the nervous system. Nature450, 56–62 (2007). - PubMed

MeSH terms

LinkOut - more resources