Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 5;541(7635):107-111.
doi: 10.1038/nature20777. Epub 2016 Nov 21.

Synthetic recording and in situ readout of lineage information in single cells

Affiliations

Synthetic recording and in situ readout of lineage information in single cells

Kirsten L Frieda et al. Nature. .

Abstract

Reconstructing the lineage relationships and dynamic event histories of individual cells within their native spatial context is a long-standing challenge in biology. Many biological processes of interest occur in optically opaque or physically inaccessible contexts, necessitating approaches other than direct imaging. Here we describe a synthetic system that enables cells to record lineage information and event histories in the genome in a format that can be subsequently read out of single cells in situ. This system, termed memory by engineered mutagenesis with optical in situ readout (MEMOIR), is based on a set of barcoded recording elements termed scratchpads. The state of a given scratchpad can be irreversibly altered by CRISPR/Cas9-based targeted mutagenesis, and later read out in single cells through multiplexed single-molecule RNA fluorescence hybridization (smFISH). Using MEMOIR as a proof of principle, we engineered mouse embryonic stem cells to contain multiple scratchpads and other recording components. In these cells, scratchpads were altered in a progressive and stochastic fashion as the cells proliferated. Analysis of the final states of scratchpads in single cells in situ enabled reconstruction of lineage information from cell colonies. Combining analysis of endogenous gene expression with lineage reconstruction in the same cells further allowed inference of the dynamic rates at which embryonic stem cells switch between two gene expression states. Finally, using simulations, we show how parallel MEMOIR systems operating in the same cell could enable recording and readout of dynamic cellular event histories. MEMOIR thus provides a versatile platform for information recording and in situ, single-cell readout across diverse biological systems.

PubMed Disclaimer

Conflict of interest statement

The authors declare competing financial interests: details are available in the online version of the paper.

Figures

Extended Data Figure 1 |
Extended Data Figure 1 |. MEM-01 consistently expresses short-lived transcripts from multiple integrated barcoded scratchpads.
a, The barcoded scratchpad transposon is composed of the following elements (left to right): the PiggyBac 5′ terminal repeat (triangle), the chicken HS4 insulator, a PGK promoter driving expression of the hygromycin resistance coding sequence, a 5′ FRT site, the PP7 scratchpad array consisting of 10 repeats, a 3′ FRT site, a barcode sequence (Supplementary Table 1), a priming region for sequencing and PCR, the BGH polyA, and the PiggyBac 3′ terminal repeat (triangle). b, Unique genomic integrations for the MEM-01 cell line were detected by qPCR. Bars show mean ± s.d. of four biological repeats with individual data points marked. c, The relative RNA expression levels of barcode integrations were quantified by RT–qPCR. Bars show mean ± s.d. of three biological repeats with individual data points marked. d, Scratchpad expression profiles remain constant over 1.3 months of passaging. Low- and high-passage cultures of MEM-01 cells (light and dark bars, respectively) were assayed for RNA expression levels by RT–qPCR. The unchanged expression levels indicate that most barcoded scratchpads express at a consistent level and are not routinely silenced over time. Bars show values from single biological samples with error bars calculated by combining in quadrature the technical replicate variation in barcode and normalizer quantitation cycle, Cq, values. eg, RNA half-lives assessed by RT–qPCR analysis of transcript levels after blocking transcription with actinomycin D (10μg ml). e, Barcoded scratchpad transcripts were assayed with two different sets of qPCR primers (left and right panels). These data indicate a half-life of approximately 2 h. f, g, Myc and Sdha are known to have short and long mRNA half-lives, respectively, and were assessed as controls, for comparison. Myc half-life (f) of 1 h was shorter than the other measured half-lives, while Sdha (g) was longer lived. For Sdha, the measured half-life value (indicated with an asterisk) is expected to overestimate the true value, as Sdha levels were determined relative to those of the similarly long-lived gene Atp5e, whose transcript levels were also decaying over the time course. A previous estimate of Sdha half-life in mESCs was 8–13 h (ref. 37). All sample transcript levels were assessed relative to those of Atp5e. Transcript abundances were normalized to 1 at time zero. Decay curves were fit assuming one-phase exponential decay using weighted nonlinear least squares regression (e, f) or assuming a linear approximation to exponential decay (g). Half-lives were determined on the basis of the best fit decay constants and a range reported based on the 95% confidence interval (shown in parentheses). Data represent two biological replicates with multiple technical replicates; error bars show standard deviations.
Extended Data Figure 2 |
Extended Data Figure 2 |. Barcoded scratchpads collapse to truncated products in activated cells and are stable in full-length and collapsed forms.
a, Agarose gel electrophoresis of PCR amplified scratchpads reveals scratchpad collapse after gRNA induction. Full-length scratchpads were amplified from plasmid DNA (lane 1), as well as from cells without gRNA constructs (lane 3), or with uninduced gRNAs (lane 4). By contrast, cells expressing gRNA showed shorter products (lane 5). Cells with no scratchpads are also shown as a negative control (lane 2). Bands corresponding to the full-length scratchpad and the collapsed scratchpad are indicated (arrows). Note that the laddering effect seen in all lanes and gels is due in part to PCR amplification artefacts with the repetitive arrays. For gel source data, see Supplementary Fig. 1. b, The lowest molecular weight band from scratchpad collapse, as shown in lane 5 in a, was extracted and subcloned into a vector. Nine of the colonies were sequenced. They aligned to a single repeat unit with 5′ and 3′ flanking regions, suggesting complete collapse of the repeats owing to Cas9 activity. Six of the nine sequencing reads resulted in collapse to a perfect single repeat (with a possible point mutation in the scratchpad sequence associated with barcode 2), and the remaining three sequencing reads had additional small deletions in the scratchpad. c, Scratchpad collapse requires induction of both Cas9 and gRNA. The gel shows scratchpad states for MEM-01 cells treated with no ligand, with Shield1 (to stabilize Cas9 protein), with Wnt3a (to induce gRNA expression), and with both Wnt3a (100 ng ml−1) and Shield1 (100 nM), all after 48 h. d, Scratchpad collapse increased with increasing gRNA activation, as assessed using smFISH to detect scratchpad co-localization with four highly expressed barcodes. Cells were analysed either without gRNA activation or 48 h after gRNA activation by addition of Wnt3a and Shield1 (same concentrations as in c). gRNA expression was measured by the intensity of co-expressed nuclear mTurquoise signal. Box plots show median (red bar), first and third quartiles (box), and extrema of distributions; n = 1,826, 1,081, 345, 191 cells, left to right. Related to Fig. 2c. eg, Scratchpad states remain stable over extended periods. e, Unactivated MEM-01 cells maintained uncollapsed scratchpads over timescales of months. f, To check the stability of individual barcoded scratchpad variants over time, multiple subclones of MEM-01 were isolated after no activation (control; top panels) and after a pulse of activation for 24 h (Wnt3a 100 ng ml−1, Shield1 100 nM; bottom panels). Subclones were assessed for the states of different barcoded scratchpad types after initial isolation (0 month relative age, left) and after one month of maintenance (right). The apparent collapse states (from uncollapsed to fully collapsed) of the barcoded scratchpad types were distinct in different subclones and remained stable over a month, indicating that scratchpad states are stable over these timescales. g, Barcoded scratchpads are also stable over long periods as assessed by smFISH readout. The fraction per cell of barcode transcripts (from four distinct barcode types) that co-localized with scratchpad signal was essentially unchanged between an unactivated low passage cell culture and one maintained for over a month. The imperfect co-localization fraction is largely the result of errors in smFISH detection and not gradual scratchpad collapse. Boxplots as in d; n = 1,826, 983 cells, left to right.
Extended Data Figure 3 |
Extended Data Figure 3 |. Scratchpad collapse works with an alternative gRNA, and in multiple cell types.
ad, A Cre-recombinase-activated gRNA is effective at inducing collapse events. a, Schematic of Creactivated gRNA system. The construct contains a constitutive PGK promoter driving expression of a histone 2B (H2B)–mTurquoise fusion protein (the H2B provides nuclear localization). This is followed by a U6 TATA-lox promoter driving expression of an shRNA against mTurquoise, followed in turn by a polyT (T6) transcriptional stop, and then a gRNA directed against scratchpad regions. Prior to Cre expression, expression of the shRNA keeps mTurquoise levels low (brown dashed line) and prevents expression of the gRNA. After the introduction of Cre, the shRNA-stop cassette is removed, allowing mTurquoise and gRNA expression. Thus, mTurquoise provides a visual marker of gRNA expression. This type of gRNA architecture could allow MEMOIR activation in specific tissues expressing Cre. b, PCR analysis shows that Cre can induce scratchpad collapse. Gel shows genomic DNA from a clonal cell line harbouring the construct in a. Scratchpads appear uncollapsed in untransfected cells (left lane), but show significant collapse after transfection with mRNA encoding Cre protein (right lane, approximately 52 h after transfection). Note that the laddering effect seen in all lanes and gels is due in part to PCR amplification artefacts with the repetitive arrays. c, smFISH analysis reveals Cre-activated scratchpad collapse. Quantification of barcode–scratchpad co-localization fractions as measured by smFISH. Cre transfection reduced scratchpad and barcode co-localization levels in cells that showed evidence of Cre activity, as assessed by mTurquoise expression (right). Transfected cells that were mTurquoise-negative or low and untransfected cells retained high co-localization levels (middle and left). Co-localization levels per cell were assessed based on the co-localization of four expressed barcodes with scratchpad transcripts. Box plots show median (red bar), first and third quartiles (box), and extrema of distributions; n = 995, 643, 649 cells, left to right. d, Example smFISH images of scratchpad and barcode co-localization detected in single cells containing the Cre-activated gRNA. Some activated cells (top panels, mTurquoise expression ‘on’) show loss of co-localized signal for a specific barcode (top panels, lower cell). Unactivated cells, as assessed by low mTurquoise expression, typically show no loss of co-localization (bottom panels). Scale bars, 10 μm. e, f, Scratchpads in CHO-K1 cells and yeast also undergo Cas9/gRNA-dependent collapse. e, Cas9- and gRNA-expressing plasmids were transiently transfected into Chinese Hamster Ovary (CHO-K1) cells containing stably integrated scratchpads. Gel analysis reveals Cas9 and gRNA-dependent scratchpad collapse (middle lane), while transfection with a Cas9-expressing plasmid alone or control plasmids resulted in no collapse (left and right lanes, respectively). f, Scratchpad collapse was tested in a yeast strain with doxycycline-inducible Cas9 and gRNA and integrated scratchpads. Before inducing Cas9-gRNA expression (lane 1 and 3), the scratchpads were intact. After Cas9-gRNA induction with 2 μg ml doxycycline for 11 h, scratchpads appeared collapsed (lane 2 and 4). Left two lanes (lanes 1 and 2) and right two lanes (lanes 3 and 4) correspond to two biological replicates. Note that the scratchpads in CHO-K1 and yeast cells have a similar scratchpad PP7 array to that used elsewhere but different flanking sequences, so their absolute PCR product lengths differ. For gel source data, see Supplementary Fig. 1.
Extended Data Figure 4 |
Extended Data Figure 4 |. Examples of lineage reconstruction for ten colonies.
Data for ten colonies that reconstructed with > 70% of pairwise relationships correctly identified are shown here. The bubble chart shows the number of barcode transcripts detected (bubble size) and the uncollapsed fraction (colour scale). Matrix of cell-to-cell barcode distance (dissimilarity) scores were computed from the data. Low (blue) values indicate more similar barcoded scratchpad collapse patterns. Note that sisters and cousins tend to have lower distance scores than second cousins, creating a block diagonal pattern in the distance matrix. Lineage trees were reconstructed based on the distance matrix using an agglomerative hierarchical clustering algorithm (see Methods). Cluster distances from the reconstruction algorithm are shown as branch heights in the reconstructed linkage trees. Percentages on the linkage trees represent frequencies of clade occurrence from a barcode resampling bootstrap. The percentage of correct relationships identified by the depicted lineage reconstruction is shown as a percentage and the actual tree is reported as [(x y)(x y)][(x y)(x y)], where sister pairs are denoted as (x y) and cousins are grouped in brackets ([...]).
Extended Data Figure 5 |
Extended Data Figure 5 |. Analysis of reconstruction failure modes.
These ten colonies showed reconstruction accuracies similar to those of random data. Bubble charts, distance matrices and linkage trees are shown as in Extended Data Fig. 4. Note the relative lack of block diagonal structures in the distance matrices, which typically reflect evidence of close sister or cousin relationships and less similar second cousins in better reconstructed colonies. Poor reconstructions result from insufficiently informative or inconsistent collapse patterns. These can occur in several ways. First, colonies may have too many collapsed scratchpads (for example, row 2, column 2), leading to degeneracy, and eliminating differences between clades. Second, and more often, colonies have too few collapsed scratchpads (for example, row 3, column 2) to reconstruct the full tree accurately. Third, colonies can provide inconsistent or incomplete lineage information such that the data do not point to one consistent lineage hypothesis (for example, row 5, column 1). Inconsistent information can arise from convergent collapse events in which the same scratchpad randomly collapses in separate branches of the lineage—such noise is inherent to this method of lineage tracking but can be significantly reduced by increasing the number of barcoded scratchpads. Additionally, variability in scratchpad expression, resulting from stochastic expression of individual barcoded scratchpads as well as apparent inconsistencies due to expression of multiple incorporations of the same barcoded scratchpad can generate conflicting information. Despite these issues, colonies can in many cases provide information about some lineage relationships. For example, for the colony in row 5, column 1, all the sister pairs are correctly identified, but they are not definitively placed in the lineage tree owing to conflicting readouts at the cousin level (for example, collapse events in barcodes 9 and 14). Similarly, for the colony in row 5, column 2, cells 3 and 4 are readily identified as sisters because of a common collapse event in barcode 9. But, there is little additional information, such as a collapse event from the two-cell-stage, which would allow the cousins to be correctly identified. These and other sources of noise impacting colony reconstruction are analysed in more detail in Extended Data Fig. 7 and Supplementary Information, and can be addressed in future implementations of MEMOIR.
Extended Data Figure 6 |
Extended Data Figure 6 |. Bootstrap reconstruction score enriches for colonies that exhibit more accurate lineage reconstruction.
a, A bootstrap procedure (Methods) was used to determine the robustness of clade reconstruction to resampling of barcode data for each colony. The frequency of lineage reconstruction at the first cousin clade level was then used to rank all 108 colonies. Colonies with higher reconstruction robustness were enriched for more accurate lineage reconstructions, although no information about accuracy was used to identify these colonies. The top 20% of colonies based on bootstrap score were termed subset 1 (left of blue line; n= 22). This group correctly identified an average of 72% of relationships. The top 40% of colonies were termed subset 2 (left of green line; n = 43) and correctly identified 67% of relationships. Grey region indicates the range of correct relationships expected from random guessing of trees (mean ± s.d. indicated by line and shading). The bootstrap metric effectively filters out colonies that have insufficient or inconsistent scratchpad collapse information and thus do not robustly generate the same reconstruction. Noise sources that affect the data include convergent scratchpad collapse, imperfect collapse rates that may not result in collapse events every generation, and variable scratchpad expression that limits readout signal or introduces ambiguities due to expression from multiple incorporations of the same barcode type (see Extended Data Fig. 7 and Supplementary Information). b, Cumulative distributions show the fraction of pairwise sister, first cousin, and second cousin relationships correctly identified in each colony. Reconstruction accuracies of all these types of lineage relationships are similar to predictions based on the simulated model with eight scratchpads (no noise included). This shows that reconstruction is accurate across all levels of relationships. Related to Fig. 3g.
Extended Data Figure 7 |
Extended Data Figure 7 |. Comprehensive error analysis identifies scratchpad expression variability as the key source of noise in MEMOIR experiments.
a, Overall reconstruction errors result from three types of noise: the inherent stochastic nature of recording lineage information with stochastic scratchpad collapse events, recording noise (due to fluctuations in the expression levels of Cas9 and gRNA), and readout noise (due to fluctuations in the expression levels of the barcoded scratchpads, variable expression from multiple integrations of the same barcoded scratchpad species (BC), and the fidelity of smFISH imaging readout). b, Cell–cell variability can be decomposed into intra-colony and inter-colony components, as shown schematically. For each hypothetical colony, the relative amounts of each type of variability are plotted (also schematic). c, Plots show experimentally measured intra- and inter-colony noise from gRNA activity (from the fluorescent signal of the Wnt reporter, left), Cas9 expression (from the transcript counts by smFISH, middle), and scratchpad expression (from transcript counts by smFISH, right). These plots represent data from individual cells of all 108 MEM-01 colonies (see Supplementary Information for details). d, Recording noise results in a small decrease in reconstruction accuracy. The plot on the left shows the cumulative distribution of reconstruction accuracies of 500 simulated colonies comprised of trees of three generations, with an average scratchpad collapse rate of 0.1, and 13 scratchpads. The heat map on the right shows the average reconstruction accuracy for 500 simulated colonies for a range of average collapse rates and number of scratchpads. e, Fluctuations in scratchpad (SP) expression levels substantially reduce reconstruction accuracy. Simulation results are plotted as in d, but with the addition of readout noise, rather than recording noise, to the idealized simulations. The readout noise is added as two separate components: scratchpad expression level fluctuations, which significantly increase error, and noise due to smFISH imaging fidelity, which contributes minimally to reconstruction error. The curves are for two integration sites per barcode. f, Cumulative distribution of reconstruction accuracy of 500 simulated colonies with all three components of noise included for different numbers of integration sites per barcode. The thick blue line is the experimental distribution obtained from the 108 MEM-01 colonies. The simulated distribution is consistent with the experimentally observed distribution, especially for two effective integrations per barcode. No fitting parameters were used.
Extended Data Figure 8 |
Extended Data Figure 8 |. Performance analysis on deeper trees and trees with missing cells.
a, Simulations of reconstruction accuracy of full binary trees for varying numbers of unique barcoded scratchpads, varying collapse rates, and varying numbers of generations (N). The colour of the heat maps corresponds to the fraction of all pairwise lineage relationships correctly identified in the reconstructed tree, averaged over many simulated trees (Fig. 3h in the main text, also see Methods). Even at greater depth (for example, N = 10), trees can be reconstructed accurately with approximately 50 scratchpads. b, The collapse rate that maximizes reconstruction accuracy depends on the number of generations to be tracked, but is only weakly dependent on the number of scratchpads. This is because maximal lineage information is recorded when each scratchpad has a probability of 0.5 of having collapsed by the final time point, regardless of the total number of scratchpads. The plot shows the optimal collapse rate as a function of tree depth, as determined from the simulations (dots) as well as the theoretical expectation of a cumulative collapse probability of 0.5 per scratchpad (dashed line). The theory curve contains no fitting parameters. c, Simulations of reconstruction accuracy for binary trees of three generations as a function of the number of scratchpads and the scratchpad collapse rate for trees with one (left), two (middle), or three (right) randomly chosen endpoint cells missing. Compare with reconstruction accuracy for trees with no missing cells in Fig. 3h. The schematic above each panel shows the topology and branch lengths of trees with the given number of missing cells. A modified neighbour joining algorithm was used to exhaustively score all 315 possible reconstructions. To distinguish between reconstructions where tree topology is the same but the branch lengths are different (two such trees are shown bracketed in the schematic of the middle panel), we modified the reconstruction algorithm to estimate the branch lengths connecting a pair of cells based on the hamming distance of their barcoded scratchpad collapse patterns (see Supplementary Information). For example, two cells whose collapse patterns differ substantially would be estimated to have a longer lineage distance between them than would cells with more similar patterns. In general, trees with missing leaves can be reconstructed with accuracy similar to full binary tree (Fig. 3h). As the number of missing cells increases, the reconstruction accuracy decreases because there are fewer cells in the tree to provide lineage information.
Extended Data Figure 9 |
Extended Data Figure 9 |. Simulations show that MEMOIR can operate at low collapse rates to reconstruct sparse trees.
We simulated MEMOIR in the sparse recording regime, in which collapse events for any given lineage occur, on average, once every few generations. Trees were generated using simulations and reconstructed using a maximum parsimony approach (see Supplementary Information). Experimentally, sparse tree regimes in which collapse events occur infrequently could be achieved with low Cas9 and/or gRNA expression levels or rare expression events (for example, by using weak promoters, occasionally-activated promoters, protein degradation domains), or with decreased Cas9-mediated affinity for target scratchpads (for example, by decreasing the complementarity between the gRNA and target). a, Cartoon of sparse collapse events on a full binary tree. Each collapse changes the state of each scratchpad (arrays of red or black boxes, shown only at nodes where new collapse events occur). At the final generation, there are five populations of cells with distinct collapse patterns, each shown in a different colour. In the sparse representation of the tree (right) each collapse event corresponds to a new branch, and the five leaves correspond to the five subpopulations of cells with distinct collapse patterns. b, Possible source of reconstruction errors. Unrelated clades can converge independently to the same collapse pattern and thus become indistinguishable, resulting in reconstruction errors (tree on the left), but the probability of such coincidences decreases with increasing number of scratchpads (all clades are distinguishable for the tree on the right). c, A simulated sparse tree with 30 leaves and an average depth of 2.4 ± 1.3. The depth of the tree is defined as the cumulative number of collapse events experienced by each leaf averaged over all the leaves of the tree. The statistics of this tree shape is approximately equivalent to a sparse tree generated by a collapse rate of 0.33 per cell per generation on a full tree of six generations. The heat map shows the status of the scratchpad sites for all the leaves. Each column corresponds to a particular barcoded scratchpad, and each row to a leaf. d, Same as in c, but for a simulated sparse tree with 100 leaves and a depth of 3.1 ± 1.6; approximately equivalent to a collapse rate of 0.275 per cell per generation on a full tree of eight generations. e, The fraction of correctly identified tree partitions (defined using the Robinson–Foulds metric) is shown as a function of the number of scratchpads, and normalized by its value in the limit of an infinite number of distinct scratchpads (where a unique collapse pattern is generated for every collapse event). Sparse trees of three different sizes (that is, different numbers of leaves and depth) were generated. Each dot corresponds to one simulated tree. Tree size was held constant as the number of scratchpads was increased, requiring a fixed collapse rate per cell but a collapse rate per scratchpad that scaled inversely with scratchpad number. Trees with fewer leaves and lower depth required fewer scratchpads for accurate reconstruction. But, even larger trees could recover close to the maximal lineage information using only a modest number of scratchpads.
Extended Data Figure 10 |
Extended Data Figure 10 |. The Esrrb expression level distribution is stationary.
a, Distribution of the number of Esrrb transcripts in individual cells in populations of MEM-01 ES cells activated by the addition of Wnt3a and Shield1 (same conditions as the colonies analysed in Figs. 3 and 4) for different amounts of time (0, 24, and 48 h from top to bottom). The distribution of Esrrb transcript counts does not change significantly over 48 h of Wnt3a exposure as quantified by the P value of the Kolmogorov–Smirnov (KS) test. The Kolmogorov–Smirnov test was performed for the observed distributions at 24 and 48 h with respect to the reference distribution at 0 h. The cumulative distribution functions (bottom) similarly show that the fraction of cells in the low (or high) Esrrb expression state does not change significantly over 48 h of Wnt3a activation. A stationary Esrrb distribution implies that transitions between the low and high Esrrb expression states must be reversible. b, LIF removal changes the Esrrb distribution. Same as in panel a but with LIF removed from the media at t = 0. The distributions show a significant change during the 48 h period, with the fraction of cells in the low Esrrb expression state increasing over time, as expected,.
Figure 1 |
Figure 1 |. The MEMOIR system for recording and in situ readout of cell lineage.
a, Barcoded scratchpads provide a general purpose recording element whose state can be irreversibly altered by Cas9/gRNA-mediated cleavage. b, The MEMOIR recording system consists of three types of components, all stably integrated into the genome: (1) a Cas9 variant containing an inducible degron (DD) that is stabilized by the small molecule Shield1. (2) A Wnt-inducible gRNA targeting the scratchpad, co-expressed with a fluorescent protein (mTurquoise). Ribozyme sequences (HH, HDV) enable gRNA excision. (3) A set of barcoded scratchpads (two-colour elements) integrated throughout the genome. Inverted triangles in a and b denote PiggyBac terminal repeats, used for genome integration. c, The MEMOIR recording and readout process. During recording, scratchpads collapse stochastically as cells proliferate, producing distinct scratchpad states in each cell. During readout, individual mRNA molecules are detected with a single scratchpad-specific probe set (orange, inset), and multiple barcode-specific probe sets (blue, green, inset) through sequential rounds of hybridization and imaging. Uncollapsed scratchpads produce co-localized barcode and scratchpad signals (overlapping dots), while collapsed scratchpads produce only a barcode-specific signal (single dots).
Figure 2 |
Figure 2 |. In situ readout of scratchpad state.
a, smFISH readout of scratchpad state in two cells (white outlines). The scratchpad associated with barcode 2 has collapsed in the lower cell, but remains uncollapsed in the upper cell. Overlaid images are slightly offset for visual clarity. b, Histograms of scratchpad smFISH signal intensities, identified as collapsed (blue) or uncollapsed (orange) based on scratchpad–barcode co-localization. The fraction of collapsed scratchpads increased after 48 h of activation (top versus bottom panel). Far right bars indicate smFISH signal exceeding the maximum displayed intensity. c, Scratchpad collapse accumulates over time post activation. Box plots show median (red bar), first and third quartiles (box) and extrema for four highly expressed barcodes; n = 1,826, 418, 610, 545 cells, left to right. Activated samples in b and c only include gRNA-expressing cells, as measured by co-expression of mTurquoise. d, Multiplexed readout of barcoded scratchpads (scratchpad, SP; barcode, BC) by sequential rounds of hybridization with distinct probe sets (colours) provide information about the collapse status of multiple barcoded scratchpads in each cell (right). e, Example of seqFISH analysis. Scratchpads (red) and three pairs of barcodes (middle images) are shown (pseudocoloured). Solid and dashed circles at barcode positions indicate uncollapsed and collapsed scratchpads, respectively. Barcode data are superimposed on the scratchpad image in the final panel. For clarity, additional hybridizations and barcodes are not shown. Scale bars (a, e), 10μm (left images) and 2 μm (magnified panels).
Figure 3 |
Figure 3 |. MEMOIR enables lineage reconstruction in ES cell colonies.
a, Time-lapse videos of colony growth were acquired to provide lineage ‘ground truth’ (dashed lines) for later validation of reconstructed lineages, but not for reconstruction itself. b, At the end of the movie, seqFISH was performed, as in Fig. 2. Scale bar, 20μm. c, Examples of how barcoded scratchpad collapse patterns reflect cell lineage. d, MEMOIR readout for the colony in a–c, showing the number of barcode transcripts detected (bubble size) and the uncollapsed fraction (colour scale). e, Data from d were used to compute a matrix of cell-to-cell barcode ‘distance’ (dissimilarity) scores. f, Reconstructed lineage tree for the same colony (Methods). Percentages on the tree represent the frequencies of clade occurrence from a barcode resampling bootstrap procedure. In this case, the reconstructed tree matches that obtained from the video. g, Cumulative distributions show the fraction of all pairwise relationships correctly identified in each colony, for all colonies, and for the top 20% (subset 1) or 40% (subset 2) ranked by bootstrap score. h, Idealized simulations of three-generation binary trees show how reconstruction accuracy (fraction of relationships correctly identified, colour) depends on collapse rate and number of scratchpads. i, Cumulative distributions from simulations of MEMOIR show how empirically measured noise sources affect reconstruction accuracy in simulated trees, assuming 13 scratchpads. gRNA and Cas9 expression noise adds some reconstruction error (dotted line), which is strongly increased by additional noise from scratchpad expression variability, assuming two expressed integrations per barcode (dashed line), and increased slightly more by addition of smFISH readout noise (solid line).
Figure 4 |
Figure 4 |. MEMOIR enables inference of gene expression dynamics and the recording of cellular events.
ad, Gene expression dynamics inference (see Supplementary Information). a, The rates of switching between two gene expression states can be inferred by combining reconstructed lineage information and endpoint gene expression measurements (schematic). Inference works because switching rates affect the degree of cell state clustering on endpoints of lineage trees,. This analysis can be performed for multiple genes (red, green, blue), which could exhibit different dynamics, as shown schematically. b, Fits to the bimodal distribution of single-cell Esrrb transcript counts enable probabilistic assignment of cells to either the low (E) or high (E+) Esrrb expression state. c, Esrrb expression states mapped onto endpoints of lineage trees reconstructed by MEMOIR suggest that these states are stable for multiple generations. Two example colonies are shown, with numbers indicating single endpoint cells. Scale bars, 20 μm. d, Frequency of occurrence in the same state (E, top; E+, bottom) of pairs of sisters, first cousins, and second cousins from MEMOIR reconstructions of the 30 colonies with highest reconstruction confidence scores among the 85 colonies in which Esrrb was measured (blue, red) and from the actual lineages of the same colonies (grey). Transition rates inferred from MEMOIR are shown at right. eg, Cellular event recording (schematic). e, gRNA1 (orange) is constitutively expressed for lineage reconstruction, while the orthogonal gRNA2 (purple) and gRNA3 (green) are expressed in response to specific signals and target independent scratchpads sets. f, Schematic showing recording of possible signalling histories (purple and green shading indicate periods when signals 1 and 2, respectively, are present. g, Reconstruction of simulated event histories in a six-generation tree. The signals recorded along two branches (yellow) are shown (bottom panels), including the actual simulated signals (thick lines), examples of individual reconstructed signals (dashed lines), and the average reconstructed signals (solid lines; mean ± s.d., n = 500 trees) (Methods).

Comment in

References

    1. Frumkin D, Wasserstrom A, Kaplan S, Feige U & Shapiro E Genomic variability within an organism exposes its cell lineage tree. PLOS Comput. Biol 1, e50 (2005). - PMC - PubMed
    1. Salipante SJ & Horwitz MS Phylogenetic fate mapping. Proc. Natl Acad. Sci. USA 103, 5448–5453 (2006). - PMC - PubMed
    1. Behjati S et al. Genome sequencing of normal cells reveals developmental lineages and mutational processes. Nature 513, 422–425 (2014). - PMC - PubMed
    1. Wasserstrom A et al. Reconstruction of cell lineage trees in mice. PLoS One 3, e1939 (2008). - PMC - PubMed
    1. Lodato MA et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015). - PMC - PubMed