Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Aug 24;548(7668):456-460.
doi: 10.1038/nature23653. Epub 2017 Aug 16.

Polylox barcoding reveals haematopoietic stem cell fates realized in vivo

Affiliations

Polylox barcoding reveals haematopoietic stem cell fates realized in vivo

Weike Pei et al. Nature. .

Abstract

Developmental deconvolution of complex organs and tissues at the level of individual cells remains challenging. Non-invasive genetic fate mapping has been widely used, but the low number of distinct fluorescent marker proteins limits its resolution. Much higher numbers of cell markers have been generated using viral integration sites, viral barcodes, and strategies based on transposons and CRISPR-Cas9 genome editing; however, temporal and tissue-specific induction of barcodes in situ has not been achieved. Here we report the development of an artificial DNA recombination locus (termed Polylox) that enables broadly applicable endogenous barcoding based on the Cre-loxP recombination system. Polylox recombination in situ reaches a practical diversity of several hundred thousand barcodes, allowing tagging of single cells. We have used this experimental system, combined with fate mapping, to assess haematopoietic stem cell (HSC) fates in vivo. Classical models of haematopoietic lineage specification assume a tree with few major branches. More recently, driven in part by the development of more efficient single-cell assays and improved transplantation efficiencies, different models have been proposed, in which unilineage priming may occur in mice and humans at the level of HSCs. We have introduced barcodes into HSC progenitors in embryonic mice, and found that the adult HSC compartment is a mosaic of embryo-derived HSC clones, some of which are unexpectedly large. Most HSC clones gave rise to multilineage or oligolineage fates, arguing against unilineage priming, and suggesting coherent usage of the potential of cells in a clone. The spreading of barcodes, both after induction in embryos and in adult mice, revealed a basic split between common myeloid-erythroid development and common lymphocyte development, supporting the long-held but contested view of a tree-like haematopoietic structure.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest. The German Cancer Research Center has filed an international patent application entitled “Genetic random DNA barcode generator for in vivo cell tracing” (PCT/EP2016/065932). HR Rodewald, TB Feyerabend and W. Pei are listed as inventors.

Figures

Extended Data Fig. 1
Extended Data Fig. 1. Generation of the Rosa26Polylox locus and experimental procedures for barcode detection and analysis.
a, Gene targeting of Polylox DNA into the Rosa26 locus in ES cells; shown are the wild type Rosa26 locus (top), targeting vector (middle) and targeted Rosa26Polylox locus (bottom). Southern blot (insert) (Supplementary Fig. 1b) from genomic tail DNA of control Rosa26+/+ and Rosa26RFP/+ (ref. 31) mice, and from three Rosa26Polylox/+ ES cell clones shows restriction fragments corresponding to wild-type (5.8 kb) or targeted (4.8 kb) loci. b, Kinetics of Polylox locus recombination after treatment of Rosa26Polylox MerCreMer ES cells with 4-OH tamoxifen (4-OHT) at 0 h (Supplementary Fig. 1c). c, Rosa26Polylox MerCreMer ES cells were left untreated and followed for 27 days (left panel), or pulsed with 4-OHT for 3 hours and chased over 34 days (right panel) (Supplementary Fig. 1d). d, Workflow from cell isolation to Polylox barcode detection. Cell populations of interest were isolated by cell sorting, genomic DNA was purified and the Polylox cassette was amplified by PCR, and the fragments (see Methods) were sequenced by single molecule real time (SMRT) sequencing using Pacific Biosciences instruments. Raw data was processed with the accompanying software package to obtain the circular consensus sequences (CCS). Subsequently, CCS were filtered for reads containing complete Polylox sequences. Next, we aligned the barcode segments to the CCS reads and determined the order and orientation of the segments to retrieve the recombined Polylox barcodes (see Methods). Finally, CCS with incomplete segment alignment (X), or illegitimate segment orders (e.g. segment duplications) were filtered out and removed from the further analysis.
Extended Data Figure 2
Extended Data Figure 2. Example of a complex CCS DNA sequence and its corresponding Polylox barcode.
a, Schematic drawing of the unrecombined Polylox cassette, and an experimentally found recombined barcode. b, Full nucleotide sequence of one CCS read. In 5’ to 3’ orientation, the DNA sequence is organized into intervening loxP sites and the annotated barcode blocks (‘barcode alphabet’). Numbers and letters refer to the segments shown in a. c, Proportions of unrecombined (blue) and recombined (red) sequence reads in granulocytes (Gr), B cells (B2), CD4 T cells (CD4), and CD8 T cells (CD8) from adult Rosa26PolyloxTie2MCM mice without tamoxifen (TAM) treatment (top row), adult Rosa26Polylox (middle row) and adult Rosa26PolyloxTie2MCM (bottom row) mice, each treated as embryos with tamoxifen.
Extended Data Figure 3
Extended Data Figure 3. Barcode generation probabilities and number of Polylox recombination events in acutely labeled B cells.
a, Barcode generation probabilities were computed for a set of barcodes found experimentally (mouse #3, Extended Data Table 1, n=506 barcodes) with and without length dependence of recombination rate, as described in Supplementary Methods. b, To compare the frequencies of individual barcode segments (‘letters’) generated by the model with experimental data, we focused on data from a Rosa26Polylox/CreERT2 mouse treated with tamoxifen (see Figure 3a, mouse #1), from which about 15,000 acutely barcoded B cells were analyzed. To simulate barcode generation in 15,000 cells, 15,000 barcodes were drawn (with the frequencies of recombination events shown in e below). This procedure was repeated 500 times to obtain standard deviations. c, Measured and computed distributions of fragment lengths are shown (experimental data and simulations as in b). d, The observed and measured distributions of the 180 possible pairs of adjacent segments are shown (experimental data and simulations as in b); the unrecombined pairs are particularly abundant. The PacBio instrument loads longer fragments less efficiently than shorter ones. Because of this bias, we restricted the analysis in b-d to fragments with 1, 3 and 5 segments. e, For all barcodes found in B cells from the mouse in b we computed the minimal number of recombination events (excisions or/and inversions) needed to generate the barcode. All barcodes can be generated with six or fewer recombination events. The cumulative distribution of event frequencies is shown. Similar distributions were obtained in the reported barcoding experiments with Rosa26PolyloxTie2MCM mice. f, Barcodes generated once or multiple times in a simulated sample of 15,000 cells (as described in b). On average, 2,920 ± 35 (mean ± s.d.) different barcodes were generated with 15,000 draws. g, Measured barcode frequencies versus computed generation probabilities. For all barcodes retrieved in adult mice after barcode induction in embryonic HSC progenitors (Fig. 4, Exp. 1 and 2), we binned total barcode frequencies according to generation probabilities and calculated boxplot statistics of observed barcode frequencies for each bin (red line, median; box ends, 25% and 75% percentiles; bars, most extreme data points not considered outliers; red dots, outliers, n = 4, 95, 175, 97, 73, 28, 11 barcodes (left) and n = 4, 102, 180, 75, 42, 31, 14 barcodes (right)). From generation probabilities of 0.1 to 10–4, observed median frequency and probability of generation were overall correlated, showing that barcodes generated with higher probability are recovered more frequently. By contrast, for barcodes with generation probabilities ≤ 10–4, their median frequency of observation was independent of the probability of generation, indicating that these barcodes have each been generated in the smallest possible unit, a single embryonic HSC progenitor.
Extended Data Figure 4
Extended Data Figure 4. Histogram of apparent HSC clone sizes.
For the single HSC sequencing data (Fig. 4, Exp. 1 and 2) we show separately the histogram of apparent clone sizes. An apparent clone is defined by all HSC that contain the same barcode. These apparent clones are unlikely to all be biological clones, due to the inclusion of abundant barcodes that may have been generated in more than one embryonic HSC progenitor. For the analysis of rare barcodes that are highly likely to define true clones generated from single HSC progenitors see Figure 4b and e.
Extended Data Figure 5
Extended Data Figure 5. Overview of FACS gating strategies.
a-k, Distinctive surface marker combinations applied for the isolation of specific cell populations are depicted. Pre-gated lineage markers are indicated above the first plot of each panel. Not shown is the additional gating of all populations for size (FSC, SSC) and viability (Sytox blue). For the complete listing of antibodies and marker phenotypes, see also methods section. a, Isolation of Kit+Sca1+ stem cells (HSC and ST-HSC) and multipotent progenitors (MPP), upper right, and Kit+Sca1 myeloid progenitors (CMP and GMP), lower right, from bone marrow. b, Characterization of bone marrow CLP. c, Definition of pre-B cells (Fr. B and Fr. C) in bone marrow. d, Thymic pre T cells (DN2 and DN3). e, Gating of nucleated erythrocyte progenitors in the bone marrow (EryP II-IV, upper right, and EryP I, lower right). f, CD4 or CD8 single-positive T cells from spleen. g, Classical CD19+ splenic B cells. h, Neutrophilic granulocytes from the spleen. i, Splenic monocytes. j, Non-classical B cells (B1a and B1b) from the peritoneal cavity. k, Classical CD19+ B2 cells from peritoneal cavity.
Extended Data Figure 6
Extended Data Figure 6. Adult barcode distributions in embryonically induced mice (Fig. 4a, Exp. 2).
a, Heatmap of all barcodes found in HSC (first lane) and the indicated erythroid, myeloid and lymphoid lineages in Exp. 2. b, Heatmap of peripheral barcodes (Pgen < 10–4, and detected in two independent samples of the same population) sorted according to lineage output in Exp. 2. Frequencies of barcodes are represented by color-coded scales on the right.
Extended Data Figure 7
Extended Data Figure 7. Clustering of cell types according to all mutual correlations reveals robust dichotomy between common myelo-erythroid and common lymphoid development (all data from adult mice with embryonically induced barcodes) (Fig. 4a, Exp. 2).
a-c, Barcode frequencies in CD8 T cells versus B lymphocytes (B2) (a), in granulocytes (Gr) versus B lymphocytes (B2) (b) and granulocytes (Gr) versus CD4 T cells (c). Data in a-c are from Exp. 1, and each dot is an individual rare barcode with n=48 (a) and n=49 (b) and N = 53 (c). d, Hierarchical clustering (with distance 1 – Spearman rank correlation coefficient, as described in Fig. 5d) applied to rare and reliably sampled barcodes found in indicated populations in Exp. 2 (n=50). e, Hierarchical clustering as described in d but applied to all barcodes found in peripheral cells in Exp. 1 (n=506) and Exp. 2 (n=496). The inclusion of redundant barcodes reduces differences in correlations, yet the split between common myelo-erythroid and common lymphoid is evident. f, Clustering as described in d but applied to rare multilineage barcodes (found in at least one erythroid, granulocyte and lymphocyte population, analogous to Fig. 4f; Exp.1, n = 16 and Exp. 2, n = 25). g, Clustering as described in d but applied only to barcodes found in adult HSC, including redundant ones (shown in Fig. 4d; Exp.1, n=54 and Exp. 2, n=56). h, Summary of Spearman rank correlations (mean and 95% confidence bounds computed by non-parametric bootstrap) of GMP versus the indicated lineages (for CMP, see Fig. 5h); rare barcodes are from Exp. 2, n=30-44.
Extended Data Figure 8
Extended Data Figure 8. Polylox barcoding of hematopoiesis in adult mice (all data from adult mice with barcodes induced as adults).
a, Barcodes were induced by tamoxifen treatment of adult Rosa26PolyloxTie2MCM mice, and the indicated cell populations were analyzed at 11 to 13 months of age. b, Tamoxifen treatment of adult Rosa26PolyloxTie2MCM mice (Extended Data Table 1) induced Polylox recombination in HSC and, to a lesser extent, also in downstream stem and progenitor cells, ST-HSC, MPP and CMP (Supplementary Fig. 1e) c, Heatmaps of barcodes satisfying single-cell induction criteria (at the time of labeling) recovered in the indicated stem and progenitor cells, and mature cells in Exp. 3 (left panel), and Exp. 4 (right panel). Frequencies of barcodes are represented by color code (scale on the left). d, Heatmaps for individual HSC, satisfying adult single-cell barcode induction criteria, and their lineage output in Exp. 3 (top panel) and Exp. 4 (bottom panel). Pgen for the multilineage barcodes were as follows: ‘1F92G45H3’, 1.3 x 10-9; ‘123FG45’, 2 x 10-5. e, The barcode overlap between two samples of the same cell population (granulocytes and CD4 T cells isolated from the peripheral blood; 30,000 cells per sample) was smaller than for embryonically labeled mice (cf. Fig. 4c). f, Hierarchical clustering of rank correlations of barcodes from the indicated populations (Exp. 3, n = 129). The color scale (not shown) for rank correlations is identical to the scale bar shown in Fig. 5i.
Extended Data Figure 9
Extended Data Figure 9. Induction of Polylox recombination in tissues of all three germ layers.
a, To induce Polylox recombination in different tissues in vivo, we crossed the Rosa26Polylox allele into mice bearing the Rosa26CreERT2 allele, which encodes ubiquitously expressed, tamoxifen-regulated Cre, yielding Rosa26Polylox/CreERT2. b, Adult Rosa26Polylox/CreERT2 mice were injected with tamoxifen, or with oil only (vehicle control) according to the depicted schedule, and were analyzed on day 5. c, Genomic DNA was prepared from indicated organs that represent developmental derivatives of all three germ layers: brain (ectoderm), muscle, spleen, and thymus (mesoderm), and liver and lung (endoderm). The Polylox cassette was amplified by PCR, and recombination in each tissue and for all time points was visualized by separating DNA fragments by gel electrophoresis (Supplementary Fig. 1f). The first lane is the PCR water control, the second lane is from Rosa26+/+ DNA template, and the third lane is from Rosa26Polylox (no Cre) template; all other lanes show data from Rosa26Polylox/CreERT2 mice for the indicated organs and conditions. The DNA sample and PCR result from the muscle oil control were not available.
Figure 1
Figure 1. Polylox: A Cre recombinase-driven artificial DNA recombination substrate.
a, Structure of the Polylox cassette with loxP sites (triangles; black and white split symbolizes recombination site). Colored linkers represent DNA segments ‘1’-‘9’. Examples for recombination products resulting from one Cre-mediated excision, and one Cre-mediated inversion are shown. The original code segments (‘letters’) are abbreviated ‘1’-‘9’, and their inversions ‘A’-‘I’. b, In vitro digestion of Polylox DNA insert in pWP-AG vector by Cre recombinase, and size resolution of recombination products by gel electrophoresis (Supplementary Fig. 1a).
Figure 2
Figure 2. Combinatorics of Polylox barcoding.
a, Illustration of stepwise recombinations, considering only the DNA segments and loxP sites in the red box. The decreasing green shades indicate an increase in the minimum number of recombination events required to generate a given barcode. b, Calculation of theoretical numbers of barcodes reached with increasing recombination events in the Polylox locus, with a maximum barcode number of > 1.8 million.
Figure 3
Figure 3. Polylox barcoding in vivo.
a, Barcodes in splenic B cells isolated 18 hours after induction of recombination by tamoxifen in Rosa26Polylox/CreERT2 mice. Barcodes were ranked according to their generation probability, considering inversions (Pinv) equally likely as excisions (Pexc) (black line), or half as likely (blue line), or twice as likely (red line). Barcode examples are shown with inversions (blue numbers) and deletions (red numbers) underlying barcode generation. The decadic logarithm is used here and throughout the paper. b, Venn diagrams indicating unique and shared barcodes in three independent samples (mice # 1, 2, 3; Extended Data Table 1). c, Generation probabilities of barcodes shared between all mice (n=177 barcodes) (‘All’), between two mice (n=255) (‘2’), and unique barcodes occurring only in one mouse (n=1042) (‘unique’) in the Venn diagrams shown in b (red line, median; box ends, 25% and 75% percentiles).
Figure 4
Figure 4. Polylox barcoding in embryonic mice and HSC fate mapping.
a, Barcode induction in emerging HSC in embryonic mice, and analysis of barcodes in single HSC and the indicated populations in adult mice at nine (Exp. 1) and eleven (Exp. 2) months of age. HSC sites (yolk sac, aorta-gonad-mesonephros (AGM), fetal liver and bone marrow) are depicted. b, Frequency distributions of barcodes found in 382 recombined HSC (Exp. 1), and in 427 recombined HSC (Exp. 2) (for numbers of unique barcodes see Extended Data Fig. 4). Redundant length ‘1’ barcodes are not displayed. Light blue, barcodes with Pgen > 10–4; other colors mark clones with Pgen < 10–4; there were five such clones in Exp. 1, and nine in Exp. 2. c, Barcode overlay of independent samples from Exp. 1. Venn diagrams indicate numbers of shared and non-shared barcodes between repeat samples of indicated populations. d, Heatmap of barcodes in HSCs (first lane) and the indicated erythroid, myeloid and lymphoid lineages in Exp. 1 (Gr, granulocytes; BM, bone marrow; Sp, spleen; PEC, peritoneal exudate cells; numbers in parentheses indicate independent samples from the same lineage). e, Heatmap of individual HSC clones satisfying embryonic single cell barcode induction criteria, and their lineage output in Exp. 1 and Exp. 2. CMP and GMP were isolated and analyzed only in Exp. 2 (N.D., not done). f, Heatmap of peripheral barcodes (Pgen < 10–4, and detected in two independent samples of the same population) sorted according to lineage output (Exp. 1). Frequencies of barcodes are represented by color-coded scales on the right for d and e, or for f.
Figure 5
Figure 5. Hierarchical clustering of barcode frequencies in mice induced at embryonic or adult stages.
a, b, Barcode frequencies in erythrocyte progenitors (EryP) versus granulocytes (a), and in EryP versus B lymphocytes (B2) (b) (further population comparisons in Extended Data Fig. 7a-c); data from Exp. 1; each dot represents an individual barcode with n=33 (a) and n=60 (b). c, Spearman rank correlation coefficients ρ, for the comparisons shown in a, b and Extended Data Fig. 7a-c; error bars indicate 95% confidence bounds computed by non-parametric bootstrap (n=33-60 barcodes). d, Hierarchical clustering of rank correlations for the indicated populations analyzed in Exp. 1 (n=60 barcodes, rare and reliably sampled). e, Hierarchical clustering of rank correlations reveals distinct T/B2 and B1 branches (Exp. 1). f, g, Barcode frequency correlations comparing CMP versus EryP (f), and CMP versus granulocytes (g), n=41 barcodes. h, Summary of Spearman rank correlations (mean and 95% confidence bounds) of CMP versus the indicated lineages; data are from Exp. 2, n=34-44 barcodes (for GMP, see Extended Data Fig. 7h). i-k, Barcode induction in adult mice (see Extended Data Fig. 8a). i, Hierarchical clustering of rank correlations of barcodes from the indicated populations (Exp. 5, n=355 barcodes). j, Summary of barcode frequency rank correlations comparing CMP + GMP versus the indicated populations (Exp. 3, n=29-106 barcodes). k, Heatmap of rank correlations comparing barcodes in the indicated lineage-restricted progenitors and mature cells (Exp. 3, n=164 barcodes). All analyses done with rare barcodes (Pgen < 10–4).

References

    1. Kretzschmar K, Watt FM. Lineage tracing. Cell. 2012;148:33–45. - PubMed
    1. Keller G, Paige C, Gilboa E, Wagner EF. Expression of a foreign gene in myeloid and lymphoid cells derived from multipotent haematopoietic precursors. Nature. 1985;318:149–154. - PubMed
    1. Gerrits A, et al. Cellular barcoding tool for clonal analysis in the hematopoietic system. Blood. 2010;115:2610–2618. - PubMed
    1. Sun J, et al. Clonal dynamics of native haematopoiesis. Nature. 2014;514:322–327. - PMC - PubMed
    1. McKenna A, et al. Whole organism lineage tracing by combinatorial and cumulative genome editing. Science. 2016 - PMC - PubMed

Publication types

MeSH terms