Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jul 14;535(7611):289-293.
doi: 10.1038/nature18633. Epub 2016 Jul 6.

Resolving early mesoderm diversification through single-cell expression profiling

Affiliations

Resolving early mesoderm diversification through single-cell expression profiling

Antonio Scialdone et al. Nature. .

Abstract

In mammals, specification of the three major germ layers occurs during gastrulation, when cells ingressing through the primitive streak differentiate into the precursor cells of major organ systems. However, the molecular mechanisms underlying this process remain unclear, as numbers of gastrulating cells are very limited. In the mouse embryo at embryonic day 6.5, cells located at the junction between the extra-embryonic region and the epiblast on the posterior side of the embryo undergo an epithelial-to-mesenchymal transition and ingress through the primitive streak. Subsequently, cells migrate, either surrounding the prospective ectoderm contributing to the embryo proper, or into the extra-embryonic region to form the yolk sac, umbilical cord and placenta. Fate mapping has shown that mature tissues such as blood and heart originate from specific regions of the pre-gastrula epiblast, but the plasticity of cells within the embryo and the function of key cell-type-specific transcription factors remain unclear. Here we analyse 1,205 cells from the epiblast and nascent Flk1(+) mesoderm of gastrulating mouse embryos using single-cell RNA sequencing, representing the first transcriptome-wide in vivo view of early mesoderm formation during mammalian gastrulation. Additionally, using knockout mice, we study the function of Tal1, a key haematopoietic transcription factor, and demonstrate, contrary to previous studies performed using retrospective assays, that Tal1 knockout does not immediately bias precursor cells towards a cardiac fate.

PubMed Disclaimer

Conflict of interest statement

statement The authors declare no competing financial interests.

Figures

Extended Data Figure 1
Extended Data Figure 1. Fluorescence activated cell sorting of single cells.
a) Flk1+ cells were sorted from 3 embryos at S and HF stages and 4 embryos at NP stage. Labels such at ‘S4’ refer to the embryo number in the metadata available online at http://gastrulation.stemcells.cam.ac.uk/scialdone2016. b) CD41+Flk1- cells (red gate) and CD41+Flk1+ (green gate) cells were sorted from 8 embryos each at NP and HF stages. Each stage was sorted on two occasions. Labels above FACS plots refer to the sort and embryo number in the metadata available online, as above. In all plots, pink text indicates the percentage of cells in that gate.
Extended Data Figure 2
Extended Data Figure 2. Quality control of single cell RNA-seq data.
a) Table showing numbers and estimates of numbers of cells of different phenotypes present in embryos between E6.5 and E7.75 (HF stage) and numbers sorted for this study. (*) Total cell numbers for E6.5 are from Beddington and Robertson (1999) and (§) total numbers and numbers of Flk1+ cells are from Moignard et al., (2015). Percentages of cells expressing Flk1 and/or CD41 at NP and HF stages are the average values from the embryos used in this study and were used to calculate the estimated numbers present in embryos from the total cell numbers. ND, not done. b) t-SNE representation of the five metrics used to assess the quality of the samples. Only cells that passed all criteria (blue circles) were used for downstream analysis. c) Squared coefficient of variation (CV2) as a function of the mean normalised counts (µ) for genes across all cells. The green line shows the fit CV2 = a1/μ + α0. All highly variable genes (with an adjusted p-value < 0.1) are marked by red circles. d) Number of genes detected (i.e., with more than 10 normalised read counts) in cells across the different clusters in the WT (left panel) and the Tal1-/- (right panel) mice.
Extended Data Figure 3
Extended Data Figure 3. Identifying cell clusters.
The Dynamic Hybrid Cut algorithm was used with all the possible values of the “deepSplit” parameter on 100 bootstrapped subsamples. To assess the quality of the clustering, the Pearson gamma (a) and the average silhouette width (b) were calculated. Higher values of these parameters correspond to better clustering. The Pearson gamma represents the correlation between the dissimilarity of samples and a binary variable that equals 1 for pairs of samples in the same cluster and 0 for samples in different clusters. The average silhouette width measures the average separation between neighboring clusters,. At “deepSplit”=2 the Pearson gamma is highest whereas the average silhouette width begins to decrease. This suggests that at such a value of the “deepSplit” parameter a good compromise between robustness and sensitivity is achieved. The Pearson gamma and the average silhouette width were computed with the R function “cluster.stats” in the “fpc” package (version 2.1-9). (c-f) Examples of marker genes for four clusters: Mesp1 for cluster 4 (top-ranked marker) (c), Sox18 for cluster 6 (second-ranked) (d), Hbb-bh1 for cluster 8 (fourth-ranked) (e) and Alx1 for cluster 10 (top-ranked) (f).The y-axis shows the log10 normalised expression of the genes. g) Dendrogram showing the clustering of the cells in the first experiment. The colors at the bottom indicate the cluster each cell was assigned to by the dynamic hybrid cut algorithm. Cluster assignment was used to sort cells in Figure 1b. h) Identities were assigned to the 10 clusters in Figure 1c based on the expression of key genes associated with various mesodermal lineages or spatial locations within the embryo.
Extended Data Figure 4
Extended Data Figure 4. Expression of key marker genes in E7.0-7.75 embryos.
Schematic representations of expression patterns were generated from published in situ hybridization data (see citations) for key markers of clusters 1 (magenta, visceral endoderm59), 2 (pink, extra-embryonic ectoderm61), 6 (red, YS endothelium62), 9 (black, allantois63,64) and 10 (green, second heart field65,66). Anterior is shown on the left and posterior on the right. Also shown is the t-SNE for all cells or cells from E7.0 onwards (S, NP and HF stages) indicating expression of each gene (white, low; purple, high).
Extended Data Figure 5
Extended Data Figure 5. Expression of key genes used for sorting single cells.
a)t-SNE as in Figure 1 showing the sorting strategy for each of the 1,205 cells. b)Expression of Flk1 (Kdr), CD41 (Itga2b), Scl (Tal1), Gata2 and T (Brachyury) superimposed onto the t-SNE. c) t-SNE showing only the cells from S, NP and HF stages, colored according to cluster as in Figure 1c and E. d) t-SNE for the 481 E6.5 cells in cluster 3, as in Figure 2a. Each point is colored by expression of T and Foxa2.
Extended Data Figure 6
Extended Data Figure 6. Pseudospace analysis of cluster 4 correlates with anterior-posterior position along the primitive streak.
a) Diffusion plot of cells in cluster 4. Different colors correspond to different plates and different lanes of flow cells. b) Table showing the number of cells in each stage analysed on the different lanes of flow cells (S, Primitive streak; NP, Neural plate; HF, Head fold). c) A direction in the diffusion space can be identified by the angle α that it forms with the first diffusion component (left panel). For each value of α the right panel shows the percentage of variance explained by the batch effect associated to plates SLX-8408 and SLX-8409 (orange line) and plates SLX-8410 and SLX-8411. α1 and α2 are the angles corresponding to directions that correlate the least with the batch effect (i.e., variance explained by the batch effect is minimum). d) The use of alternative dimensionality reduction techniques results in the identification of highly correlated pseudo-space coordinates. A t-SNE projection of the dissimilarity matrix was carried out (perplexity set to 50), and the direction corresponding to the pseudo-space coordinate was estimated by minimizing the correlation with the batch effect (left panel; Spearman correlation between the two pseudo-space coordinates 0.79, p-value < 2.2x10-16). Independent Component Analysis was performed on the dissimilarity matrix with the “fastICA” R function, and 3 independent components (corresponding to the 2 batch effects and the biological effect) were estimated. The presumptive pseudo-space coordinate is the component having the smallest correlation with the batch effects (right panel; Spearman correlation coefficient is 0.97, p-value < 2.2x10-16). e) Plots showing the average expression of genes in clusters 1-3 of Figure 3c along the pseudospace axis. Gene expression levels are normalised between 0 and 1. Black lines indicate the normalised mean expression levels of genes in each cluster as obtained from the fitting procedure and red shaded area indicates standard deviation. f) Expression of T as function of the pseudospace coordinate. g) Gene expression levels for example genes showing high-low-high expression pattern across the blue cluster. In B and C, putative anterior cells are to the left and posterior to the right as in C. Each dot represents a cell and red lines indicate fits based on local polynomial functions (see Methods). h) We performed Principal Component Analysis (PCA) on the cells in cluster 4 by using markers of presomitic mesoderm as anterior mesoderm markers and genes expressed in haemato-vascular and allantoic mesoderm as posterior markers–, as well as Podxl which was shown to separate distinct Flk1+ mesodermal lineages. The first component explained 36% of the total variance and was highly correlated with the pseudo-space coordinate (left panel; Spearman rank correlation 0.84, p-value < 2.2x10-16). All the anterior markers were negatively correlated with the pseudo-space coordinate, whereas all posterior markers had a positive correlation (right panel).
Extended Data Figure 7
Extended Data Figure 7. Expression of key genes along the anterior-posterior axis of the primitive streak in E7.0-7.75 embryos.
Schematic representations of gene expression were generated from published in situ hybridization data (see citations) for key markers of clusters 4 (blue, mesoderm) and 7 (yellow, posterior mesoderm/blood progenitors). Expression of T (Brachyury) and Flk1 (Kdr – from in house data) are shown to illustrate the extent of the primitive streak at E7.5. Lefty2 and Tbx6 are expressed in the putative anterior portion of cluster 4 and in more anterior regions of the primitive streak in in situ analysis. Tbx3 and Bmp4 are expressed in the more posterior portion of cluster 4 and in the embryo are expressed in the more posterior region of the primitive streak around the amnion and into the extra-embryonic mesoderm. Tek and Fli1 (from in-house data) are expressed in cluster 7 and in the embryo are found exclusively in the extra-embryonic portion. Also shown is the t-SNE for the cells from E7.0 onwards (S, NP and HF stages) indicating expression of each gene (white, low; purple, high).
Extended Data Figure 8
Extended Data Figure 8. Pseudotime analysis of primitive erythroid development.
a) Diffusion plot of cells in cluster 7 and 8. Different colors correspond to different plates and lanes of flow cells. b) Table showing the number of cells in each stage collected on the different plates (S, primitive streak; NP, Neural plate; HF, Head fold). c) Analogously to Extended Data Fig. 6, the angle α identifies a direction in the diffusion space (left panel). The percentage of variance explained by the batch effect associated to plates SLX-8344 and SLX-8345 is plotted as function of α in the right panel. d) The pseudo-time coordinate is robust to the use of different dimensionality reduction techniques, as shown in the left panel with t-SNE (Spearman correlation 0.92, p-value < 2.2x10-16) and in the right panel with Independent Component Analysis (Spearman correlation 0.97, p-value < 2.2x10-16; same procedure described in Extended Data Fig. 6d). e) Plots showing the average expression of genes in clusters 1-3 of Figure 4c along the pseudotime axis. Gene expression levels are normalised between 0 and 1. Black lines are the average expression levels of genes in each cluster as obtained from the fitting procedure, after normalization. Red shaded areas indicate standard deviation. f) Principal Component Analysis was carried out on the expression pattern of genes known from previous studies to be up-regulated or down-regulated along the blood developmental trajectory,–. The first principal component (explaining 44% of total variance) showed a very strong correlation with the pseudo-time coordinate (left panel; Spearman correlation coefficient 0.91, p-value < 2.2x10-16). All up-regulated (down-regulated) genes positively (negatively) correlate with the pseudotime coordinate (right panel).
Extended Data Figure 9
Extended Data Figure 9. ChIP-seq for Gata1 in ESC-derived haematopoietic cells.
a) Flow cytometry for Gata1-mCherry and Runx1-IRES-GFP knock-in reporter genes in embryoid body cells after 5 days of haematopoietic differentiation. Cells were sorted for the expression of both Runx1-IRES-GFP and Gata1-mCherry knock-in reporter genes to provide in vitro equivalents of the developing primitive erythrocytes assayed by RNA-seq. The gate used for sorting is shown in red. b) Numbers of reads and peaks identified for Gata1 and an input sample after mapping and peak calling. 4,135 Gata1 peaks were identified. c) Distribution of Gata1 peaks between promoter, intragenic and intergenic sequences. d) UCSC genome browser tracks for Gata1 and input sample at the Zfpm1 (Fog1) locus known to be a target of Gata1, indicating the quality of the ChIP-seq data. e) Expression of Gata1 target Zfpm1 during the pseudotimecourse for erythroid development, as in Figure 4.
Extended Data Figure 10
Extended Data Figure 10. Collection of embryos from Tal1 LacZ/+ crosses.
a) Genotyping PCR for embryos from Tal1 LacZ/+ crosses. Lower band is the wild type allele and upper band is the mutant allele carrying a neomycin knock in. Presence of both bands indicates heterozygosity. Embryos from which sequencing data were obtained are indicated with a red star and the number given corresponds to embryo identity in the metadata available online with the sequencing data. b) t-SNE as in Figure 5d showing Tal1 data (triangles) and original wild type data (grey circles). Tal1 data are colored according to the embryo stage from which they were collected: green, NP; red, HF; orange, 4SP. c) As in Figure 5d, showing the complete list of genes. d) Gene Set Control Analysis (GSCA) was used to identify statistically significant overlaps between genes significantly down-regulated in Tal1-/- compared with WT cells in the endothelial cluster (see Figure 5) and Tal1 targets identified by ChIP-seq. GSCA identified an enrichment of our gene set with Tal1 ChIP-seq in ESC-derived haemangioblasts and haemogenic endothelium, but not ESC-derived haematopoietic progenitors or a haematopoietic progenitor cell line.
Figure 1
Figure 1. Single cell transcriptomics identifies 10 populations relevant to early mesodermal development.
a) Whole mount images and schematics of E6.5 to E7.75 embryo sections. Colors indicate approximate locations of sorted cells. Anterior, left; posterior, right. Scale bars: 200µm. b) Heatmap showing key genes distinguishing 10 clusters. Colored bars indicate assigned cluster (top), stage (middle: turquoise, E6.5; purple, primitive streak (S, E7.0); green, neural plate (NP, E7.5); red, headfold (HF, E7.75)) and the sorted population (bottom: green, E6.5 epiblast; blue, Flk1+; turquoise, Flk1+CD41+; red, Flk1-CD41+). c) t-SNE of all 1205 cells colored by embryonic stage, and d) according to clusters in (b).
Figure 2
Figure 2. Transcriptional program associated with T induction in E6.5 epiblast cells.
a) t-SNE of the 481 E6.5 cells in cluster 3. Points are colored by expression of T (Brachyury) and Mixl1, Mesp1 and Frzb. b) Heatmap showing the 10 genes most highly positively and negatively correlated with T (Supplementary Information Table 1). c) Forward scatter (FSC) for the 481 E6.5 epiblast cells in cluster 3, with cells grouped according to T/Mesp1 expression. Boxplots indicate the median and interquartile range. P-values were calculated using a two-sided Welch’s t-test for samples with unequal variance, with FDR correction for multiple testing.
Figure 3
Figure 3. Dimensionality reduction reveals transcriptional profiles associated with cell location in the embryo.
a) Schematic of tissue emergence along the anterior-posterior primitive streak, derived from . Mesodermally- and endodermally-derived tissues are marked by a red and green line, respectively. b) Diffusion map of 216 cells in cluster 4 with pseudospace axis in red. Projections onto this axis represent pseudospace coordinates. c) Heatmap for differentially expressed genes along the pseudospace axis, showing genes more highly expressed in the anterior (dark blue) and posterior region (light blue), or highly expressed at either end (aquamarine). d) Expression profiles for example genes.
Figure 4
Figure 4. Inferring the transcriptional program underlying primitive erythropoiesis.
a) Diffusion map of 271 cells in clusters 7 and 8 displaying the inferred pseudotime axis (blue). b) Expression of Hbb-bh1 ordered by pseudotime (local polynomial fit). c) Heatmap ordered along the pseudo-time axis. Horizontal bars indicate cluster and developmental stage. Genes shown were repressed (grey), activated (green) or transiently expressed (blue). d) Examples of activated and repressed genes and e) Gata1 as in (b). f) UCSC Browser tracks for Gata1 ChIP-seq and input in Runx1+Gata1+ cells; the Nfe2 locus is shown. g) Percentage of genes in each group identified in C overlapping Gata1 targets.
Figure 5
Figure 5. Analysis of Tal1-/- embryos suggests independent fate acquisition.
a) Schematic of two cell fate diversification models. b) Tal1 in situ hybridization at HF stage. Scale bar: 200µm. c) Flow cytometry of WT and Tal1-/- mice at HF and E8.25. d) Blood program genes are differentially expressed between nascent mesoderm (blue) and endothelial (red) and blood cells (brown). Differential expression between 45 Tal1-/- and 59 WT endothelial cells (lower left t-SNE) identified 50 down-regulated genes. Gene set overlap (central panel) indicates failure to induce the blood program in Tal1-/- endothelium (p < 2.2 x 10-16, Fisher's test). On the right are expression distributions for selected genes in wild type (black) or Tal1-/- (grey) endothelial cells. e) For genes previously reported to be bound and activated (left panel) or bound and repressed (right panel) by Tal1, fold change between Tal1-/- and WT endothelium (defined in D) is plotted against average expression. Red circles: genes with a fold change > 1.5 and an FDR < 0.05.

References

    1. Lawson KA, Meneses JJ, Pedersen RA. Clonal analysis of epiblast fate during germ layer formation in the mouse embryo. Development. 1991;113:891–911. - PubMed
    1. Van Handel B, et al. Scl represses cardiomyogenesis in prospective hemogenic endothelium and endocardium. Cell. 2012;150:590–605. - PMC - PubMed
    1. Org T, et al. Scl binds to primed enhancers in mesoderm to regulate hematopoietic and cardiac fate divergence. EMBO J. 2015;34:759–77. - PMC - PubMed
    1. Ema M, et al. Primitive erythropoiesis from mesodermal precursors expressing VE-cadherin, PECAM-1, Tie2, endoglin, and CD34 in the mouse embryo. Blood. 2006;108:4018–24. - PubMed
    1. Mikkola HKA, Fujiwara Y, Schlaeger TM, Traver D, Orkin SH. Expression of CD41 marks the initiation of definitive hematopoiesis in the mouse embryo. Blood. 2003;101:508–16. - PubMed

Publication types

MeSH terms

Substances