Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 May 5;165(4):1012-26.
doi: 10.1016/j.cell.2016.03.023. Epub 2016 Apr 7.

Single-Cell RNA-Seq Reveals Lineage and X Chromosome Dynamics in Human Preimplantation Embryos

Affiliations

Single-Cell RNA-Seq Reveals Lineage and X Chromosome Dynamics in Human Preimplantation Embryos

Sophie Petropoulos et al. Cell. .

Erratum in

Abstract

Mouse studies have been instrumental in forming our current understanding of early cell-lineage decisions; however, similar insights into the early human development are severely limited. Here, we present a comprehensive transcriptional map of human embryo development, including the sequenced transcriptomes of 1,529 individual cells from 88 human preimplantation embryos. These data show that cells undergo an intermediate state of co-expression of lineage-specific genes, followed by a concurrent establishment of the trophectoderm, epiblast, and primitive endoderm lineages, which coincide with blastocyst formation. Female cells of all three lineages achieve dosage compensation of X chromosome RNA levels prior to implantation. However, in contrast to the mouse, XIST is transcribed from both alleles throughout the progression of this expression dampening, and X chromosome genes maintain biallelic expression while dosage compensation proceeds. We envision broad utility of this transcriptional atlas in future studies on human development as well as in stem cell research.

PubMed Disclaimer

Figures

None
Graphical abstract
Figure 1
Figure 1
Single-Cell RNA-Seq Transcriptome Profiling of Human Preimplantation Embryos (A) Left: quality of single-cell RNA-seq experiments assessed as nearest-neighbor similarities between cells (maximum Spearman correlation per cell, using all cell-pairs and all genes). Right: histogram of the number of expressed genes per cell. Genes with RPKM ≥1 were considered expressed. The histograms were smoothed using a Gaussian kernel. (B) Number of embryos and cells per embryonic day (E3–E7) retained after quality filtering. (C) Expression-level boxplots for ubiquitously expressed Y chromosome genes in male cells, normalized to the median in stage E4–E7. p value, two-sided MWW. (D) Boxplots showing the fraction transcribed SNPs detected as biallelically expressed in male cells, shown for chromosome X and 1. p value, two-sided MWW. (E) Two-dimensional t-SNE representation of 1,529 single-cell preimplantation transcriptomes using the 500 most variable genes across all cells (according to Figures S2A and S2B). E3–E7 indicate the embryonic day and E4.late and E5.early indicate cells picked 4–6 hr later and earlier, respectively, than the other cells from that embryonic day. (F) A pseudo-time was assigned to each cell by fitting a principal curve to the cells in the two-dimensional t-SNE subspace (Figure 1E). ICM cells were excluded from the fit to let the principal curve better reflect time and minimize lineage-effects (Supplemental Experimental Procedures). See also Figures S1 and S2.
Figure 2
Figure 2
Lineage Segregation of Cells into Inner Cell Mass, Trophectoderm, Epiblast, and Primitive Endoderm (A) t-SNE plot of all cells, as in Figure 1E, showing ICM and TE assignment of cells. Cells from E5 are highlighted in the lower left insert. The ICM-TE cell classification was done using PAM clustering in a PCA dimensionality-reduced sub-space (Figure 2B and Supplemental Experimental Procedures). (B) PCA biplot showing ICM and TE classification of cells from E5. Cells were classified as ICM or TE using PAM clustering in the PCA dimensionality-reduced space with the 250 most variable genes across all non-pre-lineage E5 cells as input (Supplemental Experimental Procedures). Cells in embryos with a pseudo-time <12.5 were assigned as pre-lineage. Genes with high PC loadings are shown. Colors indicate the weighted mean of the expression of previously known lineage markers using weights −1 and 1 for ICM and TE genes, respectively. (C) Heatmap of E5 cells and the top 500 differentially expressed genes between ICM and TE E5 cells (top 250 genes from each lineage). Upper colored bar indicates embryo membership, lower bar indicates lineage. Right-hand-side bars indicate the log2 fold-change of the TE divided by ICM mean-expression level for each gene and embryonic day (E5–E7). (D) PCA biplot showing EPI and PE classification of ICM cells from E5. Cells were classified as EPI or PE using PAM clustering in the PCA dimensionality-reduced space with the 250 most variable genes across all ICM cells that belonged to the right-most hierarchical cell-cluster in Figure 2C (Supplemental Experimental Procedures). Genes with high PC loadings are shown. Colors indicate the weighted mean of the expression of known lineage markers using weights −1 and 1 for EPI and PE genes, respectively. (E) Heatmap of E5 cells and the top 200 differentially expressed genes between EPI and PE E5 cells (top 100 genes from each lineage). Upper colored bar indicates embryo membership, lower bar indicates lineage. Right-hand-side bars indicate the log2 fold-change of the PE divided with EPI mean-expression level for each gene and embryonic day (E5–E7). (F) Number of cells (upper table) and lineage-specific genes (lower table) per embryonic day (E5–E7) and lineage. TE, trophectoderm; EPI, epiblast; PE, primitive endoderm. See also Figure S3 and Tables S1 and S2.
Figure 3
Figure 3
Lineage-Specific Genes Relate to Sub-population Cell Fate (A) RPKM expression heatmap of top 25 maintained (E5–E7) lineage-specific genes, from each lineage, across all cells. (B) Boxplot of mean expression level with respect to top 25 maintained lineage-specific genes, from each lineage, stratified by embryo and lineage. The mean expression across genes was calculated after Z score normalization as to account for that genes can be expressed at different scales. (C) Normalized RPKM mean-expression levels and Gene Ontology gene set enrichment results of top 100 lineage-specific genes from each lineage. The mean expression of each gene was calculated per embryonic day and lineage and Z score normalized across those strata. (D) Heatmap of top variable genes within TE cells, stratified by embryonic day. Cells were clustered by PAM-clustering in the PC1 and PC2 subspace. Genes were ordered by hierarchical clustering. (E) Boxplot of TE cells with respect to their mean expression level using 129 polar TE genes that were significant in both E6 and E7, stratified by embryo and polar-mural classification. The mean expression across polar-specific TE genes was calculated after Z score normalization. (F) Boxplot of polar versus mural expression fold-changes within each embryo. (G) CCR7-stained embryo by immunohistochemistry (IHC) (left). Boxplot of CCR7 IHC fluorescence intensity of polar and mural cells (right; p: MWW p value). See also Tables S3 and S4.
Figure 4
Figure 4
Developmental Progression from E3 to E7 Showing the Formation of Blastocyst Lineages (A) Three-dimensional diffusion map representation of all cells, showing lineage assignment and embryonic day, respectively. A total of 94 lineage-specific genes at E5 were used as input (Supplemental Experimental Procedures). DC, diffusion component. (B) Lineage segregation of all 1,529 cells with respect to ICM versus TE. Left: the expression of every cell with respect to lineage-specific genes (axis represent diffusion-components [DC], analogous to principal components). The black line depicts a lineage-separating border that optimally separates the two classes of cells, determined by a support vector machine (Supplemental Experimental Procedures). Right: the y axis indicates the distance from the lineage decision boundary (black line in the left sub-figure). The x axis indicates pseudo-time, as determined in Figure 1F. Each embryo was assigned a time using the mean of the cellular pseudo-times of the cells in that embryo. Each dot below the x axis indicates an embryo, colored by the embryonic day of sampling. (C) As (B) but with respect to EPI versus PE. (D) Gene-gene Pearson’s correlation matrix using the top 100 lineage-specific genes from each lineage. Gene-modules were determined based on hierarchical clustering of the correlation matrix and labeled with representative genes being part of the cluster. (E) Heatmap of expression levels (RPKM) for E3–E5 cells using the top 100 lineage-specific genes from each lineage. Cell groups were ordered according to their pre-determined groups, indicated by the colored dendrogram, and clustered within their respective group (E3, E4, E5.early, E5.mid, and E5.late). E5.mid cells were classified into three sub-groups based on the observed hierarchical clusters (EPI, PE, and TE). Genes were grouped according to observed hierarchical clusters and named based on which type of cells, and at which time point, the genes were expressed. (F) RPKM mean expression levels of lineage-specific gene sub-clusters as identified in Figure 4D. Vertical lines indicate 95% non-parametric bootstrap confidence interval across cells (B = 1,000). (G) RPKM expression levels of representative genes from each gene sub-cluster. Vertical lines indicate 95% non-parametric bootstrap confidence interval across cells (B = 1,000). See also Figure S4, Tables S5 and S6, and Movie S1.
Figure 5
Figure 5
Dosage Compensation of the X Chromosome during Preimplantation Development (A) Distribution of Spearman correlations between gene-expression levels and embryonic day (E4–E7) in female and male cells, for genes located on the X chromosome or autosomes. p values, two-sided MWW. (B–E) Boxplots of female-to-male expression-level ratios of transcribed X chromosome genes, shown for all cells (B) or specific for the TE (C), EPI (D), and PE (E) lineages. Lines intersecting the medians indicate the trend for X chromosome genes, and the green dotted lines around the 1.0-ratio similarly illustrate the medians for autosomal genes. Values above the boxplots denote p values (two-sided MWW), either indicating a significant difference between male and female cells from the same embryonic day (green p values; deviation from one at E3 or E4), or a significant reduction between E4 and a later embryonic day (blue p values). (F) Boxplots showing the distribution of cellular X chromosome RPKM sums for each sex and embryonic day, using a fixed gene set. p value, two-sided MWW. (G) Female-to-male moving expression average along the X chromosome using a 25-nearest-genes window, shown for the stages beyond ZGA completion (E4–E7), and the same for two autosomal chromosomes included for comparison. The ticks below the moving-average lines show the locations of expressed genes included in the estimates, colored according to embryonic day. (H) XIST expression-level boxplots per sex, day and lineage. p values indicate significant differences between male and female expression distributions (two-sided MWW; “ns” denotes not significant). (I) The fraction of cells with XIST RNA expression above indicated thresholds, stratified by sex and stage. See also Figure S5 and Table S7.
Figure 6
Figure 6
Biallelic Expression of XIST and X-linked Genes (A) Scatterplots showing allelic expression levels with the number of reads aligned to the reference and alternative allele on the y and x axis, respectively (shown for 30 random cells from E7 or E4). SNVs with monoallelic expression lie along the axes. Histograms summarize the observed allelic expression ratios of all X chromosome SNVs over all cells, grouped by sex and embryonic day. Chromosome 1 histograms are included for comparison. (B) Allele-specific expression barplots per cell, grouped by embryo, showing the number of reads aligned to the reference and alternative allele, using all female embryos carrying the indicated SNP. Data for a SNP within XIST, as well as SNPs located within six other X-linked genes are shown. Cells without any bar lacked reads spanning the SNP position. Biallelic expression in E7 cells was confirmed for these genes by Sanger sequencing (Figures S6A–S6D). (C) Boxplots showing the proportion of biallelic expression from the X chromosome (chrX) relative to that of autosomes (fraction biallelic chrX SNVs / fraction biallelic autosomal SNVs), shown for female and male E4–E7. Human primary pancreatic alpha cells and in vitro female fibroblasts are included as a control reference, representing somatic cells with conventional XCI. Green dots indicate medians when performing the same analysis on individual autosomal chromosomes (shown for chr1-3). Cells with at least 25 detected chrX SNPs were considered. The panel above the boxplots, “Expression-dose equivalent,” indicates the female-to-male total X chromosome-wide expression dose (median ratio of total expression in Figures 5F and S6G) for stages and cell types for which both female and male data were available (E4 to E7 and pancreatic cells), and the same for chr1-3. See also Figure S6.
Figure 7
Figure 7
Single-Molecule RNA-FISH Confirmed Biallelic Expression of XIST and ATRX (A) Single-molecule RNA-FISH of XIST shown for a female and male E7 embryo. Zoomed-in regions (right) highlight that two XIST clouds (red) were observed in female nuclei (white, Hoechst-stained), but not in male. (B) XIST clouds were localized at the X chromosomes (sex chromosomes were identified via DNA-FISH, staining chrX:p11.1–q11.1). (C) Barplot with RNA-FISH XIST count statistics from 898 female cells (five embryos) and 721 male cells (five embryos), categorized by the XIST localization pattern observed in the nucleus. (D) Left: single-molecule RNA-FISH of ATRX and XIST in a female E7 embryo. Two stronger ATRX speckles were typically observed within the nuclei, positioned at the XIST clouds. Right: DNA-FISH of chromosome X, indicating that the two stronger nuclear ATRX dots localized to the X chromosomes. (E) Boxplots of E7 RNA-seq and RNA-FISH ATRX expression levels. RNA-FISH counts confirmed that the expression levels of ATRX in female and male were on par (mean 8.9 and 8.0; median 8 and 7, respectively), indicating dosage compensation at E7.
Figure S1
Figure S1
Sex Determination of Human Preimplantation Embryos, Related to Figure 1 (A) Histogram showing Y chromosome RPKM sum per cell on the x axis and cell frequency on the y axis. Based on the modality of this distribution, we classified cells with a Y chromosome RPKM sum below 50 as female, and above 100 as male (Supplemental Experimental Procedures). (B) Histogram of X chromosome RPKM sum per cell. (C) Histogram of autosomal RPKM sum per cell. (D–F) Barplots of chromosomal RPKM sums for sex-classified cells. Color indicates embryo. The expression from all genes located on each respective chromosome was used. (G) Moving expression average using a 25-nearest-genes window along the X chromosome for a female embryo with suspected X0 karyotype (E5.early.31) relative to the female E5 (red line) or male E5 (blue line) expression of other embryos. Based on its suspected X0 karyotype embryo E5.early.31 was excluded from all further dosage compensation analyses. (H) Expression-level boxplots for ubiquitously expressed Y chromosome genes per cell in cryo-preserved male E3 and cryo-preserved male E4 embryos, normalized to the median in stage E4-E7 (as in main Figure 1C). p value: two-sided Wilcoxon test. (I) Boxplots showing the fraction X-linked SNPs detected as biallelically expressed per cell in cryo-preserved male E3 and cryo-preserved male E4 embryos (as in main Figure 1D). p value: two-sided Wilcoxon test.
Figure S2
Figure S2
Identification of the Most Variable Genes and Temporal Separation of Preimplantation Single-Cell Transcriptomes, Related to Figure 1 (A) Gene mean expression versus squared coefficient of variation (CV2) of all RefSeq genes. Red dots indicate genes ranked as being among the 500 most variable genes. Black dots indicate ERCC spike-in transcripts. Red line represents a fit against the ERCC transcripts, indicating the technical variability, and dotted line represents a biological variability of CV = 0.5, added to the technical one. (B) Variability test-statistic of every expressed RefSeq gene, derived from the mean-variance relationship in Figure S1A (Supplemental Experimental Procedures), versus the gene-rank; where rank was obtained by ordering by the variability test-statistic. Red dots indicate genes ranked as being among the 500 most variable genes. (C) Principal component analysis of all 1,529 cells using the 500 most variable genes. (D) Diffusion map of all 1,529 cells using the 500 most variable genes.
Figure S3
Figure S3
Lineage Segregation of Cells into Inner Cell Mass, Trophectoderm, Epiblast, and Primitive Endoderm, Related to Figure 2 (A) PCA biplot showing ICM and TE classification of cells from E6. Cells were classified as ICM or TE using PAM clustering in the PCA dimensionality-reduced space with the 250 most variable genes across all E6 cells as input (Supplemental Experimental Procedures). Genes with high PC loadings are shown. Colors indicate the weighted mean of the expression of known lineage markers, listed above the color bar, using weights −1 and 1 for ICM and TE genes, respectively. (B) Heatmap of E6 cells and the top 200 differentially expressed genes between ICM and TE E6 cells (top 100 genes from each lineage). The upper colored bar indicates lineage-classification of each cell, as determined in (A). Right-hand-side bars indicate the log2 fold-change of the TE divided with ICM mean-expression level for each gene and embryonic day (E5-E7). (C and D) As in (A) and (B) but with respect to E7 cells. (E and F) As in (A) and (B) but with respect to E6 ICM cells, contrasting EPI and PE cells. (G and H) As in (E) and (F) but with respect to E7 ICM cells, contrasting EPI and PE cells. (I) Expression for a selection of top-ranked lineage-specific marker genes stratified by embryo and lineage. Each dot represents the expression in a single cell and vertical lines segregate embryos. Horizontal lines indicate mean expression level per lineage within each embryo.
Figure S4
Figure S4
Preimplantation Developmental Progression of Lineage-Specific and Sex-Specific Genes, Related to Figure 4 (A) The number of significantly differentially expressed genes between embryonic time-points. From E5 to E7 the differential expression analysis was done within lineages. (B) Gene expression variability within each embryo versus developmental time (Supplemental Experimental Procedures). Each dot represents an embryo. (C) Gene-gene Pearson correlations among the top 300 maintained lineage genes (100 from each lineage). Titles refer to that genes specifically expressed in each of the two listed lineages were correlated to each other. (D) Pearson correlation within ICM cells against developmental stage for gene-pairs selected among lineage-specific genes with the strongest anti-correlation. (E) The number of significantly differentially expressed genes between females and males at each embryonic day, stratified by the genes’ chromosomal location: autosome (chrA), chromosome X (chrX) and chromosome Y (chrY) (F) The number and percentage of genes with fold-change (FC; female versus male cells) ≥ 2. Error-bars indicate standard deviation obtained by bootstrap resampling of cells (n = 100). Red and blue lines represent genes with higher expression in female and male, respectively. (G) RPKM expression levels of the testis-determining factor SRY. (H) RPKM stage-wise mean expression levels of X- and Y-linked paralogous gene pairs that were significantly differentially expressed. Error-bars indicate 95% confidence interval. (I) Pearson correlation between male stage-wise mean expression levels of X- and Y-linked paralogous gene pairs. Error-bars indicate 95% confidence interval.
Figure S5
Figure S5
Detection of XACT and XIST RNA in Human Preimplantation Cells, Related to Figure 5 (A) XACT lncRNA expression-level boxplots per sex and lineage. p values were derived from comparing the expression distributions (two-sided Wilcoxon test). (B) Barplots showing the average expression level (RPKM) within 1kb bins along a segment of the X chromosome, for female and male E4 cells. A broad peak of mapped reads appear at the XACT-gene sequence (chrX:112,983,323-113,235,148). (C) Mapped sequence reads, from a female E7 cell, aligned to the genomic region where XIST (minus strand) and TSIX (plus strand) overlap. This shows the lack of TSIX-mapping reads (no reads in TSIX-unique segments) as well as the biallelic expression of an XIST SNP (marked as a red-blue bar). (D) Expression-level (RPKM) boxplots for XIST and TSIX in male and female cells (including all cells and embryonic days), calculated from two non-overlapping XIST and TSIX sequences corresponding in length (11 kb 5′ sequence of each gene). XIST had 431-fold higher expression than TSIX at stage E7, indicating that the biallelic detection of the XIST SNP was not TSIX-derived. (E) Exon-intron structure of human XIST and TSIX, with the 11 kb 5′ sequences used in (D) indicated.
Figure S6
Figure S6
Control Experiments for Allelic Detections, Related to Figure 6 (A) To validate the accuracy of the RNA-seq data SNP calling, and to confirm the biallelic X chromosome expression with an alternative detection method, we performed Sanger sequencing on amplicons from female E7 cDNA libraries. This analysis was performed for each of the SNPs and genes presented in Figure 6B, using 4 separate single-cell libraries per gene. All of the cDNA libraries tested by the Sanger sequencing were confirmed to be biallelic for the evaluated SNP (“Conf. rate”), and the chromatograms for two example cells are shown for each SNP. The code above each chromatogram denotes the cell ID (embryonic day, embryo ID, cell ID). (B) We further evaluated two SNPs located within the same gene (PDHA1) and PCR amplicon, for which one embryo had heterozygous expression at both SNPs and another embryo had heterozygous expression at only one of the two SNPs according to our single-cell RNA-seq data (allelic RNA-seq data for these embryos is shown in D). The Sanger sequencing confirmed this pattern. (C) For one gene, TSPAN6, we additionally generated two separate amplicons for Sanger sequencing. Cells that had heterozygous expression for the two different SNPs located within these disjoint amplicons (according to the single-cell RNA-seq data, shown in D) were also confirmed to have heterozygous expression at both SNPs by Sanger sequencing. (D) Allele-expression barplots (as in main Figure 6B) shown for embryos from which single-cells were used for the double Sanger validations presented in (B) and (C). (E) Allele-level expression boxplots of mouse preimplantation cells from different stages, showing the paternal (C57BL/6J) / maternal (CAST/EiJ) expression ratio per cell on the y axis. This indicates that paternal X chromosome inactivation reached ∼60% completion at the mouse early blastocyst stage. The plots in (E) represent a re-analysis of our previously reported data (Deng et. al, 2014a), but using the same threshold for calling monoallelic expression as used in the current study (Supplemental Experimental Procedures). (F) Boxplots of allele-resolved mouse expression data, showing the ratio of biallelic expression of chromosome X relative to that of autosomes (fraction biallelic chrX SNPs / fraction biallelic autosomal SNPs), at different stages following the zygotic genome activation. (G) Boxplots showing the distribution of cellular X chromosome RPKM sums for female and male primary pancreatic alpha cells, used as positive control for conventional XCI. This indicates that the X chromosome dose is balanced in these somatic cells cells, as expected due to XCI. (H) Expression-level boxplots of XIST in female and male primary pancreatic alpha cells, used as positive control for conventional XCI.

Comment in

References

    1. Blakeley P., Fogarty N.M., Del Valle I., Wamaitha S.E., Hu T.X., Elder K., Snell P., Christie L., Robson P., Niakan K.K. Defining the three cell lineages of the human blastocyst by single-cell RNA-seq. Development. 2015;142:3151–3165. - PMC - PubMed
    1. Brennecke P., Anders S., Kim J.K., Kołodziejczyk A.A., Zhang X., Proserpio V., Baying B., Benes V., Teichmann S.A., Marioni J.C., Heisler M.G. Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods. 2013;10:1093–1095. - PubMed
    1. Clemson C.M., Chow J.C., Brown C.J., Lawrence J.B. Stabilization and localization of Xist RNA are controlled by separate mechanisms and are not sufficient for X inactivation. J. Cell Biol. 1998;142:13–23. - PMC - PubMed
    1. Cockburn K., Rossant J. Making the blastocyst: lessons from the mouse. J. Clin. Invest. 2010;120:995–1003. - PMC - PubMed
    1. Deng Q., Ramsköld D., Reinius B., Sandberg R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science. 2014;343:193–196. - PubMed

Publication types