Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Sep 15;28(18):i626-i632.
doi: 10.1093/bioinformatics/bts385.

A novel approach for resolving differences in single-cell gene expression patterns from zygote to blastocyst

Affiliations

A novel approach for resolving differences in single-cell gene expression patterns from zygote to blastocyst

Florian Buettner et al. Bioinformatics. .

Abstract

Motivation: Single-cell experiments of cells from the early mouse embryo yield gene expression data for different developmental stages from zygote to blastocyst. To better understand cell fate decisions during differentiation, it is desirable to analyse the high-dimensional gene expression data and assess differences in gene expression patterns between different developmental stages as well as within developmental stages. Conventional methods include univariate analyses of distributions of genes at different stages or multivariate linear methods such as principal component analysis (PCA). However, these approaches often fail to resolve important differences as each lineage has a unique gene expression pattern which changes gradually over time yielding different gene expressions both between different developmental stages as well as heterogeneous distributions at a specific stage. Furthermore, to date, no approach taking the temporal structure of the data into account has been presented.

Results: We present a novel framework based on Gaussian process latent variable models (GPLVMs) to analyse single-cell qPCR expression data of 48 genes from mouse zygote to blastocyst as presented by (Guo et al., 2010). We extend GPLVMs by introducing gene relevance maps and gradient plots to provide interpretability as in the linear case. Furthermore, we take the temporal group structure of the data into account and introduce a new factor in the GPLVM likelihood which ensures that small distances are preserved for cells from the same developmental stage. Using our novel framework, it is possible to resolve differences in gene expressions for all developmental stages. Furthermore, a new subpopulation of cells within the 16-cell stage is identified which is significantly more trophectoderm-like than the rest of the population. The trophectoderm-like subpopulation was characterized by considerable differences in the expression of Id2, Gata4 and, to a smaller extent, Klf4 and Hand1. The relevance of Id2 as early markers for TE cells is consistent with previously published results.

Availability: The mappings were implemented based on Prof. Neil Lawrence's FGPLVM toolbox(1); extensions for relevance analysis and including the structure of the data can be obtained from one of the authors' homepage.(2)

Contact: f.buettner@helmholtz-muenchen.de.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The totipotent blastomere differentiates first into inner and outer cells. Next, after approximately 3.5 days, the ICM differentiates into PE cells and EPI cells (A). The data-driven illustration is shown on the right hand side. For PCA (panel B), differentiation into ICM and TE can be seen, followed by differentiation from ICM into PE and EPI. ICM and PE/EPI as well as early cell stages could not be resolved. For our novel approach (bottom right), all developmental stages could be resolved and a new TE-like sub-population at the 16-cell stage was discovered. The dashed arrows reflect that the lower subpopulation at the 16-cell stage is significanlty more TE-like than the other
Fig. 2.
Fig. 2.
Standard PCA (a) and ICA (b) for all cells from 1 to 64 cell stage
Fig. 3.
Fig. 3.
(a) GPLVM for all cells from 1 to 64 cell stage. The uncertainty corresponding to the probabilistic mapping from latent space to data space is colour-coded (high SD dark, low SD light); (b) nearest neighbour errors for the original high-dimensional space and three embeddings in 2D of all cells from 1 to 64 cell stage
Fig. 4.
Fig. 4.
GPLVM for all cells from 2 to 64 cell stage. (a) Standard GPLVM. The nearest-neighbour error was 11. (b) Structure-preserving GPLVM for all cells from 2 to 64 cell stage with locality parameter γ=104 for all cell stages. The nearest-neighbour error was 11
Fig. 5.
Fig. 5.
Structure-preserving GPLVM for all cells from 2 to 64 cell stage with different values of γ. (a) γ = 100 for cell stages 2 to 8, γ = 15000 for the 16-cell stange and γ = 20000 for the 32- and 64-cell stages. Cells assigned to the TE-like subcluster are within the purple triangle. The nearest-neighbour error was 6. (b) γ = 100 for cell stages 2 to 8, γ = 20000 for the 16-cell stange and γ = 30000 for the 32- and 64-cell stages. The nearest-neighbour error was 5
Fig. 6.
Fig. 6.
Difference in gene expression between the two subclusters at the 16-cell stage for different mappings. The error bars show the variation of gene expression within the smaller subcluster (1 SD in each direction). For convenience, genes with the strongest differences are labelled in the plots. The order of all genes from top to bottom is Actb, Ahcy, Aqp3, Atp12a, Bmp4, Cdx2, Creb312, Cebpa, Dab2, DppaI, Eomes, Esrrb, Fgf4, Fgfr2, Fn1, Gapdh, Gata3, Gata4, Gata6, Grhl1, Grhl2, Hand1, Hnf4a, Id2, Klf2, Klf4, Klf5, Krt8, Lcp1, Mbnl3, Msc, Msx2, Nanog, Pdgfa, Pdgfra, Pecam1, Pou5f1, Runx1, Sox2, Sall4, Sox17, Snail, Sox13, Tcfap2a, Tcfap2c, Tcf23, Utf1 and Tspan8
Fig. 7.
Fig. 7.
Relevance map showing the greatest norm of the gradient across the entire map (left) and norm of the gradient for all genes at the centre of the ICM cluster (right). (a) Gene relevance map corresponding to the mapping in Figure 5b. The region of the map corresponding to early cell stages, including the 16-cell stage is shown in more detail (middle). Here, the gradient of Gata4 with respect to x is shown: the colour illustrates the norm of the gradient, the arrows illustrate the direction. It can be seen how between the 8-cell stage and the TE-like subcluster at the 16-cell stage considerably greater changes in Gata4 occur than between the 8-cell stage and the non-TE-like subcluster. For convenince, also the corresponding part of the embedding in Figure 5b is shown (middle, top). (b) Gradient at the centre of the ICM cluster; the error bars reflect the uncertainty of the mapping (1 SD in each direction)

References

    1. Bishop C.M. Pattern Recognition and Machine Learning (Information Science and Statistics) New York: Springer; 2006.
    1. Coucouvanis E., Martin G.R. Bmp signaling plays a role in visceral endoderm differentiation and cavitation in the early mouse embryo. Development. 1999;126:535–546. - PubMed
    1. Cross J.C., et al. Hxt encodes a basic helix-loop-helix transcription factor that regulates trophoblast cell development. Development. 1995;121:2513–2523. - PubMed
    1. Fujikura J., et al. Differentiation of embryonic stem cells is induced by gata factors. Genes Dev. 2002;16:784–789. - PMC - PubMed
    1. Guo G., et al. Resolution of cell fate decisions revealed by single-cell gene expression analysis from zygote to blastocyst. Dev. Cell. 2010;18:675–685. - PubMed

Publication types

Substances