Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov 24;11(11):e1004575.
doi: 10.1371/journal.pcbi.1004575. eCollection 2015 Nov.

SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis

Affiliations

SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis

Minzhe Guo et al. PLoS Comput Biol. .

Abstract

A major challenge in developmental biology is to understand the genetic and cellular processes/programs driving organ formation and differentiation of the diverse cell types that comprise the embryo. While recent studies using single cell transcriptome analysis illustrate the power to measure and understand cellular heterogeneity in complex biological systems, processing large amounts of RNA-seq data from heterogeneous cell populations creates the need for readily accessible tools for the analysis of single-cell RNA-seq (scRNA-seq) profiles. The present study presents a generally applicable analytic pipeline (SINCERA: a computational pipeline for SINgle CEll RNA-seq profiling Analysis) for processing scRNA-seq data from a whole organ or sorted cells. The pipeline supports the analysis for: 1) the distinction and identification of major cell types; 2) the identification of cell type specific gene signatures; and 3) the determination of driving forces of given cell types. We applied this pipeline to the RNA-seq analysis of single cells isolated from embryonic mouse lung at E16.5. Through the pipeline analysis, we distinguished major cell types of fetal mouse lung, including epithelial, endothelial, smooth muscle, pericyte, and fibroblast-like cell types, and identified cell type specific gene signatures, bioprocesses, and key regulators. SINCERA is implemented in R, licensed under the GNU General Public License v3, and freely available from CCHMC PBGE website, https://research.cchmc.org/pbge/sincera.html.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Schematic Workflow.
The analytic pipeline consists of three main components: pre-processing, cell type identification, and cell type specific gene signature and driving force identification.
Fig 2
Fig 2. Identification of Major Lung Cell Types.
Cells (n = 148) from two sample preparations from fetal mouse lung at E16.5 were assigned into 9 clusters via hierarchical clustering using average linkage and centered Pearson’s correlation. Each color represents a distinct cell cluster, labeled as C1-C9. The rectangles represent single lung cells from the first preparation and the ellipses consist of single cells from a second independent preparation. Connection lines indicate the z-score correlation between the two cells > = 0.05. The blue lines connect cells within the same preparation, while the red lines connect cells across preparations.
Fig 3
Fig 3. Validation of Cell Type Assignments using Known Biomarkers.
(A) Expression patterns of representative known cell type markers were used to validate the correct assignment of major lung cell types at E16.5. Expression levels were normalized by per-sample z-score transformation. (B) ROC curves of the rank-aggregation-based validation showed a high consistency (AUC>0.8) between the cell type assignments and the expression patterns of known cell type specific markers (S2 Table).
Fig 4
Fig 4. Prediction of Cell Types for Each Cluster using Cell Type Enrichment Analysis.
Information on gene expression in certain cell types were downloaded from EBI Expression Atlas (http://www.ebi.ac.uk/gxa). Results were obtained using differentially expressed genes as the input gene lists. The lengths of the bars represent transformed p-value (−log10 (p)) of highly enriched cell types for each cell cluster, where p is the p-value calculated by one-tailed Fisher’s exact test and represents the degree of a cell type enrichment in a given cell cluster.
Fig 5
Fig 5. Predicted Signature Genes for Major Lung Cell Types.
(A) Heatmap shows that the predicted cell type specific signature genes are selectively expressed in defined cell types. Gene expression was per sample z-score normalized. (B) The top 20 signature genes based on the ranking scores for each lung cell type are listed. Genes in red are the known markers that were used to train the signature prediction models.
Fig 6
Fig 6. Mouse Lung Epithelial Specific Transcriptional Regulatory Network.
(A) Rank importance of transcription factors (TFs) in the main connected component of epithelial specific transcriptional regulatory network (TRN). The sizes of the TF nodes are proportional to their average-ranked node importance. The main connected component of epithelial TRN is comprised of 348 nodes and 432 edges. The nodes in red are the TFs and the nodes in grey are differentially expressed genes (p-value<0.01) in epithelial cells and are not TFs. The edges were established using the first-order conditional dependence approach described in the Methods section with a cutoff at 0.05. (B) The Hopx local network (the first hop is shown). Hopx was the top ranked TF identified by driving force analysis (Table 1).

References

    1. Huang DW, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4: 44–57. 10.1038/nprot.2008.211 - DOI - PubMed
    1. Li L, Clevers H (2010) Coexistence of quiescent and active adult stem cells in mammals. Science 327: 542–545. 10.1126/science.1180794 - DOI - PMC - PubMed
    1. Pujadas E, Feinberg AP (2012) Regulated noise in the epigenetic landscape of development and disease. Cell 148: 1123–1131. 10.1016/j.cell.2012.02.045 - DOI - PMC - PubMed
    1. Neildez-Nguyen TM, Parisot A, Vignal C, Rameau P, Stockholm D, Picot J, Allo V, Le Bec C, Laplace C, Paldi A (2008) Epigenetic gene expression noise and phenotypic diversification of clonal cell populations. Differentiation 76: 33–40. - PubMed
    1. Raj A, van Oudenaarden A (2008) Nature, nurture, or chance: stochastic gene expression and its consequences. Cell 135: 216–226. 10.1016/j.cell.2008.09.050 - DOI - PMC - PubMed

Publication types