Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Feb;14(1):21-30.
doi: 10.1016/j.gpb.2016.01.005. Epub 2016 Feb 11.

Single-cell Transcriptome Study as Big Data

Affiliations
Review

Single-cell Transcriptome Study as Big Data

Pingjian Yu et al. Genomics Proteomics Bioinformatics. 2016 Feb.

Abstract

The rapid growth of single-cell RNA-seq studies (scRNA-seq) demands efficient data storage, processing, and analysis. Big-data technology provides a framework that facilitates the comprehensive discovery of biological signals from inter-institutional scRNA-seq datasets. The strategies to solve the stochastic and heterogeneous single-cell transcriptome signal are discussed in this article. After extensively reviewing the available big-data applications of next-generation sequencing (NGS)-based studies, we propose a workflow that accounts for the unique characteristics of scRNA-seq data and primary objectives of single-cell studies.

Keywords: Big data; RNA-seq; Signal normalization; Single cell; Transcriptional heterogeneity.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Number of papers/datasets addressing single-cell data and big data Searches were performed on January 04, 2016 on http://www.ncbi.nlm.nih.gov/gds for datasets and http://www.ncbi.nlm.nih.gov/pubmed for papers. Data were obtained according to the search criteria as follows filtered by year: (1) for scRNA-seq datasets on GEO: “single cell”[All Fields] AND “Expression profiling by high throughput sequencing”[Filter]; (2) for scRNA-seq papers on PubMed: “single cell”[All Fields] AND (“rna-seq”[All Fields] OR “rna sequencing”[All Fields] OR (“sequencing”[All Fields] AND “transcriptome”[All Fields])); and (3) for big-data papers on PubMed: “big data”[All Fields] OR “hadoop”[All Fields].
Figure 2
Figure 2
MYH2 gene is the marker of mature myotubes The increased bulk expression of MYH2 is primarily driven by the growing proportion of “on-” component cells (upper cluster) over time (0, 24, 48, and 72 h after myoblast differentiation is induced). Figures were derived from the dataset in Trapnell et al . A. The growth of MYH2 expression in bulk cell replicate samples (n = 3 over time). B. Beeswarm plots of the growing bimodal proportion of MYH2 from scRNA-seq over time. CF. RNA-FISH signals at 0, 24, 48, and 72 h, respectively. MYH2 and nucleus are shown in red and blue (DAPI staining), respectively. Scare bar: 25 nm. G. MYH2 RNA molecule counts per cell over time, based on RNA-FISH analyses. RNA-FISH, RNA-fluorescence in situ hybridization.
Figure 3
Figure 3
Workflow of inter-institutional scRNA-seq data integration Inter-institutional single-cell RNA-seq datasets are aligned against their genomes at the Hadoop layer. Read counts are resolved into gene “on” or “off” status at the normalization layer. Differential expression, co-expression, and other applications are developed based on gene “on” or “off” status instead of gene expression. Biology in the resulting gene list is verified by GSEA, GO-term enrichment analysis, DAVID functional analysis or other tools. GSEA, gene set enrichment analysis; GO, gene ontology; DAVID, database for annotation, visualization and integrated discovery.

Similar articles

Cited by

References

    1. 1000 Genomes Project Consortium. Abecasis G.R., Auton A., Brooks L.D., DePristo M.A., Durbin R.M. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. - PMC - PubMed
    1. Genome 10K Community of Scientists Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. J Hered. 2009;100:659–674. - PMC - PubMed
    1. Harrow J., Frankish A., Gonzalez J.M., Tapanari E., Diekhans M., Kokocinski F. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012;22:1760–1774. - PMC - PubMed
    1. Gerstein M.B., Lu Z.J., Van Nostrand E.L., Cheng C., Arshinoff B.I., Liu T. Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project. Science. 2010;330:1775–1787. - PMC - PubMed
    1. Mouse ENCODE Consortium. Stamatoyannopoulos J.A., Snyder M., Hardison R., Ren B., Gingeras T. An encyclopedia of mouse DNA elements (Mouse ENCODE) Genome Biol. 2012;13:418. - PMC - PubMed

Publication types