Review

. 2016 Feb;14(1):21-30.

doi: 10.1016/j.gpb.2016.01.005. Epub 2016 Feb 11.

Single-cell Transcriptome Study as Big Data

Pingjian Yu¹, Wei Lin²

Affiliations

¹ Genomics and Bioinformatics Lab, Baylor Institute for Immunology Research, Dallas, TX 75204, USA.
² Genomics and Bioinformatics Lab, Baylor Institute for Immunology Research, Dallas, TX 75204, USA. Electronic address: Wei.Lin@BaylorHealth.edu.

PMID: 26876720
PMCID: PMC4792842
DOI: 10.1016/j.gpb.2016.01.005

Review

Single-cell Transcriptome Study as Big Data

Pingjian Yu et al. Genomics Proteomics Bioinformatics. 2016 Feb.

. 2016 Feb;14(1):21-30.

doi: 10.1016/j.gpb.2016.01.005. Epub 2016 Feb 11.

Authors

Pingjian Yu¹, Wei Lin²

Affiliations

¹ Genomics and Bioinformatics Lab, Baylor Institute for Immunology Research, Dallas, TX 75204, USA.
² Genomics and Bioinformatics Lab, Baylor Institute for Immunology Research, Dallas, TX 75204, USA. Electronic address: Wei.Lin@BaylorHealth.edu.

PMID: 26876720
PMCID: PMC4792842
DOI: 10.1016/j.gpb.2016.01.005

Abstract

The rapid growth of single-cell RNA-seq studies (scRNA-seq) demands efficient data storage, processing, and analysis. Big-data technology provides a framework that facilitates the comprehensive discovery of biological signals from inter-institutional scRNA-seq datasets. The strategies to solve the stochastic and heterogeneous single-cell transcriptome signal are discussed in this article. After extensively reviewing the available big-data applications of next-generation sequencing (NGS)-based studies, we propose a workflow that accounts for the unique characteristics of scRNA-seq data and primary objectives of single-cell studies.

Keywords: Big data; RNA-seq; Signal normalization; Single cell; Transcriptional heterogeneity.

PubMed Disclaimer

Figures

**Figure 1**
**Number of papers/datasets addressing single-cell data and big data** Searches were performed on January 04, 2016 on http://www.ncbi.nlm.nih.gov/gds for datasets and http://www.ncbi.nlm.nih.gov/pubmed for papers. Data were obtained according to the search criteria as follows filtered by year: (1) for scRNA-seq datasets on GEO: “single cell”[All Fields] AND “Expression profiling by high throughput sequencing”[Filter]; (2) for scRNA-seq papers on PubMed: “single cell”[All Fields] AND (“rna-seq”[All Fields] OR “rna sequencing”[All Fields] OR (“sequencing”[All Fields] AND “transcriptome”[All Fields])); and (3) for big-data papers on PubMed: “big data”[All Fields] OR “hadoop”[All Fields].

**Figure 2**
*MYH2* gene is the marker of mature myotubes The increased bulk expression of *MYH2* is primarily driven by the growing proportion of “on-” component cells (upper cluster) over time (0, 24, 48, and 72 h after myoblast differentiation is induced). Figures were derived from the dataset in Trapnell et al . A. The growth of *MYH2* expression in bulk cell replicate samples (n = 3 over time). B. Beeswarm plots of the growing bimodal proportion of *MYH2* from scRNA-seq over time. C–F. RNA-FISH signals at 0, 24, 48, and 72 h, respectively. *MYH2* and nucleus are shown in red and blue (DAPI staining), respectively. Scare bar: 25 nm. G. *MYH2* RNA molecule counts per cell over time, based on RNA-FISH analyses. RNA-FISH, RNA-fluorescence *in situ* hybridization.

**Figure 3**
**Workflow of inter-institutional scRNA-seq data integration** Inter-institutional single-cell RNA-seq datasets are aligned against their genomes at the Hadoop layer. Read counts are resolved into gene “on” or “off” status at the normalization layer. Differential expression, co-expression, and other applications are developed based on gene “on” or “off” status instead of gene expression. Biology in the resulting gene list is verified by GSEA, GO-term enrichment analysis, DAVID functional analysis or other tools. GSEA, gene set enrichment analysis; GO, gene ontology; DAVID, database for annotation, visualization and integrated discovery.

See this image and copyright information in PMC

Cited by

Redefining Tumor-Associated Macrophage Subpopulations and Functions in the Tumor Microenvironment.
Wu K, Lin K, Li X, Yuan X, Xu P, Ni P, Xu D. Wu K, et al. Front Immunol. 2020 Aug 4;11:1731. doi: 10.3389/fimmu.2020.01731. eCollection 2020. Front Immunol. 2020. PMID: 32849616 Free PMC article. Review.
scRNA-seq for Microcephaly Research [III]: Computational Analysis of scRNA-seq Data.
Babcock B, Malawsky D. Babcock B, et al. Methods Mol Biol. 2023;2583:105-121. doi: 10.1007/978-1-0716-2752-5_10. Methods Mol Biol. 2023. PMID: 36418729
The basic and translational science year in review: Confucius in the era of Big Data.
Pisetsky DS. Pisetsky DS. Semin Arthritis Rheum. 2020 Jun;50(3):373-379. doi: 10.1016/j.semarthrit.2020.02.010. Epub 2020 Mar 5. Semin Arthritis Rheum. 2020. PMID: 32238274 Free PMC article. Review.
Single-cell transcriptome provides novel insights into antler stem cells, a cell type capable of mammalian organ regeneration.
Ba H, Wang D, Wu W, Sun H, Li C. Ba H, et al. Funct Integr Genomics. 2019 Jul;19(4):555-564. doi: 10.1007/s10142-019-00659-2. Epub 2019 Jan 23. Funct Integr Genomics. 2019. PMID: 30673893
Single-cell genome-wide studies give new insight into nongenetic cell-to-cell variability in animals.
Golov AK, Razin SV, Gavrilov AA. Golov AK, et al. Histochem Cell Biol. 2016 Sep;146(3):239-54. doi: 10.1007/s00418-016-1466-z. Epub 2016 Jul 13. Histochem Cell Biol. 2016. PMID: 27412014 Review.

See all "Cited by" articles

References

1. 1000 Genomes Project Consortium. Abecasis G.R., Auton A., Brooks L.D., DePristo M.A., Durbin R.M. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. - PMC - PubMed
1. Genome 10K Community of Scientists Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. J Hered. 2009;100:659–674. - PMC - PubMed
1. Harrow J., Frankish A., Gonzalez J.M., Tapanari E., Diekhans M., Kokocinski F. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012;22:1760–1774. - PMC - PubMed
1. Gerstein M.B., Lu Z.J., Van Nostrand E.L., Cheng C., Arshinoff B.I., Liu T. Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project. Science. 2010;330:1775–1787. - PMC - PubMed
1. Mouse ENCODE Consortium. Stamatoyannopoulos J.A., Snyder M., Hardison R., Ren B., Gingeras T. An encyclopedia of mouse DNA elements (Mouse ENCODE) Genome Biol. 2012;13:418. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Single-cell Transcriptome Study as Big Data

Affiliations

Single-cell Transcriptome Study as Big Data

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources