Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jun;14(6):565-571.
doi: 10.1038/nmeth.4292. Epub 2017 May 15.

Normalizing single-cell RNA sequencing data: challenges and opportunities

Affiliations

Normalizing single-cell RNA sequencing data: challenges and opportunities

Catalina A Vallejos et al. Nat Methods. 2017 Jun.

Abstract

Single-cell transcriptomics is becoming an important component of the molecular biologist's toolkit. A critical step when analyzing data generated using this technology is normalization. However, normalization is typically performed using methods developed for bulk RNA sequencing or even microarray data, and the suitability of these methods for single-cell transcriptomics has not been assessed. We here discuss commonly used normalization approaches and illustrate how these can produce misleading results. Finally, we present alternative approaches and provide recommendations for single-cell RNA sequencing users.

PubMed Disclaimer

Conflict of interest statement

Competing Financial Interests

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Cell- and gene-specific effects in RNA-seq experiments.
(a) Schematic representation of cell-specific effects. The top panel shows a pair of cells expressing two genes at the same levels. When RNA-seq is performed, cell-specific effects introduce a bias in the estimated log-fold-change (LFC) computed on raw read counts (bottom left panel). (b) Schematic representation of gene-specific effects. The two cells and true gene levels are the same as in (a), but now gene-specific effects are shown to bias the estimation of relative gene expression (bottom right panel). In real situations, both cell-specific and gene-specific effects are present. (c) List of main cell- and/or gene-specific effects and whether these are removed by unique molecular identifiers (UMIs).
Figure 2
Figure 2. Comparison of bulk-based normalization methods in real and simulated datasets.
(a) Mean-difference plot comparing the estimated scaling factors (upper-triangular panels) and CV2 of normalized counts (lower-triangular panels) for the dataset published in [28]. (b) Ratio of estimated scaling factors vs. proportion of zero counts per cell for dataset [28]. (c) Top 10% most variable genes identified after normalizing dataset [28] with three different methods. Additional datasets are analyzed in Supplementary Data 1. (d) Ratio between the estimated and the true scaling factors for the most widely used bulk-based normalization methods and a method specifically designed for scRNA-seq (“scran”) [14] in a simulated dataset consisting of two groups of cells. See Supplementary Data 2 for the simulation strategy and additional simulations.
Figure 3
Figure 3. ERCC spike-ins can be used to estimate mRNA content.
(a) Ratio between the number of reads mapped to intrinsic genes and the number of reads mapped to ERCC spike-ins in datasets from [33, 42, 43] (left, central and right panel respectively). (b) Distributions of GC-content (left panel) and length (right panel) for mouse genes with at least one count in one cell in the dataset published in [41]. The purple areas show the interquartile ranges of GC-content and length for ERCC spike-ins, with the medians marked by vertical purple continuous lines.

References

    1. Tang F, et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nature Methods. 2009;6:377–382. - PubMed
    1. Shapiro E, Biezuner T, Linnarsson S. Single-cell sequencing-based technologies will revolutionize whole-organism science. Nature Reviews Genetics. 2013;14:618–630. - PubMed
    1. Stegle O, Teichmann SA, Marioni JC. Computational and analytical challenges in single-cell transcriptomics. Nature Reviews Genetics. 2015;16:133–145. - PubMed
    1. Saliba A-E, Westermann AJ, Gorski SA, Vogel J. Single-cell RNA-seq: advances and future challenges. Nucleic Acids Research. 2014;42:8845–8860. - PMC - PubMed
    1. Gawad C, Koh W, Quake SR. Single-cell genome sequencing: current state of the science. Nature Reviews Genetics. 2016;17:175–188. - PubMed

MeSH terms