Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Apr;14(4):381-387.
doi: 10.1038/nmeth.4220. Epub 2017 Mar 6.

Power analysis of single-cell RNA-sequencing experiments

Affiliations

Power analysis of single-cell RNA-sequencing experiments

Valentine Svensson et al. Nat Methods. 2017 Apr.

Abstract

Single-cell RNA sequencing (scRNA-seq) has become an established and powerful method to investigate transcriptomic cell-to-cell variation, thereby revealing new cell types and providing insights into developmental processes and transcriptional stochasticity. A key question is how the variety of available protocols compare in terms of their ability to detect and accurately quantify gene expression. Here, we assessed the protocol sensitivity and accuracy of many published data sets, on the basis of spike-in standards and uniform data processing. For our workflow, we developed a flexible tool for counting the number of unique molecular identifiers (https://github.com/vals/umis/). We compared 15 protocols computationally and 4 protocols experimentally for batch-matched cell populations, in addition to investigating the effects of spike-in molecular degradation. Our analysis provides an integrated framework for comparing scRNA-seq protocols.

PubMed Disclaimer

Conflict of interest statement

Competing financial interests

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Illustration of protocol comparison strategy.
(A) Schematic illustration highlighting our spike-in comparison strategy of public data sets. (B) The different protocols are compared based on the same standard spike-in molecules and not the endogenous mRNA from the diverse cell types used in these studies. We define two global technical performance metrics based on spike-ins: (C) Sensitivity: the number of input spike-in molecules at the point where the probability of detection reaches 50%. (D) Accuracy: the Pearson product-moment correlation (R) between estimated expression levels and actual input RNA molecule concentration (ground truth).
Figure 2
Figure 2. Comparison of performance metrics for different protocols.
(A) Accuracy. Distributions of Pearson correlations (R) of all samples stratified by protocol investigated (without accounting for sequencing depth). (B) Sensitivity. Distributions of molecule detection limits of all samples stratified by protocol (without accounting for sequencing depth). The number of samples (n) is highlighted above the protocols. The implementation platforms and quantification strategies are indicated below the protocols. (C) UMI Efficiency. Distributions of UMI counting efficiencies in UMI-tag counting based samples, stratified by protocols. For the analysed UMI based data, the UMI efficiency recapitulates the logistic regression based molecule detection limit.
Figure 3
Figure 3. Performance metrics after accounting for sequencing depth.
Both accuracy and sensitivity are modeled with a global dependency on sequencing depth considering diminishing returns, with a distinct corrected performance parameter for each protocol. Both models (one for accuracy and one for sensitivity) has 26 parameters, and are fitted to n=20,717 observations (the number of samples). (Bulk data in pink triangles are only displayed for context). Solid curves show the predicted dependence on sequencing depth. (A) Accuracy is only marginally dependent on sequencing depth. Saturation occurs at 270,000 reads per cell in the model (dashed red line). Protocol names are ordered by performance based on predicted correlation R at 1 million reads. (B) Sensitivity is critically dependent on sequencing depth and accounting for read depth is key for fair comparison of protocols. Saturation identified by the model occurs at 4.6 million reads per cell (dashed red line). The gain from 1 to 4 million reads per sample is marginal, while we note that moving from 100,000 reads to 1 millions reads corresponds to an order of magnitude gain in sensitivity (dashed black lines). Protocols are ordered by performance based on predicted detection limit (#M, number of molecules at 1 million reads).
Figure 4
Figure 4. Investigation of factors with potential impact on performance metrics.
(A) Batch effects and RNA degradation. Performance distributions for three protocols implemented as a single batch, on the Fluidigm C1 platform (left) and on the 10x Chromium (far-left; different batch). Performance distributions of spike-ins measured after freeze-thaw cycles, normal (2-3 cycles) to critical degradation (6 cycles, left overnight at room temperature). (B) Accuracy estimates across both ERCC and SIRV spike-ins are similar. Accuracy (Pearson correlation R) of both spike-ins (ERCC’s and SIRV’s) inferred across two replicates using multiple protocols. (C) Endogenous mRNA amount does not affect performance metrics. Comparison of performance metrics between empty (lacking endogenous mRNA) and non-empty samples from 3 published datasets shows similar performance and no bias due to presence of endogenous mRNA. Red dot shows median, and red bar shows 95% confidence interval of median, estimated with bootstraps. Empty accuracy CI is 100% contained in non-empty CI, and empty sensitivity CI is 84% contained in non-empty CI. (D) Model of relative spike-in abundance degradation during normal handling. Posterior predictions from Bayesian exponential decay model, for both ERCC’s and SIRV’s. The decay parameter was 19% and 18.5%, respectively. Confidence bands correspond to 95% confidence interval from posterior parameter distribution.

References

    1. Macaulay IC, Voet T. Single cell genomics: advances and future perspectives. PLoS Genet. 2014;10:e1004126. - PMC - PubMed
    1. Stegle O, Teichmann SA, Marioni JC. Computational and analytical challenges in single-cell transcriptomics. Nat Rev Genet. 2015;16:133–145. - PubMed
    1. Wu AR, et al. Quantitative assessment of single-cell RNA-sequencing methods. Nat Methods. 2014;11:41–46. - PMC - PubMed
    1. Ziegenhain C, et al. Comparative analysis of single-cell RNA sequencing methods. bioRxiv. 2016:035758. doi: 10.1101/035758. - DOI - PubMed
    1. External RNA Controls Consortium. Proposed methods for testing and selecting the ERCC external RNA controls. BMC Genomics. 2005;6:150. - PMC - PubMed