Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 30;11(1):5504.
doi: 10.1038/s41467-020-19365-w.

Optimized design of single-cell RNA sequencing experiments for cell-type-specific eQTL analysis

Affiliations

Optimized design of single-cell RNA sequencing experiments for cell-type-specific eQTL analysis

Igor Mandric et al. Nat Commun. .

Abstract

Single-cell RNA-sequencing (scRNA-Seq) is a compelling approach to directly and simultaneously measure cellular composition and state, which can otherwise only be estimated by applying deconvolution methods to bulk RNA-Seq estimates. However, it has not yet become a widely used tool in population-scale analyses, due to its prohibitively high cost. Here we show that given the same budget, the statistical power of cell-type-specific expression quantitative trait loci (eQTL) mapping can be increased through low-coverage per-cell sequencing of more samples rather than high-coverage sequencing of fewer samples. We use simulations starting from one of the largest available real single-cell RNA-Seq data from 120 individuals to also show that multiple experimental designs with different numbers of samples, cells per sample and reads per cell could have similar statistical power, and choosing an appropriate design can yield large cost savings especially when multiplexed workflows are considered. Finally, we provide a practical approach on selecting cost-effective designs for maximizing cell-type-specific eQTL power which is available in the form of a web tool.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Average R2 between low-coverage and high-coverage gene expression estimates (Smart-Seq2 dataset, alpha cells).
a Distribution of Pearson R2 computed across all the genes at different levels of read coverage, Smart-Seq2 dataset. b Distribution of Pearson R2 at 75,000 reads per cell stratified by the expression level, Smart-Seq2 dataset. c Distribution of Pearson R2 computed across all the genes at different levels of read coverage, 10× dataset. d Distribution of Pearson R2 at 4000 reads per cell stratified by the expression level, 10× dataset. The center line, bounds of box, and whiskers represent mean, 25th to 75th percentile range, and minimum to maximum range in all boxplots.
Fig. 2
Fig. 2. Effective sample size across a grid of experimental designs.
Sample size N ranges from 40 to 120 individuals in steps of 8 and the number of cells per individuals M ranges from 500 to 2750 cells per individual in steps of 250 (CD4 T cells). a Library preparation is assumed to be 0$ per reaction, level of multiplexing is fixed and equal to 8. b Library preparation is set to $2000 per reaction, level of multiplexing is fixed and equal to 8. c Library preparation is set to $2000 per reaction, greedy multiplexing. d Library preparation is set to $2000 per reaction, greedy multiplexing, demultiplexing inaccuracy, and cell-type misclassification is taken into account.
Fig. 3
Fig. 3. Cell-type misclassification error rate (%) and coverage (thousands of reads per cell).
a Cell-type misclassification error; b coverage. Color scales correspond to the magnitude of the values in each cell of the heatmap.
Fig. 4
Fig. 4. Experimental designs for CD4 T cells ct-eQTL with effective sample size Neff = 40.
a Comparison of different experimental designs. Experimental design N = 88, M = 2250, r = 4500 yields 2-fold reduction in cost than the standard design. b For a fixed sample size and number of cells per individual, increasing coverage implies increasing the effective sample size (i.e., power) only up to a point. There is little gain in power at coverages greater than 12,500 reads per cell. Red solid line corresponds to budget and blue dashed line corresponds to effective sample size.
Fig. 5
Fig. 5. Effective sample size as a function of cell-type prevalence.
Shown here is the effective sample size across the grid of experimental design when the cell-type abundance is set to different values −5, 10, 15, 20, 25, and 30%. (CD4 T cells at fixed budget $35,000). Color scales correspond to the magnitude of the values in each cell of the heatmap.
Fig. 6
Fig. 6. Performance of ct-eQTL analysis.
Shown here is recall (power estimate) as a function of coverage in the ct-eQTL analysis of CD4 T cells at fixed budget $35,000. a Mean ct-eQTL; b variance ct-eQTL.

References

    1. Guerrero-Juarez CF, et al. Single-cell analysis reveals fibroblast heterogeneity and myeloid-derived adipocyte progenitors in murine skin wounds. Nat. Commun. 2019;10:650. doi: 10.1038/s41467-018-08247-x. - DOI - PMC - PubMed
    1. Karamitros D, et al. Single-cell analysis reveals the continuum of human lympho-myeloid progenitor cells. Nat. Immunol. 2018;19:85–97. doi: 10.1038/s41590-017-0001-2. - DOI - PMC - PubMed
    1. Hernández, P. P. et al. Single-cell transcriptional analysis reveals ILC-like cells in zebrafish. Sci. Immunol. 3 (2018). - PMC - PubMed
    1. Grün D, et al. Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature. 2015;525:251–255. doi: 10.1038/nature14966. - DOI - PubMed
    1. Villani, A.-C. et al. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science356 (2017). - PMC - PubMed

Publication types