Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 21;24(1):86.
doi: 10.1186/s13059-023-02933-w.

Consequences and opportunities arising due to sparser single-cell RNA-seq datasets

Affiliations

Consequences and opportunities arising due to sparser single-cell RNA-seq datasets

Gerard A Bouland et al. Genome Biol. .

Abstract

With the number of cells measured in single-cell RNA sequencing (scRNA-seq) datasets increasing exponentially and concurrent increased sparsity due to more zero counts being measured for many genes, we demonstrate here that downstream analyses on binary-based gene expression give similar results as count-based analyses. Moreover, a binary representation scales up to ~ 50-fold more cells that can be analyzed using the same computational resources. We also highlight the possibilities provided by binarized scRNA-seq data. Development of specialized tools for bit-aware implementations of downstream analytical tasks will enable a more fine-grained resolution of biological heterogeneity.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
More cells, more zeros. Binarized scRNA-seq datasets were generated by binarizing the raw count matrix, where zero remains zero and every non-zero value is assigned a one. A Association between year of publication, total number of cells. Scatterplot of the number of cells (log scale) against the date of publication. B Scatterplot of the detection rate (y-axis) against the number of cells (log scale, x-axis). C On the x-axis the Pearson’s correlation coefficient (p) of every cell from the PaulHSC dataset between the binarized and normalized expressions. On the y-axis the product of the detection rate and the variance of the non-zero values (q). α is the Pearson’s correlation coefficient between these values p and q across all cells. D Boxplots of the α-values for all 56 datasets grouped by technology. One dataset (LawlorPancreasData) was excluded as α-value (α = 0.42) for this dataset was a clear outlier
Fig. 2
Fig. 2
A, B Cells plotted against the first two principle components of the AD dataset [20]. A PCA based on binary representation, and B PCA based on count representation. UMAP generated from data presented with C the binary-based PCs and D the count-based PCs. Colors indicate annotated cell type. E, H UMAP based on the count based PCs, in which cells are colored according to the binary representation of the marker genes AQP4 (E) and TNR (H) which are known markers for astrocytes and OPCs respectively [21]. F, G Similar as E and H but showing the normalized expression of the marker gene. I The performance (median F1-score) of cell type identification by SingleR [22] and scPred [23] when applied to binary (binarized data), normalized (normalized expression), and shuffled (shuffled normalized expression) for 22 datasets

References

    1. Mathys H, Davila-Velderrain J, Peng Z, Gao F, Mohammadi S, Young JZ, et al. Single-cell transcriptomic analysis of Alzheimer’s disease. Nature. 2019;2019(570):332–337. - PMC - PubMed
    1. Van Der Wijst MGP, Brugge H, De Vries DH, Deelen P, Swertz MA, Franke L. Single-cell RNA sequencing identifies celltype-specific cis-eQTLs and co-expression QTLs. Nat Genet. 2018;50:493–7. - PMC - PubMed
    1. La Manno G, Soldatov R, Zeisel A, Braun E, Hochgerner H, Petukhov V, et al. RNA velocity of single cells. Nature. 2018;560(7719):494–8. - PMC - PubMed
    1. Lotfollahi M, Wolf FA, Theis FJ. scGen predicts single-cell perturbation responses. Nat Methods. 2019;16(8):715–21. - PubMed
    1. Choi K, Chen Y, Skelly DA, Churchill GA. Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics. Genome Biol. 2020;21:183. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources