Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 5;36(24):5701-5702.
doi: 10.1093/bioinformatics/btaa1009.

glmGamPoi: fitting Gamma-Poisson generalized linear models on single cell count data

Affiliations

glmGamPoi: fitting Gamma-Poisson generalized linear models on single cell count data

Constantin Ahlmann-Eltze et al. Bioinformatics. .

Abstract

Motivation: The Gamma-Poisson distribution is a theoretically and empirically motivated model for the sampling variability of single cell RNA-sequencing counts and an essential building block for analysis approaches including differential expression analysis, principal component analysis and factor analysis. Existing implementations for inferring its parameters from data often struggle with the size of single cell datasets, which can comprise millions of cells; at the same time, they do not take full advantage of the fact that zero and other small numbers are frequent in the data. These limitations have hampered uptake of the model, leaving room for statistically inferior approaches such as logarithm(-like) transformation.

Results: We present a new R package for fitting the Gamma-Poisson distribution to data with the characteristics of modern single cell datasets more quickly and more accurately than existing methods. The software can work with data on disk without having to load them into RAM simultaneously.

Availabilityand implementation: The package glmGamPoi is available from Bioconductor for Windows, macOS and Linux, and source code is available on github.com/const-ae/glmGamPoi under a GPL-3 license. The scripts to reproduce the results of this paper are available on github.com/const-ae/glmGamPoi-Paper.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Bar plot comparing the runtime of glmGamPoi (in-memory, on-disk and without overdispersion estimation), edgeR and DESeq2 (with its own implementation, or calling glmGamPoi) on the Mouse Gastrulation dataset. The time measurements were repeated five times each as a single process without parallelization on a different node of a multi-node computing cluster with minor amounts of competing tasks. The points show individual measurements, the bars their median. To reproduce the results, see Supplementary Appendix S2

References

    1. Anders S., Huber W. (2010) Differential expression analysis for sequence count data. Genome Biol., 11, R106. - PMC - PubMed
    1. Crowell H.L. et al. (2019) On the discovery of population-specific state transitions from multi-sample multi-condition single-cell RNA sequencing data. bioRxiv, pp. 1–24. https://www.biorxiv.org/content/10.1101/713412v3 - DOI
    1. Grün D. et al. (2014) Validation of noise models for single-cell transcriptomics. Nat. Methods, 11, 637–640. - PubMed
    1. Hafemeister C., Satija R. (2019) Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol., 20, 1–15. - PMC - PubMed
    1. Love M.I. et al. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol., 15, 550. - PMC - PubMed

Publication types