Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 5:12:616686.
doi: 10.3389/fgene.2021.616686. eCollection 2021.

Identifying Differentially Expressed Genes of Zero Inflated Single Cell RNA Sequencing Data Using Mixed Model Score Tests

Affiliations

Identifying Differentially Expressed Genes of Zero Inflated Single Cell RNA Sequencing Data Using Mixed Model Score Tests

Zhiqiang He et al. Front Genet. .

Abstract

Single cell RNA sequencing (scRNA-seq) allows quantitative measurement and comparison of gene expression at the resolution of single cells. Ignoring the batch effects and zero inflation of scRNA-seq data, many proposed differentially expressed (DE) methods might generate bias. We propose a method, single cell mixed model score tests (scMMSTs), to efficiently identify DE genes of scRNA-seq data with batch effects using the generalized linear mixed model (GLMM). scMMSTs treat the batch effect as a random effect. For zero inflation, scMMSTs use a weighting strategy to calculate observational weights for counts independently under zero-inflated and zero-truncated distributions. Counts data with calculated weights were subsequently analyzed using weighted GLMMs. The theoretical null distributions of the score statistics were constructed by mixed Chi-square distributions. Intensive simulations and two real datasets were used to compare edgeR-zinbwave, DESeq2-zinbwave, and scMMSTs. Our study demonstrates that scMMSTs, as supplement to standard methods, are advantageous to define DE genes of zero-inflated scRNA-seq data with batch effects.

Keywords: differential expression analyses; generalized linear mixed model; observational weights; score test; single cell RNA sequencing; zero inflation.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
False positive rate control on simulated null Usoskin datasets and Tung datasets. (A) Boxplot of PCER for 30 simulated null Usoskin datasets generated by splatter for each of 12 DE methods. scMMSTs are marked in blue. (B) Histogram of uncorrected p-values for one dataset in panel A. (C) Boxplot of PCER for 30 simulated null Tung datasets generated by splatter for each of 12 DE methods. scMMSTs are marked in blue. (D) Histogram of uncorrected p-values for one dataset in panel C. PCER, per-comparison error rate; DE, differential expression; scMMST, single cell mixed model score test.
FIGURE 2
FIGURE 2
FDP-TPR curves of DE methods on simulated Usoskin datasets and Tung datasets. (A) Line plot of the FDP-TPR curves for simulated Usoskin datasets generated by splatter for each of 12 DE methods. (B) Line plot of the FDP-TPR curves for simulated Tang datasets generated by splatter for each of 12 DE methods. Circles represent values at a 0.05 nominal FDR threshold and are filled in if the FDP (i.e., empirical FDR) is less than 0.05. DE, differential expression; TPR, true positive rate; FDP, false discovery proportion; FDR, false discovery rate.
FIGURE 3
FIGURE 3
FDP-TPR curves of DE methods on simulated datasets generated by GLMMs with μπ = 0. (A) Line plot of the FDP-TPR curves for simulated datasets based on NB GLMMs for each of 12 DE methods with the dispersion parameter θ = 0.5. (B) Line plot of the FDP-TPR curves for simulated datasets based on negative binomial (NB) GLMMs for each of 12 DE methods with θ = 1. (C) Line plot of the FDP-TPR curves for simulated datasets based on NB GLMMs for each of 12 DE methods with θ = 2. (D) Line plot of the FDP-TPR curves for simulated datasets based on Poisson GLMMs for each of 12 DE methods with β0=σβ2= 0.01. Circles represent values at a 0.05 nominal FDR threshold and are filled in if the FDP (i.e., empirical FDR) is less than 0.05. DE, differential expression; GLMM, generalized linear mixed model; NB, negative binomial; TPR, true positive rate; FDP, false discovery proportion; FDR, false discovery rate.
FIGURE 4
FIGURE 4
AUCs of DE methods for simulated datasets generated by GLMMs with μπ = 0. Adjusted p-values are used as predictors. (A) Bar plot of AUCs for simulated datasets generated by NB GLMMs for each of 12 DE methods with the dispersion parameter θ = 0.5. (B) Bar plot of AUCs for simulated datasets generated by NB GLMMs for each of 12 DE methods with θ = 1. (C) Bar plot of AUCs for simulated datasets generated by NB GLMMs for each of 12 DE methods with θ = 2. (D) Bar plot of AUCs for simulated datasets generated by Poisson GLMMs for each of 12 DE methods. AUC, area under curve; DE, differential expression; GLMM, generalized linear mixed model; NB, negative binomial.
FIGURE 5
FIGURE 5
Computational times for differential expression methods on the simulated null Usoskin and Tung datasets, which were generated by splatter. The number of cores were set to be 1 and 8 on a cluster with 24 Intel Xeon Processor (Skylake, IBRS) at 2.60 GHz (2593 MHz) and 128 GB RAM.

Similar articles

Cited by

References

    1. Benjamini Y., Hochberg Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B-Methodol. 57 289–300. 10.1111/j.2517-6161.1995.tb02031.x - DOI
    1. Böhning D., Dietz E., Schlattmann P., Mendonça L., Kirchner U. (1999). The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. J. R. Stat. Soc. Ser. A 162 195–209. 10.1111/1467-985X.00130 - DOI
    1. Breslow N. E., Clayton D. G. (1993). Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 88 9–25. 10.2307/2290687 - DOI
    1. Butler A., Hoffman P., Smibert P., Papalexi E., Satija R. (2018). Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36 411–420. 10.1038/nbt.4096 - DOI - PMC - PubMed
    1. Büttner M., Miao Z., Wolf F. A., Teichmann S. A., Theis F. J. (2019). A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods 16 43–49. 10.1038/s41592-018-0254-1 - DOI - PubMed

LinkOut - more resources