Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Feb 22:2023.02.21.529455.
doi: 10.1101/2023.02.21.529455.

Differentially Expressed Heterogeneous Overdispersion Genes Testing for Count Data

Affiliations

Differentially Expressed Heterogeneous Overdispersion Genes Testing for Count Data

Yubai Yuan et al. bioRxiv. .

Update in

Abstract

The mRNA-seq data analysis is a powerful technology for inferring information from biological systems of interest. Specifically, the sequenced RNA fragments are aligned with genomic reference sequences, and we count the number of sequence fragments corresponding to each gene for each condition. A gene is identified as differentially expressed (DE) if the difference in its count numbers between conditions is statistically significant. Several statistical analysis methods have been developed to detect DE genes based on RNA-seq data. However, the existing methods could suffer decreasing power to identify DE genes arising from overdispersion and limited sample size. We propose a new differential expression analysis procedure: heterogeneous overdispersion genes testing (DEHOGT) based on heterogeneous overdispersion modeling and a post-hoc inference procedure. DEHOGT integrates sample information from all conditions and provides a more flexible and adaptive overdispersion modeling for the RNA-seq read count. DEHOGT adopts a gene-wise estimation scheme to enhance the detection power of differentially expressed genes. DEHOGT is tested on the synthetic RNA-seq read count data and outperforms two popular existing methods, DESeq and EdgeR, in detecting DE genes. We apply the proposed method to a test dataset using RNAseq data from microglial cells. DEHOGT tends to detect more differently expressed genes potentially related to microglial cells under different stress hormones treatments.

Keywords: Differential expression; Gene expression; Generalized linear modeling; RNA-Seq data.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
The overdispersion in the real RNA-seq count data
Figure 2:
Figure 2:
The false negative rate from different methods when the read counts follow the quasi-Poisson distribution with different overdispersion levels θgQP in simulation setting 1. The bars represents the standard deviation of the false negative rate over repeated experiments.
Figure 3:
Figure 3:
The false negative rate from different methods when the read counts follow the negative binomial distribution with different overdispersion levels θgNB in simulation setting 1. The bars represents the standard deviation of the false negative rate over repeated experiments.
Figure 4:
Figure 4:
The area under the ROC curve from different methods when the read counts follow the quasi-Poisson distribution with different overdispersion levels θgQP in simulation setting 1.
Figure 5:
Figure 5:
The area under the ROC curve from different methods when the read counts follow the negative binomial distribution with different overdispersion levels θgNB in simulation setting 1.
Figure 6:
Figure 6:
The false negative rate from different methods when the read count follow the quasi-Poisson distribution with different overdispersion levels θQP. The variance of FNR obtained from repeated experiments is illustrated using the bars.
Figure 7:
Figure 7:
The false negative rate from different methods when the read count follows a negative binomial distribution with different overdispersion levels θNB.
Figure 8:
Figure 8:
The AUC from different methods when the read count follows the quasi-Poisson distribution with different overdispersion levels θQP. The variance of FNR obtained from repeated experiments is illustrated using the bars.
Figure 9:
Figure 9:
The AUC from different methods when the read counts follow the negative binomial distribution with different overdispersion levels θNB.
Figure 10:
Figure 10:
The ROC curve from different methods when the read counts follow the negative binomial distribution with θgNB(1,2) and quasi-Poisson distribution with θgQP(50,100).
Figure 11:
Figure 11:
The dispersion of genewise RNA read counts. Each dot corresponds to a sample count from a specific gene.
Figure 12:
Figure 12:
The selected DE genes from DEHOGT, DESeq, EdgeR under treatment comparison dexh3 versus control.
Figure 13:
Figure 13:
The selected DE genes from DEHOGT, DESeq, EdgeR under treatment comparison dexh6 versus control.
Figure 14:
Figure 14:
The selected DE genes from DEHOGT, DESeq, EdgeR under treatment comparison corth3 versus control.
Figure 15:
Figure 15:
The selected DE genes from DEHOGT, DESeq, EdgeR under treatment comparison corth6 versus control.
Figure 16:
Figure 16:
The selected DE genes from DEHOGT, DESeq, EdgeR under treatment comparison dexvh3 versus dexh3.
Figure 17:
Figure 17:
The selected DE genes from DEHOGT, DESeq, EdgeR under treatment comparison dexvl3 vs dexl3.
Figure 18:
Figure 18:
The selected DE genes from DEHOGT, DESeq, EdgeR under treatment comparison cortvh3 vs corth3.
Figure 19:
Figure 19:
The rank of p-value of selected genes under treatment comparison dexh3 versus control, a shorter bar indicates a smaller p-value (more significantly differently expressed).
Figure 20:
Figure 20:
The rank of p-value of selected genes under treatment comparison dexh6 versus control, and shorter bar indicates a smaller p-value.
Figure 21:
Figure 21:
The rank of p-value of selected genes under treatment comparison corth3 versus control, and shorter bar indicates a smaller p-value.
Figure 22:
Figure 22:
The rank of p-value of selected genes under treatment comparison dexvh versus dexh, and shorter bar indicates a smaller p-value.
Figure 23:
Figure 23:
The rank of p-value of selected genes under treatment comparison dexvl versus dexl, and shorter bar indicates a smaller p-value.
Figure 24:
Figure 24:
The rank of p-value of selected genes under treatment comparison cortvh versus corth, and shorter bar indicates a smaller p-value.

Similar articles

References

    1. Anders S. and Huber W. (2010). Differential expression analysis for sequence count data. Nature Precedings, pages 1–1. - PMC - PubMed
    1. Appel K., Schwahn C., Mahler J., Schulz A., Spitzer C., Fenske K., Stender J., Barnow S., John U., Teumer A., et al. (2011). Moderation of adult depression by a polymorphism in the FKBP5 gene and childhood physical abuse in the general population. Neuropsychopharmacology, 36(10):1982–1991. - PMC - PubMed
    1. Benjamini Y. and Hochberg Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1):289–300.
    1. Binder E. B. (2009). The role of FKBP5, a co-chaperone of the glucocorticoid receptor in the pathogenesis and therapy of affective and anxiety disorders. Psychoneuroendocrinology, 34:S186–S195. - PubMed
    1. Blois S. M., Sulkowski G., Tirado-González I., Warren J., Freitag N., Klapp B. F., Rifkin D., Fuss I., Strober W., and Dveksler G. S. (2014). Pregnancy-specific glycoprotein 1 (PSG1) activates TGF-β and prevents dextran sodium sulfate (DSS)-induced colitis in mice. Mucosal Immunology, 7(2):348–358. - PMC - PubMed

Publication types