Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 17;19(7):e0300565.
doi: 10.1371/journal.pone.0300565. eCollection 2024.

Differentially expressed heterogeneous overdispersion genes testing for count data

Affiliations

Differentially expressed heterogeneous overdispersion genes testing for count data

Yubai Yuan et al. PLoS One. .

Abstract

The mRNA-seq data analysis is a powerful technology for inferring information from biological systems of interest. Specifically, the sequenced RNA fragments are aligned with genomic reference sequences, and we count the number of sequence fragments corresponding to each gene for each condition. A gene is identified as differentially expressed (DE) if the difference in its count numbers between conditions is statistically significant. Several statistical analysis methods have been developed to detect DE genes based on RNA-seq data. However, the existing methods could suffer decreasing power to identify DE genes arising from overdispersion and limited sample size, where overdispersion refers to the empirical phenomenon that the variance of read counts is larger than the mean of read counts. We propose a new differential expression analysis procedure: heterogeneous overdispersion genes testing (DEHOGT) based on heterogeneous overdispersion modeling and a post-hoc inference procedure. DEHOGT integrates sample information from all conditions and provides a more flexible and adaptive overdispersion modeling for the RNA-seq read count. DEHOGT adopts a gene-wise estimation scheme to enhance the detection power of differentially expressed genes when the number of replicates is limited as long as the number of conditions is large. DEHOGT is tested on the synthetic RNA-seq read count data and outperforms two popular existing methods, DESeq2 and EdgeR, in detecting DE genes. We apply the proposed method to a test dataset using RNAseq data from microglial cells. DEHOGT tends to detect more differently expressed genes potentially related to microglial cells under different stress hormones treatments.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The overdispersion in the real RNA-seq count data.
Fig 2
Fig 2. The false negative rate from different methods when the read counts follow the quasi-Poisson distribution with different overdispersion levels θgQP in simulation setting 1.
The bars represents the standard deviation of the false negative rate over repeated experiments.
Fig 3
Fig 3. The false negative rate from different methods when the read counts follow the negative binomial distribution with different overdispersion levels θgNB in simulation setting 1.
The bars represents the standard deviation of the false negative rate over repeated experiments.
Fig 4
Fig 4. The area under the ROC curve from different methods when the read counts follow the quasi-Poisson distribution with different overdispersion levels θgQP in simulation setting 1.
Fig 5
Fig 5. The area under the ROC curve from different methods when the read counts follow the negative binomial distribution with different overdispersion levels θgNB in simulation setting 1.
Fig 6
Fig 6. The false negative rate from different methods when the read count follow the quasi-Poisson distribution with different overdispersion levels θQP.
The variance of FNR obtained from repeated experiments is illustrated using the bars.
Fig 7
Fig 7. The false negative rate from different methods when the read count follows a negative binomial distribution with different overdispersion levels θNB.
Fig 8
Fig 8. The AUC from different methods when the read count follows the quasi-Poisson distribution with different overdispersion levels θQP.
The variance of FNR obtained from repeated experiments is illustrated using the bars.
Fig 9
Fig 9. The AUC from different methods when the read counts follow the negative binomial distribution with different overdispersion levels θNB.
Fig 10
Fig 10. The ROC curve from different methods when the read counts follow the negative binomial distribution with θgNB(1,2) and quasi-Poisson distribution with θgQP(50,100).
Fig 11
Fig 11. The dispersion of genewise RNA read counts.
Each dot corresponds to a sample count from a specific gene.
Fig 12
Fig 12. The selected DE genes from DEHOGT, DESeq2, EdgeR under treatment comparison dexh3 versus control.
Fig 13
Fig 13. The selected DE genes from DEHOGT, DESeq2, EdgeR under treatment comparison dexh6 versus control.
Fig 14
Fig 14. The selected DE genes from DEHOGT, DESeq2, EdgeR under treatment comparison cortvh3 vs corth3.
Fig 15
Fig 15. The selected DE genes from DEHOGT, DESeq2, EdgeR under treatment comparison corth6 versus control.
Fig 16
Fig 16. The selected DE genes from DEHOGT, DESeq2, EdgeR under treatment comparison dexvh3 versus dexh3.
Fig 17
Fig 17. The selected DE genes from DEHOGT, DESeq2, EdgeR under treatment comparison dexvl3 vs dexl3.
Fig 18
Fig 18. The selected DE genes from DEHOGT, DESeq2, EdgeR under treatment comparison corth3 versus control.
Fig 19
Fig 19. The rank of p-value of selected genes under treatment comparison dexh3 versus control, a shorter bar indicates a smaller p-value (more significantly differently expressed).
Fig 20
Fig 20. The rank of p-value of selected genes under treatment comparison dexh6 versus control, and shorter bar indicates a smaller p-value.
Fig 21
Fig 21. The rank of p-value of selected genes under treatment comparison corth3 versus control, and shorter bar indicates a smaller p-value.
Fig 22
Fig 22. The rank of p-value of selected genes under treatment comparison dexvh versus dexh, and shorter bar indicates a smaller p-value.
Fig 23
Fig 23. The rank of p-value of selected genes under treatment comparison dexvl versus dexl, and shorter bar indicates a smaller p-value.
Fig 24
Fig 24. The rank of p-value of selected genes under treatment comparison cortvh versus corth, and shorter bar indicates a smaller p-value.

Update of

Similar articles

References

    1. Mardis ER. The impact of next-generation sequencing technology on genetics. Trends in Genetics. 2008;24(3):133–141. doi: 10.1016/j.tig.2007.12.007 - DOI - PubMed
    1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods. 2008;5(7):621–628. doi: 10.1038/nmeth.1226 - DOI - PubMed
    1. Zhang H, Pounds SB, Tang L. Statistical methods for overdispersion in mRNA-Seq count data. The Open Bioinformatics Journal. 2013;7(1). doi: 10.2174/1875036201307010034 - DOI
    1. Yehuda R. Post-traumatic stress disorder. New England Journal of Medicine. 2002;346(2):108–114. doi: 10.1056/NEJMra012941 - DOI - PubMed
    1. Kessler RC, Aguilar-Gaxiola S, Alonso J, Benjet C, Bromet EJ, Cardoso G, et al.. Trauma and PTSD in the WHO world mental health surveys. European Journal of Psychotraumatology. 2017;8(sup5):1353383. doi: 10.1080/20008198.2017.1353383 - DOI - PMC - PubMed

LinkOut - more resources