A two-step integrated approach to detect differentially expressed genes in RNA-Seq data
- PMID: 27774870
- DOI: 10.1142/S0219720016500347
A two-step integrated approach to detect differentially expressed genes in RNA-Seq data
Abstract
One of the primary objectives of ribonucleic acid (RNA) sequencing or RNA-Seq experiment is to identify differentially expressed (DE) genes in two or more treatment conditions. It is a common practice to assume that all read counts from RNA-Seq data follow overdispersed (OD) Poisson or negative binomial (NB) distribution, which is sometimes misleading because within each condition, some genes may have unvarying transcription levels with no overdispersion. In such a case, it is more appropriate and logical to consider two sets of genes: OD and non-overdispersed (NOD). We propose a new two-step integrated approach to distinguish DE genes in RNA-Seq data using standard Poisson and NB models for NOD and OD genes, respectively. This is an integrated approach because this method can be merged with any other NB-based methods for detecting DE genes. We design a simulation study and analyze two real RNA-Seq data to evaluate the proposed strategy. We compare the performance of this new method combined with the three [Formula: see text]-software packages namely edgeR, DESeq2, and DSS with their default settings. For both the simulated and real data sets, integrated approaches perform better or at least equally well compared to the regular methods embedded in these [Formula: see text]-packages.
Keywords: Next generation sequencing; RNA-Seq; differential expression; gene expression.
Similar articles
-
LFCseq: a nonparametric approach for differential expression analysis of RNA-seq data.BMC Genomics. 2014;15 Suppl 10(Suppl 10):S7. doi: 10.1186/1471-2164-15-S10-S7. Epub 2014 Dec 12. BMC Genomics. 2014. PMID: 25560842 Free PMC article.
-
Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data.PLoS One. 2020 Apr 30;15(4):e0232271. doi: 10.1371/journal.pone.0232271. eCollection 2020. PLoS One. 2020. PMID: 32353015 Free PMC article.
-
Detecting differentially expressed genes by smoothing effect of gene length on variance estimation.J Bioinform Comput Biol. 2015 Dec;13(6):1542004. doi: 10.1142/S0219720015420044. Epub 2015 Oct 11. J Bioinform Comput Biol. 2015. PMID: 26608751
-
Statistical detection of differentially expressed genes based on RNA-seq: from biological to phylogenetic replicates.Brief Bioinform. 2016 Mar;17(2):243-8. doi: 10.1093/bib/bbv035. Epub 2015 Jun 24. Brief Bioinform. 2016. PMID: 26108230 Review.
-
A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data.Am J Bot. 2012 Feb;99(2):248-56. doi: 10.3732/ajb.1100340. Epub 2012 Jan 20. Am J Bot. 2012. PMID: 22268221 Review.
Cited by
-
The rise of the distributions: why non-normality is important for understanding the transcriptome and beyond.Biophys Rev. 2019 Feb;11(1):89-94. doi: 10.1007/s12551-018-0494-4. Epub 2019 Jan 7. Biophys Rev. 2019. PMID: 30617454 Free PMC article. Review.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources