Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2017 Dec 21;12(12):e0190152.
doi: 10.1371/journal.pone.0190152. eCollection 2017.

RNA-Seq differential expression analysis: An extended review and a software tool

Affiliations
Review

RNA-Seq differential expression analysis: An extended review and a software tool

Juliana Costa-Silva et al. PLoS One. .

Abstract

The correct identification of differentially expressed genes (DEGs) between specific conditions is a key in the understanding phenotypic variation. High-throughput transcriptome sequencing (RNA-Seq) has become the main option for these studies. Thus, the number of methods and softwares for differential expression analysis from RNA-Seq data also increased rapidly. However, there is no consensus about the most appropriate pipeline or protocol for identifying differentially expressed genes from RNA-Seq data. This work presents an extended review on the topic that includes the evaluation of six methods of mapping reads, including pseudo-alignment and quasi-mapping and nine methods of differential expression analysis from RNA-Seq data. The adopted methods were evaluated based on real RNA-Seq data, using qRT-PCR data as reference (gold-standard). As part of the results, we developed a software that performs all the analysis presented in this work, which is freely available at https://github.com/costasilvati/consexpression. The results indicated that mapping methods have minimal impact on the final DEGs analysis, considering that adopted data have an annotated reference genome. Regarding the adopted experimental model, the DEGs identification methods that have more consistent results were the limma+voom, NOIseq and DESeq2. Additionally, the consensus among five DEGs identification methods guarantees a list of DEGs with great accuracy, indicating that the combination of different methods can produce more suitable results. The consensus option is also included for use in the available software.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Overview of the pipeline presented in this work.
The adopted biological samples to generate the qRT-PCR data were the same as those used to generate the RNA-Seq data.
Fig 2
Fig 2. Comparison of identified DEGs from different expression analysis tools, associated to distinct RNA-Seq mapping methods compared to qRT-PCR.
(A) Venn diagram comparing identified DEGs by the baySeq tool with BWA, TopHat, Bowtie and qRT-PCR mappers. (B) Venn diagram comparing identified DEGs by the edgeR tool with BWA, TopHat, Bowtie and qRT-PCR mappers. (C) Venn diagram comparing identified DEGs by the NOIseq with BWA, TopHat, Bowtie and qRT-PCR mappers. (D) Venn diagram comparing identified DEGs by the DESeq with BWA, TopHat, Bowtie and qRT-PCR mappers.
Fig 3
Fig 3. Histogram from DEGs identification methods integration.
The red bars indicate the DEGs identified as differentially expressed (True Positives). The blue bars indicate the not differentially expressed transcripts identified as DEGs from methods (False Positives). The Y axis indicates the amount of tools that identified correctly the transcripts as differentially expressed or not. The first row (bars with 0 in Y axis) indicate DEGs and not differentially expressed genes from qRT-PCR (gold standard) with 413 DEGs and 584 not differentially expressed transcripts, totaling 997 genes analyzed. There are no performance values for nine tools, since there was no convergence of the results with transcripts indicated by nine methods as DEG.
Fig 4
Fig 4. ROC curve from integration of DEG identification methods.
Each point indicate the performance of the best subset regarding the adopted qRT-PCR.
Fig 5
Fig 5. Projection curves of TPR and SPC.
Projection curves of TPR and SPC values when combining DEGs identification methods. The X axis is the quantity of combined DEGs identification methods. The Y axis is the evolution of TPR and SPC values regarding the adopted qRT-PCR.

References

    1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods. 2008;5(7):621–628. doi: 10.1038/nmeth.1226 - DOI - PubMed
    1. Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD, et al. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature. 2008;452(7184):215–219. doi: 10.1038/nature06745 - DOI - PMC - PubMed
    1. Agarwal A, Koppstein D, Rozowsky J, Sboner A, Habegger L, Hillier LW, et al. Comparison and calibration of transcriptome data from RNA-Seq and tiling arrays. BMC genomics. 2010;11(1):1 doi: 10.1186/1471-2164-11-383 - DOI - PMC - PubMed
    1. Kratz A, Carninci P. The devil in the details of RNA-seq. Nature biotechnology. 2014;32(9):882–884. doi: 10.1038/nbt.3015 - DOI - PubMed
    1. Zhang ZH, Jhaveri DJ, Marshall VM, Bauer DC, Edson J, Narayanan RK, et al. A comparative study of techniques for differential expression analysis on RNA-Seq data. PloS one. 2014;9(8):e103207 doi: 10.1371/journal.pone.0103207 - DOI - PMC - PubMed

Publication types