Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Sep 17:13:484.
doi: 10.1186/1471-2164-13-484.

Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing

Affiliations

Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing

José A Robles et al. BMC Genomics. .

Abstract

Background: RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods.

Results: Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices.

Conclusions: This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The percentage of transcripts reported differentially expressed, FPR defined by Eq. 4 by three software packages for synthetic data generated under the null hypothesis of no DE between two conditions. In the lower two panels the set of transcripts has been divided into those with greater than 100 counts (DE-high) and those with less than or equal to 100 counts (DE-low) averaged over biological replicates. The number of biological replicates in each condition was varied over the range n = 2, 3, …12. The experiment was repeated for 100 independently generated datasets. The top of each bar is the median value obtained and its 90% confidence interval.
Figure 2
Figure 2
Histograms of p-values calculated by three software packages for one particular example of synthetic data generated under the null hypothesis for the case n = 3. In the two right hand columns the set of transcripts has been divided into high-count transcripts (> 100 counts) and low-count transcripts (≤ 100 counts) respectively. ‘Percentage of total’ is the percentage of p-values falling within each of 100 bins in each histogram.
Figure 3
Figure 3
TPR and FPR detected by DESeq as a function of sequencing depth and replication. Different symbols represent the number n of control vs. treatment samples (n = 2, 3, 4, 6, 8, and 12) across sequence depths [100%→1%]. A: TPR (Eq. 6 at α = 1%) padj ≤ 0.01. B: FPR (Eq. 5 at α = 1%) padj ≤ 0.01. The solid grey line (“multiplex line”) connecting the TPR values of n biological replicates at 1n×100% sequencing depth shows the increase of TPR as more biological replicates n are used despite the loss power due to the sequencing depth reduction required by the multiplexing of lanes. This trend remains true even for the n = 32 and n = 96 cases.
Figure 4
Figure 4
Same as Figure3but using 2-fold-changes as the criterion for FPR and TPR instead of padj ≤ 0.01. A: TPR fold-change ≥ 2. B: FPR fold-change ≥ 2. The “multiplex line” connects the TPR and and FPR values of n biological replicates at 1n×100% sequencing depth.

References

    1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammilian transcriptomes by RNA-seq. Nat Methods. 2008;5(7):621–628. doi: 10.1038/nmeth.1226. - DOI - PubMed
    1. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;28:511–515. doi: 10.1038/nbt.1621. - DOI - PMC - PubMed
    1. Guttman M, Garber M, Levin JZ, Donaghey J, Robinson J, Adiconis X, Fan L, Koziol MJ, Gnirke A, Nusbaum C, Rinn JL, Lander ES, Regev A. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol. 2010;28:503–510. doi: 10.1038/nbt.1633. - DOI - PMC - PubMed
    1. Haas BJ, Zody MC. Advancing RNA-Seq analysis. Nat Biotechnol. 2010;28:421–423. doi: 10.1038/nbt0510-421. - DOI - PubMed
    1. Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet. 2008;40:1413–1415. doi: 10.1038/ng.259. - DOI - PubMed

Publication types

MeSH terms