Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Aug 17;11(8):e0159182.
doi: 10.1371/journal.pone.0159182. eCollection 2016.

LPEseq: Local-Pooled-Error Test for RNA Sequencing Experiments with a Small Number of Replicates

Affiliations

LPEseq: Local-Pooled-Error Test for RNA Sequencing Experiments with a Small Number of Replicates

Jungsoo Gim et al. PLoS One. .

Abstract

RNA-Sequencing (RNA-Seq) provides valuable information for characterizing the molecular nature of the cells, in particular, identification of differentially expressed transcripts on a genome-wide scale. Unfortunately, cost and limited specimen availability often lead to studies with small sample sizes, and hypothesis testing on differential expression between classes with a small number of samples is generally limited. The problem is especially challenging when only one sample per each class exists. In this case, only a few methods among many that have been developed are applicable for identifying differentially expressed transcripts. Thus, the aim of this study was to develop a method able to accurately test differential expression with a limited number of samples, in particular non-replicated samples. We propose a local-pooled-error method for RNA-Seq data (LPEseq) to account for non-replicated samples in the analysis of differential expression. Our LPEseq method extends the existing LPE method, which was proposed for microarray data, to allow examination of non-replicated RNA-Seq experiments. We demonstrated the validity of the LPEseq method using both real and simulated datasets. By comparing the results obtained using the LPEseq method with those obtained from other methods, we found that the LPEseq method outperformed the others for non-replicated datasets, and showed a similar performance with replicated samples; LPEseq consistently showed high true discovery rate while not increasing the rate of false positives regardless of the number of samples. Our proposed LPEseq method can be effectively used to conduct differential expression analysis as a preliminary design step or for investigation of a rare specimen, for which a limited number of samples is available.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Schematic representation of the local-pooled-error method for RNA-Seq data (LPEseq) method.
(A) The flow chart of the proposed algorithm. The proposed method first determines intensity bins (percentile by default) and evaluates the LPE distribution differently depending on the existence of replicates in each class: LPE per each class with replicates and LPE between classes with non-replicated experiments. For non-replicated cases, the addition step smoothens the LPE distribution by removing outliers. Detailed examples are depicted in case of replicated (B) and non-replicated (C) experiments. Blue and green colors represent different classes (i.e., X and Y). The red dotted line and orange line represent the LPE curve with and without outliers, respectively. DE transcripts are colored in red.
Fig 2
Fig 2. Venn diagrams of differentially expressed (DE) transcripts.
Two RNA-Seq datasets with one replicate (A) and two replicates (B) in each class. Five different methods, i.e., LPEseq (brown), edgeR (sky blue), DESeq (green), DESeq2 (violet) and NBPSeq (red) were used. A density plot of the mean difference between classes of uniquely found DE transcripts in each method was indicated. X- and Y-axis represent group mean difference and density. The number in parentheses indicates the total number of DE transcripts found. The criterion used to call DE was Benjamini-Hochberg corrected p-value less than 0.05 for all methods. The enriched terms gene set analysis was performed by DAVID web-tool.

Similar articles

Cited by

References

    1. Maher CA, Kumar-Sinha C, Cao X, Kalyana-Sundaram S, Han B, Jing X, et al. Transcriptome sequencing to detect gene fusions in cancer. Nature. 2009;458(7234):97–101. 10.1038/nature07638 - DOI - PMC - PubMed
    1. Gregg C, Zhang J, Weissbourd B, Luo S, Schroth GP, Haig D, et al. High-resolution analysis of parent-of-origin allelic expression in the mouse brain. Science. 2010;329(5992):643–8. 10.1126/science.1190830 - DOI - PMC - PubMed
    1. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008;18(9):1509–17. 10.1101/gr.079558.108 - DOI - PMC - PubMed
    1. Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11(3):R25 10.1186/gb-2010-11-3-r25 - DOI - PMC - PubMed
    1. Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106 10.1186/gb-2010-11-10-r106 - DOI - PMC - PubMed

LinkOut - more resources