Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
- PMID: 20167110
- PMCID: PMC2838869
- DOI: 10.1186/1471-2105-11-94
Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
Abstract
Background: High-throughput sequencing technologies, such as the Illumina Genome Analyzer, are powerful new tools for investigating a wide range of biological and medical questions. Statistical and computational methods are key for drawing meaningful and accurate conclusions from the massive and complex datasets generated by the sequencers. We provide a detailed evaluation of statistical methods for normalization and differential expression (DE) analysis of Illumina transcriptome sequencing (mRNA-Seq) data.
Results: We compare statistical methods for detecting genes that are significantly DE between two types of biological samples and find that there are substantial differences in how the test statistics handle low-count genes. We evaluate how DE results are affected by features of the sequencing platform, such as, varying gene lengths, base-calling calibration method (with and without phi X control lane), and flow-cell/library preparation effects. We investigate the impact of the read count normalization method on DE results and show that the standard approach of scaling by total lane counts (e.g., RPKM) can bias estimates of DE. We propose more general quantile-based normalization procedures and demonstrate an improvement in DE detection.
Conclusions: Our results have significant practical and methodological implications for the design and analysis of mRNA-Seq experiments. They highlight the importance of appropriate statistical methods for normalization and DE inference, to account for features of the sequencing platform that could impact the accuracy of results. They also reveal the need for further research in the development of statistical and computational methods for mRNA-Seq.
Figures








Similar articles
-
Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster.BMC Genomics. 2016 Jan 5;17:28. doi: 10.1186/s12864-015-2353-z. BMC Genomics. 2016. PMID: 26732976 Free PMC article.
-
deGPS is a powerful tool for detecting differential expression in RNA-sequencing studies.BMC Genomics. 2015 Jun 13;16(1):455. doi: 10.1186/s12864-015-1676-0. BMC Genomics. 2015. PMID: 26070955 Free PMC article.
-
Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions.Brief Bioinform. 2018 Sep 28;19(5):776-792. doi: 10.1093/bib/bbx008. Brief Bioinform. 2018. PMID: 28334202 Free PMC article.
-
Statistical detection of differentially expressed genes based on RNA-seq: from biological to phylogenetic replicates.Brief Bioinform. 2016 Mar;17(2):243-8. doi: 10.1093/bib/bbv035. Epub 2015 Jun 24. Brief Bioinform. 2016. PMID: 26108230 Review.
-
Characterizing and annotating the genome using RNA-seq data.Sci China Life Sci. 2017 Feb;60(2):116-125. doi: 10.1007/s11427-015-0349-4. Epub 2016 Jun 13. Sci China Life Sci. 2017. PMID: 27294835 Review.
Cited by
-
Estimation of Gene Expression at Isoform Level from mRNA-Seq Data by Bayesian Hierarchical Modeling.Front Genet. 2012 Nov 27;3:239. doi: 10.3389/fgene.2012.00239. eCollection 2012. Front Genet. 2012. PMID: 23293650 Free PMC article.
-
Bayesian Hierarchical Model for Differential Gene Expression Using RNA-seq Data.Stat Biosci. 2015 May 1;7(1):48-67. doi: 10.1007/s12561-013-9096-7. Stat Biosci. 2015. PMID: 26191087 Free PMC article.
-
TCC: an R package for comparing tag count data with robust normalization strategies.BMC Bioinformatics. 2013 Jul 9;14:219. doi: 10.1186/1471-2105-14-219. BMC Bioinformatics. 2013. PMID: 23837715 Free PMC article.
-
An optimized protocol for generation and analysis of Ion Proton sequencing reads for RNA-Seq.BMC Genomics. 2016 May 26;17:403. doi: 10.1186/s12864-016-2745-8. BMC Genomics. 2016. PMID: 27229683 Free PMC article.
-
Global transcriptional and phenotypic analyses of Escherichia coli O157:H7 strain Xuzhou21 and its pO157_Sal cured mutant.PLoS One. 2013 May 30;8(5):e65466. doi: 10.1371/journal.pone.0065466. Print 2013. PLoS One. 2013. PMID: 23738017 Free PMC article.
References
-
- Hoen PAC, Ariyurek Y, Thygesen HH, Vreugdenhil E, Vossen RHAM, de Menezes RX, Boer JM, van Ommen GJB, den Dunnen JT. Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Research. 2008;36(21):e141. doi: 10.1093/nar/gkn705. - DOI - PMC - PubMed
-
- Lee A, Hansen KD, Bullard J, Dudoit S, Sherlock G. Novel low abundance and transient RNAs in yeast revealed by tiling microarrays and ultra high-throughput sequencing are not conserved across closely related yeast species. PLoS Genetics. 2008;4(12):e1000299. doi: 10.1371/journal.pgen.1000299. - DOI - PMC - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical