Variance component score test for time-course gene set analysis of longitudinal RNA-seq data
- PMID: 28334305
- PMCID: PMC5862256
- DOI: 10.1093/biostatistics/kxx005
Variance component score test for time-course gene set analysis of longitudinal RNA-seq data
Abstract
As gene expression measurement technology is shifting from microarrays to sequencing, the statistical tools available for their analysis must be adapted since RNA-seq data are measured as counts. It has been proposed to model RNA-seq counts as continuous variables using nonparametric regression to account for their inherent heteroscedasticity. In this vein, we propose tcgsaseq, a principled, model-free, and efficient method for detecting longitudinal changes in RNA-seq gene sets defined a priori. The method identifies those gene sets whose expression varies over time, based on an original variance component score test accounting for both covariates and heteroscedasticity without assuming any specific parametric distribution for the (transformed) counts. We demonstrate that despite the presence of a nonparametric component, our test statistic has a simple form and limiting distribution, and both may be computed quickly. A permutation version of the test is additionally proposed for very small sample sizes. Applied to both simulated data and two real datasets, tcgsaseq is shown to exhibit very good statistical properties, with an increase in stability and power when compared to state-of-the-art methods ROAST (rotation gene set testing), edgeR, and DESeq2, which can fail to control the type I error under certain realistic settings. We have made the method available for the community in the R package tcgsaseq.
Keywords: Gene Set Analysis; Heteroscedasticity; Longitudinal data; RNA-seq data; Variance component testing.
© The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Figures





Similar articles
-
SimSeq: a nonparametric approach to simulation of RNA-sequence datasets.Bioinformatics. 2015 Jul 1;31(13):2131-40. doi: 10.1093/bioinformatics/btv124. Epub 2015 Feb 26. Bioinformatics. 2015. PMID: 25725090 Free PMC article.
-
PLNseq: a multivariate Poisson lognormal distribution for high-throughput matched RNA-sequencing read count data.Stat Med. 2015 Apr 30;34(9):1577-89. doi: 10.1002/sim.6449. Epub 2015 Jan 30. Stat Med. 2015. PMID: 25641202
-
A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.PLoS One. 2017 May 1;12(5):e0176185. doi: 10.1371/journal.pone.0176185. eCollection 2017. PLoS One. 2017. PMID: 28459823 Free PMC article.
-
Statistical detection of differentially expressed genes based on RNA-seq: from biological to phylogenetic replicates.Brief Bioinform. 2016 Mar;17(2):243-8. doi: 10.1093/bib/bbv035. Epub 2015 Jun 24. Brief Bioinform. 2016. PMID: 26108230 Review.
-
The power and promise of RNA-seq in ecology and evolution.Mol Ecol. 2016 Mar;25(6):1224-41. doi: 10.1111/mec.13526. Epub 2016 Mar 1. Mol Ecol. 2016. PMID: 26756714 Review.
Cited by
-
High-temporal resolution profiling reveals distinct immune trajectories following the first and second doses of COVID-19 mRNA vaccines.Sci Adv. 2022 Nov 11;8(45):eabp9961. doi: 10.1126/sciadv.abp9961. Epub 2022 Nov 11. Sci Adv. 2022. PMID: 36367935 Free PMC article.
-
rmRNAseq: differential expression analysis for repeated-measures RNA-seq data.Bioinformatics. 2020 Aug 15;36(16):4432-4439. doi: 10.1093/bioinformatics/btaa525. Bioinformatics. 2020. PMID: 32449749 Free PMC article.
-
dearseq: a variance component score test for RNA-seq differential analysis that effectively controls the false discovery rate.NAR Genom Bioinform. 2020 Nov 19;2(4):lqaa093. doi: 10.1093/nargab/lqaa093. eCollection 2020 Dec. NAR Genom Bioinform. 2020. PMID: 33575637 Free PMC article.
-
Temporal Dynamic Methods for Bulk RNA-Seq Time Series Data.Genes (Basel). 2021 Feb 27;12(3):352. doi: 10.3390/genes12030352. Genes (Basel). 2021. PMID: 33673721 Free PMC article. Review.
-
Airway transcriptomic profiling after bronchial thermoplasty.ERJ Open Res. 2019 Feb 18;5(1):00123-2018. doi: 10.1183/23120541.00123-2018. eCollection 2019 Feb. ERJ Open Res. 2019. PMID: 30792984 Free PMC article.
References
-
- Carroll, R. J. (1982). Adapting for heteroscedasticity in linear models. The Annals of Statistics 10, 1224–1233.
-
- Commenges, D. and Andersen, P. K. (1995). Score test of homogeneity for survival data. Lifetime Data Analysis 1, 145–156. - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials