Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct 1;18(4):589-604.
doi: 10.1093/biostatistics/kxx005.

Variance component score test for time-course gene set analysis of longitudinal RNA-seq data

Affiliations

Variance component score test for time-course gene set analysis of longitudinal RNA-seq data

Denis Agniel et al. Biostatistics. .

Abstract

As gene expression measurement technology is shifting from microarrays to sequencing, the statistical tools available for their analysis must be adapted since RNA-seq data are measured as counts. It has been proposed to model RNA-seq counts as continuous variables using nonparametric regression to account for their inherent heteroscedasticity. In this vein, we propose tcgsaseq, a principled, model-free, and efficient method for detecting longitudinal changes in RNA-seq gene sets defined a priori. The method identifies those gene sets whose expression varies over time, based on an original variance component score test accounting for both covariates and heteroscedasticity without assuming any specific parametric distribution for the (transformed) counts. We demonstrate that despite the presence of a nonparametric component, our test statistic has a simple form and limiting distribution, and both may be computed quickly. A permutation version of the test is additionally proposed for very small sample sizes. Applied to both simulated data and two real datasets, tcgsaseq is shown to exhibit very good statistical properties, with an increase in stability and power when compared to state-of-the-art methods ROAST (rotation gene set testing), edgeR, and DESeq2, which can fail to control the type I error under certain realistic settings. We have made the method available for the community in the R package tcgsaseq.

Keywords: Gene Set Analysis; Heteroscedasticity; Longitudinal data; RNA-seq data; Variance component testing.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Power evaluation in synthetic data according to how heteroscedasticity is accounted for, based on 1000 simulations.
Fig. 2.
Fig. 2.
Power evaluation in synthetic data comparing tcgsaseq, ROAST, edgeR-ROAST, and DESeq2-min test, based on 1000 simulations.
Fig. 3.
Fig. 3.
Power evaluation in negative binomial data comparing tcgsaseq, ROAST, edgeR-ROAST, and DESeq2-min test, based on 1000 simulations.
Fig. 4.
Fig. 4.
Power evaluation in realistically simulated data with a small sample size, based on 500 simulations.
Fig. 5.
Fig. 5.
p-values from testing the 9 kidney oriented gene sets investigated.

Similar articles

Cited by

References

    1. Ackermann, M. and Strimmer, K. (2009). A general modular framework for gene set enrichment analysis. BMC Bioinformatics 10, 47. - PMC - PubMed
    1. Anders, S. and Huber, W. (2010). Differential expression analysis for sequence count data. Genome Biology 11, R106. - PMC - PubMed
    1. Baduel, P., Arnold, B., Weisman, C. M., Hunter, B. and Bomblies, K.. (2016). Habitat-associated life history and stress-tolerance variation in Arabidopsis arenosa. Plant Physiology 171, 437–451. - PMC - PubMed
    1. Carroll, R. J. (1982). Adapting for heteroscedasticity in linear models. The Annals of Statistics 10, 1224–1233.
    1. Commenges, D. and Andersen, P. K. (1995). Score test of homogeneity for survival data. Lifetime Data Analysis 1, 145–156. - PubMed

MeSH terms