Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 20;22(4):bbaa264.
doi: 10.1093/bib/bbaa264.

Negative Binomial mixed models estimated with the maximum likelihood method can be used for longitudinal RNAseq data

Affiliations

Negative Binomial mixed models estimated with the maximum likelihood method can be used for longitudinal RNAseq data

Roula Tsonaka et al. Brief Bioinform. .

Abstract

Time-course RNAseq experiments, where tissues are repeatedly collected from the same subjects, e.g. humans or animals over time or under several different experimental conditions, are becoming more popular due to the reducing sequencing costs. Such designs offer the great potential to identify genes that change over time or progress differently in time across experimental groups. Modelling of the longitudinal gene expression in such time-course RNAseq data is complicated by the serial correlations, missing values due to subject dropout or sequencing errors, long follow up with potentially non-linear progression in time and low number of subjects. Negative Binomial mixed models can address all these issues. However, such models under the maximum likelihood (ML) approach are less popular for RNAseq data due to convergence issues (see, e.g. [1]). We argue in this paper that it is the use of an inaccurate numerical integration method in combination with the typically small sample sizes which causes such mixed models to fail for a great portion of tested genes. We show that when we use the accurate adaptive Gaussian quadrature approach to approximate the integrals over the random-effects terms, we can successfully estimate the model parameters with the maximum likelihood method. Moreover, we show that the boostrap method can be used to preserve the type I error rate in small sample settings. We evaluate empirically the small sample properties of the test statistics and compare with state-of-the-art approaches. The method is applied on a longitudinal mice experiment to study the dynamics in Duchenne Muscular Dystrophy. Contact:s.tsonaka@lumc.nl Roula Tsonaka is an assistant professor at the Medical Statistics, Department of Biomedical Data Sciences, Leiden University Medical Center. Her research focuses on statistical methods for longitudinal omics data. Pietro Spitali is an assistant professor at the Department of Human Genetics, Leiden University Medical Center. His research focuses on the identification of biomarkers for neuromuscular disorders.

Keywords: Adaptive Gaussian quadrature integration; Bootstrap; Negative Binomial mixed effects model; Random effects models.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Type I error rate of LRT and Wald statistics when using the asymptotic formula image (black lines) and their corresponding bootsrap-based null distribution (blue lines) for testing formula image versus formula image or formula image. Each panel corresponds to a different size of the serial correlation captured by the random effects standard deviation formula image.
Figure 2
Figure 2
Mean proportion false discoveries of LRT and Wald statistics when using the asymptotic formula image (black) and their corresponding bootstrap-based null distribution (blue) for testing formula image versus formula image or formula image. We have also compared with the F-test statistic in limma-voom approach (green) and LRT and formula image-test statistic in edgeR (red). Each panel corresponds to proportion false discoveries (top left), proportion discoveries (top right), proportion true discoveries (bottom left) and proportion true negative discoveries (bottom right).
Figure 3
Figure 3
DMD mice experiment analysis: Histogram of the estimated dispersion parameters per gene.
Figure 4
Figure 4
DMD mice experiment analysis: Histogram of the estimated standard deviation for the random effects per gene.
Figure 5
Figure 5
DMD mice experiment analysis: Histogram of the bootstrap-based formula image-values to test the hypothesis formula image for each gene formula image.
Figure 6
Figure 6
DMD mice experiment analysis: Venn diagram for the differentially expressed genes based on the Negative Binomial mixed model (“LRT-boot”) and limma-voom (‘F-voom’).
Figure 7
Figure 7
DMD mice experiment analysis: Histogram of the estimated random effects standard deviations for the genes detected by the Negative Binomial mixed model and not by the limma-voom.
Figure 8
Figure 8
DMD mice experiment analysis: Histogram of the estimated random effects standard deviations for the genes detected by limma-voom and not by the Negative Binomial mixed model.
Figure 9
Figure 9
DMD mice experiment analysis: Fitted mean cpm profiles for four randomly selected genes with differential profiles between WT and mdx mice.
Figure 10
Figure 10
DMD mice experiment analysis: Spaghetti plots for four randomly selected genes with differential profiles between WT and mdx mice.

References

    1. Cui S, Ji T, Li J, et al. . What if we ignore the random effects when analyzing RNA-seq data in a multifactor experiment. Stat Appl Genet Mol Biol 2016; 15:87–105. - PubMed
    1. Fitzmaurice GM, Laird NM, Ware JH. Applied Longitudinal Analysis. Hoboken, NJ: Wiley-Interscience, 2004.
    1. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010; 26:139–40. - PMC - PubMed
    1. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014; 15:550. - PMC - PubMed
    1. Law CW, Chen Y, Shi W, et al. . Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 2014; 15:R29. - PMC - PubMed

Publication types