. 2021 Jul 20;22(4):bbaa264.

doi: 10.1093/bib/bbaa264.

Negative Binomial mixed models estimated with the maximum likelihood method can be used for longitudinal RNAseq data

Roula Tsonaka¹, Pietro Spitali²

Affiliations

¹ Medical Statistics section, Department of Biomedical Data Sciences, Leiden University Medical Center.
² Department of Human Genetics, Leiden University Medical Center.

PMID: 33152752
PMCID: PMC8293834
DOI: 10.1093/bib/bbaa264

Negative Binomial mixed models estimated with the maximum likelihood method can be used for longitudinal RNAseq data

Roula Tsonaka et al. Brief Bioinform. 2021.

. 2021 Jul 20;22(4):bbaa264.

doi: 10.1093/bib/bbaa264.

Authors

Roula Tsonaka¹, Pietro Spitali²

Affiliations

¹ Medical Statistics section, Department of Biomedical Data Sciences, Leiden University Medical Center.
² Department of Human Genetics, Leiden University Medical Center.

PMID: 33152752
PMCID: PMC8293834
DOI: 10.1093/bib/bbaa264

Abstract

Time-course RNAseq experiments, where tissues are repeatedly collected from the same subjects, e.g. humans or animals over time or under several different experimental conditions, are becoming more popular due to the reducing sequencing costs. Such designs offer the great potential to identify genes that change over time or progress differently in time across experimental groups. Modelling of the longitudinal gene expression in such time-course RNAseq data is complicated by the serial correlations, missing values due to subject dropout or sequencing errors, long follow up with potentially non-linear progression in time and low number of subjects. Negative Binomial mixed models can address all these issues. However, such models under the maximum likelihood (ML) approach are less popular for RNAseq data due to convergence issues (see, e.g. [1]). We argue in this paper that it is the use of an inaccurate numerical integration method in combination with the typically small sample sizes which causes such mixed models to fail for a great portion of tested genes. We show that when we use the accurate adaptive Gaussian quadrature approach to approximate the integrals over the random-effects terms, we can successfully estimate the model parameters with the maximum likelihood method. Moreover, we show that the boostrap method can be used to preserve the type I error rate in small sample settings. We evaluate empirically the small sample properties of the test statistics and compare with state-of-the-art approaches. The method is applied on a longitudinal mice experiment to study the dynamics in Duchenne Muscular Dystrophy. Contact:s.tsonaka@lumc.nl Roula Tsonaka is an assistant professor at the Medical Statistics, Department of Biomedical Data Sciences, Leiden University Medical Center. Her research focuses on statistical methods for longitudinal omics data. Pietro Spitali is an assistant professor at the Department of Human Genetics, Leiden University Medical Center. His research focuses on the identification of biomarkers for neuromuscular disorders.

Keywords: Adaptive Gaussian quadrature integration; Bootstrap; Negative Binomial mixed effects model; Random effects models.

PubMed Disclaimer

Figures

**Figure 1**
Type I error rate of LRT and Wald statistics when using the asymptotic (black lines) and their corresponding bootsrap-based null distribution (blue lines) for testing versus or . Each panel corresponds to a different size of the serial correlation captured by the random effects standard deviation .

formula image — **Figure 1**
Type I error rate of LRT and Wald statistics when using the asymptotic (black lines) and their corresponding bootsrap-based null distribution (blue lines) for testing versus or . Each panel corresponds to a different size of the serial correlation captured by the random effects standard deviation .

**Figure 2**
Mean proportion false discoveries of LRT and Wald statistics when using the asymptotic (black) and their corresponding bootstrap-based null distribution (blue) for testing versus or . We have also compared with the F-test statistic in limma-voom approach (green) and LRT and -test statistic in edgeR (red). Each panel corresponds to proportion false discoveries (top left), proportion discoveries (top right), proportion true discoveries (bottom left) and proportion true negative discoveries (bottom right).

**Figure 3**
DMD mice experiment analysis: Histogram of the estimated dispersion parameters per gene.

**Figure 4**
DMD mice experiment analysis: Histogram of the estimated standard deviation for the random effects per gene.

**Figure 5**
DMD mice experiment analysis: Histogram of the bootstrap-based -values to test the hypothesis for each gene .

**Figure 6**
DMD mice experiment analysis: Venn diagram for the differentially expressed genes based on the Negative Binomial mixed model (“LRT-boot”) and limma-voom (‘F-voom’).

**Figure 7**
DMD mice experiment analysis: Histogram of the estimated random effects standard deviations for the genes detected by the Negative Binomial mixed model and not by the limma-voom.

**Figure 8**
DMD mice experiment analysis: Histogram of the estimated random effects standard deviations for the genes detected by limma-voom and not by the Negative Binomial mixed model.

**Figure 9**
DMD mice experiment analysis: Fitted mean cpm profiles for four randomly selected genes with differential profiles between WT and mdx mice.

**Figure 10**
DMD mice experiment analysis: Spaghetti plots for four randomly selected genes with differential profiles between WT and mdx mice.

See this image and copyright information in PMC

References

1. Cui S, Ji T, Li J, et al. . What if we ignore the random effects when analyzing RNA-seq data in a multifactor experiment. Stat Appl Genet Mol Biol 2016; 15:87–105. - PubMed
1. Fitzmaurice GM, Laird NM, Ware JH. Applied Longitudinal Analysis. Hoboken, NJ: Wiley-Interscience, 2004.
1. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010; 26:139–40. - PMC - PubMed
1. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014; 15:550. - PMC - PubMed
1. Law CW, Chen Y, Shi W, et al. . Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 2014; 15:R29. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- Mouse Genome Informatics (MGI)
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Negative Binomial mixed models estimated with the maximum likelihood method can be used for longitudinal RNAseq data

Affiliations

Negative Binomial mixed models estimated with the maximum likelihood method can be used for longitudinal RNAseq data

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Molecular Biology Databases