Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 16;46(20):e119.
doi: 10.1093/nar/gky675.

Impulse model-based differential expression analysis of time course sequencing data

Affiliations

Impulse model-based differential expression analysis of time course sequencing data

David S Fischer et al. Nucleic Acids Res. .

Abstract

Temporal changes to the concentration of molecular species such as mRNA, which take place in response to various environmental cues, can often be modeled as simple continuous functions such as a single pulse (impulse) model. The simplicity of such functional representations can provide an improved performance on fundamental tasks such as noise reduction, imputation and differential expression analysis. However, temporal gene expression profiles are often studied with models that treat time as a categorical variable, neglecting the dependence between time points. Here, we present ImpulseDE2, a framework for differential expression analysis that combines the power of the impulse model as a continuous representation of temporal responses along with a noise model tailored specifically to sequencing data. We compare the simple categorical models to ImpulseDE2 and to other continuous models based on natural cubic splines and demonstrate the utility of the continuous approach for studying differential expression in time course sequencing experiments. A unique feature of ImpulseDE2 is the ability to distinguish permanently from transiently up- or down-regulated genes. Using an in vitro differentiation dataset, we demonstrate that this gene classification scheme can be used to highlight distinct transcriptional programs that are associated with different phases of the differentiation process.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The impulse model is descriptive of global transcriptome and chromatin dynamics during the cellular response to stimuli. (A) The four classes of expression trajectories that can be modeled with the impulse model. (B) Case-only analysis: shown are an impulse fit (alternative model) and a constant fit (null model) with vertically superimposed inferred negative binomial likelihood functions. The likelihood functions are scaled and shifted so that the density is zero at the time coordinate of the time point of sampling. (C) Case–control analysis: shown are a separate case and control impulse fit (alternative model) and a single impulse fit to all samples (‘combined’, null model). (DH) Heat maps of z-scores of library depth normalized mean counts per time point of differentially expressed genes selected with DESeq2. Green stars indicate clusters that can be modeled with the valley model. (D) RNA-seq of Drosophila melanogoster development (‘Drosophila (Graveley)’ dataset (17)). (E) Chromatin immunoprecipitation (ChIP) of the H3K4me1 histone mark in the erythroid lineage in hematopoesis (‘erythroid chromatin (Lara-Astiaso)’ dataset (14)). The x-coordinates are seven cell states in developmental order (‘developmental times’ one to seven) within the erythroid lineage. (F) RNA-seq of dendritic cell activation through LPS (‘LPS (Jovanovic)’ dataset (15)). (G) RNA-seq of myeloid differentiation (‘myeloid (Sykes)’ dataset (18)). (H) RNA-seq of differentiation of human embryonal stem cells to definite endoderm (‘hESC (Chu)’ dataset (19)). Heat map of two further datasets (‘estrogen (Baran-Gale)’ (16) and ‘Plasmodium (Broadbent)’ dataset (20)) are supplied in Supplementary Figure S2.
Figure 2.
Figure 2.
ImpulseDE2 performance on simulated data (ROC and FDR data). AUC: area under the curve, df: degrees of freedom, FDR: false-discovery rate, ROC: receiver-operator characteristic. (A) AUC of ROC curve for case-only differential expression analysis with a varying number of time points (5–12 time points with three replicates each). (B) FDR for case-only analysis based on random deviation of a constant trend. The strength of the random variation is quantified by the standard deviation of the normal distribution from which the deviation from the constant trend is drawn for each sample. ImpulseDE was not included in this simulation study due to its slow run time and as this panel was included to compare the discovery rates of categorical and continuous models on random trends.
Figure 3.
Figure 3.
Overlaps to annotated gene sets indicate that ImpulseDE2 identifies more relevant genes than DESeq2, edge and limma on LPS (Jovanovic) dataset. Background: all genes analyzed. ImpulseDE2: genes only called differentially expressed by ImpulseDE2 and not by the reference method at a Benjamini–Hochberg corrected P-value threshold of 0.01. DESeq2 (A and E), DESeq2splines (B and F), limma (C and G), edge (D and H): genes only called differentially expressed by the reference method and not ImpulseDE2 at a Benjamini–Hochberg corrected P-value threshold of 0.01. n: gene set size. ECDF: empirical cumulative density function, KS: log10 P-value of one sided Kolmogorov–Smirnov test for the ECDFs of the set of the given method to lie below the ECDF of the other method considered in the plot. The ECDF are based on the number of overlapping ImmuneSigDB (21) target sets with each gene in the individual gene sets: the target set was all ImmuneSigDB sets that contain any the following strings in their names: ‘‘DC’’, ‘‘DENDRITIC’’ (empty, all listed under DC), ‘‘LPS’’ or ‘‘TLR’’. (A–D) Case-only differential expression analysis. (E–H) Case–control differential expression analysis.
Figure 4.
Figure 4.
Comparison of ImpulseDE2, DESeq2, limma and edge on myeloid (Sykes) dataset. (A) Fraction of significantly differentially expressed genes as a function of the significance threshold by method. (B–E) Correlation plot of the inferred differential expression Benjamini–Hochberg (BH) corrected P-values for all genes between ImpulseDE2 and DESeq2 (B), DESeq2splines (C), edge (D) and limma (E). Orange points correspond to genes for which ImpulseDE2 disabled DESeq2 dispersion outlier handling. (F) Kernel density estimate of density of the distribution of expression means of genes called differentially expressed at a q-value threshold of 1e − 2 of ImpulseDE2, limma and edge. The mean expression distribution across all genes is shown as all genes. The UpSetR plot for this dataset is supplied in Supplementary Figure S10.
Figure 5.
Figure 5.
ImpulseDE2 can distinguish between transiently and permanently changing expression trajectories in hESC (Chu) dataset. Shown are z-scores of size-factor normalized mean expression values per time point. Selected top enrichments with GO biological process, GO molecular function and MSigDB hallmark gene sets are supplied to the right of each group.

References

    1. Anders S., Huber W., Anders S., Huber W.. Differential expression analysis for sequence count data. Genome Biol. 2010; 11:R106. - PMC - PubMed
    1. Michael I.L., Huber W., Anders S.. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014; 15:550. - PMC - PubMed
    1. Mark D., Robinson D.J., McCarthy Smyth G.K.. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2009; 26:139–140. - PMC - PubMed
    1. Matthew E., Ritchie B.P., Wu D., Hu Y., Law C.W., Shi W., Smyth G.K.. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015; 43:e47. - PMC - PubMed
    1. Storey J.D., Xiao W., Leek J.T., Tompkins R.G., Davis R.W.. Significance analysis of time course microarray experiments. Proc. Natl. Acad. Sci. U.S.A. 2005; 102:12837–12842. - PMC - PubMed

Publication types