Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Aug 27;10(8):e0134540.
doi: 10.1371/journal.pone.0134540. eCollection 2015.

A Linear Mixed Model Spline Framework for Analysing Time Course 'Omics' Data

Affiliations

A Linear Mixed Model Spline Framework for Analysing Time Course 'Omics' Data

Jasmin Straube et al. PLoS One. .

Abstract

Time course 'omics' experiments are becoming increasingly important to study system-wide dynamic regulation. Despite their high information content, analysis remains challenging. 'Omics' technologies capture quantitative measurements on tens of thousands of molecules. Therefore, in a time course 'omics' experiment molecules are measured for multiple subjects over multiple time points. This results in a large, high-dimensional dataset, which requires computationally efficient approaches for statistical analysis. Moreover, methods need to be able to handle missing values and various levels of noise. We present a novel, robust and powerful framework to analyze time course 'omics' data that consists of three stages: quality assessment and filtering, profile modelling, and analysis. The first step consists of removing molecules for which expression or abundance is highly variable over time. The second step models each molecular expression profile in a linear mixed model framework which takes into account subject-specific variability. The best model is selected through a serial model selection approach and results in dimension reduction of the time course data. The final step includes two types of analysis of the modelled trajectories, namely, clustering analysis to identify groups of correlated profiles over time, and differential expression analysis to identify profiles which differ over time and/or between treatment groups. Through simulation studies we demonstrate the high sensitivity and specificity of our approach for differential expression analysis. We then illustrate how our framework can bring novel insights on two time course 'omics' studies in breast cancer and kidney rejection. The methods are publicly available, implemented in the R CRAN package lmms.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Overview of the analysis framework.
The proposed framework consists of three stages: quality control and filtering; serial modelling of profiles; and analysis with clustering to identify similarities between profiles or with hypothesis testing to identify differences over time, between groups, and/or in group and time interactions.
Fig 2
Fig 2. Examples of ‘noisy’ and differentially expressed profiles.
Profiles changing over time (blue) have a mean of the standard deviations per time point (s T) smaller than the mean of the standard deviations per molecule (s M), while these means have similar values for noisy molecules (brown). In both cases the mean of the standard deviations per subject (s I) is similar to s M.
Fig 3
Fig 3. Workflow for the profile cluster analysis.
Trajectories derived from Linear Mixed Model Spline (LMMS) and Derivative Linear Mixed Model Spline (DLMMS) were compared to trajectories derived either from the mean or Smoothing Splines Mixed Effects (SME) models. Five clustering algorithms—hierarchical clustering (HC), kmeans (KM), Self-Organizing Maps (SOM), model-based (model) and Partitioning Around Medoids (PAM) were then applied on modelled trajectories using a range of two to nine clusters. The performance of each algorithm was assessed using the Dunn index. Gene Ontology (GO) term enrichment analysis was performed on each of the obtained clusters.
Fig 4
Fig 4. Clustering of filter ratios on proteomic datasets.
Scatterplots of filter ratios R T on the x-axis against R I on the y-axis for A) iTraq breast cancer dataset and B) and C) the iTraq kidney rejection dataset for group Allograft Rejection (AR) and Non-Rejection (NR) respectively. Colors indicate clusters from a 2-cluster model-based clustering, with red squares indicating molecules that cluster as ‘informative’ and will remain in the analysis and blue circles indicating ‘non-informative’ molecules that will be removed prior to analysis.
Fig 5
Fig 5. Filtering ratios of the Mus musculus data.
The filter ratios R T and R I were calculated for every molecule. Colors in A) indicate the -log10(p-values) for differential expression over time and in B) the proportion of missing values. C) is after discarding profiles with > 50% of missing values, with colors as in A).
Fig 6
Fig 6. Clustering of the iTraq breast cancer dataset.
Clustering was performed on the summarized profiles obtained from A) Linear Mixed Model Spline (LMMS), B) Derivative Linear Mixed Model Spline (DLMMS), C) mean and D) Smoothing Splines Mixed Effects (SME). The best clustering algorithm and the best number of clusters were chosen according to the Dunn index. In A), B) and D) we used hierarchical clustering and in C) Partitioning Around Medoids (PAM) clustering. The x-axis represents time (in hours) and the y-axis intensity in terms of log 2 transformed protein abundance.

References

    1. Murphy JP, Pinto DM. Temporal proteomic analysis of IGF-1R signalling in MCF-7 breast adenocarcinoma cells. Proteomics. 2010;10(9):1847–60. 10.1002/pmic.200900711 - DOI - PubMed
    1. modENCODE Consortium T, Roy S, Ernst J, Kharchenko PV, Kheradpour P, Negre N, et al. Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE. Science. 2010;330(6012):1787–97. 10.1126/science.1198374 - DOI - PMC - PubMed
    1. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, et al. Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization. Mol Biol Cell. 1998;9(12):3273–97. 10.1091/mbc.9.12.3273 - DOI - PMC - PubMed
    1. Aryee MJ, Gutiérrez-Pabello Ja, Kramnik I, Maiti T, Quackenbush J. An improved empirical bayes approach to estimating differential gene expression in microarray time-course data: BETR (Bayesian Estimation of Temporal Regulation). BMC Bioinformatics. 2009;10(1):409 10.1186/1471-2105-10-409 - DOI - PMC - PubMed
    1. Magni P, Ferrazzi F, Sacchi L, Bellazzi R. TimeClust: a clustering tool for gene expression time series. Bioinformatics. 2008;24(3):430–2. 10.1093/bioinformatics/btm605 - DOI - PubMed

Publication types

MeSH terms

Associated data