. 2015 Aug 27;10(8):e0134540.

doi: 10.1371/journal.pone.0134540. eCollection 2015.

A Linear Mixed Model Spline Framework for Analysing Time Course 'Omics' Data

Jasmin Straube¹, Alain-Dominique Gorse²; PROOF Centre of Excellence Team; Bevan Emma Huang³, Kim-Anh Lê Cao⁴

Affiliations

¹ QFAB Bioinformatics, Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia; The University of Queensland Diamantina Institute, Translational Research Institute, Brisbane, QLD, Australia.
² QFAB Bioinformatics, Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia.
³ CSIRO Digital Productivity Flagship, Brisbane, QLD, Australia.
⁴ The University of Queensland Diamantina Institute, Translational Research Institute, Brisbane, QLD, Australia.

PMID: 26313144
PMCID: PMC4551847
DOI: 10.1371/journal.pone.0134540

A Linear Mixed Model Spline Framework for Analysing Time Course 'Omics' Data

Jasmin Straube et al. PLoS One. 2015.

. 2015 Aug 27;10(8):e0134540.

doi: 10.1371/journal.pone.0134540. eCollection 2015.

Authors

Jasmin Straube¹, Alain-Dominique Gorse²; PROOF Centre of Excellence Team; Bevan Emma Huang³, Kim-Anh Lê Cao⁴

Affiliations

¹ QFAB Bioinformatics, Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia; The University of Queensland Diamantina Institute, Translational Research Institute, Brisbane, QLD, Australia.
² QFAB Bioinformatics, Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia.
³ CSIRO Digital Productivity Flagship, Brisbane, QLD, Australia.
⁴ The University of Queensland Diamantina Institute, Translational Research Institute, Brisbane, QLD, Australia.

PMID: 26313144
PMCID: PMC4551847
DOI: 10.1371/journal.pone.0134540

Abstract

Time course 'omics' experiments are becoming increasingly important to study system-wide dynamic regulation. Despite their high information content, analysis remains challenging. 'Omics' technologies capture quantitative measurements on tens of thousands of molecules. Therefore, in a time course 'omics' experiment molecules are measured for multiple subjects over multiple time points. This results in a large, high-dimensional dataset, which requires computationally efficient approaches for statistical analysis. Moreover, methods need to be able to handle missing values and various levels of noise. We present a novel, robust and powerful framework to analyze time course 'omics' data that consists of three stages: quality assessment and filtering, profile modelling, and analysis. The first step consists of removing molecules for which expression or abundance is highly variable over time. The second step models each molecular expression profile in a linear mixed model framework which takes into account subject-specific variability. The best model is selected through a serial model selection approach and results in dimension reduction of the time course data. The final step includes two types of analysis of the modelled trajectories, namely, clustering analysis to identify groups of correlated profiles over time, and differential expression analysis to identify profiles which differ over time and/or between treatment groups. Through simulation studies we demonstrate the high sensitivity and specificity of our approach for differential expression analysis. We then illustrate how our framework can bring novel insights on two time course 'omics' studies in breast cancer and kidney rejection. The methods are publicly available, implemented in the R CRAN package lmms.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Fig 1. Overview of the analysis framework.**
The proposed framework consists of three stages: quality control and filtering; serial modelling of profiles; and analysis with clustering to identify similarities between profiles or with hypothesis testing to identify differences over time, between groups, and/or in group and time interactions.

**Fig 2. Examples of ‘noisy’ and differentially expressed profiles.**
Profiles changing over time (blue) have a mean of the standard deviations per time point (s _T) smaller than the mean of the standard deviations per molecule (s _M), while these means have similar values for noisy molecules (brown). In both cases the mean of the standard deviations per subject (s _I) is similar to s _M.

**Fig 3. Workflow for the profile cluster analysis.**
Trajectories derived from Linear Mixed Model Spline (LMMS) and Derivative Linear Mixed Model Spline (DLMMS) were compared to trajectories derived either from the mean or Smoothing Splines Mixed Effects (SME) models. Five clustering algorithms—hierarchical clustering (HC), kmeans (KM), Self-Organizing Maps (SOM), model-based (model) and Partitioning Around Medoids (PAM) were then applied on modelled trajectories using a range of two to nine clusters. The performance of each algorithm was assessed using the Dunn index. Gene Ontology (GO) term enrichment analysis was performed on each of the obtained clusters.

**Fig 4. Clustering of filter ratios on proteomic datasets.**
Scatterplots of filter ratios R _T on the x-axis against R _I on the y-axis for A) iTraq breast cancer dataset and B) and C) the iTraq kidney rejection dataset for group Allograft Rejection (AR) and Non-Rejection (NR) respectively. Colors indicate clusters from a 2-cluster model-based clustering, with red squares indicating molecules that cluster as ‘informative’ and will remain in the analysis and blue circles indicating ‘non-informative’ molecules that will be removed prior to analysis.

**Fig 5. Filtering ratios of the *Mus musculus* data.**
The filter ratios R _T and R _I were calculated for every molecule. Colors in A) indicate the -log10(p-values) for differential expression over time and in B) the proportion of missing values. C) is after discarding profiles with > 50% of missing values, with colors as in A).

**Fig 6. Clustering of the iTraq breast cancer dataset.**
Clustering was performed on the summarized profiles obtained from A) Linear Mixed Model Spline (LMMS), B) Derivative Linear Mixed Model Spline (DLMMS), C) mean and D) Smoothing Splines Mixed Effects (SME). The best clustering algorithm and the best number of clusters were chosen according to the Dunn index. In A), B) and D) we used hierarchical clustering and in C) Partitioning Around Medoids (PAM) clustering. The x-axis represents time (in hours) and the y-axis intensity in terms of *log* ₂ transformed protein abundance.

See this image and copyright information in PMC

References

1. Murphy JP, Pinto DM. Temporal proteomic analysis of IGF-1R signalling in MCF-7 breast adenocarcinoma cells. Proteomics. 2010;10(9):1847–60. 10.1002/pmic.200900711 - DOI - PubMed
1. modENCODE Consortium T, Roy S, Ernst J, Kharchenko PV, Kheradpour P, Negre N, et al. Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE. Science. 2010;330(6012):1787–97. 10.1126/science.1198374 - DOI - PMC - PubMed
1. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, et al. Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization. Mol Biol Cell. 1998;9(12):3273–97. 10.1091/mbc.9.12.3273 - DOI - PMC - PubMed
1. Aryee MJ, Gutiérrez-Pabello Ja, Kramnik I, Maiti T, Quackenbush J. An improved empirical bayes approach to estimating differential gene expression in microarray time-course data: BETR (Bayesian Estimation of Temporal Regulation). BMC Bioinformatics. 2009;10(1):409 10.1186/1471-2105-10-409 - DOI - PMC - PubMed
1. Magni P, Ferrazzi F, Sacchi L, Bellazzi R. TimeClust: a clustering tool for gene expression time series. Bioinformatics. 2008;24(3):430–2. 10.1093/bioinformatics/btm605 - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Associated data

Actions
- Search in PubMed
- Search in GEO
Actions
- Search in PubMed
- Search in GEO

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Linear Mixed Model Spline Framework for Analysing Time Course 'Omics' Data

Affiliations

A Linear Mixed Model Spline Framework for Analysing Time Course 'Omics' Data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Associated data

LinkOut - more resources

Full Text Sources

Other Literature Sources