Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements
- PMID: 21995452
- PMCID: PMC3228548
- DOI: 10.1186/1471-2105-12-399
Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements
Abstract
Background: Post-genomic molecular biology has resulted in an explosion of data, providing measurements for large numbers of genes, proteins and metabolites. Time series experiments have become increasingly common, necessitating the development of novel analysis tools that capture the resulting data structure. Outlier measurements at one or more time points present a significant challenge, while potentially valuable replicate information is often ignored by existing techniques.
Results: We present a generative model-based Bayesian hierarchical clustering algorithm for microarray time series that employs Gaussian process regression to capture the structure of the data. By using a mixture model likelihood, our method permits a small proportion of the data to be modelled as outlier measurements, and adopts an empirical Bayes approach which uses replicate observations to inform a prior distribution of the noise variance. The method automatically learns the optimum number of clusters and can incorporate non-uniformly sampled time points. Using a wide variety of experimental data sets, we show that our algorithm consistently yields higher quality and more biologically meaningful clusters than current state-of-the-art methodologies. We highlight the importance of modelling outlier values by demonstrating that noisy genes can be grouped with other genes of similar biological function. We demonstrate the importance of including replicate information, which we find enables the discrimination of additional distinct expression profiles.
Conclusions: By incorporating outlier measurements and replicate values, this clustering algorithm for time series microarray data provides a step towards a better treatment of the noise inherent in measurements from high-throughput genomic technologies. Timeseries BHC is available as part of the R package 'BHC' (version 1.5), which is available for download from Bioconductor (version 2.9 and above) via http://www.bioconductor.org/packages/release/bioc/html/BHC.html?pagewanted=all.
Figures






Similar articles
-
Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters.BMC Bioinformatics. 2013 Aug 20;14:252. doi: 10.1186/1471-2105-14-252. BMC Bioinformatics. 2013. PMID: 23962281 Free PMC article.
-
Accelerating Bayesian hierarchical clustering of time series data with a randomised algorithm.PLoS One. 2013;8(4):e59795. doi: 10.1371/journal.pone.0059795. Epub 2013 Apr 2. PLoS One. 2013. PMID: 23565168 Free PMC article.
-
Bayesian mixture model based clustering of replicated microarray data.Bioinformatics. 2004 May 22;20(8):1222-32. doi: 10.1093/bioinformatics/bth068. Epub 2004 Feb 10. Bioinformatics. 2004. PMID: 14871871
-
R/BHC: fast Bayesian hierarchical clustering for microarray data.BMC Bioinformatics. 2009 Aug 6;10:242. doi: 10.1186/1471-2105-10-242. BMC Bioinformatics. 2009. PMID: 19660130 Free PMC article.
-
Bayesian infinite mixture model based clustering of gene expression profiles.Bioinformatics. 2002 Sep;18(9):1194-206. doi: 10.1093/bioinformatics/18.9.1194. Bioinformatics. 2002. PMID: 12217911
Cited by
-
GPseudoClust: deconvolution of shared pseudo-profiles at single-cell resolution.Bioinformatics. 2020 Mar 1;36(5):1484-1491. doi: 10.1093/bioinformatics/btz778. Bioinformatics. 2020. PMID: 31608923 Free PMC article.
-
GeTeSEPdb: A comprehensive database and online tool for the identification and analysis of gene profiles with temporal-specific expression patterns.Comput Struct Biotechnol J. 2024 Jun 5;23:2488-2496. doi: 10.1016/j.csbj.2024.06.003. eCollection 2024 Dec. Comput Struct Biotechnol J. 2024. PMID: 38939556 Free PMC article.
-
Semi-Supervised Non-Parametric Bayesian Modelling of Spatial Proteomics.Ann Appl Stat. 2022 Dec 1;16(4):22-aoas1603. doi: 10.1214/22-AOAS1603. eCollection 2022 Dec 1. Ann Appl Stat. 2022. PMID: 36507469 Free PMC article.
-
Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters.BMC Bioinformatics. 2013 Aug 20;14:252. doi: 10.1186/1471-2105-14-252. BMC Bioinformatics. 2013. PMID: 23962281 Free PMC article.
-
Gaussian process test for high-throughput sequencing time series: application to experimental evolution.Bioinformatics. 2015 Jun 1;31(11):1762-70. doi: 10.1093/bioinformatics/btv014. Epub 2015 Jan 21. Bioinformatics. 2015. PMID: 25614471 Free PMC article.
References
-
- Beal M, Krishnamurthy P. Proceedings of the Proceedings of the Twenty-Second Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-06) Arlington, Virginia: AUAI Press; 2006. Gene Expression Time Course Clustering with Countably Infinite Hidden Markov Models; pp. 23–30.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Molecular Biology Databases