Mass distributed clustering: a new algorithm for repeated measurements in gene expression data
- PMID: 16901101
Mass distributed clustering: a new algorithm for repeated measurements in gene expression data
Abstract
The availability of whole-genome sequence data and high-throughput techniques such as DNA microarray enable researchers to monitor the alteration of gene expression by a certain organ or tissue in a comprehensive manner. The quantity of gene expression data can be greater than 30,000 genes per one measurement, making data clustering methods for analysis essential. Biologists usually design experimental protocols so that statistical significance can be evaluated; often, they conduct experiments in triplicate to generate a mean and standard deviation. Existing clustering methods usually use these mean or median values, rather than the original data, and take significance into account by omitting data showing large standard deviations, which eliminates potentially useful information. We propose a clustering method that uses each of the triplicate data sets as a probability distribution function instead of pooling data points into a median or mean. This method permits truly unsupervised clustering of the data from DNA microarrays.
Similar articles
-
Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27. Bioinformatics. 2006. PMID: 16257984
-
A hierarchical clustering algorithm for MIMD architecture.Comput Biol Chem. 2004 Dec;28(5-6):417-9. doi: 10.1016/j.compbiolchem.2004.09.002. Comput Biol Chem. 2004. PMID: 15556483
-
Towards clustering of incomplete microarray data without the use of imputation.Bioinformatics. 2007 Jan 1;23(1):107-13. doi: 10.1093/bioinformatics/btl555. Epub 2006 Oct 31. Bioinformatics. 2007. PMID: 17077099
-
Techniques for clustering gene expression data.Comput Biol Med. 2008 Mar;38(3):283-93. doi: 10.1016/j.compbiomed.2007.11.001. Epub 2007 Dec 3. Comput Biol Med. 2008. PMID: 18061589 Review.
-
Microarray data analysis: from disarray to consolidation and consensus.Nat Rev Genet. 2006 Jan;7(1):55-65. doi: 10.1038/nrg1749. Nat Rev Genet. 2006. PMID: 16369572 Review.
Cited by
-
Importance of replication in analyzing time-series gene expression data: corticosteroid dynamics and circadian patterns in rat liver.BMC Bioinformatics. 2010 May 26;11:279. doi: 10.1186/1471-2105-11-279. BMC Bioinformatics. 2010. PMID: 20500897 Free PMC article.