Bagging to improve the accuracy of a clustering procedure
- PMID: 12801869
- DOI: 10.1093/bioinformatics/btg038
Bagging to improve the accuracy of a clustering procedure
Abstract
Motivation: The microarray technology is increasingly being applied in biological and medical research to address a wide range of problems such as the classification of tumors. An important statistical question associated with tumor classification is the identification of new tumor classes using gene expression profiles. Essential aspects of this clustering problem include identifying accurate partitions of the tumor samples into clusters and assessing the confidence of cluster assignments for individual samples.
Results: Two new resampling methods, inspired from bagging in prediction, are proposed to improve and assess the accuracy of a given clustering procedure. In these ensemble methods, a partitioning clustering procedure is applied to bootstrap learning sets and the resulting multiple partitions are combined by voting or the creation of a new dissimilarity matrix. As in prediction, the motivation behind bagging is to reduce variability in the partitioning results via averaging. The performances of the new and existing methods were compared using simulated data and gene expression data from two recently published cancer microarray studies. The bagged clustering procedures were in general at least as accurate and often substantially more accurate than a single application of the partitioning clustering procedure. A valuable by-product of bagged clustering are the cluster votes which can be used to assess the confidence of cluster assignments for individual observations.
Supplementary information: For supplementary information on datasets, analyses, and software, consult http://www.stat.berkeley.edu/~sandrine and http://www.bioconductor.org.
Similar articles
-
A prediction-based resampling method for estimating the number of clusters in a dataset.Genome Biol. 2002 Jun 25;3(7):RESEARCH0036. doi: 10.1186/gb-2002-3-7-research0036. Epub 2002 Jun 25. Genome Biol. 2002. PMID: 12184810 Free PMC article.
-
An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data.Bioinformatics. 2003 Nov 1;19(16):2131-40. doi: 10.1093/bioinformatics/btg296. Bioinformatics. 2003. PMID: 14594719
-
Simultaneous gene clustering and subset selection for sample classification via MDL.Bioinformatics. 2003 Jun 12;19(9):1100-9. doi: 10.1093/bioinformatics/btg039. Bioinformatics. 2003. PMID: 12801870
-
Microarray data analysis: from hypotheses to conclusions using gene expression data.Cell Oncol. 2004;26(5-6):279-90. doi: 10.1155/2004/943940. Cell Oncol. 2004. PMID: 15623938 Free PMC article. Review.
-
Entia Non Sunt Multiplicanda … Shall I look for clusters in my cognitive data?PLoS One. 2022 Jun 30;17(6):e0269584. doi: 10.1371/journal.pone.0269584. eCollection 2022. PLoS One. 2022. PMID: 35771764 Free PMC article.
Cited by
-
Band-based similarity indices for gene expression classification and clustering.Sci Rep. 2021 Nov 3;11(1):21609. doi: 10.1038/s41598-021-00678-9. Sci Rep. 2021. PMID: 34732744 Free PMC article.
-
Bagging improves reproducibility of functional parcellation of the human brain.Neuroimage. 2020 Jul 1;214:116678. doi: 10.1016/j.neuroimage.2020.116678. Epub 2020 Feb 29. Neuroimage. 2020. PMID: 32119986 Free PMC article.
-
clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets.PLoS Comput Biol. 2018 Sep 4;14(9):e1006378. doi: 10.1371/journal.pcbi.1006378. eCollection 2018 Sep. PLoS Comput Biol. 2018. PMID: 30180157 Free PMC article.
-
A Modular Cytokine Analysis Method Reveals Novel Associations With Clinical Phenotypes and Identifies Sets of Co-signaling Cytokines Across Influenza Natural Infection Cohorts and Healthy Controls.Front Immunol. 2019 Jun 18;10:1338. doi: 10.3389/fimmu.2019.01338. eCollection 2019. Front Immunol. 2019. PMID: 31275311 Free PMC article.
-
Application of wavelet-based neural network on DNA microarray data.Bioinformation. 2008;3(5):223-9. doi: 10.6026/97320630003223. Epub 2008 Dec 31. Bioinformation. 2008. PMID: 19255638 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
Research Materials