Attribute clustering for grouping, selection, and classification of gene expression data
- PMID: 17044174
- DOI: 10.1109/TCBB.2005.17
Attribute clustering for grouping, selection, and classification of gene expression data
Erratum in
- IEEE/ACM Trans Comput Biol Bioinform. 2007 Jan-Mar;4(1):157
Abstract
This paper presents an attribute clustering method which is able to group genes based on their interdependence so as to mine meaningful patterns from the gene expression data. It can be used for gene grouping, selection, and classification. The partitioning of a relational table into attribute subgroups allows a small number of attributes within or across the groups to be selected for analysis. By clustering attributes, the search dimension of a data mining algorithm is reduced. The reduction of search dimension is especially important to data mining in gene expression data because such data typically consist of a huge number of genes (attributes) and a small number of gene expression profiles (tuples). Most data mining algorithms are typically developed and optimized to scale to the number of tuples instead of the number of attributes. The situation becomes even worse when the number of attributes overwhelms the number of tuples, in which case, the likelihood of reporting patterns that are actually irrelevant due to chances becomes rather high. It is for the aforementioned reasons that gene grouping and selection are important preprocessing steps for many data mining algorithms to be effective when applied to gene expression data. This paper defines the problem of attribute clustering and introduces a methodology to solving it. Our proposed method groups interdependent attributes into clusters by optimizing a criterion function derived from an information measure that reflects the interdependence between attributes. By applying our algorithm to gene expression data, meaningful clusters of genes are discovered. The grouping of genes based on attribute interdependence within group helps to capture different aspects of gene association patterns in each group. Significant genes selected from each group then contain useful information for gene expression classification and identification. To evaluate the performance of the proposed approach, we applied it to two well-known gene expression data sets and compared our results with those obtained by other methods. Our experiments show that the proposed method is able to find the meaningful clusters of genes. By selecting a subset of genes which have high multiple-interdependence with others within clusters, significant classification information can be obtained. Thus, a small pool of selected genes can be used to build classifiers with very high classification rate. From the pool, gene expressions of different categories can be identified.
Similar articles
-
Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.BMC Bioinformatics. 2006 Aug 31;7:397. doi: 10.1186/1471-2105-7-397. BMC Bioinformatics. 2006. PMID: 16945146 Free PMC article.
-
Clustering of change patterns using Fourier coefficients.Bioinformatics. 2008 Jan 15;24(2):184-91. doi: 10.1093/bioinformatics/btm568. Epub 2007 Nov 19. Bioinformatics. 2008. PMID: 18025003
-
An iterative data mining approach for mining overlapping coexpression patterns in noisy gene expression data.IEEE Trans Nanobioscience. 2009 Sep;8(3):252-8. doi: 10.1109/TNB.2009.2026747. Epub 2009 Jul 14. IEEE Trans Nanobioscience. 2009. PMID: 19605326
-
Recent patents on biclustering algorithms for gene expression data analysis.Recent Pat DNA Gene Seq. 2011 Aug;5(2):117-25. doi: 10.2174/187221511796392097. Recent Pat DNA Gene Seq. 2011. PMID: 21529337 Review.
-
Exploring expression data: identification and analysis of coexpressed genes.Genome Res. 1999 Nov;9(11):1106-15. doi: 10.1101/gr.9.11.1106. Genome Res. 1999. PMID: 10568750 Free PMC article. Review.
Cited by
-
Sensing the squeeze: nuclear mechanotransduction in health and disease.Nucleus. 2024 Dec;15(1):2374854. doi: 10.1080/19491034.2024.2374854. Epub 2024 Jul 1. Nucleus. 2024. PMID: 38951951 Free PMC article. Review.
-
Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring.EURASIP J Bioinform Syst Biol. 2012 Jul 13;2012(1):8. doi: 10.1186/1687-4153-2012-8. EURASIP J Bioinform Syst Biol. 2012. PMID: 22793672 Free PMC article.
-
Review of feature selection approaches based on grouping of features.PeerJ. 2023 Jul 17;11:e15666. doi: 10.7717/peerj.15666. eCollection 2023. PeerJ. 2023. PMID: 37483989 Free PMC article. Review.
-
A classification framework applied to cancer gene expression profiles.J Healthc Eng. 2013;4(2):255-83. doi: 10.1260/2040-2295.4.2.255. J Healthc Eng. 2013. PMID: 23778014 Free PMC article.
-
Gene selection for cancer classification with the help of bees.BMC Med Genomics. 2016 Aug 10;9 Suppl 2(Suppl 2):47. doi: 10.1186/s12920-016-0204-7. BMC Med Genomics. 2016. PMID: 27510562 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources