Multi-class clustering of cancer subtypes through SVM based ensemble of pareto-optimal solutions for gene marker identification
- PMID: 21103052
- PMCID: PMC2980474
- DOI: 10.1371/journal.pone.0013803
Multi-class clustering of cancer subtypes through SVM based ensemble of pareto-optimal solutions for gene marker identification
Abstract
With the advancement of microarray technology, it is now possible to study the expression profiles of thousands of genes across different experimental conditions or tissue samples simultaneously. Microarray cancer datasets, organized as samples versus genes fashion, are being used for classification of tissue samples into benign and malignant or their subtypes. They are also useful for identifying potential gene markers for each cancer subtype, which helps in successful diagnosis of particular cancer types. In this article, we have presented an unsupervised cancer classification technique based on multiobjective genetic clustering of the tissue samples. In this regard, a real-coded encoding of the cluster centers is used and cluster compactness and separation are simultaneously optimized. The resultant set of near-Pareto-optimal solutions contains a number of non-dominated solutions. A novel approach to combine the clustering information possessed by the non-dominated solutions through Support Vector Machine (SVM) classifier has been proposed. Final clustering is obtained by consensus among the clusterings yielded by different kernel functions. The performance of the proposed multiobjective clustering method has been compared with that of several other microarray clustering algorithms for three publicly available benchmark cancer datasets. Moreover, statistical significance tests have been conducted to establish the statistical superiority of the proposed clustering method. Furthermore, relevant gene markers have been identified using the clustering result produced by the proposed clustering method and demonstrated visually. Biological relationships among the gene markers are also studied based on gene ontology. The results obtained are found to be promising and can possibly have important impact in the area of unsupervised cancer classification as well as gene marker identification for multiple cancer subtypes.
Conflict of interest statement
Figures




Similar articles
-
Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes.BMC Bioinformatics. 2009 Jan 20;10:27. doi: 10.1186/1471-2105-10-27. BMC Bioinformatics. 2009. PMID: 19154590 Free PMC article.
-
Multiobjective Simulated Annealing-Based Clustering of Tissue Samples for Cancer Diagnosis.IEEE J Biomed Health Inform. 2016 Mar;20(2):691-8. doi: 10.1109/JBHI.2015.2404971. Epub 2015 Feb 20. IEEE J Biomed Health Inform. 2016. PMID: 25706936
-
Gene expression data analysis using multiobjective clustering improved with SVM based ensemble.In Silico Biol. 2011-2012;11(1-2):19-27. doi: 10.3233/ISB-2012-0441. In Silico Biol. 2011. PMID: 22475749
-
Comparing algorithms for clustering of expression data: how to assess gene clusters.Methods Mol Biol. 2009;541:479-509. doi: 10.1007/978-1-59745-243-4_21. Methods Mol Biol. 2009. PMID: 19381534 Review.
-
Dissecting cancer heterogeneity--an unsupervised classification approach.Int J Biochem Cell Biol. 2013 Nov;45(11):2574-9. doi: 10.1016/j.biocel.2013.08.014. Epub 2013 Sep 1. Int J Biochem Cell Biol. 2013. PMID: 24004832 Review.
Cited by
-
Continuity of transcriptomes among colorectal cancer subtypes based on meta-analysis.Genome Biol. 2018 Sep 25;19(1):142. doi: 10.1186/s13059-018-1511-4. Genome Biol. 2018. PMID: 30253799 Free PMC article.
-
Tumor classification and biomarker discovery based on the 5'isomiR expression level.BMC Cancer. 2019 Feb 7;19(1):127. doi: 10.1186/s12885-019-5340-y. BMC Cancer. 2019. PMID: 30732570 Free PMC article.
-
Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning.IEEE J Transl Eng Health Med. 2014 Dec 2;2:4300211. doi: 10.1109/JTEHM.2014.2375820. eCollection 2014. IEEE J Transl Eng Health Med. 2014. PMID: 27170887 Free PMC article.
-
A novel biclustering approach to association rule mining for predicting HIV-1-human protein interactions.PLoS One. 2012;7(4):e32289. doi: 10.1371/journal.pone.0032289. Epub 2012 Apr 23. PLoS One. 2012. PMID: 22539940 Free PMC article.
-
Contribution of bioinformatics prediction in microRNA-based cancer therapeutics.Adv Drug Deliv Rev. 2015 Jan;81:94-103. doi: 10.1016/j.addr.2014.10.030. Epub 2014 Nov 6. Adv Drug Deliv Rev. 2015. PMID: 25450261 Free PMC article. Review.
References
-
- Golub TR, Slonim DK, Tamayo P, Huard C, Gassenbeek M, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–537. - PubMed
-
- Alizadeh AA, Eisen MB, Davis R, Ma C, Lossos I, et al. Distinct types of diffuse large b-cell lymphomas identified by gene expression profiling. Nature. 2000;403:503–511. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources