Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Nov 12;5(11):e13803.
doi: 10.1371/journal.pone.0013803.

Multi-class clustering of cancer subtypes through SVM based ensemble of pareto-optimal solutions for gene marker identification

Affiliations

Multi-class clustering of cancer subtypes through SVM based ensemble of pareto-optimal solutions for gene marker identification

Anirban Mukhopadhyay et al. PLoS One. .

Abstract

With the advancement of microarray technology, it is now possible to study the expression profiles of thousands of genes across different experimental conditions or tissue samples simultaneously. Microarray cancer datasets, organized as samples versus genes fashion, are being used for classification of tissue samples into benign and malignant or their subtypes. They are also useful for identifying potential gene markers for each cancer subtype, which helps in successful diagnosis of particular cancer types. In this article, we have presented an unsupervised cancer classification technique based on multiobjective genetic clustering of the tissue samples. In this regard, a real-coded encoding of the cluster centers is used and cluster compactness and separation are simultaneously optimized. The resultant set of near-Pareto-optimal solutions contains a number of non-dominated solutions. A novel approach to combine the clustering information possessed by the non-dominated solutions through Support Vector Machine (SVM) classifier has been proposed. Final clustering is obtained by consensus among the clusterings yielded by different kernel functions. The performance of the proposed multiobjective clustering method has been compared with that of several other microarray clustering algorithms for three publicly available benchmark cancer datasets. Moreover, statistical significance tests have been conducted to establish the statistical superiority of the proposed clustering method. Furthermore, relevant gene markers have been identified using the clustering result produced by the proposed clustering method and demonstrated visually. Biological relationships among the gene markers are also studied based on gene ontology. The results obtained are found to be promising and can possibly have important impact in the area of unsupervised cancer classification as well as gene marker identification for multiple cancer subtypes.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. The boxplots showing the index scores produced by different algorithm over 50 consecutive runs for the SRBCT dataset.
Figure 2
Figure 2. The boxplots showing the index scores produced by different algorithm over 50 consecutive runs for the Adult malignancy dataset.
Figure 3
Figure 3. The boxplots showing the index scores produced by different algorithm over 50 consecutive runs for the Brain tumor dataset.
Figure 4
Figure 4. The heatmap of the expression levels of the most frequently selected top 10 gene markers for each tumor subtype in the SRBCT data.
Red/green represents up/down regulation relative to black. Each subgroup is in a yellow box to identify its samples and the distinguishing gene markers. The image clone IDs of the marker genes are also shown on the right side of the genes.

Similar articles

Cited by

References

    1. Golub TR, Slonim DK, Tamayo P, Huard C, Gassenbeek M, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–537. - PubMed
    1. Alizadeh AA, Eisen MB, Davis R, Ma C, Lossos I, et al. Distinct types of diffuse large b-cell lymphomas identified by gene expression profiling. Nature. 2000;403:503–511. - PubMed
    1. Yeung KY, Bumgarner RE. Multiclass classification of microarray data with repeated measurements: application to cancer. Genome Biology. 2003;4 - PMC - PubMed
    1. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. 1999. pp. 6745–6750. In: Proceedings of National Academy of Science, Cell Biology. volume 96. - PMC - PubMed
    1. Khan J, Wei1 JS, Ringnr M, Saal LH, Ladanyi M, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine. 2001;7:673–679. - PMC - PubMed

Publication types

Substances