Accurate molecular classification of cancer using simple rules

Xiaosheng Wang¹, Osamu Gotoh

Affiliations

PMID: 19874631
PMCID: PMC2777919
DOI: 10.1186/1755-8794-2-64

Accurate molecular classification of cancer using simple rules

Xiaosheng Wang et al. BMC Med Genomics. 2009.

. 2009 Oct 30:2:64.

doi: 10.1186/1755-8794-2-64.

Authors

Xiaosheng Wang¹, Osamu Gotoh

Affiliation

¹ Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan. david@genome.ist.i.kyoto-u.ac.jp

PMID: 19874631
PMCID: PMC2777919
DOI: 10.1186/1755-8794-2-64

Abstract

Background: One intractable problem with using microarray data analysis for cancer classification is how to reduce the extremely high-dimensionality gene feature data to remove the effects of noise. Feature selection is often used to address this problem by selecting informative genes from among thousands or tens of thousands of genes. However, most of the existing methods of microarray-based cancer classification utilize too many genes to achieve accurate classification, which often hampers the interpretability of the models. For a better understanding of the classification results, it is desirable to develop simpler rule-based models with as few marker genes as possible.

Methods: We screened a small number of informative single genes and gene pairs on the basis of their depended degrees proposed in rough sets. Applying the decision rules induced by the selected genes or gene pairs, we constructed cancer classifiers. We tested the efficacy of the classifiers by leave-one-out cross-validation (LOOCV) of training sets and classification of independent test sets.

Results: We applied our methods to five cancerous gene expression datasets: leukemia (acute lymphoblastic leukemia [ALL] vs. acute myeloid leukemia [AML]), lung cancer, prostate cancer, breast cancer, and leukemia (ALL vs. mixed-lineage leukemia [MLL] vs. AML). Accurate classification outcomes were obtained by utilizing just one or two genes. Some genes that correlated closely with the pathogenesis of relevant cancers were identified. In terms of both classification performance and algorithm simplicity, our approach outperformed or at least matched existing methods.

Conclusion: In cancerous gene expression datasets, a small number of genes, even one or two if selected correctly, is capable of achieving an ideal cancer classification effect. This finding also means that very simple rules may perform well for cancerous class prediction.

PubMed Disclaimer

References

1. Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995;270:467–470. - PubMed
1. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–537. - PubMed
1. Xing EP, Jordan MI, Karp RM. Feature selection for high-dimensional genomic microarray data. In: Brodley CE, Danyluk AP, editor. Proceedings of the Eighteenth International Conference on Machine Learning: June 28 - July 1 2001; Williams. San Francisco: Morgan Kaufmann Publishers Inc; 2001. pp. 601–608.
1. Quinlan J. Induction of decision trees. Machine Learning. 1986;1:81–106.
1. Pawlak Z. Rough sets. International Journal of Computer and Information Sciences. 1982;11:341–356.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Accurate molecular classification of cancer using simple rules

Affiliation

Accurate molecular classification of cancer using simple rules

Authors

Affiliation

Abstract

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources