Tumor classification ranking from microarray data
- PMID: 18831787
- PMCID: PMC2559886
- DOI: 10.1186/1471-2164-9-S2-S21
Tumor classification ranking from microarray data
Abstract
Background: Gene expression profiles based on microarray data are recognized as potential diagnostic indices of cancer. Molecular tumor classifications resulted from these data and learning algorithms have advanced our understanding of genetic changes associated with cancer etiology and development. However, classifications are not always perfect and in such cases the classification rankings (likelihoods of correct class predictions) can be useful for directing further research (e.g., by deriving inferences about predictive indicators or prioritizing future experiments). Classification ranking is a challenging problem, particularly for microarray data, where there is a huge number of possible regulated genes with no known rating function. This study investigates the possibility of making tumor classification more informative by using a method for classification ranking that requires no additional ranking analysis and maintains relatively good classification accuracy.
Results: Microarray data of 11 different types and subtypes of cancer were analyzed using MDR (Multi-Dimensional Ranker), a recently developed boosting-based ranking algorithm. The number of predictor genes in all of the resulting classification models was at most nine, a huge reduction from the more than 12 thousands genes in the majority of the expression samples. Compared to several other learning algorithms, MDR gives the greatest AUC (area under the ROC curve) for the classifications of prostate cancer, acute lymphoblastic leukemia (ALL) and four ALL subtypes: BCR-ABL, E2A-PBX1, MALL and TALL. SVM (Support Vector Machine) gives the highest AUC for the classifications of lung, lymphoma, and breast cancers, and two ALL subtypes: Hyperdiploid > 50 and TEL-AML1. MDR gives highly competitive results, producing the highest average AUC, 91.01%, and an average overall accuracy of 90.01% for cancer expression analysis.
Conclusion: Using the classification rankings from MDR is a simple technique for obtaining effective and informative tumor classifications from cancer gene expression data. Further interpretation of the results obtained from MDR is required. MDR can also be used directly as a simple feature selection mechanism to identify genes relevant to tumor classification. MDR may be applicable to many other classification problems for microarray data.
Figures
Similar articles
-
Multiclass cancer classification by support vector machines with class-wise optimized genes and probability estimates.J Theor Biol. 2009 Aug 7;259(3):533-40. doi: 10.1016/j.jtbi.2009.04.013. Epub 2009 May 3. J Theor Biol. 2009. PMID: 19406131
-
Gene selection from microarray data for cancer classification--a machine learning approach.Comput Biol Chem. 2005 Feb;29(1):37-46. doi: 10.1016/j.compbiolchem.2004.11.001. Comput Biol Chem. 2005. PMID: 15680584
-
Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE.BMC Bioinformatics. 2006 Dec 25;7:543. doi: 10.1186/1471-2105-7-543. BMC Bioinformatics. 2006. PMID: 17187691 Free PMC article.
-
A primer on gene expression and microarrays for machine learning researchers.J Biomed Inform. 2004 Aug;37(4):293-303. doi: 10.1016/j.jbi.2004.07.002. J Biomed Inform. 2004. PMID: 15465482 Review.
-
Filter versus wrapper gene selection approaches in DNA microarray domains.Artif Intell Med. 2004 Jun;31(2):91-103. doi: 10.1016/j.artmed.2004.01.007. Artif Intell Med. 2004. PMID: 15219288 Review.
Cited by
-
Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data.BMC Bioinformatics. 2012 Oct 17;13:270. doi: 10.1186/1471-2105-13-270. BMC Bioinformatics. 2012. PMID: 23075381 Free PMC article.
-
A comparison of classification methods for predicting Chronic Fatigue Syndrome based on genetic data.J Transl Med. 2009 Sep 22;7:81. doi: 10.1186/1479-5876-7-81. J Transl Med. 2009. PMID: 19772600 Free PMC article.
-
Oncogenes and pathway identification using filter-based approaches between various carcinoma types in lung.Int J Comput Biol Drug Des. 2009;2(3):236-51. doi: 10.1504/IJCBDD.2009.030115. Epub 2009 Dec 10. Int J Comput Biol Drug Des. 2009. PMID: 20090162 Free PMC article.
-
Comparison of classification algorithms with wrapper-based feature selection for predicting osteoporosis outcome based on genetic factors in a taiwanese women population.Int J Endocrinol. 2013;2013:850735. doi: 10.1155/2013/850735. Epub 2013 Jan 14. Int J Endocrinol. 2013. PMID: 23401685 Free PMC article.
-
Gene expression profiles for predicting metastasis in breast cancer: a cross-study comparison of classification methods.ScientificWorldJournal. 2012;2012:380495. doi: 10.1100/2012/380495. Epub 2012 Nov 28. ScientificWorldJournal. 2012. PMID: 23251101 Free PMC article.
References
-
- Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403:503–511. doi: 10.1038/35000501. http://dx.doi.org/10.1038/35000501 - DOI - DOI - PubMed
-
- Burges CJC, Schölkopf B. Improving the Accuracy and Speed of Support Vector Machines. In: Mozer MC, Jordan MI, Petsche T, editor. Advances in Neural Information Processing Systems. Vol. 9. The MIT Press; 1997. p. 375.http://citeseer.ist.psu.edu/burges97improving.html
-
- Gordon GJ, Jensen RV, Hsiao LL, Gullans SR, Blumenstock JE, Ramaswamy S, Richards WG, Sugarbaker DJ, Bueno R. Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma. Cancer Res. 2002;62:4963–4967. http://cancerres.aacrjournals.org/cgi/content/abstract/62/17/4963 - PubMed
-
- Li J, Liu H, Ng SK, Wong L. Discovery of significant rules for classifying cancer diagnosis data. Bioinformatics. 2003;19:ii93–102. http://bioinformatics.oxfordjournals.org/cgi/content/abstract/19/suppl_2... - PubMed
-
- Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D'Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR. Gene expression correlates of clinical prostate cancer behavior. Cancer cell. 2002;1:203–9. doi: 10.1016/S1535-6108(02)00030-2. [PMID: 12086878]. - DOI - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous