Identifying genes that contribute most to good classification in microarrays
- PMID: 16959042
- PMCID: PMC1574352
- DOI: 10.1186/1471-2105-7-407
Identifying genes that contribute most to good classification in microarrays
Abstract
Background: The goal of most microarray studies is either the identification of genes that are most differentially expressed or the creation of a good classification rule. The disadvantage of the former is that it ignores the importance of gene interactions; the disadvantage of the latter is that it often does not provide a sufficient focus for further investigation because many genes may be included by chance. Our strategy is to search for classification rules that perform well with few genes and, if they are found, identify genes that occur relatively frequently under multiple random validation (random splits into training and test samples).
Results: We analyzed data from four published studies related to cancer. For classification we used a filter with a nearest centroid rule that is easy to implement and has been previously shown to perform well. To comprehensively measure classification performance we used receiver operating characteristic curves. In the three data sets with good classification performance, the classification rules for 5 genes were only slightly worse than for 20 or 50 genes and somewhat better than for 1 gene. In two of these data sets, one or two genes had relatively high frequencies not noticeable with rules involving 20 or 50 genes: desmin for classifying colon cancer versus normal tissue; and zyxin and secretory granule proteoglycan genes for classifying two types of leukemia.
Conclusion: Using multiple random validation, investigators should look for classification rules that perform well with few genes and select, for further study, genes with relatively high frequencies of occurrence in these classification rules.
Figures


Similar articles
-
Accurate molecular classification of cancer using simple rules.BMC Med Genomics. 2009 Oct 30;2:64. doi: 10.1186/1755-8794-2-64. BMC Med Genomics. 2009. PMID: 19874631 Free PMC article.
-
A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.Artif Intell Med. 2007 Oct;41(2):161-75. doi: 10.1016/j.artmed.2007.07.008. Epub 2007 Sep 11. Artif Intell Med. 2007. PMID: 17851055
-
Simultaneous gene clustering and subset selection for sample classification via MDL.Bioinformatics. 2003 Jun 12;19(9):1100-9. doi: 10.1093/bioinformatics/btg039. Bioinformatics. 2003. PMID: 12801870
-
Reliable classification of two-class cancer data using evolutionary algorithms.Biosystems. 2003 Nov;72(1-2):111-29. doi: 10.1016/s0303-2647(03)00138-2. Biosystems. 2003. PMID: 14642662
-
Effective dimension reduction methods for tumor classification using gene expression data.Bioinformatics. 2003 Mar 22;19(5):563-70. doi: 10.1093/bioinformatics/btg062. Bioinformatics. 2003. PMID: 12651713
Cited by
-
Stratification bias in low signal microarray studies.BMC Bioinformatics. 2007 Sep 2;8:326. doi: 10.1186/1471-2105-8-326. BMC Bioinformatics. 2007. PMID: 17764577 Free PMC article.
-
Improving the biomarker pipeline to develop and evaluate cancer screening tests.J Natl Cancer Inst. 2009 Aug 19;101(16):1116-9. doi: 10.1093/jnci/djp186. Epub 2009 Jul 2. J Natl Cancer Inst. 2009. PMID: 19574417 Free PMC article.
-
Early detection of the major male cancer types in blood-based liquid biopsies using a DNA methylation panel.Clin Epigenetics. 2019 Dec 2;11(1):175. doi: 10.1186/s13148-019-0779-x. Clin Epigenetics. 2019. PMID: 31791387 Free PMC article.
-
LiKidMiRs: A ddPCR-Based Panel of 4 Circulating miRNAs for Detection of Renal Cell Carcinoma.Cancers (Basel). 2022 Feb 9;14(4):858. doi: 10.3390/cancers14040858. Cancers (Basel). 2022. PMID: 35205607 Free PMC article.
-
A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model.BMC Bioinformatics. 2008 May 19;9:241. doi: 10.1186/1471-2105-9-241. BMC Bioinformatics. 2008. PMID: 18489778 Free PMC article.
References
-
- Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc. 2002;97:77–87. doi: 10.1198/016214502753479248. - DOI
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous