Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan 6:5:14.
doi: 10.1186/1748-7188-5-14.

ANMM4CBR: a case-based reasoning method for gene expression data classification

Affiliations

ANMM4CBR: a case-based reasoning method for gene expression data classification

Bangpeng Yao et al. Algorithms Mol Biol. .

Abstract

Background: Accurate classification of microarray data is critical for successful clinical diagnosis and treatment. The "curse of dimensionality" problem and noise in the data, however, undermines the performance of many algorithms.

Method: In order to obtain a robust classifier, a novel Additive Nonparametric Margin Maximum for Case-Based Reasoning (ANMM4CBR) method is proposed in this article. ANMM4CBR employs a case-based reasoning (CBR) method for classification. CBR is a suitable paradigm for microarray analysis, where the rules that define the domain knowledge are difficult to obtain because usually only a small number of training samples are available. Moreover, in order to select the most informative genes, we propose to perform feature selection via additively optimizing a nonparametric margin maximum criterion, which is defined based on gene pre-selection and sample clustering. Our feature selection method is very robust to noise in the data.

Results: The effectiveness of our method is demonstrated on both simulated and real data sets. We show that the ANMM4CBR method performs better than some state-of-the-art methods such as support vector machine (SVM) and k nearest neighbor (kNN), especially when the data contains a high level of noise.

Availability: The source code is attached as an additional file of this paper.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Framework of ANMM4CBR for microarray classification. ANMM4CBR contains two modules, ANMM for feature selection and CBR for classification. Both ANMM and CBR are suitable for dealing with microarray data, which usually contain noisy information and only a small number of training samples are available.
Figure 2
Figure 2
Additive optimization of the NNM criterion. flagm indicates whether ϕm has been selected. It is true if ϕm has been selected, otherwise false.
Figure 3
Figure 3
Visualization of training samples using top 3 selected features by different feature selection methods. The feature selection methods are: (a) BW, (b) ANMM without feature pre-selection and sample clustering, (c) ANMM. Results of MRMR were not listed due to space limitation. Figure 4 shows that MRMR did not perform better than BW on this data. In these figures, different marker types represent samples in different classes, and the mis-specifications are depicted with red edge. In (c) samples in different clusters are filled with different colors.
Figure 4
Figure 4
Boxplots of the accuracy on simulated data. "Values" indicate the accuracy. Each column indicates different algorithms: 1 - BW+kNN; 2 - MRMR +kNN; 3 - BW+SVM; 4 - MRMR+SVM; 5 - LogitBoost; 6 - ANMM4CBR without feature pre-selection and sample clustering; 7 - ANMM4CBR.

Similar articles

Cited by

References

    1. Heller RA, Schena M, Chai A, Shalon D, Bedilion T, Gilmore J, Woolley DE, Davis RW. Discovering and analysis of inflammatory disease-related genes using cDNA microarrays. P Natl Acad Sci USA. 1997;94:2150–2155. doi: 10.1073/pnas.94.6.2150. - DOI - PMC - PubMed
    1. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. P Natl Acad Sci USA. 1998;95:14863–14868. doi: 10.1073/pnas.95.25.14863. - DOI - PMC - PubMed
    1. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–537. doi: 10.1126/science.286.5439.531. - DOI - PubMed
    1. Berrar D, Bradbury I, Dubitzky W. Instance-based concept learning from multiclass DNA microarray data. BMC Bioinformatics. 2006;7:73. doi: 10.1186/1471-2105-7-73. - DOI - PMC - PubMed
    1. Khan J, Wei JS, Ringnér M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med. 2001;7:673–679. doi: 10.1038/89044. - DOI - PMC - PubMed

LinkOut - more resources