Comparative Study

. 2011 Feb 15;12 Suppl 1(Suppl 1):S7.

doi: 10.1186/1471-2105-12-S1-S7.

Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery

Henry Han¹, Xiao-Li Li

Affiliations

PMID: 21342590
PMCID: PMC3044315
DOI: 10.1186/1471-2105-12-S1-S7

Comparative Study

Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery

Henry Han et al. BMC Bioinformatics. 2011.

. 2011 Feb 15;12 Suppl 1(Suppl 1):S7.

doi: 10.1186/1471-2105-12-S1-S7.

Authors

Henry Han¹, Xiao-Li Li

Affiliation

¹ Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.

PMID: 21342590
PMCID: PMC3044315
DOI: 10.1186/1471-2105-12-S1-S7

Abstract

Background: Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional and heterogeneous tumor profiles challenge current machine learning methodologies for its small number of samples and large or even huge number of variables (genes). This naturally calls for the use of an effective feature selection in microarray data classification.

Methods: We propose a novel feature selection method: multi-resolution independent component analysis (MICA) for large-scale gene expression data. This method overcomes the weak points of the widely used transform-based feature selection methods such as principal component analysis (PCA), independent component analysis (ICA), and nonnegative matrix factorization (NMF) by avoiding their global feature-selection mechanism. In addition to demonstrating the effectiveness of the multi-resolution independent component analysis in meaningful biomarker discovery, we present a multi-resolution independent component analysis based support vector machines (MICA-SVM) and linear discriminant analysis (MICA-LDA) to attain high-performance classifications in low-dimensional spaces.

Results: We have demonstrated the superiority and stability of our algorithms by performing comprehensive experimental comparisons with nine state-of-the-art algorithms on six high-dimensional heterogeneous profiles under cross validations. Our classification algorithms, especially, MICA-SVM, not only accomplish clinical or near-clinical level sensitivities and specificities, but also show strong performance stability over its peers in classification. Software that implements the major algorithm and data sets on which this paper focuses are freely available at https://sites.google.com/site/heyaumapbc2011/.

Conclusions: This work suggests a new direction to accelerate microarray technologies into a clinical routine through building a high-performance classifier to attain clinical-level sensitivities and specificities by treating an input profile as a 'profile-biomarker'. The multi-resolution data analysis based redundant global feature suppressing and effective local feature extraction also have a positive impact on large scale 'omics' data mining.

PubMed Disclaimer

Figures

**Figure 1**
**Meta-samples constructed from MICA for six original samples ('*breast_1 data*')**. Meta-samples constructed from multi-resolution independent component analysis for six original samples (three controls and three cancers) in the breast_1 data at the three levels thresholds: *τ =*3,4,6 with a wavelet ‘*db8*’. The low-dimensional meta-samples separate two types of samples clearly in visualization.

**Figure 2**
**Distributions of the classification rates of four algorithms on five profiles.** Distributions of the classification rates of four algorithms: MICA-SVM, ICA-SVM, PCA-SVM, and SVM on five profiles under the 100 trials of 50% holdout cross validations

**Figure 3**
**Comparisons on the five algorithm performance on the six datasets under k-fold cross validations**. Comparisons on the five algorithm classification performance on the six datasets under k-fold cross validations. ‘S’ (*stroma*), ‘B1’ (*breast_1*), ‘P’ (*prostate*), ‘G (*glioma*)’‘H’ (HCC), and ‘B2’ (*breast_2*). The MICA-SVM algorithm demonstrated exceptional leading performance over the others

**Figure 4**
**Comparisons of the distributions of algorithm classification rates**. Comparisons of the distributions of classification rates of three algorithms on four profiles under the 100 trials of 50% HOCV. LDA classification rates < 50% on the glioma data are not showed in the visualization

**Figure 5**
**Optimal threshold selections**. Average classification rates and corresponding condition numbers at 11 level thresholds on four profiles under 100 trials of 50% HOCV.

**Figure 6**
**Biomarker visualization in the stroma data**. Visualization of 47 samples in the stroma data by using three biomarkers

See this image and copyright information in PMC

References

1. Wang Y, Klijn J, Zhang, Atkins, Foeken J. Gene expression profiles and prognostic markers for primary breast cancer. Methods Mol Biol. 2007;377:131–138. full_text. - PubMed
1. Zhou X, Tuc D. MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics. 2007;23(9):1106–1114. doi: 10.1093/bioinformatics/btm036. - DOI - PubMed
1. Jolliffe I. Principal component analysis. Springer Series in Statistics, 2nd ed., Springer, New York; 2002.
1. Hyvärinen A. Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks. 1999;10(3):626–634. - PubMed
1. Lee D, Seung H. Learning the parts of objects by non-negative matrix factorization. Nature. 1999;401:788–791. doi: 10.1038/44565. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery

Affiliation

Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Miscellaneous