Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2011 Feb 15;12 Suppl 1(Suppl 1):S7.
doi: 10.1186/1471-2105-12-S1-S7.

Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery

Affiliations
Comparative Study

Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery

Henry Han et al. BMC Bioinformatics. .

Abstract

Background: Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional and heterogeneous tumor profiles challenge current machine learning methodologies for its small number of samples and large or even huge number of variables (genes). This naturally calls for the use of an effective feature selection in microarray data classification.

Methods: We propose a novel feature selection method: multi-resolution independent component analysis (MICA) for large-scale gene expression data. This method overcomes the weak points of the widely used transform-based feature selection methods such as principal component analysis (PCA), independent component analysis (ICA), and nonnegative matrix factorization (NMF) by avoiding their global feature-selection mechanism. In addition to demonstrating the effectiveness of the multi-resolution independent component analysis in meaningful biomarker discovery, we present a multi-resolution independent component analysis based support vector machines (MICA-SVM) and linear discriminant analysis (MICA-LDA) to attain high-performance classifications in low-dimensional spaces.

Results: We have demonstrated the superiority and stability of our algorithms by performing comprehensive experimental comparisons with nine state-of-the-art algorithms on six high-dimensional heterogeneous profiles under cross validations. Our classification algorithms, especially, MICA-SVM, not only accomplish clinical or near-clinical level sensitivities and specificities, but also show strong performance stability over its peers in classification. Software that implements the major algorithm and data sets on which this paper focuses are freely available at https://sites.google.com/site/heyaumapbc2011/.

Conclusions: This work suggests a new direction to accelerate microarray technologies into a clinical routine through building a high-performance classifier to attain clinical-level sensitivities and specificities by treating an input profile as a 'profile-biomarker'. The multi-resolution data analysis based redundant global feature suppressing and effective local feature extraction also have a positive impact on large scale 'omics' data mining.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Meta-samples constructed from MICA for six original samples ('breast_1 data'). Meta-samples constructed from multi-resolution independent component analysis for six original samples (three controls and three cancers) in the breast_1 data at the three levels thresholds: τ =3,4,6 with a wavelet ‘db8’. The low-dimensional meta-samples separate two types of samples clearly in visualization.
Figure 2
Figure 2
Distributions of the classification rates of four algorithms on five profiles. Distributions of the classification rates of four algorithms: MICA-SVM, ICA-SVM, PCA-SVM, and SVM on five profiles under the 100 trials of 50% holdout cross validations
Figure 3
Figure 3
Comparisons on the five algorithm performance on the six datasets under k-fold cross validations. Comparisons on the five algorithm classification performance on the six datasets under k-fold cross validations. ‘S’ (stroma), ‘B1’ (breast_1), ‘P’ (prostate), ‘G (glioma)’‘H’ (HCC), and ‘B2’ (breast_2). The MICA-SVM algorithm demonstrated exceptional leading performance over the others
Figure 4
Figure 4
Comparisons of the distributions of algorithm classification rates. Comparisons of the distributions of classification rates of three algorithms on four profiles under the 100 trials of 50% HOCV. LDA classification rates < 50% on the glioma data are not showed in the visualization
Figure 5
Figure 5
Optimal threshold selections. Average classification rates and corresponding condition numbers at 11 level thresholds on four profiles under 100 trials of 50% HOCV.
Figure 6
Figure 6
Biomarker visualization in the stroma data. Visualization of 47 samples in the stroma data by using three biomarkers

Similar articles

Cited by

References

    1. Wang Y, Klijn J, Zhang, Atkins, Foeken J. Gene expression profiles and prognostic markers for primary breast cancer. Methods Mol Biol. 2007;377:131–138. full_text. - PubMed
    1. Zhou X, Tuc D. MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics. 2007;23(9):1106–1114. doi: 10.1093/bioinformatics/btm036. - DOI - PubMed
    1. Jolliffe I. Principal component analysis. Springer Series in Statistics, 2nd ed., Springer, New York; 2002.
    1. Hyvärinen A. Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks. 1999;10(3):626–634. - PubMed
    1. Lee D, Seung H. Learning the parts of objects by non-negative matrix factorization. Nature. 1999;401:788–791. doi: 10.1038/44565. - DOI - PubMed

Publication types

Substances

LinkOut - more resources