Exploring the within- and between-class correlation distributions for tumor classification
- PMID: 20339085
- PMCID: PMC2872377
- DOI: 10.1073/pnas.0910140107
Exploring the within- and between-class correlation distributions for tumor classification
Abstract
To many biomedical researchers, effective tumor classification methods such as the support vector machine often appear like a black box not only because the procedures are complex but also because the required specifications, such as the choice of a kernel function, suffer from a clear guidance either mathematically or biologically. As commonly observed, samples within the same tumor class tend to be more similar in gene expression than samples from different tumor classes. But can this well-received observation lead to a useful procedure of classification and prediction? To address this issue, we first conceived a statistical framework and derived general conditions to serve as the theoretical foundation that supported the aforementioned empirical observation. Then we constructed a classification procedure that fully utilized the information obtained by comparing the distributions of within-class correlations with between-class correlations via Kullback-Leibler divergence. We compared our approach with many machine-learning techniques by applying to 22 binary- and multiclass gene-expression datasets involving human cancers. The results showed that our method performed as efficiently as support vector machine and Naïve Bayesian and outperformed other learning methods (decision trees, linear discriminate analysis, and k-nearest neighbor). In addition, we conducted a simulation study and showed that our method would be more effective if the arriving new samples are subject to the often-encountered baseline shift or increased noise level problems. Our method can be extended for general classification problems when only the similarity scores between samples are available.
Conflict of interest statement
The authors declare no conflict of interest.
Figures


Similar articles
-
Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.BMC Bioinformatics. 2007 Feb 28;8:67. doi: 10.1186/1471-2105-8-67. BMC Bioinformatics. 2007. PMID: 17328811 Free PMC article.
-
Simple decision rules for classifying human cancers from gene expression profiles.Bioinformatics. 2005 Oct 15;21(20):3896-904. doi: 10.1093/bioinformatics/bti631. Epub 2005 Aug 16. Bioinformatics. 2005. PMID: 16105897 Free PMC article.
-
Multiclass molecular cancer classification by kernel subspace methods with effective kernel parameter selection.J Bioinform Comput Biol. 2005 Oct;3(5):1071-88. doi: 10.1142/s0219720005001491. J Bioinform Comput Biol. 2005. PMID: 16278948
-
A regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data.BMC Bioinformatics. 2006 Mar 20;7 Suppl 1(Suppl 1):S11. doi: 10.1186/1471-2105-7-S1-S11. BMC Bioinformatics. 2006. PMID: 16723004 Free PMC article.
-
A primer on learning in Bayesian networks for computational biology.PLoS Comput Biol. 2007 Aug;3(8):e129. doi: 10.1371/journal.pcbi.0030129. PLoS Comput Biol. 2007. PMID: 17784779 Free PMC article. Review. No abstract available.
Cited by
-
Comparing biological information contained in mRNA and non-coding RNAs for classification of lung cancer patients.BMC Cancer. 2019 Dec 3;19(1):1176. doi: 10.1186/s12885-019-6338-1. BMC Cancer. 2019. PMID: 31796020 Free PMC article.
-
Knowledge discovery by accuracy maximization.Proc Natl Acad Sci U S A. 2014 Apr 8;111(14):5117-22. doi: 10.1073/pnas.1220873111. Epub 2014 Mar 24. Proc Natl Acad Sci U S A. 2014. PMID: 24706821 Free PMC article.
-
Differentiating the learning styles of college students in different disciplines in a college English blended learning setting.PLoS One. 2021 May 20;16(5):e0251545. doi: 10.1371/journal.pone.0251545. eCollection 2021. PLoS One. 2021. PMID: 34014963 Free PMC article.
-
100% classification accuracy considered harmful: the normalized information transfer factor explains the accuracy paradox.PLoS One. 2014 Jan 10;9(1):e84217. doi: 10.1371/journal.pone.0084217. eCollection 2014. PLoS One. 2014. PMID: 24427282 Free PMC article.
-
Robust classification using average correlations as features (ACF).BMC Bioinformatics. 2023 Mar 20;24(1):101. doi: 10.1186/s12859-023-05224-0. BMC Bioinformatics. 2023. PMID: 36941542 Free PMC article.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Molecular Biology Databases
Research Materials