A feature selection method for classification within functional genomics experiments based on the proportional overlapping score
- PMID: 25113817
- PMCID: PMC4141116
- DOI: 10.1186/1471-2105-15-274
A feature selection method for classification within functional genomics experiments based on the proportional overlapping score
Abstract
Background: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature's relevance to a classification task.
Results: We apply POS, along-with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance.
Conclusions: A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along-with a novel gene score are exploited to produce the selected subset of genes.
Figures









References
-
- Chen K‐H, Wang K‐J, Tsai M‐L, Wang K‐M, Adrian AM, Cheng W‐C, Yang T‐S, Teng N‐C, Tan K‐P, Chang K‐S. Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm. BMC Bioinformatics. 2014;15(1):49. doi: 10.1186/1471-2105-15-49. - DOI - PMC - PubMed
Publication types
MeSH terms
Associated data
- Actions
- Actions
- Actions
- Actions