Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Dec 23;12 Suppl 5(Suppl 5):S1.
doi: 10.1186/1471-2164-12-S5-S1. Epub 2011 Dec 23.

Gene selection and classification for cancer microarray data based on machine learning and similarity measures

Affiliations

Gene selection and classification for cancer microarray data based on machine learning and similarity measures

Qingzhong Liu et al. BMC Genomics. .

Abstract

Background: Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money.

Results: To deal with redundant information and improve classification, we propose a gene selection method, Recursive Feature Addition, which combines supervised learning and statistical similarity measures. To determine the final optimal gene set for prediction and classification, we propose an algorithm, Lagging Prediction Peephole Optimization. By using six benchmark microarray gene expression data sets, we compared Recursive Feature Addition with recently developed gene selection methods: Support Vector Machine Recursive Feature Elimination, Leave-One-Out Calculation Sequential Forward Selection and several others.

Conclusions: On average, with the use of popular learning machines including Nearest Mean Scaled Classifier, Support Vector Machine, Naive Bayes Classifier and Random Forest, Recursive Feature Addition outperformed other methods. Our studies also showed that Lagging Prediction Peephole Optimization is superior to random strategy; Recursive Feature Addition with Lagging Prediction Peephole Optimization obtained better testing accuracies than the gene selection method varSelRF.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The average testing accuracies of different gene selection methods for six benchmark data sets by using the classifiers (NBC, NMSC, SVM, RF). X-axis and y-axis give the feature dimension and testing accuracy values, respectively.
Figure 2
Figure 2
Boxplots of testing accuracies of the LPPO with four gene selection methods using two different classifiers (NBC, NMSC) compared to varSelRF for six data sets. RF is the final classifier. All six data sets demonstrate that varSelRF accuracies are lower than our proposed feature selection and optimization algorithm with the same RF classifier.
Figure 3
Figure 3
A sketch description of the Lagging Prediction Peephole Optimization on Prostate data set.

References

    1. Chen Z, McGee M, Liu Q, Scheuermann RH. A distribution free summarization method for Affymetrix GeneChip Arrays. Bioinformatics. 2007;23(3):321–327. doi: 10.1093/bioinformatics/btl609. - DOI - PubMed
    1. Quackenbush J. Computational analysis of microarray data. Nat Rev Genet. 2001;2(6):418–427. doi: 10.1038/35076576. - DOI - PubMed
    1. Hand DJ, Heard NA. Finding groups in gene expression data. J Biomed Biotechnol. 2005. pp. 215–225. - PMC - PubMed
    1. Segal E, Friedman N, Kaminski N, Regev A, Koller D. From signatures to models: understanding cancer using microarrays. Nat Genet. 2005;37(Suppl):S38–45. - PubMed
    1. Torrente A, Kapushesky M, Brazma A. A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings. Bioinformatics. 2005;21(21):3993–3999. doi: 10.1093/bioinformatics/bti644. - DOI - PubMed

Publication types

LinkOut - more resources