Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Jul 22:9:319.
doi: 10.1186/1471-2105-9-319.

A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification

Affiliations
Comparative Study

A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification

Alexander Statnikov et al. BMC Bioinformatics. .

Abstract

Background: Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain.

Results: In the present paper we identify methodological biases of prior work comparing random forests and support vector machines and conduct a new rigorous evaluation of the two algorithms that corrects these limitations. Our experiments use 22 diagnostic and prognostic datasets and show that support vector machines outperform random forests, often by a large margin. Our data also underlines the importance of sound research design in benchmarking and comparison of bioinformatics algorithms.

Conclusion: We found that both on average and in the majority of microarray datasets, random forests are outperformed by support vector machines both in the settings when no gene selection is performed and when several popular gene selection methods are used.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Classification performance of SVMs and RFs without gene selection. The performance is estimated using area under ROC curve (AUC) for binary classification tasks and relative classifier information (RCI) for multicategory tasks.
Figure 2
Figure 2
Classification performance of SVMs and RFs with gene selection. The performance is estimated using area under ROC curve (AUC) for binary classification tasks and relative classifier information (RCI) for multicategory tasks.

References

    1. Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics. 2005;21:631–643. - PubMed
    1. Breiman L. Random forests. Machine Learning. 2001;45:5–32.
    1. Wu B, Abbott T, Fishman D, McMurray W, Mor G, Stone K, Ward D, Williams K, Zhao H. Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics. 2003;19:1636–1643. - PubMed
    1. Lee JW, Lee JB, Park M, Song SH. An extensive comparison of recent classification tools applied to microarray data. Computational Statistics & Data Analysis. 2005;48:869–885.
    1. Diaz-Uriarte R, Alvarez de Andres S. Gene selection and classification of microarray data using random forest. BMC Bioinformatics. 2006;7:3. - PMC - PubMed

Publication types

Substances