Comparative Study

. 2014 Apr 24;9(4):e94137.

doi: 10.1371/journal.pone.0094137. eCollection 2014.

A systematic comparison of supervised classifiers

Diego Raphael Amancio¹, Cesar Henrique Comin², Dalcimar Casanova², Gonzalo Travieso², Odemir Martinez Bruno², Francisco Aparecido Rodrigues¹, Luciano da Fontoura Costa²

Affiliations

¹ Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, São Paulo, Brazil.
² São Carlos Institute of Physics, University of São Paulo, São Carlos, São Paulo, Brazil.

PMID: 24763312
PMCID: PMC3998948
DOI: 10.1371/journal.pone.0094137

Comparative Study

A systematic comparison of supervised classifiers

Diego Raphael Amancio et al. PLoS One. 2014.

. 2014 Apr 24;9(4):e94137.

doi: 10.1371/journal.pone.0094137. eCollection 2014.

Authors

Diego Raphael Amancio¹, Cesar Henrique Comin², Dalcimar Casanova², Gonzalo Travieso², Odemir Martinez Bruno², Francisco Aparecido Rodrigues¹, Luciano da Fontoura Costa²

Affiliations

¹ Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, São Paulo, Brazil.
² São Carlos Institute of Physics, University of São Paulo, São Carlos, São Paulo, Brazil.

PMID: 24763312
PMCID: PMC3998948
DOI: 10.1371/journal.pone.0094137

Abstract

Pattern recognition has been employed in a myriad of industrial, commercial and academic applications. Many techniques have been devised to tackle such a diversity of applications. Despite the long tradition of pattern recognition research, there is no technique that yields the best classification in all scenarios. Therefore, as many techniques as possible should be considered in high accuracy applications. Typical related works either focus on the performance of a given algorithm or compare various classification methods. In many occasions, however, researchers who are not experts in the field of machine learning have to deal with practical classification tasks without an in-depth knowledge about the underlying parameters. Actually, the adequate choice of classifiers and parameters in such practical circumstances constitutes a long-standing problem and is one of the subjects of the current paper. We carried out a performance study of nine well-known classifiers implemented in the Weka framework and compared the influence of the parameter configurations on the accuracy. The default configuration of parameters in Weka was found to provide near optimal performance for most cases, not including methods such as the support vector machine (SVM). In addition, the k-nearest neighbor method frequently allowed the best accuracy. In certain conditions, it was possible to improve the quality of SVM by more than 20% with respect to their default parameter configuration.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Example of artificial dataset for 10 classes and 2 features (DB2F).**
It is possible to note that different classes have different correlations between the features. The separation between the classes are (a) , (b) and (c) .

formula image — **Figure 1. Example of artificial dataset for 10 classes and 2 features (DB2F).**
It is possible to note that different classes have different correlations between the features. The separation between the classes are (a) , (b) and (c) .

**Figure 2. Behavior of the accuracy rate as the number of features increases.**
As more attributes are taken into account, the kNN becomes significantly better than the other pattern recognition techniques.

**Figure 3. One dimensional analysis performed with the parameter**
**of the kNN classifier.** Panel (a) illustrates the default value of the parameter () with a red vertical dashed line. The accuracy rate associated with default values of parameters is denoted by and the best accuracy rate observed in the neighborhood of the default value of is represented as . The difference between these two quantities is represented by . Panel (b) shows how the accuracy rates vary with the variation of in DB2F (each line represent the behavior of a particular dataset in DB2F). Finally, panel (c) displays the distribution of in DB2F.

**Figure 4. Example of the random parameters analysis.**
We use one of the artificial datasets and the kNN classifier. (a) By randomly drawing 1,000 different parameter combinations of kNN we construct a histogram of accuracy rates. The red dashed line indicates the performance achieved with default parameters. (b) The accuracy rate for the default parameters are subtracted from the values obtained for the random drawing. The normalized area of the histogram for values that are above zero indicates how easy is to improve the performance with a random tuning of parameters.

**Figure 5. Distribution of the difference of accuracy rates observed between the random and default configuration of parameters.**
(a) kNN; (b) C4.5; (c) Multilayer Perceptron; (d) Logistic; (e) Random Forest; (f) Simple CART; (g) SVM. Note that, in the case of kNN and SVM classifiers, most of the random configurations yield better results than the default case.

See this image and copyright information in PMC

References

1. Mayer-Schonberger V, Cukier K (2013) Big Data: a revolution that will transform how we live, work, and think. Eamon Dolan/Houghton Mifflin Harcourt.
1. Sathi A (2013) Big Data analytics: disruptive technologies for changing the game. Mc Press.
1. Pers TH, Albrechtsen A, Holst C, Sorensen TIA, Gerds TA (2009) The validation and assessment of machine learning: a game of prediction from high-dimensional data. PLoS ONE 4 (8) e6287. - PMC - PubMed
1. Marquand AF, Filippone M, Ashburner J, Girolami M, Mourao-Miranda J, et al. (2013) Automated, high accuracy classification of parkinsonian disorders: a pattern recognition approach. PLoS ONE 8 (7) e69237. - PMC - PubMed
1. Montavon G, Rupp M, Gobre V, Vazquez-Mayagoitia A, Hansen K, Tkatchenko A, Mller K-R, Lilienfeld OA (2013) Machine learning of molecular electronic properties in chemical compound space. New Journal of Physics 15: 095003.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A systematic comparison of supervised classifiers

Affiliations

A systematic comparison of supervised classifiers

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources