Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan 10;9(1):e84217.
doi: 10.1371/journal.pone.0084217. eCollection 2014.

100% classification accuracy considered harmful: the normalized information transfer factor explains the accuracy paradox

Affiliations

100% classification accuracy considered harmful: the normalized information transfer factor explains the accuracy paradox

Francisco J Valverde-Albacete et al. PLoS One. .

Abstract

The most widely spread measure of performance, accuracy, suffers from a paradox: predictive models with a given level of accuracy may have greater predictive power than models with higher accuracy. Despite optimizing classification error rate, high accuracy models may fail to capture crucial information transfer in the classification task. We present evidence of this behavior by means of a combinatorial analysis where every possible contingency matrix of 2, 3 and 4 classes classifiers are depicted on the entropy triangle, a more reliable information-theoretic tool for classification assessment. Motivated by this, we develop from first principles a measure of classification performance that takes into consideration the information learned by classifiers. We are then able to obtain the entropy-modulated accuracy (EMA), a pessimistic estimate of the expected accuracy with the influence of the input distribution factored out, and the normalized information transfer factor (NIT), a measure of how efficient is the transmission of information from the input to the output set of classes. The EMA is a more natural measure of classification performance than accuracy when the heuristic to maximize is the transfer of information through the classifier instead of classification error count. The NIT factor measures the effectiveness of the learning process in classifiers and also makes it harder for them to "cheat" using techniques like specialization, while also promoting the interpretability of results. Their use is demonstrated in a mind reading task competition that aims at decoding the identity of a video stimulus based on magnetoencephalography recordings. We show how the EMA and the NIT factor reject rankings based in accuracy, choosing more meaningful and interpretable classifiers.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Heatmap of the best classifiers of the MEG mind reading competition according to accuracy (left) and the EMA and the NIT factor (right) criteria.
Rows correspond to stimulus formula image and columns to the decision formula image or response. Darker hues correlate with higher joint probability formula image. The heat map on the left reveals that the best classifier according to accuracy does not capture the fact that stimuli formula image, formula image and formula image belong to a particular category whilst formula image and formula image belong to another. Aformula image Bformula image Cformula image
Figure 2
Figure 2. (Color online) Entropy decomposition for square matrices of (A) , (B) , and (C) (decimated), representing confusion matrices for a classification task at different accuracy levels as described by the right color bar.
The interspersing of the plots representing matrices with different accuracies but similar entropies is evident at all levels for formula image and formula image but only for lower levels of accuracy for formula image. This entails that accuracy is not a good criterion to judge the flow of information from the input labels to the output labels of a classifier (see text).
Figure 3
Figure 3. (Color online) Entropy (above) and perplexity (below) decomposition chains for a joint distribution.
Left, perplexity reduction in the input (learning) chain; right, perplexity increase in the output chain, related to classifier specialization. The colors refer to those of Fig. 5.(B). The ordering of the boxes is a convention to reveal the prior and posterior natures of the perplexities of class distributions.
Figure 4
Figure 4. (Color online) Entropy triangle for the MEG mind Reading data ordered after accuracy (A) and a detail of the participants of higher accuracy (B).
The ranking following accuracy is at odds with the EMA and the NIT factor ranking based in mutual information (height, right scale of triangle). The detail in (B) shows that participant formula image, closely followed by formula image should have been ranked first after this criterion.
Figure 5
Figure 5. (Color online) Extended information diagrams of entropies related to a bivariate distribution: (A) conventional diagram, and (B) split diagram.
The bounding rectangle is the joint entropy of two uniform (thence independent) distributions formula image and formula image of the same cardinality as formula image and formula image. The expected mutual information formula image appears twice in (A) and this makes the diagram split for each variable symmetrically in (B).
Figure 6
Figure 6. Schematic Entropy Triangle showing interpretable zones and extreme cases of classifiers.
The annotations on the center of each side are meant to hold for that whole side.

Similar articles

Cited by

References

    1. Sokal RR (1974) Classification: Purposes, principles, progress, prospects. Science 185: 1115–1123. - PubMed
    1. Huang H, Liu CC, Zhou XJ (2010) Bayesian approach to transforming public gene expression repositories into disease diagnosis databases. Proceedings of the National Academy of Sciences of the United States of America 107: 6823–6828. - PMC - PubMed
    1. West M, Blanchette C, Dressman H, Huang E, Ishida S, et al. (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences of the United States of America 98: 11462–11467. - PMC - PubMed
    1. Wei X, Li KC (2010) Exploring the within- and between-class correlation distributions for tumor classification. Proceedings of the National Academy of Sciences of the United States of America 107: 6737–6742. - PMC - PubMed
    1. Miller GA, Nicely PE (1955) An analysis of perceptual confusions among some English consonants. Journal of the Acoustical Society of America 27: 338–352.

Publication types