100% classification accuracy considered harmful: the normalized information transfer factor explains the accuracy paradox

doi:10.1371/journal.pone.0084217

. 2014 Jan 10;9(1):e84217.

doi: 10.1371/journal.pone.0084217. eCollection 2014.

100% classification accuracy considered harmful: the normalized information transfer factor explains the accuracy paradox

Francisco J Valverde-Albacete¹, Carmen Peláez-Moreno²

Affiliations

¹ Departamento de Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia, Madrid, Spain.
² Signal Theory and Communications Department, University Carlos III Madrid, Madrid, Spain.

PMID: 24427282
PMCID: PMC3888391
DOI: 10.1371/journal.pone.0084217

100% classification accuracy considered harmful: the normalized information transfer factor explains the accuracy paradox

Francisco J Valverde-Albacete et al. PLoS One. 2014.

. 2014 Jan 10;9(1):e84217.

doi: 10.1371/journal.pone.0084217. eCollection 2014.

Authors

Francisco J Valverde-Albacete¹, Carmen Peláez-Moreno²

Affiliations

¹ Departamento de Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia, Madrid, Spain.
² Signal Theory and Communications Department, University Carlos III Madrid, Madrid, Spain.

PMID: 24427282
PMCID: PMC3888391
DOI: 10.1371/journal.pone.0084217

Abstract

The most widely spread measure of performance, accuracy, suffers from a paradox: predictive models with a given level of accuracy may have greater predictive power than models with higher accuracy. Despite optimizing classification error rate, high accuracy models may fail to capture crucial information transfer in the classification task. We present evidence of this behavior by means of a combinatorial analysis where every possible contingency matrix of 2, 3 and 4 classes classifiers are depicted on the entropy triangle, a more reliable information-theoretic tool for classification assessment. Motivated by this, we develop from first principles a measure of classification performance that takes into consideration the information learned by classifiers. We are then able to obtain the entropy-modulated accuracy (EMA), a pessimistic estimate of the expected accuracy with the influence of the input distribution factored out, and the normalized information transfer factor (NIT), a measure of how efficient is the transmission of information from the input to the output set of classes. The EMA is a more natural measure of classification performance than accuracy when the heuristic to maximize is the transfer of information through the classifier instead of classification error count. The NIT factor measures the effectiveness of the learning process in classifiers and also makes it harder for them to "cheat" using techniques like specialization, while also promoting the interpretability of results. Their use is demonstrated in a mind reading task competition that aims at decoding the identity of a video stimulus based on magnetoencephalography recordings. We show how the EMA and the NIT factor reject rankings based in accuracy, choosing more meaningful and interpretable classifiers.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Heatmap of the best classifiers of the MEG mind reading competition according to accuracy (left) and the EMA and the NIT factor (right) criteria.**
Rows correspond to stimulus and columns to the decision or response. Darker hues correlate with higher joint probability . The heat map on the left reveals that the best classifier according to accuracy does not capture the fact that stimuli , and belong to a particular category whilst and belong to another. A B C

formula image — **Figure 1. Heatmap of the best classifiers of the MEG mind reading competition according to accuracy (left) and the EMA and the NIT factor (right) criteria.**
Rows correspond to stimulus and columns to the decision or response. Darker hues correlate with higher joint probability . The heat map on the left reveals that the best classifier according to accuracy does not capture the fact that stimuli , and belong to a particular category whilst and belong to another. A B C

Figure 2. (Color online) Entropy decomposition for square matrices of (A) , (B) , and (C) (decimated), representing confusion matrices for a classification task at different accuracy levels as described by the right color bar.
The interspersing of the plots representing matrices with different accuracies but similar entropies is evident at all levels for and but only for lower levels of accuracy for . This entails that accuracy is not a good criterion to judge the flow of information from the input labels to the output labels of a classifier (see text).

**Figure 3. (Color online) Entropy (above) and perplexity (below) decomposition chains for a joint distribution.**
Left, perplexity reduction in the input (learning) chain; right, perplexity increase in the output chain, related to classifier specialization. The colors refer to those of Fig. 5.(B). The ordering of the boxes is a convention to reveal the prior and posterior natures of the perplexities of class distributions.

**Figure 4. (Color online) Entropy triangle for the MEG mind Reading data ordered after accuracy (A) and a detail of the participants of higher accuracy (B).**
The ranking following accuracy is at odds with the EMA and the NIT factor ranking based in mutual information (height, right scale of triangle). The detail in (B) shows that participant , closely followed by should have been ranked first after this criterion.

**Figure 5. (Color online) Extended information diagrams of entropies related to a bivariate distribution: (A) conventional diagram, and (B) split diagram.**
The bounding rectangle is the joint entropy of two uniform (thence independent) distributions and of the same cardinality as and . The expected mutual information appears *twice* in (A) and this makes the diagram split for each variable symmetrically in (B).

**Figure 6. Schematic Entropy Triangle showing interpretable zones and extreme cases of classifiers.**
The annotations on the center of each side are meant to hold for that whole side.

See this image and copyright information in PMC

Cited by

Aberrant MEG multi-frequency phase temporal synchronization predicts conversion from mild cognitive impairment-to-Alzheimer's disease.
Pusil S, Dimitriadis SI, López ME, Pereda E, Maestú F. Pusil S, et al. Neuroimage Clin. 2019;24:101972. doi: 10.1016/j.nicl.2019.101972. Epub 2019 Aug 8. Neuroimage Clin. 2019. PMID: 31522127 Free PMC article.
Machine Learning Analysis Reveals Novel Neuroimaging and Clinical Signatures of Frailty in HIV.
Paul RH, Cho KS, Luckett P, Strain JF, Belden AC, Bolzenius JD, Navid J, Garcia-Egan PM, Cooley SA, Wisch JK, Boerwinkle AH, Tomov D, Obosi A, Mannarino JA, Ances BM. Paul RH, et al. J Acquir Immune Defic Syndr. 2020 Aug 1;84(4):414-421. doi: 10.1097/QAI.0000000000002360. J Acquir Immune Defic Syndr. 2020. PMID: 32251142 Free PMC article.
The prognostic value of gastroesophageal reflux disorder in interstitial lung disease related hospitalizations.
Alqalyoobi S, Little BB, Oldham JM, Obi ON. Alqalyoobi S, et al. Respir Res. 2023 Mar 30;24(1):97. doi: 10.1186/s12931-023-02407-4. Respir Res. 2023. PMID: 36998050 Free PMC article.
Machine learning approaches for risk prediction after percutaneous coronary intervention: a systematic review and meta-analysis.
Zaka A, Mutahar D, Gorcilov J, Gupta AK, Kovoor JG, Stretton B, Mridha N, Sivagangabalan G, Thiagalingam A, Chow CK, Zaman S, Jayasinghe R, Kovoor P, Bacchi S. Zaka A, et al. Eur Heart J Digit Health. 2024 Oct 14;6(1):23-44. doi: 10.1093/ehjdh/ztae074. eCollection 2025 Jan. Eur Heart J Digit Health. 2024. PMID: 39846069 Free PMC article.
Machine learning models for 180-day mortality prediction of patients with advanced cancer using patient-reported symptom data.
Xu C, Subbiah IM, Lu SC, Pfob A, Sidey-Gibbons C. Xu C, et al. Qual Life Res. 2023 Mar;32(3):713-727. doi: 10.1007/s11136-022-03284-y. Epub 2022 Oct 29. Qual Life Res. 2023. PMID: 36308591 Free PMC article.

See all "Cited by" articles

References

1. Sokal RR (1974) Classification: Purposes, principles, progress, prospects. Science 185: 1115–1123. - PubMed
1. Huang H, Liu CC, Zhou XJ (2010) Bayesian approach to transforming public gene expression repositories into disease diagnosis databases. Proceedings of the National Academy of Sciences of the United States of America 107: 6823–6828. - PMC - PubMed
1. West M, Blanchette C, Dressman H, Huang E, Ishida S, et al. (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences of the United States of America 98: 11462–11467. - PMC - PubMed
1. Wei X, Li KC (2010) Exploring the within- and between-class correlation distributions for tumor classification. Proceedings of the National Academy of Sciences of the United States of America 107: 6737–6742. - PMC - PubMed
1. Miller GA, Nicely PE (1955) An analysis of perceptual confusions among some English consonants. Journal of the Acoustical Society of America 27: 338–352.

Publication types

Actions

MeSH terms

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

[1] Sokal RR (1974) Classification: Purposes, principles, progress, prospects. Science 185: 1115–1123. - PubMed

[2] Sokal RR (1974) Classification: Purposes, principles, progress, prospects. Science 185: 1115–1123. - PubMed

[3] Huang H, Liu CC, Zhou XJ (2010) Bayesian approach to transforming public gene expression repositories into disease diagnosis databases. Proceedings of the National Academy of Sciences of the United States of America 107: 6823–6828. - PMC - PubMed

[4] Huang H, Liu CC, Zhou XJ (2010) Bayesian approach to transforming public gene expression repositories into disease diagnosis databases. Proceedings of the National Academy of Sciences of the United States of America 107: 6823–6828. - PMC - PubMed

[5] West M, Blanchette C, Dressman H, Huang E, Ishida S, et al. (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences of the United States of America 98: 11462–11467. - PMC - PubMed

[6] West M, Blanchette C, Dressman H, Huang E, Ishida S, et al. (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences of the United States of America 98: 11462–11467. - PMC - PubMed

[7] Wei X, Li KC (2010) Exploring the within- and between-class correlation distributions for tumor classification. Proceedings of the National Academy of Sciences of the United States of America 107: 6737–6742. - PMC - PubMed

[8] Wei X, Li KC (2010) Exploring the within- and between-class correlation distributions for tumor classification. Proceedings of the National Academy of Sciences of the United States of America 107: 6737–6742. - PMC - PubMed

[9] Miller GA, Nicely PE (1955) An analysis of perceptual confusions among some English consonants. Journal of the Acoustical Society of America 27: 338–352.

[10] Miller GA, Nicely PE (1955) An analysis of perceptual confusions among some English consonants. Journal of the Acoustical Society of America 27: 338–352.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

100% classification accuracy considered harmful: the normalized information transfer factor explains the accuracy paradox

Affiliations

100% classification accuracy considered harmful: the normalized information transfer factor explains the accuracy paradox

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials