The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation
- PMID: 33541410
- PMCID: PMC7863449
- DOI: 10.1186/s13040-021-00244-z
The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation
Abstract
Evaluating binary classifications is a pivotal task in statistics and machine learning, because it can influence decisions in multiple areas, including for example prognosis or therapies of patients in critical conditions. The scientific community has not agreed on a general-purpose statistical indicator for evaluating two-class confusion matrices (having true positives, true negatives, false positives, and false negatives) yet, even if advantages of the Matthews correlation coefficient (MCC) over accuracy and F1 score have already been shown.In this manuscript, we reaffirm that MCC is a robust metric that summarizes the classifier performance in a single value, if positive and negative cases are of equal importance. We compare MCC to other metrics which value positive and negative cases equally: balanced accuracy (BA), bookmaker informedness (BM), and markedness (MK). We explain the mathematical relationships between MCC and these indicators, then show some use cases and a bioinformatics scenario where these metrics disagree and where MCC generates a more informative response.Additionally, we describe three exceptions where BM can be more appropriate: analyzing classifications where dataset prevalence is unrepresentative, comparing classifiers on different datasets, and assessing the random guessing level of a classifier. Except in these cases, we believe that MCC is the most informative among the single metrics discussed, and suggest it as standard measure for scientists of all fields. A Matthews correlation coefficient close to +1, in fact, means having high values for all the other confusion matrix metrics. The same cannot be said for balanced accuracy, markedness, bookmaker informedness, accuracy and F1 score.
Keywords: Balanced accuracy; Binary classification; Bookmaker informedness; Confusion matrix; Machine learning; Markedness; Matthews correlation coefficient.
Conflict of interest statement
The authors declare they have no competing interests.
Figures




Similar articles
-
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.BMC Genomics. 2020 Jan 2;21(1):6. doi: 10.1186/s12864-019-6413-7. BMC Genomics. 2020. PMID: 31898477 Free PMC article.
-
A statistical comparison between Matthews correlation coefficient (MCC), prevalence threshold, and Fowlkes-Mallows index.J Biomed Inform. 2023 Aug;144:104426. doi: 10.1016/j.jbi.2023.104426. Epub 2023 Jun 21. J Biomed Inform. 2023. PMID: 37352899
-
Mind your prevalence!J Cheminform. 2024 Apr 15;16(1):43. doi: 10.1186/s13321-024-00837-w. J Cheminform. 2024. PMID: 38622648 Free PMC article.
-
How to evaluate an agent's behavior to infrequent events?-Reliable performance estimation insensitive to class distribution.Front Comput Neurosci. 2014 Apr 10;8:43. doi: 10.3389/fncom.2014.00043. eCollection 2014. Front Comput Neurosci. 2014. PMID: 24782751 Free PMC article. Review.
-
Biphasic majority voting-based comparative COVID-19 diagnosis using chest X-ray images.Expert Syst Appl. 2023 Apr 15;216:119430. doi: 10.1016/j.eswa.2022.119430. Epub 2022 Dec 21. Expert Syst Appl. 2023. PMID: 36570382 Free PMC article. Review.
Cited by
-
Benchmarking MicrobIEM - a user-friendly tool for decontamination of microbiome sequencing data.BMC Biol. 2023 Nov 23;21(1):269. doi: 10.1186/s12915-023-01737-5. BMC Biol. 2023. PMID: 37996810 Free PMC article.
-
Signature literature review reveals AHCY, DPYSL3, and NME1 as the most recurrent prognostic genes for neuroblastoma.BioData Min. 2023 Mar 4;16(1):7. doi: 10.1186/s13040-023-00325-1. BioData Min. 2023. PMID: 36870971 Free PMC article.
-
Predictive Potential of Cmax Bioequivalence in Pilot Bioavailability/Bioequivalence Studies, through the Alternative ƒ2 Similarity Factor Method.Pharmaceutics. 2023 Oct 20;15(10):2498. doi: 10.3390/pharmaceutics15102498. Pharmaceutics. 2023. PMID: 37896259 Free PMC article.
-
Implementation of IFPTML Computational Models in Drug Discovery Against Flaviviridae Family.J Chem Inf Model. 2024 Mar 25;64(6):1841-1852. doi: 10.1021/acs.jcim.3c01796. Epub 2024 Mar 11. J Chem Inf Model. 2024. PMID: 38466369 Free PMC article.
-
Predicting Satisfaction With Chat-Counseling at a 24/7 Chat Hotline for the Youth: Natural Language Processing Study.JMIR AI. 2025 Feb 18;4:e63701. doi: 10.2196/63701. JMIR AI. 2025. PMID: 39965198 Free PMC article.
References
-
- Luca O. Model Selection and Error Estimation in a Nutshell. Berlin: Springer; 2020.
-
- Naser MZ, Alavi A. Insights into performance fitness and error metrics for machine learning. 2020:1–25. arXiv preprint arXiv:2006.00887.
-
- Bekkar M, Djemaa HK, Alitouche TA. Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl. 2013;3(10):27–38.
LinkOut - more resources
Full Text Sources
Other Literature Sources