The receiver operating characteristic curve accurately assesses imbalanced datasets
- PMID: 39005487
- PMCID: PMC11240176
- DOI: 10.1016/j.patter.2024.100994
The receiver operating characteristic curve accurately assesses imbalanced datasets
Abstract
Many problems in biology require looking for a "needle in a haystack," corresponding to a binary classification where there are a few positives within a much larger set of negatives, which is referred to as a class imbalance. The receiver operating characteristic (ROC) curve and the associated area under the curve (AUC) have been reported as ill-suited to evaluate prediction performance on imbalanced problems where there is more interest in performance on the positive minority class, while the precision-recall (PR) curve is preferable. We show via simulation and a real case study that this is a misinterpretation of the difference between the ROC and PR spaces, showing that the ROC curve is robust to class imbalance, while the PR curve is highly sensitive to class imbalance. Furthermore, we show that class imbalance cannot be easily disentangled from classifier performance measured via PR-AUC.
Keywords: ROC curve; binary classification; imbalanced data; machine learning; performance metric; precision-recall.
© 2024 The Authors.
Conflict of interest statement
The authors declare no competing interests.
Figures










Similar articles
-
Tuning model parameters in class-imbalanced learning with precision-recall curve.Biom J. 2019 May;61(3):652-664. doi: 10.1002/bimj.201800148. Epub 2018 Dec 12. Biom J. 2019. PMID: 30548291
-
A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms.BMC Med Inform Decis Mak. 2020 Jan 6;20(1):4. doi: 10.1186/s12911-019-1014-6. BMC Med Inform Decis Mak. 2020. PMID: 31906931 Free PMC article.
-
Limitations of receiver operating characteristic curve on imbalanced data: Assist device mortality risk scores.J Thorac Cardiovasc Surg. 2023 Apr;165(4):1433-1442.e2. doi: 10.1016/j.jtcvs.2021.07.041. Epub 2021 Jul 30. J Thorac Cardiovasc Surg. 2023. PMID: 34446286 Free PMC article.
-
The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets.PLoS One. 2015 Mar 4;10(3):e0118432. doi: 10.1371/journal.pone.0118432. eCollection 2015. PLoS One. 2015. PMID: 25738806 Free PMC article.
-
Class imbalance on medical image classification: towards better evaluation practices for discrimination and calibration performance.Eur Radiol. 2024 Dec;34(12):7895-7903. doi: 10.1007/s00330-024-10834-0. Epub 2024 Jun 11. Eur Radiol. 2024. PMID: 38861161 Review.
Cited by
-
Machine learning allows robust classification of lung neoplasm tissue using an electronic biopsy through minimally-invasive electrical impedance spectroscopy.Sci Rep. 2025 Mar 21;15(1):9716. doi: 10.1038/s41598-025-94826-0. Sci Rep. 2025. PMID: 40119130 Free PMC article.
-
Understanding the heterogeneous performance of variant effect predictors across human protein-coding genes.Sci Rep. 2024 Oct 30;14(1):26114. doi: 10.1038/s41598-024-76202-6. Sci Rep. 2024. PMID: 39478110 Free PMC article.
-
NetStart 2.0: prediction of eukaryotic translation initiation sites using a protein language model.BMC Bioinformatics. 2025 Aug 19;26(1):216. doi: 10.1186/s12859-025-06220-2. BMC Bioinformatics. 2025. PMID: 40830753 Free PMC article.
-
A review of machine learning applications in heart health.Biomed Eng Online. 2025 Aug 11;24(1):99. doi: 10.1186/s12938-025-01430-4. Biomed Eng Online. 2025. PMID: 40790763 Free PMC article. Review.
-
PolyLLM: polypharmacy side effect prediction via LLM-based SMILES encodings.Front Pharmacol. 2025 Jul 31;16:1617142. doi: 10.3389/fphar.2025.1617142. eCollection 2025. Front Pharmacol. 2025. PMID: 40822486 Free PMC article.
References
-
- Carter H., Chen S., Isik L., Tyekucheva S., Velculescu V.E., Kinzler K.W., Vogelstein B., Karchin R. Cancer-Specific High-Throughput Annotation of Somatic Mutations: Computational Prediction of Driver Missense Mutations. Cancer Res. 2009;69:6660–6667. doi: 10.1158/0008-5472.CAN-09-1133. - DOI - PMC - PubMed
-
- Sofaer H.R., Hoeting J.A., Jarnevich C.S. The area under the precision-recall curve as a performance metric for rare binary events. Methods Ecol. Evol. 2019;10:565–577. doi: 10.1111/2041-210X.13140. - DOI
-
- Thölke P., Mantilla-Ramos Y.-J., Abdelhedi H., Maschke C., Dehgan A., Harel Y., Kemtur A., Mekki Berrada L., Sahraoui M., Young T., et al. Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data. Neuroimage. 2023;277 doi: 10.1016/j.neuroimage.2023.120253. - DOI - PubMed
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials