Uncertainty-aware deep learning in healthcare: A scoping review

Tyler J Loftus^{1

2}, Benjamin Shickel³, Matthew M Ruppert^{2

4}, Jeremy A Balch¹, Tezcan Ozrazgat-Baslanti^{2

4}, Patrick J Tighe⁵, Philip A Efron^{1

2}, William R Hogan⁶, Parisa Rashidi^{2

7}, Gilbert R Upchurch Jr¹, Azra Bihorac^{2

4}

Affiliations

¹ Department of Surgery, University of Florida Health, Gainesville, Florida, United States of America.
² Intelligent Critical Care Center, University of Florida, Gainesville, Florida, United States of America.
³ Department of Biomedical Engineering, University of Florida, Gainesville, Florida, United States of America.
⁴ Department of Medicine, University of Florida Health, Gainesville, Florida, United States of America.
⁵ Departments of Anesthesiology, Orthopedics, and Information Systems/Operations Management, University of Florida Health, Gainesville, Florida, United States of America.
⁶ Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, United States of America.
⁷ Departments of Biomedical Engineering, Computer and Information Science and Engineering, and Electrical and Computer Engineering, University of Florida, Gainesville, Florida, United States of America.

PMID: 36590140
PMCID: PMC9802673
DOI: 10.1371/journal.pdig.0000085

Uncertainty-aware deep learning in healthcare: A scoping review

Tyler J Loftus et al. PLOS Digit Health. 2022.

. 2022;1(8):e0000085.

doi: 10.1371/journal.pdig.0000085. Epub 2022 Aug 10.

Authors

Affiliations

¹ Department of Surgery, University of Florida Health, Gainesville, Florida, United States of America.
² Intelligent Critical Care Center, University of Florida, Gainesville, Florida, United States of America.
³ Department of Biomedical Engineering, University of Florida, Gainesville, Florida, United States of America.
⁴ Department of Medicine, University of Florida Health, Gainesville, Florida, United States of America.
⁵ Departments of Anesthesiology, Orthopedics, and Information Systems/Operations Management, University of Florida Health, Gainesville, Florida, United States of America.
⁶ Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, United States of America.
⁷ Departments of Biomedical Engineering, Computer and Information Science and Engineering, and Electrical and Computer Engineering, University of Florida, Gainesville, Florida, United States of America.

PMID: 36590140
PMCID: PMC9802673
DOI: 10.1371/journal.pdig.0000085

Abstract

Mistrust is a major barrier to implementing deep learning in healthcare settings. Entrustment could be earned by conveying model certainty, or the probability that a given model output is accurate, but the use of uncertainty estimation for deep learning entrustment is largely unexplored, and there is no consensus regarding optimal methods for quantifying uncertainty. Our purpose is to critically evaluate methods for quantifying uncertainty in deep learning for healthcare applications and propose a conceptual framework for specifying certainty of deep learning predictions. We searched Embase, MEDLINE, and PubMed databases for articles relevant to study objectives, complying with PRISMA guidelines, rated study quality using validated tools, and extracted data according to modified CHARMS criteria. Among 30 included studies, 24 described medical imaging applications. All imaging model architectures used convolutional neural networks or a variation thereof. The predominant method for quantifying uncertainty was Monte Carlo dropout, producing predictions from multiple networks for which different neurons have dropped out and measuring variance across the distribution of resulting predictions. Conformal prediction offered similar strong performance in estimating uncertainty, along with ease of interpretation and application not only to deep learning but also to other machine learning approaches. Among the six articles describing non-imaging applications, model architectures and uncertainty estimation methods were heterogeneous, but predictive performance was generally strong, and uncertainty estimation was effective in comparing modeling methods. Overall, the use of model learning curves to quantify epistemic uncertainty (attributable to model parameters) was sparse. Heterogeneity in reporting methods precluded the performance of a meta-analysis. Uncertainty estimation methods have the potential to identify rare but important misclassifications made by deep learning models and compare modeling methods, which could build patient and clinician trust in deep learning applications in healthcare. Efficient maturation of this field will require standardized guidelines for reporting performance and uncertainty metrics.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no conflicts of interest.

Figures

**Fig 1. PRISMA flow diagram for article inclusion.**

**Fig 2. A conceptual framework for optimizing certainty in deep learning predictions by quantifying and minimizing aleatoric and epistemic uncertainty.**

See this image and copyright information in PMC

References

1. Shickel B, Loftus TJ, Adhikari L, Ozrazgat-Baslanti T, Bihorac A, Rashidi P. DeepSOFA: A Continuous Acuity Score for Critically Ill Patients using Clinically Interpretable Deep Learning. Sci Rep. 2019;9(1):1879. doi: 10.1038/s41598-019-38491-0 - DOI - PMC - PubMed
1. Tiwari P, Colborn KL, Smith DE, Xing F, Ghosh D, Rosenberg MA. Assessment of a Machine Learning Model Applied to Harmonized Electronic Health Record Data for the Prediction of Incident Atrial Fibrillation. JAMA Netw Open. 2020;3(1):e1919396. doi: 10.1001/jamanetworkopen.2019.19396 - DOI - PMC - PubMed
1. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115–8. doi: 10.1038/nature21056 - DOI - PMC - PubMed
1. McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, et al. International evaluation of an AI system for breast cancer screening. Nature. 2020;577(7788):89–94. doi: 10.1038/s41586-019-1799-6 - DOI - PubMed
1. Stubbs K, Hinds PJ, Wettergreen D. Autonomy and common ground in human-robot interaction: A field study (vol 22, pg 42, 2007). Ieee Intell Syst. 2007;22(3):3–.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Uncertainty-aware deep learning in healthcare: A scoping review

Affiliations

Uncertainty-aware deep learning in healthcare: A scoping review

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources