. 2025 Mar;9(3):294-306.

doi: 10.1038/s41551-023-01160-9. Epub 2023 Dec 28.

Auditing the inference processes of medical-image classifiers by leveraging generative AI and the expertise of physicians

Alex J DeGrave^{1

2}, Zhuo Ran Cai³, Joseph D Janizek^{1

2}, Roxana Daneshjou^#^{4

5}, Su-In Lee^#⁶

Affiliations

¹ Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA.
² Medical Scientist Training Program, University of Washington, Seattle, WA, USA.
³ Program for Clinical Research and Technology, Department of Dermatology, Stanford University School of Medicine, Stanford, CA, USA.
⁴ Department of Dermatology, Stanford University School of Medicine, Stanford, CA, USA. roxanad@stanford.edu.
⁵ Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA. roxanad@stanford.edu.
⁶ Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA. suinlee@cs.washington.edu.

^# Contributed equally.

PMID: 38155295
DOI: 10.1038/s41551-023-01160-9

Auditing the inference processes of medical-image classifiers by leveraging generative AI and the expertise of physicians

Alex J DeGrave et al. Nat Biomed Eng. 2025 Mar.

. 2025 Mar;9(3):294-306.

doi: 10.1038/s41551-023-01160-9. Epub 2023 Dec 28.

Authors

Alex J DeGrave^{1

2}, Zhuo Ran Cai³, Joseph D Janizek^{1

2}, Roxana Daneshjou^#^{4

5}, Su-In Lee^#⁶

Affiliations

¹ Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA.
² Medical Scientist Training Program, University of Washington, Seattle, WA, USA.
³ Program for Clinical Research and Technology, Department of Dermatology, Stanford University School of Medicine, Stanford, CA, USA.
⁴ Department of Dermatology, Stanford University School of Medicine, Stanford, CA, USA. roxanad@stanford.edu.
⁵ Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA. roxanad@stanford.edu.
⁶ Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA. suinlee@cs.washington.edu.

^# Contributed equally.

PMID: 38155295
DOI: 10.1038/s41551-023-01160-9

Abstract

The inferences of most machine-learning models powering medical artificial intelligence are difficult to interpret. Here we report a general framework for model auditing that combines insights from medical experts with a highly expressive form of explainable artificial intelligence. Specifically, we leveraged the expertise of dermatologists for the clinical task of differentiating melanomas from melanoma 'lookalikes' on the basis of dermoscopic and clinical images of the skin, and the power of generative models to render 'counterfactual' images to understand the 'reasoning' processes of five medical-image classifiers. By altering image attributes to produce analogous images that elicit a different prediction by the classifiers, and by asking physicians to identify medically meaningful features in the images, the counterfactual images revealed that the classifiers rely both on features used by human dermatologists, such as lesional pigmentation patterns, and on undesirable features, such as background skin texture and colour balance. The framework can be applied to any specialized medical domain to make the powerful inference processes of machine-learning models medically understandable.

PubMed Disclaimer

Conflict of interest statement

Competing interests: R.D. reports fees from L’Oreal, Frazier Healthcare Partners, Pfizer, DWA and VisualDx for consulting; stock options from MDAcne and Revea for advisory board; and research funding from UCB. The other authors declare no competing interests.

Cited by

Explainable AI for computational pathology identifies model limitations and tissue biomarkers.
Kaczmarzyk JR, Saltz JH, Koo PK. Kaczmarzyk JR, et al. ArXiv [Preprint]. 2024 Nov 18:arXiv:2409.03080v2. ArXiv. 2024. PMID: 39279830 Free PMC article. Preprint.
Visual interpretability of image-based classification models by generative latent space disentanglement applied to in vitro fertilization.
Rotem O, Schwartz T, Maor R, Tauber Y, Shapiro MT, Meseguer M, Gilboa D, Seidman DS, Zaritsky A. Rotem O, et al. Nat Commun. 2024 Aug 27;15(1):7390. doi: 10.1038/s41467-024-51136-9. Nat Commun. 2024. PMID: 39191720 Free PMC article.
DREAM: A framework for discovering mechanisms underlying AI prediction of protected attributes.
Gadgil SU, DeGrave AJ, Janizek JD, Xu S, Nwandu L, Fonjungo F, Lee SI, Daneshjou R. Gadgil SU, et al. medRxiv [Preprint]. 2025 Jul 21:2024.04.09.24305289. doi: 10.1101/2024.04.09.24305289. medRxiv. 2025. PMID: 40778150 Free PMC article. Preprint.
Digital twins as global learning health and disease models for preventive and personalized medicine.
Li X, Loscalzo J, Mahmud AKMF, Aly DM, Rzhetsky A, Zitnik M, Benson M. Li X, et al. Genome Med. 2025 Feb 7;17(1):11. doi: 10.1186/s13073-025-01435-7. Genome Med. 2025. PMID: 39920778 Free PMC article. Review.
Machine learning methods for histopathological image analysis: Updates in 2024.
Komura D, Ochi M, Ishikawa S. Komura D, et al. Comput Struct Biotechnol J. 2024 Dec 30;27:383-400. doi: 10.1016/j.csbj.2024.12.033. eCollection 2025. Comput Struct Biotechnol J. 2024. PMID: 39897057 Free PMC article. Review.

See all "Cited by" articles

References

1. Wu, E. et al. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat. Med. 27, 582–584 (2021). - DOI - PubMed
1. Reddy, S. Explainability and artificial intelligence in medicine. Lancet Digit. Health 4, E214–E215 (2022). - DOI - PubMed
1. Young, A. T. et al. Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models. npj Digit. Med. 4, 10 (2021). - DOI - PubMed - PMC
1. DeGrave, A. J., Janizek, J. D. & Lee, S.-I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 3, 610–619 (2021). - DOI
1. Singh, N. et al. Agreement between saliency maps and human-labeled regions of interest: applications to skin disease classification. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 3172–3181 (IEEE, 2020).

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Auditing the inference processes of medical-image classifiers by leveraging generative AI and the expertise of physicians

Affiliations

Auditing the inference processes of medical-image classifiers by leveraging generative AI and the expertise of physicians

Authors

Affiliations

Abstract

Conflict of interest statement

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Abstract

Conflict of interest statement

Similar articles

Cited by

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical