Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar;9(3):294-306.
doi: 10.1038/s41551-023-01160-9. Epub 2023 Dec 28.

Auditing the inference processes of medical-image classifiers by leveraging generative AI and the expertise of physicians

Affiliations

Auditing the inference processes of medical-image classifiers by leveraging generative AI and the expertise of physicians

Alex J DeGrave et al. Nat Biomed Eng. 2025 Mar.

Abstract

The inferences of most machine-learning models powering medical artificial intelligence are difficult to interpret. Here we report a general framework for model auditing that combines insights from medical experts with a highly expressive form of explainable artificial intelligence. Specifically, we leveraged the expertise of dermatologists for the clinical task of differentiating melanomas from melanoma 'lookalikes' on the basis of dermoscopic and clinical images of the skin, and the power of generative models to render 'counterfactual' images to understand the 'reasoning' processes of five medical-image classifiers. By altering image attributes to produce analogous images that elicit a different prediction by the classifiers, and by asking physicians to identify medically meaningful features in the images, the counterfactual images revealed that the classifiers rely both on features used by human dermatologists, such as lesional pigmentation patterns, and on undesirable features, such as background skin texture and colour balance. The framework can be applied to any specialized medical domain to make the powerful inference processes of machine-learning models medically understandable.

PubMed Disclaimer

Conflict of interest statement

Competing interests: R.D. reports fees from L’Oreal, Frazier Healthcare Partners, Pfizer, DWA and VisualDx for consulting; stock options from MDAcne and Revea for advisory board; and research funding from UCB. The other authors declare no competing interests.

Similar articles

Cited by

References

    1. Wu, E. et al. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat. Med. 27, 582–584 (2021). - DOI - PubMed
    1. Reddy, S. Explainability and artificial intelligence in medicine. Lancet Digit. Health 4, E214–E215 (2022). - DOI - PubMed
    1. Young, A. T. et al. Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models. npj Digit. Med. 4, 10 (2021). - DOI - PubMed - PMC
    1. DeGrave, A. J., Janizek, J. D. & Lee, S.-I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 3, 610–619 (2021). - DOI
    1. Singh, N. et al. Agreement between saliency maps and human-labeled regions of interest: applications to skin disease classification. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 3172–3181 (IEEE, 2020).

LinkOut - more resources