Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 15;15(1):524.
doi: 10.1038/s41467-023-43095-4.

Dermatologist-like explainable AI enhances trust and confidence in diagnosing melanoma

Collaborators, Affiliations

Dermatologist-like explainable AI enhances trust and confidence in diagnosing melanoma

Tirtha Chanda et al. Nat Commun. .

Abstract

Artificial intelligence (AI) systems have been shown to help dermatologists diagnose melanoma more accurately, however they lack transparency, hindering user acceptance. Explainable AI (XAI) methods can help to increase transparency, yet often lack precise, domain-specific explanations. Moreover, the impact of XAI methods on dermatologists' decisions has not yet been evaluated. Building upon previous research, we introduce an XAI system that provides precise and domain-specific explanations alongside its differential diagnoses of melanomas and nevi. Through a three-phase study, we assess its impact on dermatologists' diagnostic accuracy, diagnostic confidence, and trust in the XAI-support. Our results show strong alignment between XAI and dermatologist explanations. We also show that dermatologists' confidence in their diagnoses, and their trust in the support system significantly increase with XAI compared to conventional AI. This study highlights dermatologists' willingness to adopt such XAI systems, promoting future use in the clinic.

PubMed Disclaimer

Conflict of interest statement

PT reports grants from Lilly, consulting fees from Silverchair, lecture honoraria from Lilly, FotoFinder and Novartis, outside of the present publication. TJB owns a company that develops mobile apps (Smart Health Heidelberg GmbH, Heidelberg, Germany), outside of the scope of the submitted work. WS received travel support for participation in congresses and / or (speaker) honoraria as well as research grants from medi GmbH Bayreuth, Abbvie, Almirall, Amgen, Bristol-Myers Squibb, Celgene, GSK, Janssen, LEO Pharma, Lilly, MSD, Novartis, Pfizer, Roche, Sanofi Genzyme, and UCB outside of the present publication. MLV received travel support for participation in congresses and / or (speaker) honoraria as well as research grants from Abbvie, Almirall, Amgen, Bristol-Myers Squibb, Celgene, Janssen, Kyowa Kirin, LEO Pharma, Lilly, MSD, Novartis, Pfizer, Roche, Sanofi Genzyme, and UCB outside of the present publication. BS is on the advisory board or has received honoraria from Immunocore, Almirall, Pfizer, Sanofi, Novartis, Roche, BMS and MSD, research funding from Novartis and Pierre Fabre Pharmaceuticals, and travel support from Novartis, Roche, Bristol-Myers Squibb and Pierre Fabre Pharma, outside the submitted work. SH is on the advisory board or has received honoraria from Novartis, Pierre Fabre, BMS and MSD outside the submitted work. KD has received honoraria from Novartis, Pierre Fabre and Roche outside the submitted work. SF reports consulting or advisory board membership: Bayer, Illumina, Roche; honoraria: Amgen, Eli Lilly, PharmaMar, Roche; research funding: AstraZeneca, Pfizer, PharmaMar, Roche; travel or accommodation expenses: Amgen, Eli Lilly, Illumina, PharmaMar, Roche. JSU is on the advisory board or has received honoraria and travel support from Amgen, Bristol Myers Squibb, GSK, Immunocore, LeoPharma, Merck Sharp and Dohme, Novartis, Pierre Fabre, Roche, Sanofi outside the submitted work ME has received honoraria and travel expenses from Novartis and Immunocore. SHo received travel support for participation in congresses, (speaker) honoraria and research grants from Almirall, UCB, Janssen, Novartis, LEO Pharma and Lilly outside of the present publication. SP has received travel support for participation in congresses and/or speaker honoraria from Abbvie, Lilly, MSD, Novartis, Pfizer and Sanofi outside of the present publication. SPo is on the advisory board or has received honoraria from Galenicum Derma, ISDIN, Cantabria Labs and Mesoestetic. RLB has received support from Castle Bioscience for the International Melanoma Pathology Study Group Symposium and Workshop. MG served as consultant to argenx (honoraria paid to institution) and Almirall and received honoraria for participation in advisory boards / travel support from Biotest, GSK, Janssen, Leo Pharma, Lilly, Novartis and UCB - all outside the scope of the submitted work. MVH received honoraria from MSD, BMS, Roche, Novartis, Sun Pharma, Sanofi, Almirall, Biofrontera, Galderma. The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the XAI and reader study.
a Schematic overview of our multimodal XAI. The AI system makes a prediction for each characteristic and then infers a melanoma diagnosis if it detects at least two melanoma characteristics. The diagnosis and corresponding explanations are then displayed to the clinician. b Schematic overview of our work. We first collected ground-truth annotations and corresponding ontology-based explanations for 3611 dermoscopic images from 14 international board-certified dermatologists and trained an explanatory AI on this dataset (top row). We then employed this classifier in a three-phase study (bottom row) involving 116 clinicians tasked with diagnosing dermoscopic images of melanomas and nevi. In phase 1 of the study, the clinicians received no AI assistance. In phase 2, they received the XAI’s predicted diagnoses but not its explanations. In phase 3, they received the predicted diagnoses along with the explanations. Figures created with BioRender.com.
Fig. 2
Fig. 2. Example multimodal XAI explanation.
An example multimodal explanation from our XAI used in phase 3, showing a textual explanation (a) and the corresponding localised visual explanations (b). The XAI identified this lesion as a melanoma with the characteristics stated in the textual explanation. The white polygons represent the most important regions where the XAI detected the corresponding characteristics.
Fig. 3
Fig. 3. Overview of our XAI’s performance.
a Ratio of mean Grad-CAM pixel activation value inside the lesion to that outside the lesion (P < 0.0001, two-sided Wilcoxon signed-rank test, n = 196 images). Higher values are better, as they indicate greater attention on regions within the lesion than on regions outside the lesion. Four data points for the baseline and 19 data points for the XAI have values above 300 and have been omitted to more clearly visualise the data. b We calculated the difference in output scores before and after obscuring the important pixels of the images (n = 200 images per threshold). Since we used a threshold on the Grad-CAM heatmaps, we calculated faithfulness values for each threshold ranging from 5 to 95. The stars represent the threshold used in our study and the values of faithfulness at this threshold. The transparent bands represent the 95% bootstrap confidence intervals. c Overlap in ontological explanations between clinician pairs for the same image compared to the overlap in ontological explanations between clinicians and our XAI. The whiskers are positioned close to zero and one, and the median lines are positioned close to zero, making them unnoticeable. Each value is shifted by a random number between −0.02 and 0.02 on the y-axis so that the points can be seen more clearly. The between-clinician category consists of n = 5165 clinician-pairs, whereas the clinician-XAI category comprises n = 1089 images. d Region of interest (ROI) overlap between clinicians and our XAI compared to that of the baseline (P < 0.0001, two-sided paired t test, n = 1120 images). For all boxplots, the horizontal line on each box denotes the median value and the white dot denotes the mean. The upper and lower box limits denote the 1st and 3rd quartiles, respectively, and the whiskers extend from the box to 1.5 times the interquartile range. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Impact of our XAI on clinicians’ diagnostic accuracy, confidence, and trust.
a Distributions of clinicians’ balanced accuracy in each phase of our study. (P < 0.0001, two-sided paired t test, n = 109 participants (No AI vs. AI Support)), (P = 0.34, two-sided paired t test, n = 116 participants (AI Support vs XAI Support)) b Balanced diagnostic accuracy with AI and XAI support grouped by different levels of experience with dermoscopy (n = 116 participants). Distributions of clinicians’ mean diagnostic confidence (n = 116 participants) (c) and mean trust in the support system (n = 116 participants) (d) in each phase of our study. In figures a, c, d, the grey lines between the phases connect the same participant between phases, and the black lines connecting the boxes indicate the means across all participants. For all figures, the horizontal line on each box denotes the median value and the white dot denotes the mean. The upper and lower box limits denote the 1st and 3rd quartiles, respectively, and the whiskers extend from the box to 1.5 times the interquartile range. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Relationship between clinicians’ trust in AI and overlap in ontological explanations.
ac Correlation between overlap in reasoning (measured by Sørensen-Dice similarity coefficient) and trust in XAI for cases where the clinicians’ diagnoses matched those of the XAI (P = 0.01, Spearman’s rank correlation, n = 871 images). The left column depicts the relationship between overlap in reasoning and trust in XAI for both classes (a), the middle column depicts cases where both the clinicians and the XAI diagnosed melanoma (P < 0.0001, Spearman’s rank correlation, n = 567 images) (b), and the right column represents cases where they both diagnosed nevus (P = 0.01, Spearman’s rank correlation, n = 505 images) (c). Trust is measured on a Likert scale (1–10, with 1 meaning no trust and 10 meaning complete trust in the AI). Each data point is shifted by a random number between −0.02 and 0.02 on the y-axis and −0.1 and 0.1 on the x-axis so that the points can be seen more clearly. The light-coloured triangles connected by lines represent the means (calculated on non-shifted values) of each trust value and the transparent bands represent the 95% bootstrap confidence intervals. Source data are provided as a Source Data file.

References

    1. Maron, R. C. et al. Artificial Intelligence and its effect on dermatologists’ accuracy in dermoscopic melanoma image classification: web-based survey study. J. Med. Internet Res.22, e18091 (2020). 10.2196/18091 - DOI - PMC - PubMed
    1. Tschandl, P. et al. Human–computer collaboration for skin cancer recognition. Nat. Med.26, 1229–1234 (2020). 10.1038/s41591-020-0942-0 - DOI - PubMed
    1. Yu, K.-H., Beam, A. L. & Kohane, I. S. Artificial intelligence in healthcare. Nat. Biomed. Eng.2, 719–731 (2018). 10.1038/s41551-018-0305-z - DOI - PubMed
    1. Goodman, B. & Flaxman, S. European Union Regulations on Algorithmic Decision-Making and a “Right to Explanation”. AI Mag.38, 50–57 (2017).
    1. Tonekaboni, S., Joshi, S., McCradden, M. D. & Goldenberg, A. What Clinicians Want: Contextualizing Explainable Machine Learning for Clinical End Use. In Proceedings of the 4th Machine Learning for Healthcare Conference 359–380 (PMLR, 2019).