Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug;39(8):1489-1499.
doi: 10.1111/jdv.20479. Epub 2024 Dec 8.

Effect of patient-contextual skin images in human- and artificial intelligence-based diagnosis of melanoma: Results from the 2020 SIIM-ISIC melanoma classification challenge

Affiliations

Effect of patient-contextual skin images in human- and artificial intelligence-based diagnosis of melanoma: Results from the 2020 SIIM-ISIC melanoma classification challenge

Nicholas R Kurtansky et al. J Eur Acad Dermatol Venereol. 2025 Aug.

Abstract

Background: While the high accuracy of reported AI tools for melanoma detection is promising, the lack of holistic consideration of the patient is often criticized. Along with medical history, a dermatologist would also consider intra-patient nevi patterns, such that nevi that are different from others on a given patient are treated with suspicion.

Objective: To evaluate whether patient-contextual lesion-images improves diagnostic accuracy for melanoma in a dermoscopic image-based AI competition and a human reader study.

Methods: An international online AI competition was held in 2020. The task was to classify dermoscopy images as melanoma or benign lesions. A multi-source dataset of dermoscopy images grouped by patient were provided, and additional use of public datasets was permitted. Competitors were judged on area under the receiver operating characteristic (AUROC) on a private leaderboard. Concurrently, a human reader study was hosted using a subset of the test data. Participants gave their initial diagnosis of an index case (melanoma vs. benign) and were then presented with seven additional lesion-images of that patient before giving a second prediction of the index case. Outcome measures were sensitivity and specificity.

Results: The top 50 of 3308 AI competition entries achieved AUROC scores ranging from 0.943 to 0.949. Few algorithms considered intra-patient lesion patterns and instead most evaluated images independently. The median sensitivity and specificity of human readers before receiving contextual images were 60.0% and 86.7%, and after were 60.0% and 85.7%. Human and AI algorithm performance varied by image source.

Conclusions: This study provided an open-source state-of-the-art algorithm for melanoma detection that has been evaluated at multiple centres. Patient-contextual images did not positively impact performance of AI algorithms or human readers. Providing seven contextual images and no total body image may have been insufficient to test the applicability of the intra-patient lesion patterns.

PubMed Disclaimer

Conflict of interest statement

CONFLICT OF INTEREST STATEMENT

HPS is a shareholder of MoleMap NZ Limited and e-derm consult GmbH and undertakes regular teledermatological reporting for both companies. HPS is a Medical Consultant for Canfield Scientific Inc. and a Medical Advisor for First Derm. HPS received a NHMRC Synergy Grant (2009923) and receives research funding from the Australian Cancer Research Foundation and NHMRC Centre of Research Excellence (2006551). VR is funded by the Melanoma Research Alliance, has a contract with Lutris Pharma, receives research support from Kaggle and AWS, receives consulting fees and stock options from Inhabit Brands Inc. and sits on both the AAD AUI committee and SIIM Board. AH receives consulting fees from Canfield Scientific Inc. and SciBase, participates on an advisory board for Jannsen, participates in the Organizing Committee of the International Skin Imaging Society, is Vice President of the Skin Cancer Foundation and is co-founder of SpotDoc. NRK, AH, JW and VR receive research funding through the MSKCC Cancer Center Support Grant P30 CA008748. BBS anticipates employment with Canfield Scientific, Inc. PG has received honoraria from MetaOptima and TPY. JK was awarded a grant from the Melanoma Research Alliance, received consulting fees from The Skin Diary Ltd. and SharkNinja LLC, received personal payment from Beiersdorf LTD and Leo Pharmaceuticals LTD, and received travel support from Almirall Pharmaceuticals LTD. HK received royalties from CASIO, Heine and MetaOptima; received consulting fees from FotoFinder and AI Medical Technology; received payment or honoraria from FotoFinder, Pelpharma, La Roche Posay, Eli Lilly, Novartis and MSD; holds a leadership position at the International Dermoscopy Society; and received equipment from FotoFinder, Heine, and CASIO. JP is co-founder of Athena Tech, a scientific advisor of Dermavision and chairman of the EADV Task Force of Artificial Intelligence. PT received a grant/contract from Lilly; payment/honoraria for lectures/presentations from FotoFinder, Novartis, Lilly and AbbVie; and holds a leadership position at the International Dermoscopy Society. CAP, MC, KL and CR have no COI to report.

Figures

FIGURE 1
FIGURE 1
Two still frames from the user-interface of the reader study platform. Panel (a) shows that participants were given one image to analyse and predict whether it was a melanoma (single image assessment). Panel (b) shows the addition of the seven contextual lesion-images from the same patient, at which time the participants were again asked to predict whether the index lesion was a melanoma (enhanced-contextual assessment). Each contextual image could be magnified to the size of the case in question.
FIGURE 2
FIGURE 2
Distribution of median image risk scores for the top-50 placing teams in the 2020 ISIC Challenge. The median predicted risk score (in terms of ranked percentile among the entire test set) was derived for diagnostic class-specific subgroups independently for each AI algorithm. Points represent an individual challenge submission. Blue points represent the median among benign images and red points represent the median among melanoma images. The centre horizontal line within each boxplot represents the median. The lower and upper hinges of each box represent the first and third quartiles (Q1 and Q3). The upper end of each whisker represents the more extreme value between the largest observed value and Q3 + 1.5*IQR and the lower end of each whisker represents the more extreme value between the smallest observed value and Q1 − 1.5*IQR, where IQR is the interquartile range.
FIGURE 3
FIGURE 3
Frequency of human reader shift in both specificity and sensitivity from single image assessment to enhanced contextual assessment. Size of the circles correlate to the frequency presented in the label. The bottom left represents readers who experienced more than a 5% decrease in both sensitivity and specificity. The upper right represents readers who experienced at least 5% increase in both metrics. The centre circle represents readers whose two metrics changed between the range or −5% to 5%.
FIGURE 4
FIGURE 4
Human false-positive (1-specificity) versus sensitivity rates plotted in relation to the receiver operating characteristic curve achieved by the winning algorithm across the 388 index cases of the reader study. Large, filled circles represent the median sensitivity and median specificity of human readers. Large hollow circles represent the mean sensitivity and mean specificity of human readers. Small dots represent individuals. Blue points correspond to accuracy when considering the single image and orange points correspond to accuracy after viewing the seven patient-contextual lesion-images during the enhanced-contextual assessment. Error bars extending from the mean reader estimates represent the 95% confidence intervals for sensitivity and false-negative rates. Points falling below the ROC curve are considered worse performance compared to the AI algorithm. Similarly, points falling above the ROC curve are considered higher performing compared to the AI algorithm.

References

    1. Marchetti MA, Cowen EA, Kurtansky NR, Weber J, Dauscher M, DeFazio J, et al. Prospective validation of dermoscopy-based open-source artificial intelligence for melanoma diagnosis (PROVE-AI study). NPJ Digit Med. 2023;6(1):127. - PMC - PubMed
    1. Tschandl P, Rinner C, Apalla Z, Argenziano G, Codella N, Halpern A, et al. Human-computer collaboration for skin cancer recognition. Nat Med. 2020;26(8):1229–34. - PubMed
    1. Cerminara SE, Cheng P, Kostner L, Huber S, Kunz M, Maul JT, et al. Diagnostic performance of augmented intelligence with 2D and 3D total body photography and convolutional neural networks in a high-risk population for melanoma under real-world conditions: a new era of skin cancer screening? Eur J Cancer. 2023;190:112954. - PubMed
    1. Barata C, Rotemberg V, Codella NCF, Tschandl P, Rinner C, Akay BN, et al. A reinforcement learning model for AI-based decision support in skin cancer. Nat Med. 2023;29(8):1941–6. - PMC - PubMed
    1. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115–8. - PMC - PubMed