Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 12;6(1):127.
doi: 10.1038/s41746-023-00872-1.

Prospective validation of dermoscopy-based open-source artificial intelligence for melanoma diagnosis (PROVE-AI study)

Affiliations

Prospective validation of dermoscopy-based open-source artificial intelligence for melanoma diagnosis (PROVE-AI study)

Michael A Marchetti et al. NPJ Digit Med. .

Abstract

The use of artificial intelligence (AI) has the potential to improve the assessment of lesions suspicious of melanoma, but few clinical studies have been conducted. We validated the accuracy of an open-source, non-commercial AI algorithm for melanoma diagnosis and assessed its potential impact on dermatologist decision-making. We conducted a prospective, observational clinical study to assess the diagnostic accuracy of the AI algorithm (ADAE) in predicting melanoma from dermoscopy skin lesion images. The primary aim was to assess the reliability of ADAE's sensitivity at a predefined threshold of 95%. Patients who had consented for a skin biopsy to exclude melanoma were eligible. Dermatologists also estimated the probability of melanoma and indicated management choices before and after real-time exposure to ADAE scores. All lesions underwent biopsy. Four hundred thirty-five participants were enrolled and contributed 603 lesions (95 melanomas). Participants had a mean age of 59 years, 54% were female, and 96% were White individuals. At the predetermined 95% sensitivity threshold, ADAE had a sensitivity of 96.8% (95% CI: 91.1-98.9%) and specificity of 37.4% (95% CI: 33.3-41.7%). The dermatologists' ability to assess melanoma risk significantly improved after ADAE exposure (AUC 0.7798 vs. 0.8161, p = 0.042). Post-ADAE dermatologist decisions also had equivalent or higher net benefit compared to biopsying all lesions. We validated the accuracy of an open-source melanoma AI algorithm and showed its theoretical potential for improving dermatology experts' ability to evaluate lesions suspicious of melanoma. Larger randomized trials are needed to fully evaluate the potential of adopting this AI algorithm into clinical workflows.

PubMed Disclaimer

Conflict of interest statement

V.R. is an expert advisor for Inhabit Brands. A.H. consults for Canfield Scientific Inc. and Janssen Research and Development, has ownership/equity interests in HCW Health LLC, SKIP Derm LLC, and has a fiduciary role/position and intellectual rights in SKIP Derm LLC. A.M. reports receiving honorarium from Canfield Scientific Inc. All other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Concordance of expected sensitivity with observed sensitivity.
Observed (vertical axis) sensitivity (blue) and specificity (red) at various expected sensitivity thresholds (horizontal axis). A perfect concordance between observed and expected sensitivity is represented by the solid black line. Deviations of the blue line below the black line (observed sensitivity less than expected sensitivity) and above black line (observed sensitivity greater than expected sensitivity) suggest loss of concordance. At the predetermined 95% sensitivity threshold (the study’s prespecified primary endpoint), ADAE had a sensitivity of 96.8% (95% CI: 91.1–98.9%) and specificity of 37.4% (95% CI: 33.3–41.7%).
Fig. 2
Fig. 2. ADAE score distribution by treating dermatologist and pathologic diagnosis.
ADAE scores (y-axis) for individual study lesions (yellow dots and blue triangles) are shown for melanomas (left panel) and non-melanomas (right panel), stratified by study dermatologist (x-axis). Higher ADAE scores (more likely melanoma) are closer to 0 and lower ADAE scores (less likely melanoma) are increasingly negative. The horizontal line is the prespecified 95% sensitivity threshold. Yellows dots represent lesions with an unequivocal benign or malignant diagnosis. Blue triangles represent lesions with a borderline pathology that was adjudicated to be melanoma or non-melanoma. The center line within each boxplot represents the median. The lower and upper hinges of the box represent the first and third quartiles (Q1 and Q3). The upper end of each whisker represents the more extreme value between the largest observed value and Q3 + 1.5 * IQR and the lower end of each whisker represents the more extreme value between the smallest observed value and Q1−1.5 * IQR, where IQR is the interquartile range.
Fig. 3
Fig. 3. Decision curve plotting decrease in avoidable interventions against threshold probability.
The binary management decision is the dermatologist biopsy decision (yes/no) after exposure to AI results. The risk probability estimate is the dermatologist’s predicted melanoma probability (0–100%) after exposure to AI results. The default strategy (biopsy all lesions) is represented by the x-axis (net avoidable interventions of 0). For example, at a threshold probability of 5% (meaning that harm of missing 1 melanoma is equivalent to the harm of 19 unnecessary benign skin biopsies), exposing dermatologists to ADAE results would theoretically be the equivalent of a strategy that reduced the number of unnecessary biopsies by about 15–20 per 100 without missing biopsy for any patients with melanoma. All 22 histopathologic keratinocyte carcinomas were excluded from decision curve analyses because they are not viewed as equivalent to benign skin lesions.
Fig. 4
Fig. 4. ADAE web-app interface.
Example of image upload results. a Dermoscopy image chosen by the dermatologist. b The ADAE score is shown as the vertical black line, calculated as the log-average of the 18 models (90 folds). Red (melanoma) and blue (benign lesions) dots were from the 2020 SIIM-ISIC Melanoma Challenge, with scores calculated as the average of the 18 models (90 folds). Higher scores (more likely melanoma) were to the right and lower scores (more likely benign) were to the left. The horizontal red (melanoma) and blue (benign lesions) lines show the distributions of ADAE scores for these diagnostic classes. Scores were spline transformed so the 95% sensitivity of the raw average of the 18 models was displayed in the center (gray vertical line). Users could adjust the interface to visualize different sensitivity thresholds (95%, 90%, 85%, and 80%) and different data test sets (red and blue dots) from the SIIM-ISIC Melanoma Challenge (MSK only vs. all 6 sites). c Saliency map showing the spatial support for melanoma prediction with yellow color indicating more likely melanoma and blue color indicating more likely not melanoma.

References

    1. Nelson KC, Swetter SM, Saboda K, Chen SC, Curiel-Lewandrowski C. Evaluation of the number-needed-to-biopsy metric for the diagnosis of cutaneous melanoma: a systematic review and meta-analysis. JAMA Dermatol. 2019;155:1167–1174. doi: 10.1001/jamadermatol.2019.1514. - DOI - PMC - PubMed
    1. Fried L, et al. Technological advances for the detection of melanoma: Advances in diagnostic techniques. J. Am. Acad. Dermatol. 2020;83:983–992. doi: 10.1016/j.jaad.2020.03.121. - DOI - PubMed
    1. Fried L, et al. Technological advances for the detection of melanoma: Advances in molecular techniques. J. Am. Acad. Dermatol. 2020;83:996–1004. doi: 10.1016/j.jaad.2020.03.122. - DOI - PubMed
    1. Haggenmüller S, et al. Skin cancer classification via convolutional neural networks: systematic review of studies involving human experts. Eur. J. Cancer. 2021;156:202–216. doi: 10.1016/j.ejca.2021.06.049. - DOI - PubMed
    1. Daneshjou R, et al. Checklist for Evaluation of Image-Based Artificial Intelligence Reports in Dermatology: CLEAR Derm Consensus Guidelines From the International Skin Imaging Collaboration Artificial Intelligence Working Group. JAMA Dermatol. 2022;158:90–96. doi: 10.1001/jamadermatol.2021.4915. - DOI - PMC - PubMed