Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug;143(8):1423-1429.e1.
doi: 10.1016/j.jid.2022.08.058. Epub 2023 Feb 18.

Improving Artificial Intelligence-Based Diagnosis on Pediatric Skin Lesions

Affiliations

Improving Artificial Intelligence-Based Diagnosis on Pediatric Skin Lesions

Paras P Mehta et al. J Invest Dermatol. 2023 Aug.

Abstract

Artificial intelligence algorithms to classify melanoma are dependent on their training data, which limits generalizability. The objective of this study was to compare the performance of an artificial intelligence model trained on a standard adult-predominant dermoscopic dataset before and after the addition of additional pediatric training images. The performances were compared using held-out adult and pediatric test sets of images. We trained two models: one (model A) on an adult-predominant dataset (37,662 images from the International Skin Imaging Collaboration) and the other (model A+P) on an additional 1,536 pediatric images. We compared performance between the two models on adult and pediatric held-out test images separately using the area under the receiver operating characteristic curve. We then used Gradient-weighted Class Activation Maps and background skin masking to understand the contributions of the lesion versus background skin to algorithm decision making. Adding images from a pediatric population with different epidemiological and visual patterns to current reference standard datasets improved algorithm performance on pediatric images without diminishing performance on adult images. This suggests a way that dermatologic artificial intelligence models can be made more generalizable. The presence of background skin was important to the pediatric-specific improvement seen between models. Our study highlights the importance of carefully curated and labeled data from diverse inputs to improve the generalizability of AI models for dermatology, in this case applied to dermoscopic images of adult and pediatric lesions to improve melanoma detection.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interests:

AH is a shareholder of Skip Derm, LLC. AH and has provided services for Janssen, SciBase, Canfield Scientific, and HealthCert International. HPS is a shareholder of MoleMap NZ Limited and e-derm consult GmbH, and undertakes regular teledermatological reporting for both companies. HPS is a Medical Consultant for Canfield Scientific Inc, Blaze Bioscience Inc, MoleMap Australia Pty Ltd, Revenio Research Oy and a Medical Advisor for First Derm. VR is an expert consultant for Inhabit Brands, Inc and expert advisor for Inhabit Brands, Inc

Conflicts of Interests:

AH is a shareholder of Skip Derm, LLC. AH has provided services for Janssen, SciBase, Canfield Scientific, and HealthCert International. compensation for provision of services from HPS is a shareholder of MoleMap NZ Limited and e-derm consult GmbH, and undertakes regular teledermatological reporting for both companies. HPS is a Medical Consultant for Canfield Scientific Inc, MoleMap Australia Pty Ltd, Revenio Research Oy and a Medical Advisor for First Derm.

Figures

Figure 1:
Figure 1:
Comparison between Models trained on Adult (A) and Adult+Pediatric (A+P) images. This figure shows the corresponding Receiver Operating Characteristic (ROC) curves for all 5 versions of Model A and Model A+P for pediatric and adult images. Figure 1a, b: ROC curves show the performance of Models A and A+P on adult images. Figures 1c, d: ROC curves show the performance of Models A and A+P on pediatric images.
Figure 2:
Figure 2:
Comparison of image weighting between differently trained models. Gradient-weighted Class Activation Maps of Model A (trained on adult images from the ISIC archie), and Model A+P (trained on adult images from the ISIC archive + our original pediatric dataset) where increasing intensity of yellow corresponds to how much that area was weighted in the algorithm’s decision making. 2a) an image whose decision was made by Model A mostly on the basis of the background skin (low LAR), compared to 2e) Model A+P mostly on the basic of the lesion characteristics (high LAR). 2b) an image whose classification was made from very little overall activation by Model A compared to 2f) activated by mostly background skin (low LAR) in Model A+P. 2c/2g) an image which was classified mainly on the basis of its lesion in both models (high LAR). 2d/2h): An image whose background skin stayed the most informative contributor to classification, but where both models weighted different parts in the background skin (low LAR). 2i/2j: A parallel plot representation of 200 random pediatric and adult images and their change in LAR between Model A and Model A+P (blue) and the average change in LAR (orange). While the average change in LAR across all lesions is low, many lesions showed a significant change in LAR between the two models, as exhibited by the varying slopes of blue. This signifies that Model A+P often “looks at” different regions than Model A across many images.
Figure 3:
Figure 3:
The effect of background skin on AI accuracy. The ROC curves of Models A and A+P on images with and without background skin. Figures 3a, b: Performance on adult images with masked background and Figures 3c, d: Performance on similarly masked pediatric images. We see decreased performance on both kinds of images with the removal of background skin from the image.
Figure 4:
Figure 4:
Two-dimensional visualization of the embedding spaces. Model A, trained only on adult images, and Model A+P trained on adult and pediatric images. Model A+P demonstrates improved ability to detect adult and pediatric images as distinct clusters compared to Model A.

References

    1. Adamson AS, Smith A. Machine Learning and Health Care Disparities in Dermatology. JAMA Dermatol 2018;154(11):1247. - PubMed
    1. Bach S, Binder A, Montavon G, Klauschen F, Müller K-R, Samek W. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLOS ONE. Public Library of Science; 2015;10(7):e0130140 - PMC - PubMed
    1. Chan S, Reddy V, Myers B, Thibodeaux Q, Brownstone N, Liao W. Machine Learning in Dermatology: Current Applications, Opportunities, and Limitations. Dermatol. Ther 2020;10(3):365–86 - PMC - PubMed
    1. Cordoro KM, Gupta D, Frieden IJ, McCalmont T, Kashani-Sabet M. Pediatric melanoma: results of a large cohort study and proposal for modified ABCD detection criteria for children. J. Am. Acad. Dermatol 2013;68(6):913–25 - PubMed
    1. Cruz-Roa A, Caicedo JC, González FA. Visual pattern mining in histology image collections using bag of features. Artif. Intell. Med 2011;52(2):91–106 - PubMed

Publication types

LinkOut - more resources