Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 1;159(11):1223-1231.
doi: 10.1001/jamadermatol.2023.3521.

Generation of a Melanoma and Nevus Data Set From Unstandardized Clinical Photographs on the Internet

Affiliations

Generation of a Melanoma and Nevus Data Set From Unstandardized Clinical Photographs on the Internet

Soo Ick Cho et al. JAMA Dermatol. .

Abstract

Importance: Artificial intelligence (AI) training for diagnosing dermatologic images requires large amounts of clean data. Dermatologic images have different compositions, and many are inaccessible due to privacy concerns, which hinder the development of AI.

Objective: To build a training data set for discriminative and generative AI from unstandardized internet images of melanoma and nevus.

Design, setting, and participants: In this diagnostic study, a total of 5619 (CAN5600 data set) and 2006 (CAN2000 data set; a manually revised subset of CAN5600) cropped lesion images of either melanoma or nevus were semiautomatically annotated from approximately 500 000 photographs on the internet using convolutional neural networks (CNNs), region-based CNNs, and large mask inpainting. For unsupervised pretraining, 132 673 possible lesions (LESION130k data set) were also created with diversity by collecting images from 18 482 websites in approximately 80 countries. A total of 5000 synthetic images (GAN5000 data set) were generated using the generative adversarial network (StyleGAN2-ADA; training, CAN2000 data set; pretraining, LESION130k data set).

Main outcomes and measures: The area under the receiver operating characteristic curve (AUROC) for determining malignant neoplasms was analyzed. In each test, 1 of the 7 preexisting public data sets (total of 2312 images; including Edinburgh, an SNU subset, Asan test, Waterloo, 7-point criteria evaluation, PAD-UFES-20, and MED-NODE) was used as the test data set. Subsequently, a comparative study was conducted between the performance of the EfficientNet Lite0 CNN on the proposed data set and that trained on the remaining 6 preexisting data sets.

Results: The EfficientNet Lite0 CNN trained on the annotated or synthetic images achieved higher or equivalent mean (SD) AUROCs to the EfficientNet Lite0 trained using the pathologically confirmed public data sets, including CAN5600 (0.874 [0.042]; P = .02), CAN2000 (0.848 [0.027]; P = .08), and GAN5000 (0.838 [0.040]; P = .31 [Wilcoxon signed rank test]) and the preexisting data sets combined (0.809 [0.063]) by the benefits of increased size of the training data set.

Conclusions and relevance: The synthetic data set in this diagnostic study was created using various AI technologies from internet images. A neural network trained on the created data set (CAN5600) performed better than the same network trained on preexisting data sets combined. Both the annotated (CAN5600 and LESION130k) and synthetic (GAN5000) data sets could be shared for AI training and consensus between physicians.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Dr. S. I. Cho reported being employed by Lunit Inc and owning stock and stock options in Lunit Inc. Dr Daneshjou reported receiving personal fees from DWA, Pfizer Inc, L’Oreal SA, and VisualDx and stock options from Revea and MDalgorithms outside the submitted work. Dr Kim reported receiving nonfinancial support from the Basic Science Research Program through the National Research Foundation of Korea during the conduct of the study. Dr Han reported being the founder, chief executive officer, and chief technology officer for IDerma Inc during the conduct of the study. No other disclosures were reported.

Figures

Figure 1.
Figure 1.. Illustration of the Use of Various Convolutional Neural Networks (CNNs) for Creating the Synthetic and Morphed Images
Possible lesions were detected using a region-based CNN (R-CNN) (blob detector; Model Dermatology [ModelDerm]). Lesional images with adequate quality were selected using a CNN (fine image selector; ModelDerm). Lesions that were either melanoma (MEL) or nevus (N) were selected using a CNN (disease classifier; ModelDerm). Artifacts were removed using large mask inpainting (LaMa; yellow star). For generating the synthetic and morphed images, a generative adversarial network (GAN; StyleGAN2-ADA) was used. CAN2000 and CAN5600 indicate clinical photographs annotated by neural networks with 2006 and 5619 images, respectively. The inset images were obtained from Wikipedia, the National Cancer Institute, and Pix4Free.org, and the original sources were as follows: https://commons.wikimedia.org/wiki/File:Photograph_of_lentigo_maligna_melanoma.jpg; https://commons.wikimedia.org/wiki/File:519_Melanoma.jpg; https://commons.wikimedia.org/wiki/File:Sebaceous_cyst01.jpg; https://commons.wikimedia.org/wiki/File:Malignant_Melanoma_in_situ_Left_Forearm.jpg; https://en.wikipedia.org/wiki/Melanoma#/media/File:Malignant_Melanoma_right_medial_thigh.jpg; https://en.wikipedia.org/wiki/Acne#/media/File:Backacne.JPG; https://pix4free.org/photo/15352/melanoma.html; https://en.wikipedia.org/wiki/Nevus#/media/File:Nevus_NCI.jpg; https://commons.wikimedia.org/wiki/File:Melanoma_Growth_over_14_Months.jpg; https://commons.wikimedia.org/wiki/File:Congenital_melanocytic_nevus_01.jpg; https://en.wikipedia.org/wiki/Melanoma#/media/File:Photography_of_a_large_acral_lentiginous_melanoma.jpg; https://en.wikipedia.org/wiki/Melanoma#/media/File:Photography_of_nodular_melanoma.jpg; and https://en.wikipedia.org/wiki/Nevus#/media/File:Normal_mole_(1).jpg.
Figure 2.
Figure 2.. Examples of Synthetic Morphed Images
Top row shows from seed0513.jpg to seed1119.jpg; middle row, from seed0513.jpg to seed1185.jpg; and bottom row, from seed1197.jpg to seed1464.jpg. All synthetic images are accessible on figshare.
Figure 3.
Figure 3.. Examples of Synthetic Images of Melanoma and Nevus

Comment in

Similar articles

Cited by

References

    1. Esteva A, Kuprel B, Novoa RA, et al. . Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115-118. doi:10.1038/nature21056 - DOI - PMC - PubMed
    1. Han SS, Park I, Eun Chang S, et al. . Augmented intelligence dermatology: deep neural networks empower medical professionals in diagnosing skin cancer and predicting treatment options for 134 skin disorders. J Invest Dermatol. 2020;140(9):1753-1761. doi:10.1016/j.jid.2020.01.019 - DOI - PubMed
    1. Liu Y, Jain A, Eng C, et al. . A deep learning system for differential diagnosis of skin diseases. Nat Med. 2020;26(6):900-908. doi:10.1038/s41591-020-0842-3 - DOI - PubMed
    1. Du-Harpur X, Watt FM, Luscombe NM, Lynch MD. What is AI? applications of artificial intelligence to dermatology. Br J Dermatol. 2020;183(3):423-430. doi:10.1111/bjd.18880 - DOI - PMC - PubMed
    1. Petrie T, Samatham R, Witkowski AM, Esteva A, Leachman SA. Melanoma early detection: big data, bigger picture. J Invest Dermatol. 2019;139(1):25-30. doi:10.1016/j.jid.2018.06.187 - DOI - PMC - PubMed

Publication types