Generation of a Melanoma and Nevus Data Set From Unstandardized Clinical Photographs on the Internet
- PMID: 37792351
- PMCID: PMC10551819
- DOI: 10.1001/jamadermatol.2023.3521
Generation of a Melanoma and Nevus Data Set From Unstandardized Clinical Photographs on the Internet
Abstract
Importance: Artificial intelligence (AI) training for diagnosing dermatologic images requires large amounts of clean data. Dermatologic images have different compositions, and many are inaccessible due to privacy concerns, which hinder the development of AI.
Objective: To build a training data set for discriminative and generative AI from unstandardized internet images of melanoma and nevus.
Design, setting, and participants: In this diagnostic study, a total of 5619 (CAN5600 data set) and 2006 (CAN2000 data set; a manually revised subset of CAN5600) cropped lesion images of either melanoma or nevus were semiautomatically annotated from approximately 500 000 photographs on the internet using convolutional neural networks (CNNs), region-based CNNs, and large mask inpainting. For unsupervised pretraining, 132 673 possible lesions (LESION130k data set) were also created with diversity by collecting images from 18 482 websites in approximately 80 countries. A total of 5000 synthetic images (GAN5000 data set) were generated using the generative adversarial network (StyleGAN2-ADA; training, CAN2000 data set; pretraining, LESION130k data set).
Main outcomes and measures: The area under the receiver operating characteristic curve (AUROC) for determining malignant neoplasms was analyzed. In each test, 1 of the 7 preexisting public data sets (total of 2312 images; including Edinburgh, an SNU subset, Asan test, Waterloo, 7-point criteria evaluation, PAD-UFES-20, and MED-NODE) was used as the test data set. Subsequently, a comparative study was conducted between the performance of the EfficientNet Lite0 CNN on the proposed data set and that trained on the remaining 6 preexisting data sets.
Results: The EfficientNet Lite0 CNN trained on the annotated or synthetic images achieved higher or equivalent mean (SD) AUROCs to the EfficientNet Lite0 trained using the pathologically confirmed public data sets, including CAN5600 (0.874 [0.042]; P = .02), CAN2000 (0.848 [0.027]; P = .08), and GAN5000 (0.838 [0.040]; P = .31 [Wilcoxon signed rank test]) and the preexisting data sets combined (0.809 [0.063]) by the benefits of increased size of the training data set.
Conclusions and relevance: The synthetic data set in this diagnostic study was created using various AI technologies from internet images. A neural network trained on the created data set (CAN5600) performed better than the same network trained on preexisting data sets combined. Both the annotated (CAN5600 and LESION130k) and synthetic (GAN5000) data sets could be shared for AI training and consensus between physicians.
Conflict of interest statement
Figures



Comment in
-
Advances in Melanoma-Nevus Classification Using Artificially Generated Image Data Sets.JAMA Dermatol. 2023 Nov 1;159(11):1175-1176. doi: 10.1001/jamadermatol.2023.3518. JAMA Dermatol. 2023. PMID: 37792340 No abstract available.
Similar articles
-
Robustness of convolutional neural networks in recognition of pigmented skin lesions.Eur J Cancer. 2021 Mar;145:81-91. doi: 10.1016/j.ejca.2020.11.020. Epub 2021 Jan 7. Eur J Cancer. 2021. PMID: 33423009
-
Synthetic Medical Images for Robust, Privacy-Preserving Training of Artificial Intelligence: Application to Retinopathy of Prematurity Diagnosis.Ophthalmol Sci. 2022 Feb 11;2(2):100126. doi: 10.1016/j.xops.2022.100126. eCollection 2022 Jun. Ophthalmol Sci. 2022. PMID: 36249693 Free PMC article.
-
Diagnostic performance of augmented intelligence with 2D and 3D total body photography and convolutional neural networks in a high-risk population for melanoma under real-world conditions: A new era of skin cancer screening?Eur J Cancer. 2023 Sep;190:112954. doi: 10.1016/j.ejca.2023.112954. Epub 2023 Jun 24. Eur J Cancer. 2023. PMID: 37453242
-
Skin cancer classification via convolutional neural networks: systematic review of studies involving human experts.Eur J Cancer. 2021 Oct;156:202-216. doi: 10.1016/j.ejca.2021.06.049. Epub 2021 Sep 8. Eur J Cancer. 2021. PMID: 34509059
-
Artificial intelligence in dermatopathology: Diagnosis, education, and research.J Cutan Pathol. 2021 Aug;48(8):1061-1068. doi: 10.1111/cup.13954. Epub 2021 Jan 26. J Cutan Pathol. 2021. PMID: 33421167 Review.
Cited by
-
AI in Aesthetic/Cosmetic Dermatology: Current and Future.J Cosmet Dermatol. 2025 Jan;24(1):e16640. doi: 10.1111/jocd.16640. Epub 2024 Nov 7. J Cosmet Dermatol. 2025. PMID: 39509562 Free PMC article. Review.
-
Evaluation of Perceptual Realism and Clinical Plausibility of AI-Generated Colon Polyp Images.Biomedicines. 2025 Jun 26;13(7):1561. doi: 10.3390/biomedicines13071561. Biomedicines. 2025. PMID: 40722637 Free PMC article.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Research Materials