Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 1;156(1):29-37.
doi: 10.1001/jamadermatol.2019.3807.

Keratinocytic Skin Cancer Detection on the Face Using Region-Based Convolutional Neural Network

Affiliations

Keratinocytic Skin Cancer Detection on the Face Using Region-Based Convolutional Neural Network

Seung Seog Han et al. JAMA Dermatol. .

Abstract

Importance: Detection of cutaneous cancer on the face using deep-learning algorithms has been challenging because various anatomic structures create curves and shades that confuse the algorithm and can potentially lead to false-positive results.

Objective: To evaluate whether an algorithm can automatically locate suspected areas and predict the probability of a lesion being malignant.

Design, setting, and participants: Region-based convolutional neural network technology was used to create 924 538 possible lesions by extracting nodular benign lesions from 182 348 clinical photographs. After manually or automatically annotating these possible lesions based on image findings, convolutional neural networks were trained with 1 106 886 image crops to locate and diagnose cancer. Validation data sets (2844 images from 673 patients; mean [SD] age, 58.2 [19.9] years; 308 men [45.8%]; 185 patients with malignant tumors, 305 with benign tumors, and 183 free of tumor) were obtained from 3 hospitals between January 1, 2010, and September 30, 2018.

Main outcomes and measures: The area under the receiver operating characteristic curve, F1 score (mean of precision and recall; range, 0.000-1.000), and Youden index score (sensitivity + specificity -1; 0%-100%) were used to compare the performance of the algorithm with that of the participants.

Results: The algorithm analyzed a mean (SD) of 4.2 (2.4) photographs per patient and reported the malignancy score according to the highest malignancy output. The area under the receiver operating characteristic curve for the validation data set (673 patients) was 0.910. At a high-sensitivity cutoff threshold, the sensitivity and specificity of the model with the 673 patients were 76.8% and 90.6%, respectively. With the test partition (325 images; 80 patients), the performance of the algorithm was compared with the performance of 13 board-certified dermatologists, 34 dermatology residents, 20 nondermatologic physicians, and 52 members of the general public with no medical background. When the disease screening performance was evaluated at high sensitivity areas using the F1 score and Youden index score, the algorithm showed a higher F1 score (0.831 vs 0.653 [0.126], P < .001) and Youden index score (0.675 vs 0.417 [0.124], P < .001) than that of nondermatologic physicians. The accuracy of the algorithm was comparable with that of dermatologists (F1 score, 0.831 vs 0.835 [0.040]; Youden index score, 0.675 vs 0.671 [0.100]).

Conclusions and relevance: The results of the study suggest that the algorithm could localize and diagnose skin cancer without preselection of suspicious lesions by dermatologists.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Dr Lim is employed by LG Sciencepark. However, the company did not have any role in the study design, data collection and analysis, the decision to publish, or the preparation of this manuscript. No other disclosures were reported.

Figures

Figure 1.
Figure 1.. Representative Example to Calculate the Highest Malignancy Output With Multiple Photographs of 1 Patient
A, The blob detector detects numerous blobs from unprocessed clinical image. B, The fine image selector determines whether the detected blob is a skin lesion and excludes images with inadequate quality or those with nonspecific diagnosis. C, The disease classifier analyzes each lesion and shows 178 outputs. For example, the output of the perinasal nodular lesion output was (Top1: basal cell carcinoma, 0.67; Top2: seborrheic keratosis, 0.19; Top3: wart, 0.05; Top178: infantile eczema, Youden index score, 0.00). D, Calculation of malignancy_output uses the 178 outputs of the each lesion with the following formula: malignancy output = (basal cell carcinoma output + squamous cell carcinoma output + squamous cell carcinoma in situ output + keratoacanthoma output + malignant melanoma output) + 0.2 × (actinic keratosis output + ulcer output). The number shown is the malignancy output multiplied by 100. The perinasal lesion was marked with a red box because the final malignancy_output (67) was higher than the fixed T90/T80 thresholds (the thresholds at which the sensitivity of the algorithm was 90% or 80%) (25.45/46.87). In all malignancy_outputs in 1 patient, the highest malignancy_output was finally used to draw the receiver operating characteristic curve. The image was generated by style-generative adversarial network and is not an actual person.
Figure 2.
Figure 2.. Example of the Detection of Basal Cell Carcinoma of the Periocular Area of a 79-Year-Old Man From the Dermatology Data Set
The rectangles were colored when the malignancy output was higher than the threshold (T80, red; T90, orange). The lesional blobs of basal cell carcinoma were strongly detected in the periocular area. There were several seborrheic keratoses on the cheek and brow, and the algorithm correctly diagnosed them as benign. The final malignancy output for the test individual in this series of 3 photographs was 94, which was the highest malignancy output.
Figure 3.
Figure 3.. Receiver Operating Characteristic (ROC) Curves With the Dermatology (DER) and Plastic Surgery (PS) Validation Data Sets and the Comparison With Experts and the General Public
A, DER + PS data set (325 images from 80 patients; area under the ROC (AUC) of the algorithm = 0.919). B, DER + PS data set (2844 images from 673 patients; AUC = 0.910). C, DER data set (170 images from 40 patients; AUC = 0.868). D, DER data set (1570 images from 386 patients; AUC = 0.896). E, PS data set (155 images from 40 patients; AUC = 0.983). F, PS data set (1274 images from 287 patients; AUC = 0.954). Addition symbols that indicate the test participants’ mean sensitivity and specificity for malignant vs nonmalignant, and multiplication symbols that indicate the test participants’ mean sensitivity and specificity for whether a biopsy is or is not required are located on the proximity or slightly on the upper left-hand side of the algorithm’s curve as shown in A, C, E. Compared with the dermatologists’ individual sensitivity and specificity, the algorithm demonstrated relatively better performance in C than A or E. A much smaller number of dermatologists and dermatology residents are located left and upper than the algorithm’s curve in C. The black star/diamond points are the sensitivity/specificity of the algorithm at the 90% (T90) and 80% (T80) thresholds. The general public was asked whether there was a possibility of skin cancer, thus necessitating a visit to a dermatologist, and the sensitivity and specificity were calculated for their answers. The sensitivity of the general public was 50.1%, which indicated that half of the malignant lesions could be ignored.

Comment in

Similar articles

Cited by

References

    1. Gulshan V, Peng L, Coram M, et al. . Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316(22):2402-2410. doi:10.1001/jama.2016.17216 - DOI - PubMed
    1. Esteva A, Kuprel B, Novoa RA, et al. . Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115-118. doi:10.1038/nature21056 - DOI - PMC - PubMed
    1. Haenssle HA, Fink C, Schneiderbauer R, et al. ; Reader study level-I and level-II Groups . Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann Oncol. 2018;29(8):1836-1842. doi:10.1093/annonc/mdy166 - DOI - PubMed
    1. Chilamkurthy S, Ghosh R, Tanamala S, et al. . Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet. 2018;392(10162):2388-2396. doi:10.1016/S0140-6736(18)31645-3 - DOI - PubMed
    1. Fujisawa Y, Otomo Y, Ogata Y, et al. . Deep-learning–based, computer-aided classifier developed with a small dataset of clinical images surpasses board-certified dermatologists in skin tumour diagnosis. Br J Dermatol. 2019;180(2):373-381. doi:10.1111/bjd.16924 - DOI - PubMed

MeSH terms