Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 8;11(1):7769.
doi: 10.1038/s41598-021-87064-7.

Predicting the clinical management of skin lesions using deep learning

Affiliations

Predicting the clinical management of skin lesions using deep learning

Kumar Abhishek et al. Sci Rep. .

Abstract

Automated machine learning approaches to skin lesion diagnosis from images are approaching dermatologist-level performance. However, current machine learning approaches that suggest management decisions rely on predicting the underlying skin condition to infer a management decision without considering the variability of management decisions that may exist within a single condition. We present the first work to explore image-based prediction of clinical management decisions directly without explicitly predicting the diagnosis. In particular, we use clinical and dermoscopic images of skin lesions along with patient metadata from the Interactive Atlas of Dermoscopy dataset (1011 cases; 20 disease labels; 3 management decisions) and demonstrate that predicting management labels directly is more accurate than predicting the diagnosis and then inferring the management decision ([Formula: see text] and [Formula: see text] improvement in overall accuracy and AUROC respectively), statistically significant at [Formula: see text]. Directly predicting management decisions also considerably reduces the over-excision rate as compared to management decisions inferred from diagnosis predictions (24.56% fewer cases wrongly predicted to be excised). Furthermore, we show that training a model to also simultaneously predict the seven-point criteria and the diagnosis of skin lesions yields an even higher accuracy (improvements of [Formula: see text] and [Formula: see text] in overall accuracy and AUROC respectively) of management predictions. Finally, we demonstrate our model's generalizability by evaluating on the publicly available MClass-D dataset and show that our model agrees with the clinical management recommendations of 157 dermatologists as much as they agree amongst each other.

PubMed Disclaimer

Conflict of interest statement

G.H. serves as a Scientific Advisor to Triage Technologies Inc., Toronto, Canada, where J.K. and G.H. are minor shareholders (<5%). Triage Technologies Inc. offers a tool to detect skin conditions from images that was not a part of the presented experiments. K.A. has no competing interest to declare.

Figures

Figure 1
Figure 1
An overview of the three prediction models. All the models take the clinical and the dermoscopic images of the skin lesion and the patient metadata as input. Note that we also perform an input ablation study (A multi-task prediction model section; Table 4). (a) The first model predicts the lesion diagnosis probabilities, DIAGpred. (b) The second model predicts the management decision probabilities, MGMTpred. (c) The third is a multi-task model and predicts the seven-point criteria (Criterion{1,2,,7}pred,multi) in addition to DIAGpred,multi and MGMTpred,multi. The argmax operation assigns 1 to the most likely label and 0 to all others. For (a), DIAGpred diagnosis is used to arrive at a management decision either using (a1) binary labeling, MGMTinfr,binary, or (a2) prior based inference, MGMTinfr,all. Similarly, the outputs of (b) can be used to directly predict a management decision using either (b1) binary labeling, MGMTpred,binary, or (b2) all the labels, MGMTpred,all. As explained in the text, the diagnosis labels are basal cell carcinoma (BCC), nevus (NEV), melanoma (MEL), seborrheic keratosis (SK), and others (MISC), and the management decision labels are ‘clinical follow up’ (CLNC), ‘excision’ (EXC), and ‘no further examination’ (NONE). In the case of binary management decisions, we predict whether a lesion should be excised (EXC) or not (NOEXC).
Figure 2
Figure 2
Quantitative evaluation of the MGMTinfr,all and MGMTpred,all predictions. (a) Violin plots of the distance measures of the probabilistic predictions show that the MGMTpred,all predictions are closer (statistically significant) to the target labels for test data. (b, c) ROC curves and (d, e) confusion matrices of MGMTinfr,all and MGMTpred,all respectively along with cell-wise diagnosis breakdown. Note that MGMTinfr,all has a tendency to over-excise lesions.
Figure 3
Figure 3
Evaluating the multi-modal multi-task model. (a) ROC curve and (b) precision-recall curve for the management prediction task. Confusion matrices for (c) the management prediction task and (d) the diagnosis prediction task along with the diagnosis-wise breakdown for the management labels.
Figure 4
Figure 4
Evaluating the statistical significance of each input data modality’s contribution in improving the management decision prediction MGMTpred,multi. ‘C’, ‘D’, and ‘M’ refer to clinical image, dermoscopic image, and patient metadata respectively, and the row and the column names refer to the experiments in the ablation study presented in Table 4. For each pair of experiments (i) and (j), the cell (i, j) contains the p-value corresponding to the McNemar’s test performed on the corresponding pair of predictions.
Figure 5
Figure 5
Evaluating the multi-task model on the MClass-D dataset. (a) Confusion matrices and (b) ROC curves for MGMTpred and MGMTinfr predictions with both MGMTGT,agg and MGMTGT,true as target clinical management labels.
Figure 6
Figure 6
A breakdown of the inputs, outputs, loss functions, and architecture of the three prediction models. Global average pooled feature responses from the clinical and the dermoscopic images are extracted and concatenated (denoted by the plus symbol) with one-hot encoded patient meta-data, and the three models are trained with LDIAG, LMGMT, and Lmulti respectively. The first model predicts the diagnosis labels (DIAGpred) which are then used along with the management priors to obtain inferred management decisions (MGMTinfr), whereas the second model predicts the management decisions directly (MGMTpred). Finally, the last model is a multi-task one and is trained to predict the seven-point criteria, the diagnosis, and the management (outputs enclosed in the dashed box).

Similar articles

Cited by

References

    1. Friedman RJ, Rigel DS, Kopf AW. Early detection of malignant melanoma: the role of physician examination and self-examination of the skin. CA Cancer J. Clin. 1985;35:130–151. doi: 10.3322/canjclin.35.3.130. - DOI - PubMed
    1. Henning JS, et al. The CASH (color, architecture, symmetry, and homogeneity) algorithm for dermoscopy. J. Am. Acad. Dermatol. 2007;56:45–52. doi: 10.1016/j.jaad.2006.09.003. - DOI - PubMed
    1. Bakheet S. An SVM framework for malignant melanoma detection based on optimized HOG features. Computation. 2017;5:4. doi: 10.3390/computation5010004. - DOI
    1. Grzesiak-Kopeć, K., Nowak, L. & Ogorzałek, M. Automatic diagnosis of melanoid skin lesions using machine learning methods. In Rutkowski, L. et al. (eds.) International Conference on Artificial Intelligence and Soft Computing, 577–585 (Springer, Cham, 2015). 10.1007/978-3-319-19324-3_51.
    1. Jaworek-Korjakowska J. Computer-aided diagnosis of micro-malignant melanoma lesions applying support vector machines. BioMed Res. Int. 2016;2016(6):1–8. doi: 10.1155/2016/4381972. - DOI - PMC - PubMed

Publication types

MeSH terms