. 2021 Apr 8;11(1):7769.

doi: 10.1038/s41598-021-87064-7.

Predicting the clinical management of skin lesions using deep learning

Kumar Abhishek¹, Jeremy Kawahara², Ghassan Hamarneh²

Affiliations

¹ School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada. kabhishe@sfu.ca.
² School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.

PMID: 33833293
PMCID: PMC8032721
DOI: 10.1038/s41598-021-87064-7

Predicting the clinical management of skin lesions using deep learning

Kumar Abhishek et al. Sci Rep. 2021.

. 2021 Apr 8;11(1):7769.

doi: 10.1038/s41598-021-87064-7.

Authors

Kumar Abhishek¹, Jeremy Kawahara², Ghassan Hamarneh²

Affiliations

¹ School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada. kabhishe@sfu.ca.
² School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.

PMID: 33833293
PMCID: PMC8032721
DOI: 10.1038/s41598-021-87064-7

Abstract

Automated machine learning approaches to skin lesion diagnosis from images are approaching dermatologist-level performance. However, current machine learning approaches that suggest management decisions rely on predicting the underlying skin condition to infer a management decision without considering the variability of management decisions that may exist within a single condition. We present the first work to explore image-based prediction of clinical management decisions directly without explicitly predicting the diagnosis. In particular, we use clinical and dermoscopic images of skin lesions along with patient metadata from the Interactive Atlas of Dermoscopy dataset (1011 cases; 20 disease labels; 3 management decisions) and demonstrate that predicting management labels directly is more accurate than predicting the diagnosis and then inferring the management decision ([Formula: see text] and [Formula: see text] improvement in overall accuracy and AUROC respectively), statistically significant at [Formula: see text]. Directly predicting management decisions also considerably reduces the over-excision rate as compared to management decisions inferred from diagnosis predictions (24.56% fewer cases wrongly predicted to be excised). Furthermore, we show that training a model to also simultaneously predict the seven-point criteria and the diagnosis of skin lesions yields an even higher accuracy (improvements of [Formula: see text] and [Formula: see text] in overall accuracy and AUROC respectively) of management predictions. Finally, we demonstrate our model's generalizability by evaluating on the publicly available MClass-D dataset and show that our model agrees with the clinical management recommendations of 157 dermatologists as much as they agree amongst each other.

PubMed Disclaimer

Conflict of interest statement

G.H. serves as a Scientific Advisor to Triage Technologies Inc., Toronto, Canada, where J.K. and G.H. are minor shareholders (<5%). Triage Technologies Inc. offers a tool to detect skin conditions from images that was not a part of the presented experiments. K.A. has no competing interest to declare.

Figures

**Figure 1**
An overview of the three prediction models. All the models take the clinical and the dermoscopic images of the skin lesion and the patient metadata as input. Note that we also perform an input ablation study (A multi-task prediction model section; Table 4). (a) The first model predicts the lesion diagnosis probabilities, ${DIAG}_{pred}$ . (b) The second model predicts the management decision probabilities, ${MGMT}_{pred}$ . (c) The third is a multi-task model and predicts the seven-point criteria ( $Criterion {1, 2, \dots, 7}_{pred, multi}$ ) in addition to ${DIAG}_{pred, multi}$ and ${MGMT}_{pred, multi}$ . The argmax operation assigns 1 to the most likely label and 0 to all others. For (a), ${DIAG}_{pred}$ diagnosis is used to arrive at a management decision either using (a1) binary labeling, ${MGMT}_{infr, binary}$ , or (a2) prior based inference, ${MGMT}_{infr, all}$ . Similarly, the outputs of (b) can be used to directly predict a management decision using either (b1) binary labeling, ${MGMT}_{pred, binary}$ , or (b2) all the labels, ${MGMT}_{pred, all}$ . As explained in the text, the diagnosis labels are basal cell carcinoma (BCC), nevus (NEV), melanoma (MEL), seborrheic keratosis (SK), and others (MISC), and the management decision labels are ‘clinical follow up’ (CLNC), ‘excision’ (EXC), and ‘no further examination’ (NONE). In the case of binary management decisions, we predict whether a lesion should be excised (EXC) or not (NOEXC).

**Figure 2**
Quantitative evaluation of the ${MGMT}_{infr, all}$ and ${MGMT}_{pred, all}$ predictions. (a) Violin plots of the distance measures of the probabilistic predictions show that the ${MGMT}_{pred, all}$ predictions are closer (statistically significant) to the target labels for test data. (b, c) ROC curves and (d, e) confusion matrices of ${MGMT}_{infr, all}$ and ${MGMT}_{pred, all}$ respectively along with cell-wise diagnosis breakdown. Note that ${MGMT}_{infr, all}$ has a tendency to over-excise lesions.

**Figure 3**
Evaluating the multi-modal multi-task model. (a) ROC curve and (b) precision-recall curve for the management prediction task. Confusion matrices for (c) the management prediction task and (d) the diagnosis prediction task along with the diagnosis-wise breakdown for the management labels.

**Figure 4**
Evaluating the statistical significance of each input data modality’s contribution in improving the management decision prediction ${MGMT}_{pred, multi}$ . ‘C’, ‘D’, and ‘M’ refer to clinical image, dermoscopic image, and patient metadata respectively, and the row and the column names refer to the experiments in the ablation study presented in Table 4. For each pair of experiments (i) and (j), the cell (i, j) contains the p-value corresponding to the McNemar’s test performed on the corresponding pair of predictions.

**Figure 5**
Evaluating the multi-task model on the MClass-D dataset. (a) Confusion matrices and (b) ROC curves for ${MGMT}_{pred}$ and ${MGMT}_{infr}$ predictions with both ${MGMT}_{GT, agg}$ and ${MGMT}_{GT, true}$ as target clinical management labels.

**Figure 6**
A breakdown of the inputs, outputs, loss functions, and architecture of the three prediction models. Global average pooled feature responses from the clinical and the dermoscopic images are extracted and concatenated (denoted by the plus symbol) with one-hot encoded patient meta-data, and the three models are trained with $L_{DIAG}$ , $L_{MGMT}$ , and $L_{multi}$ respectively. The first model predicts the diagnosis labels ( ${DIAG}_{pred}$ ) which are then used along with the management priors to obtain inferred management decisions ( ${MGMT}_{infr}$ ), whereas the second model predicts the management decisions directly ( ${MGMT}_{pred}$ ). Finally, the last model is a multi-task one and is trained to predict the seven-point criteria, the diagnosis, and the management (outputs enclosed in the dashed box).

See this image and copyright information in PMC

References

1. Friedman RJ, Rigel DS, Kopf AW. Early detection of malignant melanoma: the role of physician examination and self-examination of the skin. CA Cancer J. Clin. 1985;35:130–151. doi: 10.3322/canjclin.35.3.130. - DOI - PubMed
1. Henning JS, et al. The CASH (color, architecture, symmetry, and homogeneity) algorithm for dermoscopy. J. Am. Acad. Dermatol. 2007;56:45–52. doi: 10.1016/j.jaad.2006.09.003. - DOI - PubMed
1. Bakheet S. An SVM framework for malignant melanoma detection based on optimized HOG features. Computation. 2017;5:4. doi: 10.3390/computation5010004. - DOI
1. Grzesiak-Kopeć, K., Nowak, L. & Ogorzałek, M. Automatic diagnosis of melanoid skin lesions using machine learning methods. In Rutkowski, L. et al. (eds.) International Conference on Artificial Intelligence and Soft Computing, 577–585 (Springer, Cham, 2015). 10.1007/978-3-319-19324-3_51.
1. Jaworek-Korjakowska J. Computer-aided diagnosis of micro-malignant melanoma lesions applying support vector machines. BioMed Res. Int. 2016;2016(6):1–8. doi: 10.1155/2016/4381972. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Predicting the clinical management of skin lesions using deep learning

Affiliations

Predicting the clinical management of skin lesions using deep learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical