Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug;29(8):1941-1946.
doi: 10.1038/s41591-023-02475-5. Epub 2023 Jul 27.

A reinforcement learning model for AI-based decision support in skin cancer

Affiliations

A reinforcement learning model for AI-based decision support in skin cancer

Catarina Barata et al. Nat Med. 2023 Aug.

Abstract

We investigated whether human preferences hold the potential to improve diagnostic artificial intelligence (AI)-based decision support using skin cancer diagnosis as a use case. We utilized nonuniform rewards and penalties based on expert-generated tables, balancing the benefits and harms of various diagnostic errors, which were applied using reinforcement learning. Compared with supervised learning, the reinforcement learning model improved the sensitivity for melanoma from 61.4% to 79.5% (95% confidence interval (CI): 73.5-85.6%) and for basal cell carcinoma from 79.4% to 87.1% (95% CI: 80.3-93.9%). AI overconfidence was also reduced while simultaneously maintaining accuracy. Reinforcement learning increased the rate of correct diagnoses made by dermatologists by 12.0% (95% CI: 8.8-15.1%) and improved the rate of optimal management decisions from 57.4% to 65.3% (95% CI: 61.7-68.9%). We further demonstrated that the reward-adjusted reinforcement learning model and a threshold-based model outperformed naïve supervised learning in various clinical scenarios. Our findings suggest the potential for incorporating human preferences into image-based diagnostic algorithms.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing interests: P.T. has received fees from Silverchair, speaker honoraria from FotoFinder, Lilly and Novartis, and an unrestricted one-year postdoc grant from MetaOptima Technology Inc. N.C. is a Microsoft employee and owns diverse investments across technology and healthcare companies. A.H. is a consultant to Canfield Scientific Inc. and advisory board member of Scibase AB. H.P.S. is a shareholder of MoleMap NZ Limited and e-derm consult GmbH and undertakes regular teledermatological reporting for both companies. H.P.S. is also a medical consultant for Canfield Scientific Inc., MoleMap Australia Pty Ltd, Blaze Bioscience Inc. and a medical adviser for First Derm. V.R. is a medical adviser for Inhabit Brands, Inc. H.K. received nonfinancial support from Derma Medical Systems, Fotofinder and Heine, and speaker fees from Fotofinder. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Comparison of models and reader study results.
a, Expert-generated reward table used to train the RL model; rows, ground truth; columns, predictions. b,c, Confusion matrix of the SL model (b) and the RL model (c) using the same test set (n = 1511). Rows, ground truth; columns, predictions. The proportions are normalized by the row-sums (MEL: n = 171; BCC: n = 93; AKIEC: n = 43; BKL: n = 217; NV: n = 908; DF: n = 44; VASC: n = 35). d, Boxplot of difference in entropy of paired test set predictions (n = 1,511) of the SL model and the RL model. Black line, median; boxes, 25th–75th percentiles; whiskers, minimum and maximum values, P < 0.0001 (Wilcoxon test). e,f, Results of the reader study comparing sensitivities (e) and frequencies of optimal management decisions (f) of 89 dermatologists by diagnosis without AI support (−AI), with support by the SL model (+SL) and with support by the RL model (+RL). Optimal managements: ‘excision’ for melanomas and basal cell carcinomas; ‘local therapy’ for actinic keratoses/intraepidermal carcinoma; and ‘dismiss’ for nevi, benign keratinocytic lesions, dermatofibroma and vascular lesions. Bars, means; whiskers, standard error. Sample sizes: MEL(−AI): n = 89; MEL(+SL): n = 78; MEL(+RL): n = 81; BCC(−AI): n = 89; BCC(+SL): n = 63; BCC(+RL): n = 68; AKIEC(−AI): n = 89; AKIEC(+SL): n = 60; AKIEC(+RL): n = 72; NV(−AI): n = 89; NV(+SL): n = 88; NV(+RL): n = 85; BKL(−AI): n = 89; BKL(+SL): n = 65; BKL(+RL): n = 76; DF(−AI): n = 89; DF(+SL): n = 71; DF(+RL): n = 61; VASC(−AI): n = 89, VASC(+SL): n = 67; VASC(+RL): n = 65. Abbreviations: MEL, melanoma; BCC, basal cell carcinoma; AKIEC, actinic keratosis/intraepidermal carcinoma; BKL, benign keratinocytic lesion; NV, melanocytic nevus; DF, dermatofibroma; VASC, vascular lesion.
Fig. 2
Fig. 2. Comparison of models in three different scenarios.
Top level (binary scenario: benign versus malignant): a, Experts’ malignancy probability thresholds for decision to excise (n = 10). Lines, median; boxes, 25th–75th percentiles; whiskers, values within 1.5 times interquartile range. b, Receiver operating characteristic curve derived from the SL model and operating points of ten experts using either thresholds (SL model) or rewards (RL model). Possible management decisions were ‘dismiss’ or ‘excise’. True and false positive rates refer to proportions of malignant and benign lesions that were excised. Black triangle, naïve approach (excision if malignant probability > 0.5). c, Boxplot comparing TPRs for melanomas applying thresholds (SL model) and rewards (RL model) provided by ten experts. Bars, means; whiskers, standard deviations (P = 0.11, paired t-test); dashed line, naïve approach. Middle level (multiclass scenario, additional therapeutic option): d, Thresholds of ten experts for probabilities of actinic keratosis/intraepidermal carcinoma for decision to treat locally. Line, median; boxes, 25th–75th percentiles; whiskers, values within 1.5 times interquartile range. e, Median rewards per action and diagnosis. fh, Confusion matrices of actions by diagnosis: naïve approach (f), threshold-adjusted SL model (g) and RL model (h). Lower level (patient-centered approach, 7,375 lesions, 524 patients): i, Thresholds of ten experts for malignancy probabilities for decision to dismiss, monitor or excise. j, Median rewards per action and diagnosis. k, Number of excisions of benign lesions by patient according to model. l, Number of monitored benign lesions by patient according to model. m, Management strategies for 55 melanomas according to model.
Extended Data Fig. 1
Extended Data Fig. 1. Comparison of baseline SL model with RL model.
a: Alluvial plot of test set (n = 1511); the left block shows the ground truth, the middle block shows the results of supervised learning (SL), and the right block shows the results of reinforcement learning (RL) based on a reward table created by experts; Only alluvials with n > 5 are shown. MEL= melanoma (n = 171), BCC= basal cell carcinoma (n = 93), AKIEC= actinic keratosis and intraepidermal carcinoma(n = 43), BKL= benign keratinocytic lesion (n = 217), NV= melanocytic nevus (n = 908), DF=dermatofibroma (n = 44), VASC= vascular lesion (n = 35). b: Boxplots of entropy of correct and incorrect predictions for melanoma (n = 171) and melanocytic nevi (n = 908) according to applied model. Black line = median, boxes = 25th–75th percentiles, whiskers = values within 1.5 times interquartile range; Abbreviations: SL= supervised learning, RL= reinforcement learning, dx =ground truth.
Extended Data Fig. 2
Extended Data Fig. 2. Scenario with 7 diagnoses and ‘local therapy’ as an additional treatment option.
a: Graphical abstract of scenario adding the treatment option ‘local therapy’ (for example cryotherapy) for actinic keratosis/intraepidermal carcinomas. While excision is the optimal management for melanoma and most basal cell carcinomas, local therapy is optimal for actinic keratosis/intraepidermal carcinoma. We judged local therapy to be a harmful treatment for melanomas and suboptimal for basal cell carcinomas suitable for surgery (all basal cell carcinomas in the dataset). b: Proportion of cases per diagnosis and model that received optimal management (excision for melanoma and basal carcinoma, local therapy for actinic keratoses/intraepidermal carcinoma, and no treatment (‘dismiss’) for all benign diagnoses); c: Proportion of cases per diagnosis and model that were mismanaged. Mismanagement included all procedures except excision for melanoma and basal cell carcinoma, all procedures except excision or local therapy for actinic keratoses/intraepidermal carcinoma, and all procedures except ‘dismiss’ for all benign conditions (nevus, benign keratinocytic lesions, dermatofibroma, and vascular lesions). Abbreviations and sample size: mel= melanoma (n = 171), bcc= basal cell carcinoma (n = 93), akiec= actinic keratosis/intraepidermal carcinoma(n = 43), bkl= benign keratinocytic lesion (n = 217), nv= nevus (n = 908), df=dermatofibroma (n = 44), vasc= vascular lesion (n = 35).
Extended Data Fig. 3
Extended Data Fig. 3. Scenario of high-risk patients with multiple nevi.
a: Graphical abstract of scenario of monitoring of high-risk individuals with multiple nevi. Due to the large number of lesions per patient, this scenario requires a more patient-centered and less lesion-centered approach. Most melanomas detected during monitoring are noninvasive, slow-growing lesions. Short-term monitoring of these melanomas, while not optimal, is considered acceptable. b: Malignancy probability predictions of the baseline SL model according to management predictions of the RL model for benign lesions (n = 7320) and melanomas (n = 55). The red dashed horizontal line indicates the median value of the melanoma probability selected by 10 experts as threshold for excision. The black dashed horizontal line indicates the minimum value. Black line = median, boxes = 25th–75th percentiles, whiskers = values within 1.5 times the interquartile range.

Similar articles

Cited by

References

    1. Esteva A, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115–118. doi: 10.1038/nature21056. - DOI - PMC - PubMed
    1. Tschandl P, et al. Human-computer collaboration for skin cancer recognition. Nat. Med. 2020 doi: 10.1038/s41591-020-0942-0. - DOI - PubMed
    1. Tschandl P, et al. Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study. Lancet Oncol. 2019;20:938–947. doi: 10.1016/S1470-2045(19)30333-X. - DOI - PMC - PubMed
    1. Haenssle HA, et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann. Oncol. 2018;29:1836–1842. doi: 10.1093/annonc/mdy166. - DOI - PubMed
    1. McKinney SM, et al. International evaluation of an AI system for breast cancer screening. Nature. 2020;577:89–94. doi: 10.1038/s41586-019-1799-6. - DOI - PubMed

Publication types