Review

. 2024 May 14;7(1):125.

doi: 10.1038/s41746-024-01103-x.

A systematic review and meta-analysis of artificial intelligence versus clinicians for skin cancer diagnosis

Affiliations

¹ Department of Dermatology, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile.
² Universidad Catolica-Evidence Center, Cochrane Chile Associated Center, Pontificia Universidad Católica de Chile, Santiago, Chile.
³ Melanoma and Skin Cancer Unit, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile.
⁴ Dermatology Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
⁵ Department of Oncology, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile.
⁶ Department of Computer Science, Pontificia Universidad Católica de Chile, Santiago, Chile.
⁷ Department of Dermatology, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile. ctnavarr@gmail.com.
⁸ Melanoma and Skin Cancer Unit, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile. ctnavarr@gmail.com.

^# Contributed equally.

PMID: 38744955
PMCID: PMC11094047
DOI: 10.1038/s41746-024-01103-x

Review

A systematic review and meta-analysis of artificial intelligence versus clinicians for skin cancer diagnosis

Maria Paz Salinas et al. NPJ Digit Med. 2024.

. 2024 May 14;7(1):125.

doi: 10.1038/s41746-024-01103-x.

Authors

Affiliations

¹ Department of Dermatology, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile.
² Universidad Catolica-Evidence Center, Cochrane Chile Associated Center, Pontificia Universidad Católica de Chile, Santiago, Chile.
³ Melanoma and Skin Cancer Unit, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile.
⁴ Dermatology Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
⁵ Department of Oncology, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile.
⁶ Department of Computer Science, Pontificia Universidad Católica de Chile, Santiago, Chile.
⁷ Department of Dermatology, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile. ctnavarr@gmail.com.
⁸ Melanoma and Skin Cancer Unit, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile. ctnavarr@gmail.com.

^# Contributed equally.

PMID: 38744955
PMCID: PMC11094047
DOI: 10.1038/s41746-024-01103-x

Erratum in

Author Correction: A systematic review and meta-analysis of artificial intelligence versus clinicians for skin cancer diagnosis.
Salinas MP, Sepúlveda J, Hidalgo L, Peirano D, Morel M, Uribe P, Rotemberg V, Briones J, Mery D, Navarrete-Dechent C. Salinas MP, et al. NPJ Digit Med. 2024 May 24;7(1):141. doi: 10.1038/s41746-024-01138-0. NPJ Digit Med. 2024. PMID: 38789723 Free PMC article. No abstract available.

Abstract

Scientific research of artificial intelligence (AI) in dermatology has increased exponentially. The objective of this study was to perform a systematic review and meta-analysis to evaluate the performance of AI algorithms for skin cancer classification in comparison to clinicians with different levels of expertise. Based on PRISMA guidelines, 3 electronic databases (PubMed, Embase, and Cochrane Library) were screened for relevant articles up to August 2022. The quality of the studies was assessed using QUADAS-2. A meta-analysis of sensitivity and specificity was performed for the accuracy of AI and clinicians. Fifty-three studies were included in the systematic review, and 19 met the inclusion criteria for the meta-analysis. Considering all studies and all subgroups of clinicians, we found a sensitivity (Sn) and specificity (Sp) of 87.0% and 77.1% for AI algorithms, respectively, and a Sn of 79.78% and Sp of 73.6% for all clinicians (overall); differences were statistically significant for both Sn and Sp. The difference between AI performance (Sn 92.5%, Sp 66.5%) vs. generalists (Sn 64.6%, Sp 72.8%), was greater, when compared with expert clinicians. Performance between AI algorithms (Sn 86.3%, Sp 78.4%) vs expert dermatologists (Sn 84.2%, Sp 74.4%) was clinically comparable. Limitations of AI algorithms in clinical practice should be considered, and future studies should focus on real-world settings, and towards AI-assistance.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
PRISMA flow diagram of included studies.

**Fig. 2. QUADAS-2 results of the assessment of risk of bias in the included studies.**
QUADAS-2 tool was used to assess the risk of bias in the included studies in terms of 4 domains (participants, index test, reference standard, and analysis). Low risk (cyan) refers to the number of studies that have a low risk of bias in the respective domain. Unclear (gray) refers to the number of studies that have an unclear risk of bias in the respective domain due to lack of information reported by the study. High risk (purple) refers to the number of studies that have a high risk of bias in the respective domain. a. Risk of Bias Assessment b. Applicability Concerns.

Fig. 3. Forest plot detailing the sensitivity and specificity for all groups of clinicians (‘overall’) and artificial intelligence algorithms from each study included in the meta-analysis according to type of test set (external vs internal).
a Sensitivity for artificial intelligence (left) and all clinicians (‘overall’) (right). b Specificity for artificial intelligence (left) and all clinicians (‘overall’) (right).

**Fig. 4. Hierarchical ROC curves of studies for comparing performance between artificial intelligence algorithms (left) and all group of clinicians (right).**
ROC receiver operating characteristic. Each circle size represents the individual study sample size (circle size is inversely related to study variance).

**Fig. 5. Hierarchical ROC curves of studies for comparing performance between artificial intelligence algorithms (left) and generalists (right).**
ROC receiver operating characteristic. Each circle size represents the individual study sample size (circle size is inversely related to study variance).

**Fig. 6. Forest plots of studies showing artificial intelligence vs generalists sensitivity and specificity.**
a Sensitivity for artificial intelligence (left) and for generalists (right). b Specificity for artificial intelligence (left) and for generalists (right).

**Fig. 7. Hierarchical ROC curves of studies for comparing performance between artificial intelligence algorithms (left) and non-expert dermatologists (right).**
ROC receiver operating characteristic. Each circle size represents the individual study sample size (circle size is inversely related to study variance).

**Fig. 8. Forest plots of studies showing artificial intelligence vs non-expert dermatologists sensitivity and specificity according to type of test set (external vs internal).**
a Sensitivity for artificial intelligence (left) and for non-expert dermatologists (right). b Specificity for artificial intelligence (left) and for non-expert dermatologists (right).

**Fig. 9. Hierarchical ROC curves of studies for comparing performance between artificial intelligence algorithms (left) and expert dermatologists (right).**
ROC receiver operating characteristic. Each circle size represents the individual study sample size (circle size is inversely related to study variance).

**Fig. 10. Forest plots of studies showing artificial intelligence vs expert dermatologists sensitivity and specificity according to type of test set (external vs internal).**
a Sensitivity for artificial intelligence (left) and expert dermatologists (right). b Sensitivity for artificial intelligence (left) and for expert dermatologists (right).

See this image and copyright information in PMC

References

1. Lakhani NA, et al. Total body skin examination for skin cancer screening among U.S. adults from 2000 to 2010. Prev. Med. 2014;61:75–80. doi: 10.1016/j.ypmed.2014.01.003. - DOI - PMC - PubMed
1. Wu Y, et al. Skin cancer classification with deep learning: A systematic review. Front Oncol. 2022;12:893972. doi: 10.3389/fonc.2022.893972. - DOI - PMC - PubMed
1. Jones OT, et al. Artificial intelligence and machine learning algorithms for early detection of skin cancer in community and primary care settings: a systematic review. Lancet Digit Health. 2022;4:e466–e476. doi: 10.1016/S2589-7500(22)00023-1. - DOI - PubMed
1. Sangers, T. E. et al. Position statement of the EADV Artificial Intelligence (AI) Task Force on AI-assisted smartphone apps and web-based services for skin disease. J. Eur. Acad. Dermatol Venereol10.1111/jdv.19521 (2023). - PubMed
1. Whiting PF. QUADAS-2: A revised tool for the quality assessment of diagnostic accuracy studies. Ann. Intern Med. 2011;155:529. doi: 10.7326/0003-4819-155-8-201110180-00009. - DOI - PubMed

Publication types

Actions

Grants and funding

P30 CA008748/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A systematic review and meta-analysis of artificial intelligence versus clinicians for skin cancer diagnosis

Affiliations

A systematic review and meta-analysis of artificial intelligence versus clinicians for skin cancer diagnosis

Authors

Affiliations

Erratum in

Abstract

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous