Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 May 14;7(1):125.
doi: 10.1038/s41746-024-01103-x.

A systematic review and meta-analysis of artificial intelligence versus clinicians for skin cancer diagnosis

Affiliations
Review

A systematic review and meta-analysis of artificial intelligence versus clinicians for skin cancer diagnosis

Maria Paz Salinas et al. NPJ Digit Med. .

Erratum in

Abstract

Scientific research of artificial intelligence (AI) in dermatology has increased exponentially. The objective of this study was to perform a systematic review and meta-analysis to evaluate the performance of AI algorithms for skin cancer classification in comparison to clinicians with different levels of expertise. Based on PRISMA guidelines, 3 electronic databases (PubMed, Embase, and Cochrane Library) were screened for relevant articles up to August 2022. The quality of the studies was assessed using QUADAS-2. A meta-analysis of sensitivity and specificity was performed for the accuracy of AI and clinicians. Fifty-three studies were included in the systematic review, and 19 met the inclusion criteria for the meta-analysis. Considering all studies and all subgroups of clinicians, we found a sensitivity (Sn) and specificity (Sp) of 87.0% and 77.1% for AI algorithms, respectively, and a Sn of 79.78% and Sp of 73.6% for all clinicians (overall); differences were statistically significant for both Sn and Sp. The difference between AI performance (Sn 92.5%, Sp 66.5%) vs. generalists (Sn 64.6%, Sp 72.8%), was greater, when compared with expert clinicians. Performance between AI algorithms (Sn 86.3%, Sp 78.4%) vs expert dermatologists (Sn 84.2%, Sp 74.4%) was clinically comparable. Limitations of AI algorithms in clinical practice should be considered, and future studies should focus on real-world settings, and towards AI-assistance.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
PRISMA flow diagram of included studies.
Fig. 2
Fig. 2. QUADAS-2 results of the assessment of risk of bias in the included studies.
QUADAS-2 tool was used to assess the risk of bias in the included studies in terms of 4 domains (participants, index test, reference standard, and analysis). Low risk (cyan) refers to the number of studies that have a low risk of bias in the respective domain. Unclear (gray) refers to the number of studies that have an unclear risk of bias in the respective domain due to lack of information reported by the study. High risk (purple) refers to the number of studies that have a high risk of bias in the respective domain. a. Risk of Bias Assessment b. Applicability Concerns.
Fig. 3
Fig. 3. Forest plot detailing the sensitivity and specificity for all groups of clinicians (‘overall’) and artificial intelligence algorithms from each study included in the meta-analysis according to type of test set (external vs internal).
a Sensitivity for artificial intelligence (left) and all clinicians (‘overall’) (right). b Specificity for artificial intelligence (left) and all clinicians (‘overall’) (right).
Fig. 4
Fig. 4. Hierarchical ROC curves of studies for comparing performance between artificial intelligence algorithms (left) and all group of clinicians (right).
ROC receiver operating characteristic. Each circle size represents the individual study sample size (circle size is inversely related to study variance).
Fig. 5
Fig. 5. Hierarchical ROC curves of studies for comparing performance between artificial intelligence algorithms (left) and generalists (right).
ROC receiver operating characteristic. Each circle size represents the individual study sample size (circle size is inversely related to study variance).
Fig. 6
Fig. 6. Forest plots of studies showing artificial intelligence vs generalists sensitivity and specificity.
a Sensitivity for artificial intelligence (left) and for generalists (right). b Specificity for artificial intelligence (left) and for generalists (right).
Fig. 7
Fig. 7. Hierarchical ROC curves of studies for comparing performance between artificial intelligence algorithms (left) and non-expert dermatologists (right).
ROC receiver operating characteristic. Each circle size represents the individual study sample size (circle size is inversely related to study variance).
Fig. 8
Fig. 8. Forest plots of studies showing artificial intelligence vs non-expert dermatologists sensitivity and specificity according to type of test set (external vs internal).
a Sensitivity for artificial intelligence (left) and for non-expert dermatologists (right). b Specificity for artificial intelligence (left) and for non-expert dermatologists (right).
Fig. 9
Fig. 9. Hierarchical ROC curves of studies for comparing performance between artificial intelligence algorithms (left) and expert dermatologists (right).
ROC receiver operating characteristic. Each circle size represents the individual study sample size (circle size is inversely related to study variance).
Fig. 10
Fig. 10. Forest plots of studies showing artificial intelligence vs expert dermatologists sensitivity and specificity according to type of test set (external vs internal).
a Sensitivity for artificial intelligence (left) and expert dermatologists (right). b Sensitivity for artificial intelligence (left) and for expert dermatologists (right).

References

    1. Lakhani NA, et al. Total body skin examination for skin cancer screening among U.S. adults from 2000 to 2010. Prev. Med. 2014;61:75–80. doi: 10.1016/j.ypmed.2014.01.003. - DOI - PMC - PubMed
    1. Wu Y, et al. Skin cancer classification with deep learning: A systematic review. Front Oncol. 2022;12:893972. doi: 10.3389/fonc.2022.893972. - DOI - PMC - PubMed
    1. Jones OT, et al. Artificial intelligence and machine learning algorithms for early detection of skin cancer in community and primary care settings: a systematic review. Lancet Digit Health. 2022;4:e466–e476. doi: 10.1016/S2589-7500(22)00023-1. - DOI - PubMed
    1. Sangers, T. E. et al. Position statement of the EADV Artificial Intelligence (AI) Task Force on AI-assisted smartphone apps and web-based services for skin disease. J. Eur. Acad. Dermatol Venereol10.1111/jdv.19521 (2023). - PubMed
    1. Whiting PF. QUADAS-2: A revised tool for the quality assessment of diagnostic accuracy studies. Ann. Intern Med. 2011;155:529. doi: 10.7326/0003-4819-155-8-201110180-00009. - DOI - PubMed