External Validation of Deep Learning Algorithms for Radiologic Diagnosis: A Systematic Review

Alice C Yu¹, Bahram Mohajer¹, John Eng¹

Affiliations

PMID: 35652114
PMCID: PMC9152694
DOI: 10.1148/ryai.210064

External Validation of Deep Learning Algorithms for Radiologic Diagnosis: A Systematic Review

Alice C Yu et al. Radiol Artif Intell. 2022.

. 2022 May 4;4(3):e210064.

doi: 10.1148/ryai.210064. eCollection 2022 May.

Authors

Alice C Yu¹, Bahram Mohajer¹, John Eng¹

Affiliation

¹ Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine, 1800 Orleans St, Baltimore, MD 21287.

PMID: 35652114
PMCID: PMC9152694
DOI: 10.1148/ryai.210064

Abstract

Purpose: To assess generalizability of published deep learning (DL) algorithms for radiologic diagnosis.

Materials and methods: In this systematic review, the PubMed database was searched for peer-reviewed studies of DL algorithms for image-based radiologic diagnosis that included external validation, published from January 1, 2015, through April 1, 2021. Studies using nonimaging features or incorporating non-DL methods for feature extraction or classification were excluded. Two reviewers independently evaluated studies for inclusion, and any discrepancies were resolved by consensus. Internal and external performance measures and pertinent study characteristics were extracted, and relationships among these data were examined using nonparametric statistics.

Results: Eighty-three studies reporting 86 algorithms were included. The vast majority (70 of 86, 81%) reported at least some decrease in external performance compared with internal performance, with nearly half (42 of 86, 49%) reporting at least a modest decrease (≥0.05 on the unit scale) and nearly a quarter (21 of 86, 24%) reporting a substantial decrease (≥0.10 on the unit scale). No study characteristics were found to be associated with the difference between internal and external performance.

Conclusion: Among published external validation studies of DL algorithms for image-based radiologic diagnosis, the vast majority demonstrated diminished algorithm performance on the external dataset, with some reporting a substantial performance decrease.Keywords: Meta-Analysis, Computer Applications-Detection/Diagnosis, Neural Networks, Computer Applications-General (Informatics), Epidemiology, Technology Assessment, Diagnosis, Informatics Supplemental material is available for this article. © RSNA, 2022.

Keywords: Computer Applications–Detection/Diagnosis; Computer Applications–General (Informatics); Diagnosis; Epidemiology; Informatics; Meta-Analysis; Neural Networks; Technology Assessment.

PubMed Disclaimer

Conflict of interest statement

Disclosures of conflicts of interest: A.C.Y. No relevant relationships. B.M. No relevant relationships. J.E. No relevant relationships.

Figures

**Figure 1:**
Diagram summarizing literature search and article selection.

**Figure 2:**
Plot of representative diagnostic performance difference between external and development datasets. The three most common imaging modalities and body parts are indicated. AUC = area under the receiver operating characteristic curve, BO = bone, BR = brain, CH = chest, XR = radiography.

See this image and copyright information in PMC

References

1. Rajpurkar P, Irvin J, Ball RL, et al. . Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med 2018;15(11):e1002686. - PMC - PubMed
1. Chilamkurthy S, Ghosh R, Tanamala S, et al. . Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet 2018;392(10162):2388–2396. - PubMed
1. Gulshan V, Peng L, Coram M, et al. . Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 2016;316(22):2402–2410. - PubMed
1. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM 2017;60(6):84–90.
1. Maruyama T, Hayashi N, Sato Y, et al. . Comparison of medical image classification accuracy among three machine learning methods. J XRay Sci Technol 2018;26(6):885–893. - PubMed

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

External Validation of Deep Learning Algorithms for Radiologic Diagnosis: A Systematic Review

Affiliation

External Validation of Deep Learning Algorithms for Radiologic Diagnosis: A Systematic Review

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Miscellaneous