External Validation of Deep Learning Algorithms for Radiologic Diagnosis: A Systematic Review
- PMID: 35652114
- PMCID: PMC9152694
- DOI: 10.1148/ryai.210064
External Validation of Deep Learning Algorithms for Radiologic Diagnosis: A Systematic Review
Abstract
Purpose: To assess generalizability of published deep learning (DL) algorithms for radiologic diagnosis.
Materials and methods: In this systematic review, the PubMed database was searched for peer-reviewed studies of DL algorithms for image-based radiologic diagnosis that included external validation, published from January 1, 2015, through April 1, 2021. Studies using nonimaging features or incorporating non-DL methods for feature extraction or classification were excluded. Two reviewers independently evaluated studies for inclusion, and any discrepancies were resolved by consensus. Internal and external performance measures and pertinent study characteristics were extracted, and relationships among these data were examined using nonparametric statistics.
Results: Eighty-three studies reporting 86 algorithms were included. The vast majority (70 of 86, 81%) reported at least some decrease in external performance compared with internal performance, with nearly half (42 of 86, 49%) reporting at least a modest decrease (≥0.05 on the unit scale) and nearly a quarter (21 of 86, 24%) reporting a substantial decrease (≥0.10 on the unit scale). No study characteristics were found to be associated with the difference between internal and external performance.
Conclusion: Among published external validation studies of DL algorithms for image-based radiologic diagnosis, the vast majority demonstrated diminished algorithm performance on the external dataset, with some reporting a substantial performance decrease.Keywords: Meta-Analysis, Computer Applications-Detection/Diagnosis, Neural Networks, Computer Applications-General (Informatics), Epidemiology, Technology Assessment, Diagnosis, Informatics Supplemental material is available for this article. © RSNA, 2022.
Keywords: Computer Applications–Detection/Diagnosis; Computer Applications–General (Informatics); Diagnosis; Epidemiology; Informatics; Meta-Analysis; Neural Networks; Technology Assessment.
© 2022 by the Radiological Society of North America, Inc.
Conflict of interest statement
Disclosures of conflicts of interest: A.C.Y. No relevant relationships. B.M. No relevant relationships. J.E. No relevant relationships.
Figures


Similar articles
-
Deep Learning in Neuroradiology: A Systematic Review of Current Algorithms and Approaches for the New Wave of Imaging Technology.Radiol Artif Intell. 2020 Mar 4;2(2):e190026. doi: 10.1148/ryai.2020190026. eCollection 2020 Mar. Radiol Artif Intell. 2020. PMID: 33937816 Free PMC article.
-
Design Characteristics of Studies Reporting the Performance of Artificial Intelligence Algorithms for Diagnostic Analysis of Medical Images: Results from Recently Published Papers.Korean J Radiol. 2019 Mar;20(3):405-410. doi: 10.3348/kjr.2019.0025. Korean J Radiol. 2019. PMID: 30799571 Free PMC article.
-
Deep Learning Model for Automated Detection and Classification of Central Canal, Lateral Recess, and Neural Foraminal Stenosis at Lumbar Spine MRI.Radiology. 2021 Jul;300(1):130-138. doi: 10.1148/radiol.2021204289. Epub 2021 May 11. Radiology. 2021. PMID: 33973835
-
Performance and Limitation of Machine Learning Algorithms for Diabetic Retinopathy Screening: Meta-analysis.J Med Internet Res. 2021 Jul 3;23(7):e23863. doi: 10.2196/23863. J Med Internet Res. 2021. PMID: 34407500 Free PMC article. Review.
-
Development and Validation of Deep Learning-based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs.Radiology. 2019 Jan;290(1):218-228. doi: 10.1148/radiol.2018180237. Epub 2018 Sep 25. Radiology. 2019. PMID: 30251934
Cited by
-
Estimation of Left and Right Ventricular Ejection Fractions from cine-MRI Using 3D-CNN.Sensors (Basel). 2023 Jul 21;23(14):6580. doi: 10.3390/s23146580. Sensors (Basel). 2023. PMID: 37514888 Free PMC article.
-
Shortcut learning in medical AI hinders generalization: method for estimating AI model generalization without external data.NPJ Digit Med. 2024 May 14;7(1):124. doi: 10.1038/s41746-024-01118-4. NPJ Digit Med. 2024. PMID: 38744921 Free PMC article.
-
Improving the Generalizability and Performance of an Ultrasound Deep Learning Model Using Limited Multicenter Data for Lung Sliding Artifact Identification.Diagnostics (Basel). 2024 May 22;14(11):1081. doi: 10.3390/diagnostics14111081. Diagnostics (Basel). 2024. PMID: 38893608 Free PMC article.
-
Critical Appraisal of Artificial Intelligence-Enabled Imaging Tools Using the Levels of Evidence System.AJNR Am J Neuroradiol. 2023 May;44(5):E21-E28. doi: 10.3174/ajnr.A7850. Epub 2023 Apr 20. AJNR Am J Neuroradiol. 2023. PMID: 37080722 Free PMC article.
-
Strategies for Implementing Machine Learning Algorithms in the Clinical Practice of Radiology.Radiology. 2024 Jan;310(1):e223170. doi: 10.1148/radiol.223170. Radiology. 2024. PMID: 38259208 Free PMC article. Review.
References
-
- Chilamkurthy S, Ghosh R, Tanamala S, et al. . Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet 2018;392(10162):2388–2396. - PubMed
-
- Gulshan V, Peng L, Coram M, et al. . Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 2016;316(22):2402–2410. - PubMed
-
- Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM 2017;60(6):84–90.
-
- Maruyama T, Hayashi N, Sato Y, et al. . Comparison of medical image classification accuracy among three machine learning methods. J XRay Sci Technol 2018;26(6):885–893. - PubMed
LinkOut - more resources
Full Text Sources
Miscellaneous