Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists
- PMID: 30457988
- PMCID: PMC6245676
- DOI: 10.1371/journal.pmed.1002686
Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists
Abstract
Background: Chest radiograph interpretation is critical for the detection of thoracic diseases, including tuberculosis and lung cancer, which affect millions of people worldwide each year. This time-consuming task typically requires expert radiologists to read the images, leading to fatigue-based diagnostic error and lack of diagnostic expertise in areas of the world where radiologists are not available. Recently, deep learning approaches have been able to achieve expert-level performance in medical image interpretation tasks, powered by large network architectures and fueled by the emergence of large labeled datasets. The purpose of this study is to investigate the performance of a deep learning algorithm on the detection of pathologies in chest radiographs compared with practicing radiologists.
Methods and findings: We developed CheXNeXt, a convolutional neural network to concurrently detect the presence of 14 different pathologies, including pneumonia, pleural effusion, pulmonary masses, and nodules in frontal-view chest radiographs. CheXNeXt was trained and internally validated on the ChestX-ray8 dataset, with a held-out validation set consisting of 420 images, sampled to contain at least 50 cases of each of the original pathology labels. On this validation set, the majority vote of a panel of 3 board-certified cardiothoracic specialist radiologists served as reference standard. We compared CheXNeXt's discriminative performance on the validation set to the performance of 9 radiologists using the area under the receiver operating characteristic curve (AUC). The radiologists included 6 board-certified radiologists (average experience 12 years, range 4-28 years) and 3 senior radiology residents, from 3 academic institutions. We found that CheXNeXt achieved radiologist-level performance on 11 pathologies and did not achieve radiologist-level performance on 3 pathologies. The radiologists achieved statistically significantly higher AUC performance on cardiomegaly, emphysema, and hiatal hernia, with AUCs of 0.888 (95% confidence interval [CI] 0.863-0.910), 0.911 (95% CI 0.866-0.947), and 0.985 (95% CI 0.974-0.991), respectively, whereas CheXNeXt's AUCs were 0.831 (95% CI 0.790-0.870), 0.704 (95% CI 0.567-0.833), and 0.851 (95% CI 0.785-0.909), respectively. CheXNeXt performed better than radiologists in detecting atelectasis, with an AUC of 0.862 (95% CI 0.825-0.895), statistically significantly higher than radiologists' AUC of 0.808 (95% CI 0.777-0.838); there were no statistically significant differences in AUCs for the other 10 pathologies. The average time to interpret the 420 images in the validation set was substantially longer for the radiologists (240 minutes) than for CheXNeXt (1.5 minutes). The main limitations of our study are that neither CheXNeXt nor the radiologists were permitted to use patient history or review prior examinations and that evaluation was limited to a dataset from a single institution.
Conclusions: In this study, we developed and validated a deep learning algorithm that classified clinically important abnormalities in chest radiographs at a performance level comparable to practicing radiologists. Once tested prospectively in clinical settings, the algorithm could have the potential to expand patient access to chest radiograph diagnostics.
Conflict of interest statement
I have read the journal's policy and the authors of this manuscript have the following competing interests: CPL holds shares in whiterabbit.ai and Nines.ai, is on the Advisory Board of Nuance Communications and on the Board of Directors for the Radiological Society of North America, and has other research support from Philips, GE Healthcare, and Philips Healthcare. MPL holds shares in and serves on the Advisory Board for Nines.ai. None of these organizations have a financial interest in the results of this study.
Figures



Similar articles
-
Automated detection of moderate and large pneumothorax on frontal chest X-rays using deep convolutional neural networks: A retrospective study.PLoS Med. 2018 Nov 20;15(11):e1002697. doi: 10.1371/journal.pmed.1002697. eCollection 2018 Nov. PLoS Med. 2018. PMID: 30457991 Free PMC article.
-
Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation.Radiology. 2020 Feb;294(2):421-431. doi: 10.1148/radiol.2019191293. Epub 2019 Dec 3. Radiology. 2020. PMID: 31793848
-
Artificial Intelligence Algorithm Detecting Lung Infection in Supine Chest Radiographs of Critically Ill Patients With a Diagnostic Accuracy Similar to Board-Certified Radiologists.Crit Care Med. 2020 Jul;48(7):e574-e583. doi: 10.1097/CCM.0000000000004397. Crit Care Med. 2020. PMID: 32433121
-
Deep learning for report generation on chest X-ray images.Comput Med Imaging Graph. 2024 Jan;111:102320. doi: 10.1016/j.compmedimag.2023.102320. Epub 2023 Dec 14. Comput Med Imaging Graph. 2024. PMID: 38134726 Review.
-
Deep Learning for Pneumothorax Detection on Chest Radiograph: A Diagnostic Test Accuracy Systematic Review and Meta Analysis.Can Assoc Radiol J. 2024 Aug;75(3):525-533. doi: 10.1177/08465371231220885. Epub 2024 Jan 8. Can Assoc Radiol J. 2024. PMID: 38189265
Cited by
-
Automatic volumetric diagnosis of hepatocellular carcinoma based on four-phase CT scans with minimum extra information.Front Oncol. 2022 Oct 13;12:960178. doi: 10.3389/fonc.2022.960178. eCollection 2022. Front Oncol. 2022. PMID: 36313647 Free PMC article.
-
Visual interpretability of image-based classification models by generative latent space disentanglement applied to in vitro fertilization.Nat Commun. 2024 Aug 27;15(1):7390. doi: 10.1038/s41467-024-51136-9. Nat Commun. 2024. PMID: 39191720 Free PMC article.
-
Predicting bone metastasis-free survival in non-small cell lung cancer from preoperative CT via deep learning.NPJ Precis Oncol. 2024 Jul 28;8(1):161. doi: 10.1038/s41698-024-00649-z. NPJ Precis Oncol. 2024. PMID: 39068240 Free PMC article.
-
Precision Psychiatry: The Future Is Now.Can J Psychiatry. 2022 Jan;67(1):21-25. doi: 10.1177/0706743721998044. Epub 2021 Mar 24. Can J Psychiatry. 2022. PMID: 33757313 Free PMC article. No abstract available.
-
Will Artificial Intelligence Replace Radiologists?Radiol Artif Intell. 2019 May 15;1(3):e190058. doi: 10.1148/ryai.2019190058. eCollection 2019 May. Radiol Artif Intell. 2019. PMID: 33937794 Free PMC article. No abstract available.
References
-
- Raoof S, Feigin D, Sung A, Raoof S, Irugulpati L, Rosenow EC. Interpretation of plain chest roentgenogram. Chest. 2012. February;141(2):545–58. 10.1378/chest.10-1302 - DOI - PubMed
-
- Mathers CD, Loncar D. Projections of Global Mortality and Burden of Disease from 2002 to 2030. PLOS Med. 2006. November 28;3(11):e442 10.1371/journal.pmed.0030442 - DOI - PMC - PubMed
-
- Gulshan V, Peng L, Coram M, C. Stumpe M, Wu D, Narayanaswamy A, et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA. 2016. November 29;316. - PubMed
-
- Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017. February;542(7639):115–8. 10.1038/nature21056 - DOI - PMC - PubMed
-
- Ehteshami Bejnordi B, Veta M, Johannes van Diest P, van Ginneken B, Karssemeijer N, Litjens G, et al. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer. JAMA. 2017. 12;318(22):2199–210. 10.1001/jama.2017.14585 - DOI - PMC - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical