Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 7;11(12):e053024.
doi: 10.1136/bmjopen-2021-053024.

Do comprehensive deep learning algorithms suffer from hidden stratification? A retrospective study on pneumothorax detection in chest radiography

Affiliations

Do comprehensive deep learning algorithms suffer from hidden stratification? A retrospective study on pneumothorax detection in chest radiography

Jarrel Seah et al. BMJ Open. .

Abstract

Objectives: To evaluate the ability of a commercially available comprehensive chest radiography deep convolutional neural network (DCNN) to detect simple and tension pneumothorax, as stratified by the following subgroups: the presence of an intercostal drain; rib, clavicular, scapular or humeral fractures or rib resections; subcutaneous emphysema and erect versus non-erect positioning. The hypothesis was that performance would not differ significantly in each of these subgroups when compared with the overall test dataset.

Design: A retrospective case-control study was undertaken.

Setting: Community radiology clinics and hospitals in Australia and the USA.

Participants: A test dataset of 2557 chest radiography studies was ground-truthed by three subspecialty thoracic radiologists for the presence of simple or tension pneumothorax as well as each subgroup other than positioning. Radiograph positioning was derived from radiographer annotations on the images.

Outcome measures: DCNN performance for detecting simple and tension pneumothorax was evaluated over the entire test set, as well as within each subgroup, using the area under the receiver operating characteristic curve (AUC). A difference in AUC of more than 0.05 was considered clinically significant.

Results: When compared with the overall test set, performance of the DCNN for detecting simple and tension pneumothorax was statistically non-inferior in all subgroups. The DCNN had an AUC of 0.981 (0.976-0.986) for detecting simple pneumothorax and 0.997 (0.995-0.999) for detecting tension pneumothorax.

Conclusions: Hidden stratification has significant implications for potential failures of deep learning when applied in clinical practice. This study demonstrated that a comprehensively trained DCNN can be resilient to hidden stratification in several clinically meaningful subgroups in detecting pneumothorax.

Keywords: accident & emergency medicine; chest imaging; health informatics.

PubMed Disclaimer

Conflict of interest statement

Competing interests: All authors have reviewed and approved this manuscript. Authors JS, CT, QDB, MRM, XH, HA, JL, PB and CMJ are employees of, or are seconded to, Annalise.ai. NE and LO-R have no interests to declare.

Figures

Figure 1
Figure 1
Difference in AUC for detecting simple pneumothorax in the test dataset versus each specific subgroup with adjusted 95% CI. AUC, area under the receiver operating characteristic curve.
Figure 2
Figure 2
Difference in AUC for detecting tension pneumothorax in the test dataset versus each specific subgroup with adjusted 95% CI. AUC, area under the receiver operating characteristic curve.

Similar articles

Cited by

References

    1. Khan A, Sohail A, Zahoora U, et al. . A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 2020;53:5455–516. 10.1007/s10462-020-09825-6 - DOI
    1. Rawat W, Wang Z. Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput 2017;29:2352–449. 10.1162/neco_a_00990 - DOI - PubMed
    1. Rajpurkar P, Irvin J, Ball RL, et al. . Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med 2018;15:e1002686. 10.1371/journal.pmed.1002686 - DOI - PMC - PubMed
    1. Sarvamangala DR, Kulkarni RV. Convolutional neural networks in medical image understanding: a survey. Evol Intell 2021;1:3. 10.1007/s12065-020-00540-3 - DOI - PMC - PubMed
    1. Aggarwal R, Sounderajah V, Martin G, et al. . Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. npj Digit Med 2021;4:1–23. 10.1038/s41746-021-00438-z - DOI - PMC - PubMed

Publication types