Machine-learning-based multiple abnormality prediction with large-scale chest computed tomography volumes

Rachel Lea Draelos¹, David Dov², Maciej A Mazurowski³, Joseph Y Lo⁴, Ricardo Henao⁵, Geoffrey D Rubin⁶, Lawrence Carin⁷

Affiliations

¹ Computer Science Department, Duke University, LSRC Building D101, 308 Research Drive, Duke Box 90129, Durham, North Carolina 27708-0129, United States of America; School of Medicine, Duke University, DUMC 3710, Durham, North Carolina 27710, United States of America. Electronic address: rlb61@duke.edu.
² Electrical and Computer Engineering Department, Edmund T. Pratt Jr. School of Engineering, Duke University, Box 90291, Durham, North Carolina 27708, United States of America.
³ Electrical and Computer Engineering Department, Edmund T. Pratt Jr. School of Engineering, Duke University, Box 90291, Durham, North Carolina 27708, United States of America; Radiology Department, Duke University, Box 3808 DUMC, Durham, North Carolina 27710, United States of America; Biostatistics and Bioinformatics Department, Duke University, DUMC 2424 Erwin Road, Suite 1102 Hock Plaza, Box 2721 Durham, North Carolina 27710, United States of America.
⁴ Electrical and Computer Engineering Department, Edmund T. Pratt Jr. School of Engineering, Duke University, Box 90291, Durham, North Carolina 27708, United States of America; Radiology Department, Duke University, Box 3808 DUMC, Durham, North Carolina 27710, United States of America; Biomedical Engineering Department, Edmund T. Pratt Jr. School of Engineering, Duke University, Room 1427, Fitzpatrick Center (FCIEMAS), 101 Science Drive, Campus Box 90281, Durham, North Carolina 27708-0281, United States of America.
⁵ Electrical and Computer Engineering Department, Edmund T. Pratt Jr. School of Engineering, Duke University, Box 90291, Durham, North Carolina 27708, United States of America; Biostatistics and Bioinformatics Department, Duke University, DUMC 2424 Erwin Road, Suite 1102 Hock Plaza, Box 2721 Durham, North Carolina 27710, United States of America.
⁶ Radiology Department, Duke University, Box 3808 DUMC, Durham, North Carolina 27710, United States of America.
⁷ Computer Science Department, Duke University, LSRC Building D101, 308 Research Drive, Duke Box 90129, Durham, North Carolina 27708-0129, United States of America; Electrical and Computer Engineering Department, Edmund T. Pratt Jr. School of Engineering, Duke University, Box 90291, Durham, North Carolina 27708, United States of America; Statistical Science Department, Duke University, Box 90251, Durham, North Carolina 27708-0251, United States of America.

PMID: 33129142
PMCID: PMC7726032
DOI: 10.1016/j.media.2020.101857

Machine-learning-based multiple abnormality prediction with large-scale chest computed tomography volumes

Rachel Lea Draelos et al. Med Image Anal. 2021 Jan.

. 2021 Jan:67:101857.

doi: 10.1016/j.media.2020.101857. Epub 2020 Oct 9.

Authors

Rachel Lea Draelos¹, David Dov², Maciej A Mazurowski³, Joseph Y Lo⁴, Ricardo Henao⁵, Geoffrey D Rubin⁶, Lawrence Carin⁷

Affiliations

¹ Computer Science Department, Duke University, LSRC Building D101, 308 Research Drive, Duke Box 90129, Durham, North Carolina 27708-0129, United States of America; School of Medicine, Duke University, DUMC 3710, Durham, North Carolina 27710, United States of America. Electronic address: rlb61@duke.edu.
² Electrical and Computer Engineering Department, Edmund T. Pratt Jr. School of Engineering, Duke University, Box 90291, Durham, North Carolina 27708, United States of America.
³ Electrical and Computer Engineering Department, Edmund T. Pratt Jr. School of Engineering, Duke University, Box 90291, Durham, North Carolina 27708, United States of America; Radiology Department, Duke University, Box 3808 DUMC, Durham, North Carolina 27710, United States of America; Biostatistics and Bioinformatics Department, Duke University, DUMC 2424 Erwin Road, Suite 1102 Hock Plaza, Box 2721 Durham, North Carolina 27710, United States of America.
⁴ Electrical and Computer Engineering Department, Edmund T. Pratt Jr. School of Engineering, Duke University, Box 90291, Durham, North Carolina 27708, United States of America; Radiology Department, Duke University, Box 3808 DUMC, Durham, North Carolina 27710, United States of America; Biomedical Engineering Department, Edmund T. Pratt Jr. School of Engineering, Duke University, Room 1427, Fitzpatrick Center (FCIEMAS), 101 Science Drive, Campus Box 90281, Durham, North Carolina 27708-0281, United States of America.
⁵ Electrical and Computer Engineering Department, Edmund T. Pratt Jr. School of Engineering, Duke University, Box 90291, Durham, North Carolina 27708, United States of America; Biostatistics and Bioinformatics Department, Duke University, DUMC 2424 Erwin Road, Suite 1102 Hock Plaza, Box 2721 Durham, North Carolina 27710, United States of America.
⁶ Radiology Department, Duke University, Box 3808 DUMC, Durham, North Carolina 27710, United States of America.
⁷ Computer Science Department, Duke University, LSRC Building D101, 308 Research Drive, Duke Box 90129, Durham, North Carolina 27708-0129, United States of America; Electrical and Computer Engineering Department, Edmund T. Pratt Jr. School of Engineering, Duke University, Box 90291, Durham, North Carolina 27708, United States of America; Statistical Science Department, Duke University, Box 90251, Durham, North Carolina 27708-0251, United States of America.

PMID: 33129142
PMCID: PMC7726032
DOI: 10.1016/j.media.2020.101857

Abstract

Machine learning models for radiology benefit from large-scale data sets with high quality labels for abnormalities. We curated and analyzed a chest computed tomography (CT) data set of 36,316 volumes from 19,993 unique patients. This is the largest multiply-annotated volumetric medical imaging data set reported. To annotate this data set, we developed a rule-based method for automatically extracting abnormality labels from free-text radiology reports with an average F-score of 0.976 (min 0.941, max 1.0). We also developed a model for multi-organ, multi-disease classification of chest CT volumes that uses a deep convolutional neural network (CNN). This model reached a classification performance of AUROC >0.90 for 18 abnormalities, with an average AUROC of 0.773 for all 83 abnormalities, demonstrating the feasibility of learning from unfiltered whole volume CT data. We show that training on more labels improves performance significantly: for a subset of 9 labels - nodule, opacity, atelectasis, pleural effusion, consolidation, mass, pericardial effusion, cardiomegaly, and pneumothorax - the model's average AUROC increased by 10% when the number of training labels was increased from 9 to all 83. All code for volume preprocessing, automated label extraction, and the volume abnormality prediction model is publicly available. The 36,316 CT volumes and labels will also be made publicly available pending institutional approval.

Keywords: chest computed tomography; convolutional neural network; deep learning; machine learning; multilabel classification.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

**Figure 1.**
Study Overview. (a) Reports from chest CT scans performed without intravenous contrast material were acquired from the Duke Enterprise Data Unified Content Explorer (DEDUCE) search tool as well as the Epic electronic health record (EHR). Report accession numbers were used to download CT slices as DICOMs from the Duke Image Archive (DIA), which were processed into a final data set of 36,316 CT volumes. (b) We develop an approach for extracting binary labels for 83 different abnormalities from the free-text chest CT reports. (c) We train and evaluate a deep convolutional neural network model (shown here and detailed further in Figure 2) that takes as input a whole CT volume and predicts all 83 abnormality labels simultaneously.

**Figure 2. CT-Net volume classification architecture.**
The CT volume is treated as a stack of three-channel images to enable use of a ResNet-18 (He et al., 2015) feature extractor pretrained on ImageNet (Deng et al., 2009). The ResNet-18 features for the stack of 134 three-channel images are concatenated and processed with several 3D convolutional layers to aggregate features across the craniocaudal extent of the scan and reduce the size of the representation. Then the representation is flattened and passed through three fully connected layers to produce predicted probabilities for the 83 abnormalities of interest.

**Figure 3.. Architecture Comparison and Ablation Study on Training/Validation Data Subset.**
The AUROCs for each abnormality in this experiment were calculated on a random sample of 1,000 validation set scans, for models trained on a random subset of 2,000 training scans. CT-Net-83 is the proposed model. BodyConv and 3DConv are alternative architectures. CT-Net-83 (Pool) and CT-Net-83 (Rand) are ablated version of the CT-Net-83 model.

See this image and copyright information in PMC

References

1. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X, 2016. TensorFlow: A system for large-scale machine learning, in: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) pp. 265–283.
1. Annarumma M, Withey SJ, Bakewell RJ, Pesce E, Goh V, Montana G, 2019. Automated Triaging of Adult Chest Radiographs with Deep Artificial Neural Networks. Radiology 291, 196–202. 10.1148/radiol.2018180921 - DOI - PMC - PubMed
1. Anthimopoulos M, Christodoulidis S, Ebner L, Christe A, Mougiakakou S, 2016. Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network. IEEE Trans. Med. Imaging 35, 1207–1216. 10.1109/TMI.2016.2535865 - DOI - PubMed
1. Ardila D, Kiraly AP, Bharadwaj S, Choi B, Reicher JJ, Peng L, Tse D, Etemadi M, Ye W, Corrado G, Naidich DP, Shetty S, 2019. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat. Med 10.1038/s41591-019-0447-x - DOI - PubMed
1. Armato SG, McLennan G, Bidaut L, McNitt-Gray MF, Meyer CR, Reeves AP, Zhao B, Aberle DR, Henschke CI, Hoffman EA, Kazerooni EA, MacMahon H, Van Beek EJR, Yankelevitz D, Biancardi AM, Bland PH, Brown MS, Engelmann RM, Laderach GE, Max D, Pais RC, Qing DPY, Roberts RY, Smith AR, Starkey A, Batra P, Caligiuri P, Farooqi A, Gladish GW, Jude CM, Munden RF, Petkovska I, Quint LE, Schwartz LH, Sundaram B, Dodd LE, Fenimore C, Gur D, Petrick N, Freymann J, Kirby J, Hughes B, Vande Casteele A, Gupte S, Sallam M, Heath MD, Kuhn MH, Dharaiya E, Burns R, Fryd DS, Salganicoff M, Anand V, Shreter U, Vastagh S, Croft BY, Clarke LP, 2011. The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans. Med. Phys 38, 915–931. 10.1118/1.3528204 - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine-learning-based multiple abnormality prediction with large-scale chest computed tomography volumes

Affiliations

Machine-learning-based multiple abnormality prediction with large-scale chest computed tomography volumes

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical