. 2022 Apr 15;22(1):102.

doi: 10.1186/s12911-022-01843-4.

Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning

Vincent M D'Anniballe^#¹, Fakrul Islam Tushar^#^{1

2

3}, Khrystyna Faryna³, Songyue Han⁴, Maciej A Mazurowski^{1

2}, Geoffrey D Rubin^{1

5}, Joseph Y Lo^{6

7}

Affiliations

¹ Center for Virtual Imaging Trials, Carl E. Ravin Advanced Imaging Laboratories, Department of Radiology, Duke University School of Medicine, 2424 Erwin Rd. Ste. 302, Durham, NC, 27705, USA.
² Department of Electrical and Computer Engineering, Pratt School of Engineering, Duke University, Durham, NC, USA.
³ Erasmus+ Joint Master in Medical Imaging and Applications, University of Girona, Girona, Spain.
⁴ School of Software Engineering, South China University of Technology, Guangzhou, Guangdong, China.
⁵ Department of Medical Imaging, University of Arizona College of Medicine, Tucson, AZ, USA.
⁶ Center for Virtual Imaging Trials, Carl E. Ravin Advanced Imaging Laboratories, Department of Radiology, Duke University School of Medicine, 2424 Erwin Rd. Ste. 302, Durham, NC, 27705, USA. joseph.lo@duke.edu.
⁷ Department of Electrical and Computer Engineering, Pratt School of Engineering, Duke University, Durham, NC, USA. joseph.lo@duke.edu.

^# Contributed equally.

PMID: 35428335
PMCID: PMC9011942
DOI: 10.1186/s12911-022-01843-4

Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning

Vincent M D'Anniballe et al. BMC Med Inform Decis Mak. 2022.

. 2022 Apr 15;22(1):102.

doi: 10.1186/s12911-022-01843-4.

Authors

Vincent M D'Anniballe^#¹, Fakrul Islam Tushar^#^{1

2

3}, Khrystyna Faryna³, Songyue Han⁴, Maciej A Mazurowski^{1

2}, Geoffrey D Rubin^{1

5}, Joseph Y Lo^{6

7}

Affiliations

¹ Center for Virtual Imaging Trials, Carl E. Ravin Advanced Imaging Laboratories, Department of Radiology, Duke University School of Medicine, 2424 Erwin Rd. Ste. 302, Durham, NC, 27705, USA.
² Department of Electrical and Computer Engineering, Pratt School of Engineering, Duke University, Durham, NC, USA.
³ Erasmus+ Joint Master in Medical Imaging and Applications, University of Girona, Girona, Spain.
⁴ School of Software Engineering, South China University of Technology, Guangzhou, Guangdong, China.
⁵ Department of Medical Imaging, University of Arizona College of Medicine, Tucson, AZ, USA.
⁶ Center for Virtual Imaging Trials, Carl E. Ravin Advanced Imaging Laboratories, Department of Radiology, Duke University School of Medicine, 2424 Erwin Rd. Ste. 302, Durham, NC, 27705, USA. joseph.lo@duke.edu.
⁷ Department of Electrical and Computer Engineering, Pratt School of Engineering, Duke University, Durham, NC, USA. joseph.lo@duke.edu.

^# Contributed equally.

PMID: 35428335
PMCID: PMC9011942
DOI: 10.1186/s12911-022-01843-4

Abstract

Background: There is progress to be made in building artificially intelligent systems to detect abnormalities that are not only accurate but can handle the true breadth of findings that radiologists encounter in body (chest, abdomen, and pelvis) computed tomography (CT). Currently, the major bottleneck for developing multi-disease classifiers is a lack of manually annotated data. The purpose of this work was to develop high throughput multi-label annotators for body CT reports that can be applied across a variety of abnormalities, organs, and disease states thereby mitigating the need for human annotation.

Methods: We used a dictionary approach to develop rule-based algorithms (RBA) for extraction of disease labels from radiology text reports. We targeted three organ systems (lungs/pleura, liver/gallbladder, kidneys/ureters) with four diseases per system based on their prevalence in our dataset. To expand the algorithms beyond pre-defined keywords, attention-guided recurrent neural networks (RNN) were trained using the RBA-extracted labels to classify reports as being positive for one or more diseases or normal for each organ system. Alternative effects on disease classification performance were evaluated using random initialization or pre-trained embedding as well as different sizes of training datasets. The RBA was tested on a subset of 2158 manually labeled reports and performance was reported as accuracy and F-score. The RNN was tested against a test set of 48,758 reports labeled by RBA and performance was reported as area under the receiver operating characteristic curve (AUC), with 95% CIs calculated using the DeLong method.

Results: Manual validation of the RBA confirmed 91-99% accuracy across the 15 different labels. Our models extracted disease labels from 261,229 radiology reports of 112,501 unique subjects. Pre-trained models outperformed random initialization across all diseases. As the training dataset size was reduced, performance was robust except for a few diseases with a relatively small number of cases. Pre-trained classification AUCs reached > 0.95 for all four disease outcomes and normality across all three organ systems.

Conclusions: Our label-extracting pipeline was able to encompass a variety of cases and diseases in body CT reports by generalizing beyond strict rules with exceptional accuracy. The method described can be easily adapted to enable automated labeling of hospital-scale medical data sets for training image-based disease classifiers.

Keywords: Attention RNN; Computed tomography; Natural language processing; Report labeling; Rule-based algorithm; Weak supervision.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
Complete workflow of this study. Radiology reports extracted from our health system were deidentified and the findings sections were isolated. The reports were analyzed by an RBA and an attention-guided RNN to classify each report for 5 different outcomes (one or more of four disease states or normal) per organ system (lungs/pleura, liver/gallbladder, kidneys/ureters). A separate RBA and RNN was used for each organ system

**Fig. 2**
Representative example of a body CT radiology report within our dataset. Report consists of protocol, indication, technique, findings, and impression sections composed in a semi-structured form

**Fig. 3**
Distribution of CT protocols within our dataset. CAP = chest, abdomen, and pelvis, C = chest, AP = abdomen-pelvis, A = abdomen, P = pelvis, CA = chest-abdomen, CP = chest-pelvis

**Fig. 4**
Overview of the RBAs. (Top) The findings section of each report was extracted, then the text was converted to lowercase and each sentence was tokenized. The RBA was deployed on each sentence, and the number of diseases was counted using the multi-organ descriptor first and then the single-organ descriptor logic. If no disease labels were detected, the normal descriptor logic was applied. This process was repeated for each disease allowing a report to be positive for one or more diseases or normal for each organ system. (Bottom) The normal, multi-organ, and single organ descriptor logics

**Fig. 5**
Frequency of reports for each disease within our dataset

**Fig. 6**
Examples of attention vectors projected on the findings section of radiology reports. (Top panel) a report positive for nodule in the lungs/pleura. (Middle panel) a normal report for liver/gallbladder. (Bottom panel) a report positive for stone in the kidneys/ureters. As part of standard pre-processing, all numbers and punctuation were removed and text was converted to lowercase

**Fig. 7**
Effect of different sizes of training data in the pretrained embedding models on classification performance. a Number of reports randomly split in 20%, 40%, 60%, 80% and 100% of total training dataset for each disease by organ system. b Performance of models on test-set trained with randomly split 20%, 40%, 60%, 80%, and 100% training data for each disease by organ system reported as AUC. Error bars represent 95% confidence intervals

See this image and copyright information in PMC

References

1. Pons E, Braun LM, Hunink MM, Kors JA. Natural language processing in radiology: a systematic review. Radiology. 2016;279(2):329–343. doi: 10.1148/radiol.16142770. - DOI - PubMed
1. Dreyer KJ, Kalra MK, Maher MM, Hurier AM, Asfaw BA, Schultz T, et al. Application of recently developed computer algorithm for automatic classification of unstructured radiology reports: validation study. Radiology. 2005;234(2):323–329. doi: 10.1148/radiol.2341040049. - DOI - PubMed
1. Solti I, Cooke CR, Xia F, Wurfel MM, editors. Automated classification of radiology reports for acute lung injury: comparison of keyword and machine learning based natural language processing approaches. In 2009 IEEE international conference on bioinformatics and biomedicine workshop; 2009: IEEE. - PMC - PubMed
1. Percha B, Nassif H, Lipson J, Burnside E, Rubin D. Automatic classification of mammography reports by BI-RADS breast tissue composition class. J Am Med Inform Assoc. 2012;19(5):913–916. doi: 10.1136/amiajnl-2011-000607. - DOI - PMC - PubMed
1. Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-Ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 2017, pp. 3462–71.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning

Affiliations

Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials