. 2024 Jan:149:104576.

doi: 10.1016/j.jbi.2023.104576. Epub 2023 Dec 13.

Deep learning uncertainty quantification for clinical text classification

Alina Peluso¹, Ioana Danciu², Hong-Jun Yoon², Jamaludin Mohd Yusof³, Tanmoy Bhattacharya³, Adam Spannaus², Noah Schaefferkoetter², Eric B Durbin⁴, Xiao-Cheng Wu⁵, Antoinette Stroup⁶, Jennifer Doherty⁷, Stephen Schwartz⁸, Charles Wiggins⁹, Linda Coyle¹⁰, Lynne Penberthy¹¹, Georgia D Tourassi², Shang Gao²

Affiliations

¹ Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States. Electronic address: pelusoa@ornl.gov.
² Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States.
³ Los Alamos National Laboratory, Los Alamos, NM 87545, United States.
⁴ University of Kentucky, Lexington, KY 40536, United States.
⁵ Louisiana State University, New Orleans, LA 70112, United States.
⁶ Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, United States.
⁷ University of Utah, Salt Lake City, UT 84132, United States.
⁸ Fred Hutchinson Cancer Research Center, Seattle, WA 98109, United States.
⁹ University of New Mexico, Albuquerque, NM 87131, United States.
¹⁰ Information Management Services Inc., Calverton, MD 20705, United States.
¹¹ National Cancer Institute, Bethesda, MD 20814, United States.

PMID: 38101690
PMCID: PMC11467893
DOI: 10.1016/j.jbi.2023.104576

Deep learning uncertainty quantification for clinical text classification

Alina Peluso et al. J Biomed Inform. 2024 Jan.

. 2024 Jan:149:104576.

doi: 10.1016/j.jbi.2023.104576. Epub 2023 Dec 13.

Authors

Affiliations

¹ Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States. Electronic address: pelusoa@ornl.gov.
² Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States.
³ Los Alamos National Laboratory, Los Alamos, NM 87545, United States.
⁴ University of Kentucky, Lexington, KY 40536, United States.
⁵ Louisiana State University, New Orleans, LA 70112, United States.
⁶ Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, United States.
⁷ University of Utah, Salt Lake City, UT 84132, United States.
⁸ Fred Hutchinson Cancer Research Center, Seattle, WA 98109, United States.
⁹ University of New Mexico, Albuquerque, NM 87131, United States.
¹⁰ Information Management Services Inc., Calverton, MD 20705, United States.
¹¹ National Cancer Institute, Bethesda, MD 20814, United States.

PMID: 38101690
PMCID: PMC11467893
DOI: 10.1016/j.jbi.2023.104576

Abstract

Introduction: Machine learning algorithms are expected to work side-by-side with humans in decision-making pipelines. Thus, the ability of classifiers to make reliable decisions is of paramount importance. Deep neural networks (DNNs) represent the state-of-the-art models to address real-world classification. Although the strength of activation in DNNs is often correlated with the network's confidence, in-depth analyses are needed to establish whether they are well calibrated.

Method: In this paper, we demonstrate the use of DNN-based classification tools to benefit cancer registries by automating information extraction of disease at diagnosis and at surgery from electronic text pathology reports from the US National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) population-based cancer registries. In particular, we introduce multiple methods for selective classification to achieve a target level of accuracy on multiple classification tasks while minimizing the rejection amount-that is, the number of electronic pathology reports for which the model's predictions are unreliable. We evaluate the proposed methods by comparing our approach with the current in-house deep learning-based abstaining classifier.

Results: Overall, all the proposed selective classification methods effectively allow for achieving the targeted level of accuracy or higher in a trade-off analysis aimed to minimize the rejection rate. On in-distribution validation and holdout test data, with all the proposed methods, we achieve on all tasks the required target level of accuracy with a lower rejection rate than the deep abstaining classifier (DAC). Interpreting the results for the out-of-distribution test data is more complex; nevertheless, in this case as well, the rejection rate from the best among the proposed methods achieving 97% accuracy or higher is lower than the rejection rate based on the DAC.

Conclusions: We show that although both approaches can flag those samples that should be manually reviewed and labeled by human annotators, the newly proposed methods retain a larger fraction and do so without retraining-thus offering a reduced computational cost compared with the in-house deep learning-based abstaining classifier.

Keywords: Abstaining classifier; Accuracy; CNN; DNN; Deep learning; HiSAN; NCI SEER; Pathology reports; Selective classification; Text classification; Uncertainty quantification.

Published by Elsevier Inc.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

**Fig. 1.**
Subfigure (a) shows the architecture of the MTCNN for multi-task classification. The model has three parallel filters with a different window size for each filter. The output from these filters is fed into a maxpooling layer and is then concatenated before a final softmax or sigmoid function is applied for each classification task. In subfigure (b), the MTHiSAN architecture is presented, showing how the different layers of word embeddings create a word hierarchy which is connected to a self-attention and target attention, respectively. The output of these attention mechanisms is directly connected to similar hierarchical attention mechanisms, creating a hierarchy over the lines in a pathology report. These features create the document embedding, which are the extracted features used in the final classification layer.

**Fig. 2.**
Baseline accuracy for our models on validation and test data for both in-distribution, more recent holdout data (UTNJKYLASA) as well as OOD data (CA and NM). In all instances, the accuracy is lower than the required 97% level.

**Fig. 3.**
*Experimental study 1:* validation data. The accuracy level of about 97% is achieved with the displayed rejection rate. (*) and (x) represent the lowest and highest rejection rate by task.

**Fig. 4.**
*Experimental study 1:* more recent, hold-out test set. The tuning on the validation set resulted in higher accuracy on the test set than the target level of 97%, corresponding to the displayed rejection rate. (*) and (x) represent the lowest and highest rejection rate by task.

**Fig. 5.**
*Experimental study 2:* validation set (top) and more recent, hold-out test set (bottom). The tuning on the validation set for the same self-tuning accuracy selected by the DAC resulted in a higher accuracy than the target level of 97%. (*) and (x) represent the lowest and highest rejection rate by task. Also, all the proposed a posteriori methods retain a larger or equal rate of classes (i.e., number of retained predicted classes vs. ground truth CTC classes) compared to the DAC.

**Fig. 6.**
*Experimental study 2*: OOD test set – CA (top) and NM (bottom). The tuning on the validation set led to a higher accuracy on the test set than the target level of 97%. (*) and (x) represent the lowest and highest rejection rate by task. Also, all the proposed *a posteriori* methods retain a larger or equal rate of classes (i.e., number of retained predicted classes vs. ground truth CTC classes) compared to the DAC.

See this image and copyright information in PMC

References

1. Qiu JX, Yoon H-J, Fearn PA, Tourassi GD, Deep learning for automated extraction of primary sites from cancer pathology reports, IEEE J. Biomed. Health Inform 22 (1) (2017) 244–251. - PubMed
1. Hughes M, Li I, Kotoulas S, Suzumura T, Medical text classification using convolutional neural networks, in: Informatics for Health: Connected Citizen-Led Wellness and Population Health, IOS Press, 2017, pp. 246–250. - PubMed
1. Gao S, Qiu JX, Alawad M, Hinkle JD, Schaefferkoetter N, Yoon H-J, Christian B, Fearn PA, Penberthy L, Wu X-C, et al., Classifying cancer pathology reports with hierarchical self-attention networks, Artif. Intell. Med 101 (2019) 101726. - PubMed
1. Alawad M, Gao S, Qiu JX, Yoon HJ, Blair Christian J, Penberthy L, Mumphrey B, Wu X-C, Coyle L, Tourassi G, Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks, J. Am. Med. Inform. Assoc 27 (1) (2020) 89–98. - PMC - PubMed
1. Yoon H-J, Peluso A, Durbin EB, Wu X-C, Stroup A, Doherty J, Schwartz S, Wiggins C, Coyle L, Penberthy L, Automatic information extraction from childhood cancer pathology reports, JAMIA open 5 (2) (2022) ooac049. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Elsevier Science
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deep learning uncertainty quantification for clinical text classification

Affiliations

Deep learning uncertainty quantification for clinical text classification

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources