Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 13;5(3):ooac075.
doi: 10.1093/jamiaopen/ooac075. eCollection 2022 Oct.

Using ensembles and distillation to optimize the deployment of deep learning models for the classification of electronic cancer pathology reports

Affiliations

Using ensembles and distillation to optimize the deployment of deep learning models for the classification of electronic cancer pathology reports

Kevin De Angeli et al. JAMIA Open. .

Abstract

Objective: We aim to reduce overfitting and model overconfidence by distilling the knowledge of an ensemble of deep learning models into a single model for the classification of cancer pathology reports.

Materials and methods: We consider the text classification problem that involves 5 individual tasks. The baseline model consists of a multitask convolutional neural network (MtCNN), and the implemented ensemble (teacher) consists of 1000 MtCNNs. We performed knowledge transfer by training a single model (student) with soft labels derived through the aggregation of ensemble predictions. We evaluate performance based on accuracy and abstention rates by using softmax thresholding.

Results: The student model outperforms the baseline MtCNN in terms of abstention rates and accuracy, thereby allowing the model to be used with a larger volume of documents when deployed. The highest boost was observed for subsite and histology, for which the student model classified an additional 1.81% reports for subsite and 3.33% reports for histology.

Discussion: Ensemble predictions provide a useful strategy for quantifying the uncertainty inherent in labeled data and thereby enable the construction of soft labels with estimated probabilities for multiple classes for a given document. Training models with the derived soft labels reduce model confidence in difficult-to-classify documents, thereby leading to a reduction in the number of highly confident wrong predictions.

Conclusions: Ensemble model distillation is a simple tool to reduce model overconfidence in problems with extreme class imbalance and noisy datasets. These methods can facilitate the deployment of deep learning models in high-risk domains with low computational resources where minimizing inference time is required.

Keywords: CNN; NLP; deep learning; ensemble distillation; selective classification.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of our training pipeline with a hypothetical example in which 3 different models classify a pathology report as stomach, esophagus, and colon. Our actual implementation consists of 1000 teacher models.
Figure 2.
Figure 2.
Histology Task. Distribution of softmaxes for the wrong predictions.
Figure 3.
Figure 3.
Subsite Task. Distribution of softmaxes for the wrong predictions.
Figure 4.
Figure 4.
Wrong histology predictions made with confidence >0.97.
Figure 5.
Figure 5.
Incorrectly annotated pathology that was fixed during the distillation process. Some sentences were removed to conserve privacy.
Figure 6.
Figure 6.
Pathology report in which the ensemble prediction votes were split into 3 equivalent groups. This report includes results of 3 analyzed specimens related to the stomach, esophagus, and colon. Some sentences were removed to ensure privacy.

References

    1. Siegel RL, Miller KD, Fuchs HE, Jemal A.. Cancer statistics, 2022. CA A Cancer J Clin 2022; 72 (1): 7–33. - PubMed
    1. Gao S, Qiu JX, Alawad M, et al.Classifying cancer pathology reports with hierarchical self-attention networks. Artif Intell Med 2019; 101: 101726. - PubMed
    1. Gao S, Ramanathan A, Tourassi G. Hierarchical convolutional attention networks for text classification. In: Proceedings of the Third Workshop on Representation Learning for NLP. Melbourne, Australia: Association for Computational Linguistics; 2018: 11–23. https://www.aclweb.org/anthology/W18-3002. Accessed September 1, 2022.
    1. Alawad M, Gao S, Qiu JX, et al.Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks. J Am Med Inform Assoc 2020; 27 (1): 89–98. - PMC - PubMed
    1. Gao S, Alawad M, Young MT, et al.Limitations of transformers on clinical text classification. IEEE J Biomed Health Inform 2021; 25 (9): 3596–607. - PMC - PubMed

LinkOut - more resources