Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation
- PMID: 33711543
- DOI: 10.1016/j.jbi.2021.103728
Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation
Abstract
Background: Diagnostic or procedural coding of clinical notes aims to derive a coded summary of disease-related information about patients. Such coding is usually done manually in hospitals but could potentially be automated to improve the efficiency and accuracy of medical coding. Recent studies on deep learning for automated medical coding achieved promising performances. However, the explainability of these models is usually poor, preventing them to be used confidently in supporting clinical practice. Another limitation is that these models mostly assume independence among labels, ignoring the complex correlations among medical codes which can potentially be exploited to improve the performance.
Methods: To address the issues of model explainability and label correlations, we propose a Hierarchical Label-wise Attention Network (HLAN), which aimed to interpret the model by quantifying importance (as attention weights) of words and sentences related to each of the labels. Secondly, we propose to enhance the major deep learning models with a label embedding (LE) initialisation approach, which learns a dense, continuous vector representation and then injects the representation into the final layers and the label-wise attention layers in the models. We evaluated the methods using three settings on the MIMIC-III discharge summaries: full codes, top-50 codes, and the UK NHS (National Health Service) COVID-19 (Coronavirus disease 2019) shielding codes. Experiments were conducted to compare the HLAN model and label embedding initialisation to the state-of-the-art neural network based methods, including variants of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).
Results: HLAN achieved the best Micro-level AUC and F1 on the top-50 code prediction, 91.9% and 64.1%, respectively; and comparable results on the NHS COVID-19 shielding code prediction to other models: around 97% Micro-level AUC. More importantly, in the analysis of model explanations, by highlighting the most salient words and sentences for each label, HLAN showed more meaningful and comprehensive model interpretation compared to the CNN-based models and its downgraded baselines, HAN and HA-GRU. Label embedding (LE) initialisation significantly boosted the previous state-of-the-art model, CNN with attention mechanisms, on the full code prediction to 52.5% Micro-level F1. The analysis of the layers initialised with label embeddings further explains the effect of this initialisation approach. The source code of the implementation and the results are openly available at https://github.com/acadTags/Explainable-Automated-Medical-Coding.
Conclusion: We draw the conclusion from the evaluation results and analyses. First, with hierarchical label-wise attention mechanisms, HLAN can provide better or comparable results for automated coding to the state-of-the-art, CNN-based models. Second, HLAN can provide more comprehensive explanations for each label by highlighting key words and sentences in the discharge summaries, compared to the n-grams in the CNN-based models and the downgraded baselines, HAN and HA-GRU. Third, the performance of deep learning based multi-label classification for automated coding can be consistently boosted by initialising label embeddings that captures the correlations among labels. We further discuss the advantages and drawbacks of the overall method regarding its potential to be deployed to a hospital and suggest areas for future studies.
Keywords: Attention Mechanisms; Automated medical coding; Deep learning; Explainability; Label correlation; Multi-label classification; Natural Language Processing.
Copyright © 2021 Elsevier Inc. All rights reserved.
Similar articles
-
Hierarchical label-wise attention transformer model for explainable ICD coding.J Biomed Inform. 2022 Sep;133:104161. doi: 10.1016/j.jbi.2022.104161. Epub 2022 Aug 20. J Biomed Inform. 2022. PMID: 35995108
-
Towards automated clinical coding.Int J Med Inform. 2018 Dec;120:50-61. doi: 10.1016/j.ijmedinf.2018.09.021. Epub 2018 Oct 2. Int J Med Inform. 2018. PMID: 30409346
-
Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7. J Biomed Inform. 2022. PMID: 35007754
-
An explainable CNN approach for medical codes prediction from clinical text.BMC Med Inform Decis Mak. 2021 Nov 16;21(Suppl 9):256. doi: 10.1186/s12911-021-01615-6. BMC Med Inform Decis Mak. 2021. PMID: 34789241 Free PMC article.
-
Automated clinical coding: what, why, and where we are?NPJ Digit Med. 2022 Oct 22;5(1):159. doi: 10.1038/s41746-022-00705-7. NPJ Digit Med. 2022. PMID: 36273236 Free PMC article. Review.
Cited by
-
ICDXML: enhancing ICD coding with probabilistic label trees and dynamic semantic representations.Sci Rep. 2024 Aug 7;14(1):18319. doi: 10.1038/s41598-024-69214-9. Sci Rep. 2024. PMID: 39112791 Free PMC article.
-
Optimising the paradigms of human AI collaborative clinical coding.NPJ Digit Med. 2024 Dec 19;7(1):368. doi: 10.1038/s41746-024-01363-7. NPJ Digit Med. 2024. PMID: 39702575 Free PMC article.
-
Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations.Database (Oxford). 2022 Aug 31;2022:baac069. doi: 10.1093/database/baac069. Database (Oxford). 2022. PMID: 36043400 Free PMC article.
-
Year 2021: COVID-19, Information Extraction and BERTization among the Hottest Topics in Medical Natural Language Processing.Yearb Med Inform. 2022 Aug;31(1):254-260. doi: 10.1055/s-0042-1742547. Epub 2022 Dec 4. Yearb Med Inform. 2022. PMID: 36463883 Free PMC article.
-
A systematic review of natural language processing applied to radiology reports.BMC Med Inform Decis Mak. 2021 Jun 3;21(1):179. doi: 10.1186/s12911-021-01533-7. BMC Med Inform Decis Mak. 2021. PMID: 34082729 Free PMC article.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical