Towards automated clinical coding
- PMID: 30409346
- DOI: 10.1016/j.ijmedinf.2018.09.021
Towards automated clinical coding
Abstract
Background: Patients' encounters with healthcare services must undergo clinical coding. These codes are typically derived from free-text notes. Manual clinical coding is expensive, time-consuming and prone to error. Automated clinical coding systems have great potential to save resources, and realtime availability of codes would improve oversight of patient care and accelerate research. Automated coding is made challenging by the idiosyncrasies of clinical text, the large number of disease codes and their unbalanced distribution.
Methods: We explore methods for representing clinical text and the labels in hierarchical clinical coding ontologies. Text is represented as term frequency-inverse document frequency counts and then as word embeddings, which we use as input to recurrent neural networks. Labels are represented atomically, and then by learning representations of each node in a coding ontology and composing a representation for each label from its respective node path. We consider different strategies for initialisation of the node representations. We evaluate our methods using the publicly-available Medical Information Mart for Intensive Care III dataset: we extract the history of presenting illness section from each discharge summary in the dataset, then predicting the International Classification of Diseases, ninth revision, Clinical Modification codes associated with these.
Results: Composing the label representations from the clinical-coding-ontology nodes increased weighted F1 for prediction of the 17,561 disease labels to 0.264-0.281 from 0.232-0.249 for atomic representations. Recurrent neural network text representation improved weighted F1 for prediction of the 19 disease-category labels to 0.682-0.701 from 0.662-0.682 using term frequency-inverse document frequency. However, term frequency-inverse document frequency outperformed recurrent neural networks for prediction of the 17,561 disease labels.
Conclusions: This study demonstrates that hierarchically-structured medical knowledge can be incorporated into statistical models, and produces improved performance during automated clinical coding. This performance improvement results primarily from improved representation of rarer diseases. We also show that recurrent neural networks improve representation of medical text in some settings. Learning good representations of the very rare diseases in clinical coding ontologies from data alone remains challenging, and alternative means of representing these diseases will form a major focus of future work on automated clinical coding.
Keywords: Clinical coding; Hierarchical representation learning; Knowledge representation; Machine learning; Natural language processing; Recurrent neural networks.
Copyright © 2018 Elsevier B.V. All rights reserved.
Similar articles
-
Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation.J Biomed Inform. 2021 Apr;116:103728. doi: 10.1016/j.jbi.2021.103728. Epub 2021 Mar 9. J Biomed Inform. 2021. PMID: 33711543
-
Boosting ICD multi-label classification of health records with contextual embeddings and label-granularity.Comput Methods Programs Biomed. 2020 May;188:105264. doi: 10.1016/j.cmpb.2019.105264. Epub 2019 Dec 10. Comput Methods Programs Biomed. 2020. PMID: 31851906
-
Deep neural models for ICD-10 coding of death certificates and autopsy reports in free-text.J Biomed Inform. 2018 Apr;80:64-77. doi: 10.1016/j.jbi.2018.02.011. Epub 2018 Feb 26. J Biomed Inform. 2018. PMID: 29496630
-
Prognostic models of in-hospital mortality of intensive care patients using neural representation of unstructured text: A systematic review and critical appraisal.J Biomed Inform. 2023 Oct;146:104504. doi: 10.1016/j.jbi.2023.104504. Epub 2023 Sep 22. J Biomed Inform. 2023. PMID: 37742782 Review.
-
Computer-assisted clinical coding: A narrative review of the literature on its benefits, limitations, implementation and impact on clinical coding professionals.Health Inf Manag. 2020 Jan;49(1):5-18. doi: 10.1177/1833358319851305. Epub 2019 Jun 3. Health Inf Manag. 2020. PMID: 31159578 Review.
Cited by
-
Artificial Intelligence (AI) in Rare Diseases: Is the Future Brighter?Genes (Basel). 2019 Nov 27;10(12):978. doi: 10.3390/genes10120978. Genes (Basel). 2019. PMID: 31783696 Free PMC article. Review.
-
Automated ICD coding via unsupervised knowledge integration (UNITE).Int J Med Inform. 2020 Jul;139:104135. doi: 10.1016/j.ijmedinf.2020.104135. Epub 2020 Apr 4. Int J Med Inform. 2020. PMID: 32361145 Free PMC article.
-
Consultation analysis: use of free text versus coded text.Health Technol (Berl). 2021;11(2):349-357. doi: 10.1007/s12553-020-00517-3. Epub 2021 Jan 24. Health Technol (Berl). 2021. PMID: 33520588 Free PMC article.
-
Classification of user queries according to a hierarchical medical procedure encoding system using an ensemble classifier.Front Artif Intell. 2022 Nov 4;5:1000283. doi: 10.3389/frai.2022.1000283. eCollection 2022. Front Artif Intell. 2022. PMID: 36406473 Free PMC article.
-
Conversion of Automated 12-Lead Electrocardiogram Interpretations to OMOP CDM Vocabulary.Appl Clin Inform. 2022 Aug;13(4):880-890. doi: 10.1055/s-0042-1756427. Epub 2022 Sep 21. Appl Clin Inform. 2022. PMID: 36130711 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources