Automated ICD coding via unsupervised knowledge integration (UNITE)
- PMID: 32361145
- PMCID: PMC9410729
- DOI: 10.1016/j.ijmedinf.2020.104135
Automated ICD coding via unsupervised knowledge integration (UNITE)
Abstract
Objective: Accurate coding is critical for medical billing and electronic medical record (EMR)-based research. Recent research has been focused on developing supervised methods to automatically assign International Classification of Diseases (ICD) codes from clinical notes. However, supervised approaches rely on ICD code data stored in the hospital EMR system and is subject to bias rising from the practice and coding behavior. Consequently, portability of trained supervised algorithms to external EMR systems may suffer.
Method: We developed an unsupervised knowledge integration (UNITE) algorithm to automatically assign ICD codes for a specific disease by analyzing clinical narrative notes via semantic relevance assessment. The algorithm was validated using coded ICD data for 6 diseases from Partners HealthCare (PHS) Biobank and Medical Information Mart for Intensive Care (MIMIC-III). We compared the performance of UNITE against penalized logistic regression (LR), topic modeling, and neural network models within each EMR system. We additionally evaluated the portability of UNITE by training at PHS Biobank and validating at MIMIC-III, and vice versa.
Results: UNITE achieved an averaged AUC of 0.91 at PHS and 0.92 at MIMIC over 6 diseases, comparable to LR and MLP. It had substantially better performance than topic models. In regards to portability, the performance of UNITE was consistent across different EMR systems, superior to LR, topic models and neural network models.
Conclusion: UNITE accurately assigns ICD code in EMR without requiring human labor, and has major advantages over commonly used machine learning approaches. In addition, the UNITE attained stable performance and high portability across EMRs in different institutions.
Keywords: Automated ICD assignment; Electronic medical records; Knowledge integration; Portability; Semantic embedding; Unsupervised learning.
Copyright © 2020 Elsevier B.V. All rights reserved.
Conflict of interest statement
Declaration of Competing Interest All authors have declared that they have no financial or non-financial interests that may be relevant to the submitted work; no other relationships or activities that could appear to have influenced the submitted work.
Figures




Similar articles
-
An empirical evaluation of deep learning for ICD-9 code assignment using MIMIC-III clinical notes.Comput Methods Programs Biomed. 2019 Aug;177:141-153. doi: 10.1016/j.cmpb.2019.05.024. Epub 2019 May 25. Comput Methods Programs Biomed. 2019. PMID: 31319942
-
An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records.Artif Intell Med. 2015 Oct;65(2):155-66. doi: 10.1016/j.artmed.2015.04.007. Epub 2015 May 15. Artif Intell Med. 2015. PMID: 26054428 Free PMC article.
-
Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record.Arthritis Res Ther. 2019 Dec 30;21(1):305. doi: 10.1186/s13075-019-2092-7. Arthritis Res Ther. 2019. PMID: 31888720 Free PMC article.
-
Overview of ICD-11 architecture and structure.BMC Med Inform Decis Mak. 2022 May 16;21(Suppl 6):378. doi: 10.1186/s12911-021-01539-1. BMC Med Inform Decis Mak. 2022. PMID: 35578335 Free PMC article. Review.
-
[ICD-11-Adapting ICD to the 21st century].Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2018 Jul;61(7):771-777. doi: 10.1007/s00103-018-2755-6. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2018. PMID: 29869704 Review. German.
Cited by
-
Automated ICD coding for coronary heart diseases by a deep learning method.Heliyon. 2023 Feb 27;9(3):e14037. doi: 10.1016/j.heliyon.2023.e14037. eCollection 2023 Mar. Heliyon. 2023. PMID: 36938427 Free PMC article.
-
Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks.NPJ Digit Med. 2021 Feb 26;4(1):37. doi: 10.1038/s41746-021-00404-9. NPJ Digit Med. 2021. PMID: 33637859 Free PMC article.
-
Comparison of different feature extraction methods for applicable automated ICD coding.BMC Med Inform Decis Mak. 2022 Jan 12;22(1):11. doi: 10.1186/s12911-022-01753-5. BMC Med Inform Decis Mak. 2022. PMID: 35022039 Free PMC article.
References
-
- Sheppard JE, Weidner LC, Zakai S, Fountain-Polley S and Williams J, “Ambiguous abbreviations: an audit of abbreviations in paediatric note keeping,” Arch. disease childhood, vol. 93, p. 204–206, 2008. - PubMed
-
- Lang D, “Consultant report-natural language processing in the health care industry,” Cincinnati Children’s Hospital Medical Center, vol. Winter, no. 6, 2007.
-
- L. L. and B. C., “Automatic assignment of ICD9 codes to discharge summaries,” Technical report, University of Massachusetts at Amherst, Amherst, MA., 1995.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources