Data integration of structured and unstructured sources for assigning clinical codes to patient stays
- PMID: 26316458
- PMCID: PMC4954635
- DOI: 10.1093/jamia/ocv115
Data integration of structured and unstructured sources for assigning clinical codes to patient stays
Abstract
Objective: Enormous amounts of healthcare data are becoming increasingly accessible through the large-scale adoption of electronic health records. In this work, structured and unstructured (textual) data are combined to assign clinical diagnostic and procedural codes (specifically ICD-9-CM) to patient stays. We investigate whether integrating these heterogeneous data types improves prediction strength compared to using the data types in isolation.
Methods: Two separate data integration approaches were evaluated. Early data integration combines features of several sources within a single model, and late data integration learns a separate model per data source and combines these predictions with a meta-learner. This is evaluated on data sources and clinical codes from a broad set of medical specialties.
Results: When compared with the best individual prediction source, late data integration leads to improvements in predictive power (eg, overall F-measure increased from 30.6% to 38.3% for International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnostic codes), while early data integration is less consistent. The predictive strength strongly differs between medical specialties, both for ICD-9-CM diagnostic and procedural codes.
Discussion: Structured data provides complementary information to unstructured data (and vice versa) for predicting ICD-9-CM codes. This can be captured most effectively by the proposed late data integration approach.
Conclusions: We demonstrated that models using multiple electronic health record data sources systematically outperform models using data sources in isolation in the task of predicting ICD-9-CM codes over a broad range of medical specialties.
Keywords: clinical coding; data integration; data mining; electronic health records; international classification of diseases.
© The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Figures
References
-
- Hsiao C-J, Hing E. Use and Characteristics of Electronic Health Record Systems Among Office-Based Physician Practices, United States, 2001-2012. US Department of Health; Human Services, Centers for Disease Control; Prevention, National Center for Health Statistics, United States.
-
- WHO. International Classification of Diseases. http://www.who.int/classifications/icd/en/. Accessed 25 March 2015.
-
- WHO. International Classification of Primary Care. 2nd edn. 2003. http://www.who.int/classifications/icd/adaptations/icpc2/en/. Accessed 25 March 2015.
-
- WHO. International Classification of Diseases, Clinical Modification (Ninth Revision).http://www.cdc.gov/nchs/icd/icd9cm.htm. Accessed 25 March 2015.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
