Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2014 Sep-Oct;21(5):801-7.
doi: 10.1136/amiajnl-2013-001915. Epub 2014 Jan 2.

Combining structured and unstructured data to identify a cohort of ICU patients who received dialysis

Affiliations
Comparative Study

Combining structured and unstructured data to identify a cohort of ICU patients who received dialysis

Swapna Abhyankar et al. J Am Med Inform Assoc. 2014 Sep-Oct.

Abstract

Objective: To develop a generalizable method for identifying patient cohorts from electronic health record (EHR) data-in this case, patients having dialysis-that uses simple information retrieval (IR) tools.

Methods: We used the coded data and clinical notes from the 24,506 adult patients in the Multiparameter Intelligent Monitoring in Intensive Care database to identify patients who had dialysis. We used SQL queries to search the procedure, diagnosis, and coded nursing observations tables based on ICD-9 and local codes. We used a domain-specific search engine to find clinical notes containing terms related to dialysis. We manually validated the available records for a 10% random sample of patients who potentially had dialysis and a random sample of 200 patients who were not identified as having dialysis based on any of the sources.

Results: We identified 1844 patients that potentially had dialysis: 1481 from the three coded sources and 1624 from the clinical notes. Precision for identifying dialysis patients based on available data was estimated to be 78.4% (95% CI 71.9% to 84.2%) and recall was 100% (95% CI 86% to 100%).

Conclusions: Combining structured EHR data with information from clinical notes using simple queries increases the utility of both types of data for cohort identification. Patients identified by more than one source are more likely to meet the inclusion criteria; however, including patients found in any of the sources increases recall. This method is attractive because it is available to researchers with access to EHR data and off-the-shelf IR tools.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Algorithms used for manual validation. (A) illustrates the manual process for determining whether patients identified as having dialysis by searching the structured data tables and/or unstructured clinical notes truly had dialysis or not (true positive, false positive). (B) shows the process for determining which patients who potentially had dialysis were missed by our method. (C) represents the informal evaluation of patients with the non-starred code 996.1 to help determine the utility of the non-starred codes.
Figure 2
Figure 2
Comparison of potential dialysis patients identified in the three structured data tables and the unstructured clinical notes (total N=1844). The underlined numbers represent patients found in both the clinical notes and one or more structured sources.
Figure 3
Figure 3
Comparison of potential dialysis patients across the three structured data tables: procedure codes, discharge diagnosis codes, and coded nursing observations. The underlined numbers represent patients found in more than one coded source. The total number of patients found in all three structured sources together is 1481.
Figure 4
Figure 4
Comparison of potential dialysis patients across the three structured sources of data using only the codes that we determined were the most unambiguous for the actual dialysis procedure (starred subset of codes), which were procedure codes 39.95 Hemodialysis and 54.98 Peritoneal dialysis; discharge diagnosis codes V45.1 Postsurgical renal dialysis status and V56.0 Encounter for extracorporeal dialysis; and nursing observation codes 146 Dialysate flow ml/hr, 147 Dialysate infusing, 150 Dialysis machine, and 152 Dialysis type. The total number of patients having any of these codes was 1326. The underlined numbers represent patients found in more than one coded source.

References

    1. Segal JB, Powe NR. Accuracy of identification of patients with immune thrombocytopenic purpura through administrative records: a data validation study. Am J Hematol 2004;75:12–7 - PubMed
    1. Eichler AF, Lamont EB. Utility of administrative claims data for the study of brain metastases: a validation study. J Neurooncol 2009;95:427–31 - PubMed
    1. Kern EFO, Maney M, Miller DR, et al. Failure of ICD-9-CM codes to identify patients with comorbid chronic kidney disease in diabetes. Health Serv Res 2006;41:564–80 - PMC - PubMed
    1. Zhan C, Elixhauser E, Richards CL, et al. Identification of hospital-acquired catheter-associated urinary tract infections from Medicare claims: sensitivity and positive predictive value. Med Care 2009;47:364–69 - PubMed
    1. Floyd JS, Heckbert SR, Weiss NS, et al. Use of administrative data to estimate the incidence of statin-related rhabdomyolysis. JAMA 2012;307:1580–2 - PMC - PubMed

Publication types

MeSH terms