Improving case definition of Crohn's disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach
- PMID: 23567779
- PMCID: PMC3665760
- DOI: 10.1097/MIB.0b013e31828133fd
Improving case definition of Crohn's disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach
Abstract
Background: Previous studies identifying patients with inflammatory bowel disease using administrative codes have yielded inconsistent results. Our objective was to develop a robust electronic medical record-based model for classification of inflammatory bowel disease leveraging the combination of codified data and information from clinical text notes using natural language processing.
Methods: Using the electronic medical records of 2 large academic centers, we created data marts for Crohn's disease (CD) and ulcerative colitis (UC) comprising patients with ≥1 International Classification of Diseases, 9th edition, code for each disease. We used codified (i.e., International Classification of Diseases, 9th edition codes, electronic prescriptions) and narrative data from clinical notes to develop our classification model. Model development and validation was performed in a training set of 600 randomly selected patients for each disease with medical record review as the gold standard. Logistic regression with the adaptive LASSO penalty was used to select informative variables.
Results: We confirmed 399 CD cases (67%) in the CD training set and 378 UC cases (63%) in the UC training set. For both, a combined model including narrative and codified data had better accuracy (area under the curve for CD 0.95; UC 0.94) than models using only disease International Classification of Diseases, 9th edition codes (area under the curve 0.89 for CD; 0.86 for UC). Addition of natural language processing narrative terms to our final model resulted in classification of 6% to 12% more subjects with the same accuracy.
Conclusions: Inclusion of narrative concepts identified using natural language processing improves the accuracy of electronic medical records case definition for CD and UC while simultaneously identifying more subjects compared with models using codified data alone.
Figures
References
-
- Carroll RJ, Thompson WK, Eyler AE, Mandelin AM, Cai T, Zink RM, Pacheco JA, Boomershine CS, Lasko TA, Xu H, Karlson EW, Perez RG, Gainer VS, Murphy SN, Ruderman EM, Pope RM, Plenge RM, Kho AN, Liao KP, Denny JC. Portability of an algorithm to identify rheumatoid arthritis in electronic health records. J Am Med Inform Assoc. 2012 - PMC - PubMed
-
- Denny JC, Crawford DC, Ritchie MD, Bielinski SJ, Basford MA, Bradford Y, Chai HS, Bastarache L, Zuvich R, Peissig P, Carrell D, Ramirez AH, Pathak J, Wilke RA, Rasmussen L, Wang X, Pacheco JA, Kho AN, Hayes MG, Weston N, Matsumoto M, Kopp PA, Newton KM, Jarvik GP, Li R, Manolio TA, Kullo IJ, Chute CG, Chisholm RL, Larson EB, McCarty CA, Masys DR, Roden DM, de Andrade M. Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am J Hum Genet. 2011;89:529–42. - PMC - PubMed
-
- Jha AK. The promise of electronic records: around the corner or down the road? Jama. 2011;306:880–1. - PubMed
-
- Kurreeman F, Liao K, Chibnik L, Hickey B, Stahl E, Gainer V, Li G, Bry L, Mahan S, Ardlie K, Thomson B, Szolovits P, Churchill S, Murphy SN, Cai T, Raychaudhuri S, Kohane I, Karlson E, Plenge RM. Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records. Am J Hum Genet. 2011;88:57–69. - PMC - PubMed
Publication types
MeSH terms
Grants and funding
- K08 AR060257/AR/NIAMS NIH HHS/United States
- U01 GM092691/GM/NIGMS NIH HHS/United States
- K23 DK097142/DK/NIDDK NIH HHS/United States
- R01 AR057108/AR/NIAMS NIH HHS/United States
- U54 LM008748/LM/NLM NIH HHS/United States
- R01-AR056768/AR/NIAMS NIH HHS/United States
- U54-LM008748/LM/NLM NIH HHS/United States
- R01 AR056768/AR/NIAMS NIH HHS/United States
- U01-GM092691/GM/NIGMS NIH HHS/United States
- K08 NS079493/NS/NINDS NIH HHS/United States
- P30 DK043351/DK/NIDDK NIH HHS/United States
- R01 AR059648/AR/NIAMS NIH HHS/United States
- K24 AR052403/AR/NIAMS NIH HHS/United States
- R01-AR059648/AR/NIAMS NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical