This is a preprint.
Development of a Claims-Based Computable Phenotype for Ulcerative Colitis Flares
- PMID: 39974105
- PMCID: PMC11838973
- DOI: 10.1101/2025.01.26.25321138
Development of a Claims-Based Computable Phenotype for Ulcerative Colitis Flares
Abstract
Background and aims: Several conditions exist that do not have their own unique diagnosis code in widely-used clinical terminologies, making them difficult to track and study. Acute severe ulcerative colitis (ASUC) is one such condition. There is no automated method to identify patients admitted for ASUC from observational data, nor any specific billing or diagnosis code for ASUC. Accurate, automated, large-scale identification of hospital admissions for non-coded conditions like ASUC may enable further research into them.
Methods: We performed a retrospective cohort study of patients with a history of ulcerative colitis (UC) admitted to a single academic institution from 2014-2019. Clinicians at our institution performed a chart review of these admissions to determine if each was due to a true episode of ASUC or not. Logistic regression, random forest (RF), and support vector machine (SVM) models were trained upon administrative claims data for all admissions.
Results: 268 ASUC admissions and 3,725 non-ASUC admissions among UC patients were included. Our RF model exhibited the best performance, correctly classifying 95.5% of admissions as either ASUC or non-ASUC, with a validation AUROC of 0.96 (95% CI 0.94-0.98; AUPRC 0.73). The model had a sensitivity of 81.5% and specificity of 96.5%. The five most important features in the model were endoscopy of sigmoid colon, length of stay, age, endoscopy of rectum, and abdominal x-ray.
Conclusions: There is currently no modality by which ASUC, which does not have its own unique diagnosis code, can be identified from claims databases in a scalable fashion for research or clinical purposes. We have developed a machine learning-based model that identifies clinically significant ASUC and reliably distinguishes them from admissions for non-ASUC reasons among UC patients. The ability to automatically curate large, accurate datasets of non-coded conditions like ASUC episodes can serve as the basis of large-scale analyses to maximize our ability to learn from real-world data, enable future research, and better understand these diseases.
Keywords: diagnosis codes; machine learning; ulcerative colitis.
Conflict of interest statement
Conflicts of Interest and Financial Disclosures: None
Figures




References
-
- ICD - ICD-10-CM - International Classification of Diseases, Tenth Revision, Clinical Modification. June 28, 2021. Accessed July 1, 2021. https://www.cdc.gov/nchs/icd/icd10cm.htm
-
- ICD Coding for Rare Diseases. Accessed July 1, 2021. https://rarediseases.info.nih.gov/guides/pages/123/icd-coding-for-rare-d...
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources