Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Jan 28:2025.01.26.25321138.
doi: 10.1101/2025.01.26.25321138.

Development of a Claims-Based Computable Phenotype for Ulcerative Colitis Flares

Affiliations

Development of a Claims-Based Computable Phenotype for Ulcerative Colitis Flares

Daniel Copeland et al. medRxiv. .

Abstract

Background and aims: Several conditions exist that do not have their own unique diagnosis code in widely-used clinical terminologies, making them difficult to track and study. Acute severe ulcerative colitis (ASUC) is one such condition. There is no automated method to identify patients admitted for ASUC from observational data, nor any specific billing or diagnosis code for ASUC. Accurate, automated, large-scale identification of hospital admissions for non-coded conditions like ASUC may enable further research into them.

Methods: We performed a retrospective cohort study of patients with a history of ulcerative colitis (UC) admitted to a single academic institution from 2014-2019. Clinicians at our institution performed a chart review of these admissions to determine if each was due to a true episode of ASUC or not. Logistic regression, random forest (RF), and support vector machine (SVM) models were trained upon administrative claims data for all admissions.

Results: 268 ASUC admissions and 3,725 non-ASUC admissions among UC patients were included. Our RF model exhibited the best performance, correctly classifying 95.5% of admissions as either ASUC or non-ASUC, with a validation AUROC of 0.96 (95% CI 0.94-0.98; AUPRC 0.73). The model had a sensitivity of 81.5% and specificity of 96.5%. The five most important features in the model were endoscopy of sigmoid colon, length of stay, age, endoscopy of rectum, and abdominal x-ray.

Conclusions: There is currently no modality by which ASUC, which does not have its own unique diagnosis code, can be identified from claims databases in a scalable fashion for research or clinical purposes. We have developed a machine learning-based model that identifies clinically significant ASUC and reliably distinguishes them from admissions for non-ASUC reasons among UC patients. The ability to automatically curate large, accurate datasets of non-coded conditions like ASUC episodes can serve as the basis of large-scale analyses to maximize our ability to learn from real-world data, enable future research, and better understand these diseases.

Keywords: diagnosis codes; machine learning; ulcerative colitis.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest and Financial Disclosures: None

Figures

Figure 1.
Figure 1.
Data flow diagram.
Figure 2.
Figure 2.
(a) Area under the receiver operating characteristic (AUROC) of our random forest (RF) classifier model. (b) Area under the precision recall curve (AUPRC) of our RF classifier model.
Figure 3.
Figure 3.
Multidimensional scaling (MDS) plot of all individual ASUC and non-ASUC admissions to visually represent similarity among ASUC admissions. Blue points represent non-ASUC admissions and red numbers represent ASUC admissions. The low pairwise distances between ASUC admissions suggest that claims data are sufficient to detect a high level of similarity amongst ASUC admissions and separate them from non-ASUC admissions.
Figure 4:
Figure 4:
Relative feature importance of each feature included in our random forest (RF) model, measured in mean decrease in Gini error upon removal of this feature from the model.

References

    1. ICD - ICD-10-CM - International Classification of Diseases, Tenth Revision, Clinical Modification. June 28, 2021. Accessed July 1, 2021. https://www.cdc.gov/nchs/icd/icd10cm.htm
    1. ICD Coding for Rare Diseases. Accessed July 1, 2021. https://rarediseases.info.nih.gov/guides/pages/123/icd-coding-for-rare-d...
    1. Aymé S, Bellet B, Rath A. Rare diseases in ICD11: making rare diseases visible in health information systems through appropriate coding. Orphanet J Rare Dis. 2015;10(1):1–14. - PMC - PubMed
    1. Zghebi SS, Mamas MA, Ashcroft DM, et al. Development and validation of the DIabetes Severity SCOre (DISSCO) in 139 626 individuals with type 2 diabetes: a retrospective cohort study. BMJ Open Diab Res Care. 2020;8(1). doi:10.1136/bmjdrc-2019-000962 - DOI - PMC - PubMed
    1. Feuerstein JD, Isaacs KL, Schneider Y, et al. AGA Clinical Practice Guidelines on the Management of Moderate to Severe Ulcerative Colitis. Gastroenterology. 2020;158(5):1450–1461. - PMC - PubMed

Publication types

LinkOut - more resources