Using machine learning algorithms to optimize treatment with high-cost biologics in a national cohort of patients with inflammatory bowel disease

Jason K Hou^{1

2

3}, Tiffany M Tang^{4

5}, Shubhada Sansgiry^{2

3}, Tony Van⁶, Peter A Richardson^{2

3}, Codey Pham¹, Francesca Cunningham^{7

8}, Jessica A Baker⁶, Ji Zhu⁴, Akbar K Waljee^{6

9

10}

Affiliations

¹ Section of Gastroenterology and Hepatology, Baylor College of Medicine, Houston, TX 77030, United States.
² Center for Innovations in Quality, Effectiveness and Safety (IQuESt), Michael E. DeBakey Veterans Affairs Medical Center, Houston, TX 77030, United States.
³ Section of Health Services Research, Baylor College of Medicine, Houston, TX 77030, United States.
⁴ Department of Statistics, University of Michigan, Ann Arbor, MI, 48109, United States.
⁵ Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN 46556, United States.
⁶ Center for Clinical Management Research, VA Ann Arbor Healthcare System, Ann Arbor, MI 48109, United States.
⁷ Veterans Affairs Center for Medication Safety, Hines, IN, 60141, United States.
⁸ United Stated Department of Veterans Affairs VA Pharmacy Benefits Management Services, Washington, DC, 20422, United States.
⁹ Center for Global Health and Equity, University of Michigan, Ann Arbor, 48109, United States.
¹⁰ Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, 48109, United States.

PMID: 41356407
PMCID: PMC12681052
DOI: 10.1093/jamiaopen/ooaf162

Using machine learning algorithms to optimize treatment with high-cost biologics in a national cohort of patients with inflammatory bowel disease

Jason K Hou et al. JAMIA Open. 2025.

. 2025 Dec 3;8(6):ooaf162.

doi: 10.1093/jamiaopen/ooaf162. eCollection 2025 Dec.

Authors

Affiliations

¹ Section of Gastroenterology and Hepatology, Baylor College of Medicine, Houston, TX 77030, United States.
² Center for Innovations in Quality, Effectiveness and Safety (IQuESt), Michael E. DeBakey Veterans Affairs Medical Center, Houston, TX 77030, United States.
³ Section of Health Services Research, Baylor College of Medicine, Houston, TX 77030, United States.
⁴ Department of Statistics, University of Michigan, Ann Arbor, MI, 48109, United States.
⁵ Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN 46556, United States.
⁶ Center for Clinical Management Research, VA Ann Arbor Healthcare System, Ann Arbor, MI 48109, United States.
⁷ Veterans Affairs Center for Medication Safety, Hines, IN, 60141, United States.
⁸ United Stated Department of Veterans Affairs VA Pharmacy Benefits Management Services, Washington, DC, 20422, United States.
⁹ Center for Global Health and Equity, University of Michigan, Ann Arbor, 48109, United States.
¹⁰ Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, 48109, United States.

PMID: 41356407
PMCID: PMC12681052
DOI: 10.1093/jamiaopen/ooaf162

Abstract

Objectives: Prediction models using statistical or machine learning (ML) approaches can enhance clinical decision support tools. Infliximab (IFX), a biologic with a newly introduced biosimilar for Crohn's disease (CD) and ulcerative colitis (UC), presents an opportunity to evaluate these tools at time of biosimilar switch to predict disease flares. This study sought to evaluate real-world safety and effectiveness of nonmedical IFX biosimilar switch in a national US cohort of CD and UC patients, and to develop and compare interpretable models for predicting adverse clinical events among patients on maintenance IFX.

Materials and methods: This retrospective cohort study used administrative and clinical data from the National Veterans Health Administration Corporate Data Warehouse. It included 2529 Veterans with CD or UC on maintenance IFX (2017-2020), either continuing originator IFX or switching to a biosimilar. The primary outcome was disease-related flare. Classification and survival models were developed using traditional and ML methods and assessed via receiver operating characteristic curve, precision-recall curve, and decision curve analysis.

Results: In 2529 Veterans with CD or UC, biosimilar switch had low predictive importance across survival models. Objective laboratory-related information yielded the highest validation. Random forest+ (RF+) outperformed all other statistical and ML models. Prior flares and total health-care encounters were the 2 most important predictors, while hemoglobin was the top laboratory predictor.

Conclusions: Prediction models, particularly RF+, may aid in optimizing biologic therapy for CD and UC by identifying patients at higher risk of flare following a biosimilar switch.

Keywords: artificial intelligence; biologic products; biosimilar pharmaceuticals; inflammatory bowel diseases; machine learning.

Published by Oxford University Press on behalf of the American Medical Informatics Association 2025.

PubMed Disclaimer

Conflict of interest statement

J.K.H. has received research funding from Redhill Biosciences, Janssen, Abbvie, Celgene, Genentech, Eli-Lily, Lycera, and Pfizer Inc. Other authors report no competing interests.

Figures

**Figure 1.**
Validation AUROC prediction performance for classification models. (A) We summarize the validation AUROC prediction performance for predicting IBD-related flares using various prediction models (column) with different sets of predictor variables (color). Results are shown for different forecasting (x-axis) and accrual periods (row). Higher values indicate more accurate prediction performance. Lines represent the mean validation AUROC, averaged across 50 random training–validation–test splits. Standard errors are shown as shaded regions around the mean estimates. Across all forecasting and accrual periods, incorporating laboratory-related information through the mean, max, and most recent lab values (orange) generally yields the highest validation AUROC. (B) Moreover, we show the distribution of pairwise differences between the validation AUROC from RF+ including the mean, max, and most recent lab data and that from competitor methods (x-axis) across different forecasting (columns) and accrual periods (rows). Across the various forecasting and accrual periods, RF+ yields significantly higher validation prediction accuracy than the other competitor methods. Asterisks represent the strength of statistical significance from a permutation test that formally assesses whether RF+ yields higher prediction accuracy than the competitor method (ie, whether the differences are significantly greater than 0). Specifically, ***P< .001, **P<.01, and *P <.05. Results are shown using 50 random training–validation–test splits. Abbreviations: AUROC, area under the receiver operating characteristic curve; IBD, inflammatory bowel disease; RF, random forest.

**Figure 2.**
RF+ calibration and decision curve analysis. (A) The number of observed flares closely matches the number of predicted flares from our best classification model, RF+ including the mean, max, and most recent lab data, across different forecasting (color) and accrual (shape) periods. The dashed line represents perfect calibration. (B) Using a decision curve analysis, we evaluated the net benefit of the RF+ model compared to that from the logistic regression model, treating all patients, or treating no patients. The net benefit (y-axis) is shown across different threshold probabilities (x-axis) for different forecasting (columns) and accrual periods (rows). The dotted line represents the threshold probability corresponding to the optimal sensitivity and specificity for the RF+ model. RF+ yields higher net benefit than the other strategies across a range of reasonable threshold probabilities. Abbreviation: RF, random forest.

**Figure 3.**
Most important features from RF+ for predicting IBD-related flares. (A) For each forecasting (columns) and accrual (rows) period, we measured the mean decrease in impurity+ (MDI+) importance (black) of each feature in the RF+ including the mean, max, and most recent lab data. Features are ordered by their mean total MDI+ importance with higher values indicating greater importance. Additionally, for each feature, we quantified the importance of their linear (red) and nonlinear (blue) contributions in the RF+ model. Each point represents the mean importance of a feature across 50 random training–validation–test splits, and error bars represent the standard errors. The top 15 most important features are shown for each choice of forecasting and accrual period. (B) Partial dependence plots are shown for the 6 most important features from the RF+ model including the mean, max, and most recent labs for the 1-year accrual, 3-month forecasting model. For each of these important features, the partial dependence plot shows the average predicted probability (or risk) of flaring across different values of that feature while keeping all other features fixed. This reveals whether the predicted risk of flaring increases or decreases as values of the feature (eg, # of prior flares) increase. The partial dependence functions are summarized across 50 random training–validation–test splits with the black solid line representing the mean and the shaded region representing the inner 95% quantiles of this distribution. Abbreviations: IBD, inflammatory bowel disease; RF, random forest.

See this image and copyright information in PMC

References

1. Huoponen S, Blom M. A systematic review of the cost-effectiveness of biologics for the treatment of inflammatory bowel diseases. PLoS One. 2015;10:e0145087. - PMC - PubMed
1. Mehta F. Report: economic implications of inflammatory bowel disease and its management. Am J Manag Care. 2016;22:s51-s60. - PubMed
1. McNicol M, Abdel-Rasoul M, McClinchie MG, et al. Clinical outcomes and cost savings of a nonmedical switch to a biosimilar in children/young adults with inflammatory bowel disease. J Pediatr Gastroenterol Nutr. 2024;78:644-652. - PubMed
1. Patel KB, Arantes LH Jr, Tang WY, Fung S. The role of biosimilars in value-based oncology care. Cancer Manag Res. 2018;10:4591-4602. - PMC - PubMed
1. Ye BD, Pesegova M, Alexeeva O, et al. Efficacy and safety of biosimilar CT-P13 compared with originator infliximab in patients with active Crohn’s disease: an international, randomised, double-blind, phase 3 non-inferiority study. Lancet. 2019;393:1699-1707. - PubMed

LinkOut - more resources

Full Text Sources
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Using machine learning algorithms to optimize treatment with high-cost biologics in a national cohort of patients with inflammatory bowel disease

Affiliations

Using machine learning algorithms to optimize treatment with high-cost biologics in a national cohort of patients with inflammatory bowel disease

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources