Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 15;26(12):5741.
doi: 10.3390/ijms26125741.

Prediction of Extraintestinal Manifestations in Inflammatory Bowel Disease Using Clinical and Genetic Variables with Machine Learning in a Latin IBD Group

Affiliations

Prediction of Extraintestinal Manifestations in Inflammatory Bowel Disease Using Clinical and Genetic Variables with Machine Learning in a Latin IBD Group

Tamara Pérez-Jeldres et al. Int J Mol Sci. .

Abstract

Extraintestinal manifestations (EIMs) significantly increase morbidity in inflammatory bowel disease (IBD) patients. In this study, we examined clinical and genetic factors associated with EIMs in 414 Latin IBD patients, utilizing machine learning for predictive modeling. In our IBD group (314 ulcerative colitis (UC) and 100 Crohn's disease (CD) patients), EIM presence was assessed. Clinical differences between patients with and without EIMs were analyzed using Chi-square and Mann-Whitney U tests. Based on the genetic data of 232 patients, we identified variants linked to EIMs, and the polygenic risk score (PRS) was calculated. A machine learning approach based on logistic regression (LR), random forest (RF), and gradient boosting (GB) models was employed for predicting EIMs. EIMs were present in 29% (120/414) of patients. EIM patients were older (52 vs. 45 years, p = 0.01) and were more likely to have a family history of IBD (p = 0.02) or use anti-TNF therapy (p = 0.01). EIMs were more common in patients with CD than in those with UC without reaching statistical significance (p = 0.06). Four genetic variants were associated with EIM risk (rs9936833, rs4410871, rs3132680, and rs3823417). While the PRS showed limited predictive power (AUC = 0.69), the LR, GB, and RF models demonstrated good predictive capabilities. Approximately one-third of IBD patients experienced EIMs. Significant risk factors included genetic variants, family history, age, and anti-TNF therapy, with predictive models effectively identifying EIM risk.

Keywords: extraintestinal manifestation; genetic variants; inflammatory bowel disease.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Frequency of extraintestinales manifestation in the Chilean IBD group.
Figure 2
Figure 2
The polygenic risk score for extraintestinal manifestations based on the data reported in Khrom’s study. (a) The density plot illustrates the overlap in predicting extraintestinal manifestations (EIMs). (b) The box plot illustrates the Z-scores for each group (0 = EIM− and 1 = EIM+), showing no significant difference between their means. The Welch two-sample t-test results indicate a test statistic of t = 1.3925, degrees of freedom = 160.8, and p-value = 0.1657, confirming the absence of statistical significance. (c) The AUROC analysis is presented. EIM: extraintestinal manifestation.
Figure 3
Figure 3
The polygenic risk score for extraintestinal manifestations based on the data reported in Liu’s study. (a) The density plot illustrates the overlap in predicting extraintestinal manifestations (EIMs). (b) The box plot illustrates the Z-scores for each group (0 = EIM− and 1 = EIM+), showing no significant difference between their means. The Welch two-sample t-test results indicate a test statistic of t = 0.79421, degrees of freedom = 153.47, and p-value = 0.4283, confirming the absence of statistical significance. (c) The AUROC (area under receiver operating curve) analysis is presented.
Figure 4
Figure 4
ROC of logistic regression model.
Figure 5
Figure 5
Performance metrics and feature importance in random forest classification for predicting extraintestinal manifestations. (a) The confusion matrix of the training data shows perfect performance: 119 true negatives, 119 true positives, and no errors, with perfect metrics (1.00) for both classes. (b) The confusion matrix of the test data shows 18 true negatives, 23 true positives, 12 false positives, and 7 false negatives. The classification report provides the following metrics: Class 0 (EIMs−): precision, 0.72; recall, 0.60; F1-score, 0.65. Class 1(EIMs+): precision; 0.66; recall, 0.77; F1-score, 0.71. The overall accuracy is 0.68. Precision = true positives/(true positives + false positives), recall = true positives/(true positives + false negatives), F1-score = 2 × (precision × recall)/(precision + recall), and overall accuracy = (true positives + true negatives)/total predictions. AUC = area under curve. (c) The AUROC plot illustrates the model’s sensitivity and specificity for both the training (AUC = 1.00) and test datasets (AUC = 0.79). (d) This bar chart displays the top features influencing the random forest model’s decisions. Feature importance variables for EIM prediction. chr16:86369512 = rs9936833, chr8:127802783 = rs4410871, chr6:31133092 = rs3823417, and chr6:30105418 = rs3132680. EIM: extraintestinal manifestation; ROC: receiver operating characteristic; AUC: area under curve.
Figure 6
Figure 6
Performance metrics and feature importance in the gradient-boosting model for predicting extraintestinal manifestations. (a) The confusion matrix of the training data shows perfect performance: 119 true negatives, 119 true positives, and no errors, with perfect metrics (1.00) for both classes. (b) The confusion matrix of the test data shows 18 true negatives, 23 true positives, 12 false positives, and 7 false negatives. The classification report provides the following metrics: Class 0 (EIM−): precision, 0.72; recall, 0.60; F1-score, 0.65. Class 1(EIM+): precision; 0.66; recall, 0.77; F1-score, 0.71. The overall accuracy is 0.68. Precision = true positives/(true positives + false positives), recall = true positives/(true positives + false negatives), F1-score = 2 × (precision × recall)/(precision + recall), and overall accuracy = (true positives + true negatives)/total predictions. (c) The AUROC plot illustrates the model’s sensitivity and specificity for both the training (AUC = 1.00) and test datasets (AUC = 0.79). AUC = area under curve. (d) Feature importance variables for EIM prediction. chr16:86369512 = rs9936833, chr8:127802783 = rs4410871, chr6:31133092 = rs3823417, and chr6:30105418 = rs3132680. EIM: extraintestinal manifestation; ROC: receiver operating characteristic; AUC: area under curve.

References

    1. Khor B., Gardet A., Xavier R.J. Genetics and pathogenesis of inflammatory bowel disease. Nature. 2011;474:307–317. doi: 10.1038/nature10209. - DOI - PMC - PubMed
    1. Swaminathan A., Day A.S., Sparrow M.P., Peyrin-Biroulet L., Siegel C.A., Gearry R.B. Review article: Measuring disease severity in inflammatory bowel disease—Beyond treat to target. Aliment. Pharmacol. Ther. 2024;60:1176–1199. doi: 10.1111/apt.18231. - DOI - PubMed
    1. Gordon H., Burisch J., Ellul P., Karmiris K., Katsanos K., Allocca M., Bamias G., Barreiro-de Acosta M., Braithwaite T., Greuter T., et al. ECCO Guidelines on Extraintestinal Manifestations in Inflammatory Bowel Disease. J. Crohn’s Colitis. 2024;18:1–37. doi: 10.1093/ecco-jcc/jjad108. - DOI - PubMed
    1. El Hadad J., Schreiner P., Vavricka S.R., Greuter T. The Genetics of Inflammatory Bowel Disease. Mol. Diagn. Ther. 2024;28:27–35. doi: 10.1007/s40291-023-00678-7. - DOI - PMC - PubMed
    1. Rankin G.B., Watts H.D., Melnyk C.S., Kelley M.L. National Cooperative Crohn’s Disease Study: Extraintestinal manifestations and perianal complications. Gastroenterology. 1979;77:914–920. doi: 10.1016/0016-5085(79)90391-3. - DOI - PubMed

LinkOut - more resources