Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 11;19(1):jjae197.
doi: 10.1093/ecco-jcc/jjae197.

Multi-omics data integration identifies novel biomarkers and patient subgroups in inflammatory bowel disease

Affiliations

Multi-omics data integration identifies novel biomarkers and patient subgroups in inflammatory bowel disease

António José Preto et al. J Crohns Colitis. .

Abstract

Background: Inflammatory bowel disease (IBD), comprising Crohn's disease (CD) and ulcerative colitis (UC), is a complex condition with diverse manifestations; recent advances in multi-omics technologies are helping researchers unravel its molecular characteristics to develop targeted treatments.

Objectives: In this work, we explored one of the largest multi-omics cohorts in IBD, the Study of a Prospective Adult Research Cohort (SPARC IBD), with the goal of identifying predictive biomarkers for CD and UC and elucidating patient subtypes.

Design: We analyzed genomics, transcriptomics (gut biopsy samples), and proteomics (blood plasma) from hundreds of patients from SPARC IBD. We trained a machine learning model that classifies UC versus CD samples. In parallel, we integrated multi-omics data to unveil patient subgroups in each of the 2 indications independently and analyzed the molecular phenotypes of these patient subpopulations.

Results: The high performance of the model showed that multi-omics signatures are able to discriminate between the 2 indications. The most predictive features of the model, both known and novel omics signatures for IBD, can potentially be used as diagnostic biomarkers. Patient subgroup analysis in each indication uncovered omics features associated with disease severity in UC patients and with tissue inflammation in CD patients. This culminates with the observation of 2 CD subpopulations characterized by distinct inflammation profiles.

Conclusions: Our work unveiled potential biomarkers to discriminate between CD and UC and to stratify each population into well-defined subgroups, offering promising avenues for the application of precision medicine strategies.

Keywords: Crohn’s disease; inflammatory bowel disease; machine learning; multi-omics; precision medicine; ulcerative colitis.

PubMed Disclaimer

Conflict of interest statement

All authors were employees of Enveda Inc. during the course of this work and have real or potential ownership interest in the company.

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
Outline of the methodology. The starting point of our work is 3 types of omics data (proteomics, genomics, and transcriptomics) generated from samples from inflammatory bowel disease (IBD) patients (ie, Crohn’s disease [CD] and ulcerative colitis [UC]). We processed these 3 omics modalities to generate a multi-omics dataset. In 1 venue, we combined the omics data and trained an ML classifier that can accurately differentiate between samples derived from UC and CD patients. Parallely, we explored the characteristics of different subpopulations in each indication (CD and UC) using Multi‐Omics Factor Analysis (MOFA). This figure was created by using BioRender.com.
Figure 2.
Figure 2.
First row: PCA of the first 2 components of the transcriptomics samples, colored by batch (A), tissue (B), and diagnosis (C). Second row: PCA of the first 2 components of the transcriptomics samples after correcting for batch effect using batch variance and pyComBat, colored by batch (D), tissue (E), and diagnosis (F).
Figure 3.
Figure 3.
(A) Confusion matrix of the predictions on the test set of 1 of the 5 cross-validations (CVs) of the model. (B) Bar plots with different performance metrics and the standard deviation of the classifier over the 5 CVs evaluated on the test set. Performance is close to 0.8 across all metrics evaluated. (C) Top 10 most predictive features in the three omics.
Figure 4.
Figure 4.
(A) Distribution of the absolute factor 1 values stratified by disease severity based on endoscopy. The correlation between factor 1 and disease severity is 0.4 considering mild = 1, moderate = 2, severe = 3 to make disease severity a numeric variable. (B) Distribution of the expression values for IL17A (proteomics). (C) Top 10 enriched pathways using the significant proteomics features (orange). (D) Distribution of the expression values for TGFA (proteomics). (E) Top 10 enriched pathways using the significant transcriptomics features (green). (F) Distribution of the expression values for DLD (transcriptomics).
Figure 5.
Figure 5.
(A) Hierarchical clustering applied to the factors from Multi-Omics Factor Analysis (MOFA) for multi-omics Crohn’s disease (CD) samples. Transcriptomics and proteomics data are derived from colon and plasma, respectively. Clustering on all factors reveals 3 main clusters. (B) Hierarchical clustering applied exclusively to factor 3 and the reported macroscopic appearance for each sample (inflamed/normal). Samples with extreme values of factor 3 (dark-red and dark-blue) are likely to be inflamed, while values of factor 3 closer to zero (light-red, light-blue, and white) are less likely to be inflamed. (C) Distribution of the factor 3 values for the 2 identified subpopulations (A: cluster 1 and 3, and B: cluster 2) stratified by macroscopic appearance (inflamed/normal). (D–F) Distribution of the expression values for NOS2 (proteomics), IL12B (proteomics), and TSBP1-AS1 (transcriptomics) across the 2 subpopulations (A and B), stratified by macroscopic appearance. The 3 features are an example of an inflammation marker, cluster A specific marker, and cluster B specific marker, respectively. (G) Proportion of HLA genes among the top 150 transcriptomic features in factor 3. (H) Most common genes of the top 150 genomics features (SNPs). TSBP1 SNPs correspond to approximately 30% of the top genomics features. (I) Top 5 enriched pathways using the significant proteomics features for the inflamed cluster A (orange) and significant transcriptomics features for the inflamed cluster B (green). Statistical significance is measured with Mann–Whitney–Wilcoxon test 2-sided (Legend: **** = P-value < .0001; *** = P-value < .001; ** = P-value < .01; ** = P-value < .05; ns = nonsignificant).

References

    1. Baumgart DC, Carding SR.. Inflammatory bowel disease: cause and immunobiology. Lancet. 2007;369:1627–1640. https://doi.org/10.1016/S0140-6736(07)60750-8 - DOI - PubMed
    1. Weersma RK, Xavier RJ, Vermeire S, Barrett JC; IBD Multi Omics Consortium. Multiomics analyses to deliver the most effective treatment to every patient with inflammatory bowel disease. Gastroenterol. 2018;155:e1–e4. https://doi.org/10.1053/j.gastro.2018.07.039 - DOI - PubMed
    1. Lloyd-Price J, Arze C, Ananthakrishnan AN, et al.; IBDMDB Investigators. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature. 2019;569:655–662. https://doi.org/10.1038/s41586-019-1237-9 - DOI - PMC - PubMed
    1. Agrawal M, Allin KH, Petralia F, Colombel JF, Jess T.. Multiomics to elucidate inflammatory bowel disease risk factors and pathways. Nat Rev Gastroenterol Hepatol. 2022;19:399–409. https://doi.org/10.1038/s41575-022-00593-y - DOI - PMC - PubMed
    1. Subramanian I, Verma S, Kumar S, Jere A, Anamika K.. Multi-omics data integration, interpretation, and its application. Bioinf Biol Insights. 2020;14:1. https://doi.org/10.1177/1177932219899051 - DOI - PMC - PubMed