Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jun 4;14(11):1182.
doi: 10.3390/diagnostics14111182.

Advances in Inflammatory Bowel Disease Diagnostics: Machine Learning and Genomic Profiling Reveal Key Biomarkers for Early Detection

Affiliations

Advances in Inflammatory Bowel Disease Diagnostics: Machine Learning and Genomic Profiling Reveal Key Biomarkers for Early Detection

Asif Hassan Syed et al. Diagnostics (Basel). .

Abstract

This study, utilizing high-throughput technologies and Machine Learning (ML), has identified gene biomarkers and molecular signatures in Inflammatory Bowel Disease (IBD). We could identify significant upregulated or downregulated genes in IBD patients by comparing gene expression levels in colonic specimens from 172 IBD patients and 22 healthy individuals using the GSE75214 microarray dataset. Our ML techniques and feature selection methods revealed six Differentially Expressed Gene (DEG) biomarkers (VWF, IL1RL1, DENND2B, MMP14, NAAA, and PANK1) with strong diagnostic potential for IBD. The Random Forest (RF) model demonstrated exceptional performance, with accuracy, F1-score, and AUC values exceeding 0.98. Our findings were rigorously validated with independent datasets (GSE36807 and GSE10616), further bolstering their credibility and showing favorable performance metrics (accuracy: 0.841, F1-score: 0.734, AUC: 0.887). Our functional annotation and pathway enrichment analysis provided insights into crucial pathways associated with these dysregulated genes. DENND2B and PANK1 were identified as novel IBD biomarkers, advancing our understanding of the disease. The validation in independent cohorts enhances the reliability of these findings and underscores their potential for early detection and personalized treatment of IBD. Further exploration of these genes is necessary to fully comprehend their roles in IBD pathogenesis and develop improved diagnostic tools and therapies. This study significantly contributes to IBD research with valuable insights, potentially greatly enhancing patient care.

Keywords: differentially expressed genes (DEGs); feature selection (FS); gene ontology; high-throughput technologies; inflammatory bowel disease (IBD); machine learning (ML); pathway enrichment analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted without any commercial or financial relationships construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
(A) Illustrates the intended framework for selecting and identifying potential DEGs from the GEO75214 gene expression dataset. (B) Depicts the framework s to screen the best supervised classification model that effectively differentiates IBD from healthy control samples. (C) Represents the RF model built using the six DEG biomarkers in independent cohorts.
Figure 2
Figure 2
Differential Gene Expression Patterns between IBD and Normal samples of the GSE75214 cohort. (a) The Figure displays the heatmap results of the upregulated genes between the IBD and Normal subjects. (b) The Figure displays the heatmap results of the downregulated genes between the IBD and Normal subjects. The color scale ranges from dark blue, indicating low expression, to dark red, indicating high expression. The expression levels provide insights into the contrasting gene expression patterns associated with IBD and Normal subjects.
Figure 3
Figure 3
Analysis of DEGs between IBD and Healthy Controls from the GSE75214 cohort. (a) The volcano plot illustrates the DEGs observed between IBD and normal individuals in the GSE75214 cohort. The y-axis represents the negative logarithm (base 10) of the p-value, while the x-axis represents the log2 fold change. The significant DEGs, meeting the criteria of a p-value less than 0.001 and a fold change exceeding the threshold of 1.06712, are highlighted on the plot. (b) Venn diagram illustrating the overlap of DEGs in the GSE75214 cohorts. The diagram shows the genes that are common DEGs (upregulated and downregulated) between the two groups (IBD and Normal) of the GSE75214 cohort.
Figure 4
Figure 4
KDE subplots illustrate the expression distribution of six genes (VMF, IL1RL1, DENND2B, MMP14, NAAA, and PANK1) across two groups (IBD patients and Normal controls).
Figure 5
Figure 5
Comparison of Accuracy, F1-Score, and AUC Scores between ‘Six Gene Biomarkers’ and ‘Baseline (33,253 Genes)’ based ML models with SMOTE and without SMOTE. Error bars represent the standard deviation values for each performance evaluator.
Figure 6
Figure 6
Illustrates a visualization of the optimized RF-based classification model’s performance using a confusion matrix.
Figure 7
Figure 7
Performance of six biomarker-based optimized RF models on different IBD cohorts (GDE30687 and GSE10616).
Figure 8
Figure 8
Presents a comparative evaluation of accuracy and AUC values between our and related models [21,22,24,28,29,30,31,32,45].

Similar articles

References

    1. Alatab S., Sepanlou S.G., Ikuta K., Vahedi H., Bisignano C., Safiri S., Sadeghi A., Nixon M.R., Abdoli A., Abolhassani H., et al. The Global, Regional, and National Burden of Inflammatory Bowel Disease in 195 Countries and Territories, 1990–2017: A Systematic Analysis for the Global Burden of Disease Study 2017. Lancet Gastroenterol. Hepatol. 2020;5:17–30. doi: 10.1016/S2468-1253(19)30333-4. - DOI - PMC - PubMed
    1. Wang R., Li Z., Liu S., Zhang D. Global, Regional and National Burden of Inflammatory Bowel Disease in 204 Countries and Territories from 1990 to 2019: A Systematic Analysis Based on the Global Burden of Disease Study 2019. BMJ Open. 2023;13:e065186. doi: 10.1136/bmjopen-2022-065186. - DOI - PMC - PubMed
    1. Bourgonje A.R., Van Goor H., Faber K.N., Dijkstra G. Clinical Value of Multi-Omics-Based Biomarker Signatures in Inflammatory Bowel Diseases: Challenges and Opportunities. Clin. Transl. Gastroenterol. 2023;14:e00579. doi: 10.14309/ctg.0000000000000579. - DOI - PMC - PubMed
    1. Seyed Tabib N.S., Madgwick M., Sudhakar P., Verstockt B., Korcsmaros T., Vermeire S. Big Data in IBD: Big Progress for Clinical Practice. Gut. 2020;69:1520–1532. doi: 10.1136/gutjnl-2019-320065. - DOI - PMC - PubMed
    1. Dhyani M., Joshi N., Bemelman W.A., Gee M.S., Yajnik V., D’Hoore A., Traverso G., Donowitz M., Mostoslavsky G., Lu T.K., et al. Challenges in IBD Research: Novel Technologies. Inflamm. Bowel Dis. 2019;25:S24–S30. doi: 10.1093/ibd/izz077. - DOI - PMC - PubMed

LinkOut - more resources