AutoXAI4Omics: an automated explainable AI tool for omics and tabular data

Affiliations

¹ IBM Research Europe, The Hartree Centre - Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom.
² Earlham Institute, Norwich Research Park, Colney Lane, Norwich NR4 7UZ.
³ IBM T.J. Watson Research Center, 1101 Kitchawan Rd, Yorktown Heights, NY 10598, United States.
⁴ IBM Research, Almaden, 650 Harry Rd, San Jose, CA 95120, United States.
⁵ STFC, The Hartree Centre, Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom.

PMID: 39576223
PMCID: PMC11583442
DOI: 10.1093/bib/bbae593

AutoXAI4Omics: an automated explainable AI tool for omics and tabular data

James Strudwick et al. Brief Bioinform. 2024.

. 2024 Nov 22;26(1):bbae593.

doi: 10.1093/bib/bbae593.

Authors

Affiliations

¹ IBM Research Europe, The Hartree Centre - Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom.
² Earlham Institute, Norwich Research Park, Colney Lane, Norwich NR4 7UZ.
³ IBM T.J. Watson Research Center, 1101 Kitchawan Rd, Yorktown Heights, NY 10598, United States.
⁴ IBM Research, Almaden, 650 Harry Rd, San Jose, CA 95120, United States.
⁵ STFC, The Hartree Centre, Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom.

PMID: 39576223
PMCID: PMC11583442
DOI: 10.1093/bib/bbae593

Abstract

Machine learning (ML) methods offer opportunities for gaining insights into the intricate workings of complex biological systems, and their applications are increasingly prominent in the analysis of omics data to facilitate tasks, such as the identification of novel biomarkers and predictive modeling of phenotypes. For scientists and domain experts, leveraging user-friendly ML pipelines can be incredibly valuable, enabling them to run sophisticated, robust, and interpretable models without requiring in-depth expertise in coding or algorithmic optimization. By streamlining the process of model development and training, researchers can devote their time and energies to the critical tasks of biological interpretation and validation, thereby maximizing the scientific impact of ML-driven insights. Here, we present an entirely automated open-source explainable AI tool, AutoXAI4Omics, that performs classification and regression tasks from omics and tabular numerical data. AutoXAI4Omics accelerates scientific discovery by automating processes and decisions made by AI experts, e.g. selection of the best feature set, hyper-tuning of different ML algorithms and selection of the best ML model for a specific task and dataset. Prior to ML analysis AutoXAI4Omics incorporates feature filtering options that are tailored to specific omic data types. Moreover, the insights into the predictions that are provided by the tool through explainability analysis highlight associations between omic feature values and the targets under investigation, e.g. predicted phenotypes, facilitating the identification of novel actionable insights. AutoXAI4Omics is available at: https://github.com/IBM/AutoXAI4Omics.

Keywords: automated; explainable; machine learning; omics.

PubMed Disclaimer

Figures

**Figure 1**
Overview of the AutoXAI4Omics XAI workflow from data input to results and interpretation.

**Figure 2**
Binary classification using AutoXAI4Omics: a case study in plant genomics. Figure summarises the predictive performance of AutoXAI4Omics in predicting either two-rowed (0) or six-rowed barley (1). (a) Box plots displaying the f1-score during cross validation, (b) confusion matrix for the best performing ML model (XGBoost), (c) ROC curve for the best performing model, and (d) feature selection accuracy curve.

**Figure 3**
Binary classification using AutoXAI4Omics: a case study in plant genomics. Figure summarizes the XAI output relating to the best performing model from AutoXAI4Omics(XGBoost) to predict either two-rowed (0) or six-rowed barley (1). (a) Bar plot summarizing the top 15 feature values selected for the best model (XGBoost), (b) SHAP global view of explanations for six-rowed predictions, and (c) SHAP global view of explanations for two-rowed Barley predictions.

**Figure 4**
Multi-class classification using AutoXAI4Omics: a case study with human RNA-seq data. (a) Bar charts displaying the f1-score on the held-out test dataset, (b) confusion matrix for the best performing ML model (random forest), (c) bar chart displays feature rank from feature importance analysis with their average abundance, and (d) ROC curve for the best performing ML model.

**Figure 5**
Multi-class classification using AutoXAI4Omics: a case study with human RNA-seq data. XAI output relating to the best ML model (random forest) to predict three classes unstim, LPS, and dNS1. (a) Global explanation for class LPS, (b) global explanation for class dNS1, and (c) global explanation for class unstim.

**Figure 6**
Regression using AutoXAI4Omics: a case study with environmental microbiome data. Predictive performance of AutoXAI4Omics to predict soil pH from the soil microbiome. (a) Bar charts displaying the MAE on the held-out test data set across a range of ML models, (b) box plot displaying MAE on cross validation across a range of models, (c) correlation, and (d) joint plot to compare predicted values (y-axis) with true values (x-axis) for the best performing ML model (random forest). Diagonal dotted line is also shown.

**Figure 7**
Regression using AutoXAI4Omics: a case study with human RNA-seq data. Figure summarizes the XAI output from SHAP relating to the best ML model generated by AutoXAI4Omics (random forest) to predict soil pH.

See this image and copyright information in PMC

References

1. Wang H, Fu T, du Y. et al. Scientific discovery in the age of artificial intelligence. Nature 2023;620:47–60. 10.1038/s41586-023-06221-2. - DOI - PubMed
1. Gao F, Huang K, Xing Y. Artificial intelligence in omics. Genomics Proteomics Bioinformatics 2023;20:811–3. 10.1016/j.gpb.2023.01.002. - DOI - PMC - PubMed
1. Mieth B, Rozier A, Rodriguez JA. et al. DeepCOMBI: explainable artificial intelligence for the analysis and discovery in genomewide association studies. NAR Genomics and Bioinformatics 2021;3:lqab065. 10.1093/nargab/lqab065. - DOI - PMC - PubMed
1. Lakiotaki K, Papadovasilakis Z, Lagani V. et al. Automated machine learning for genome wide association studies. Bioinformatics 2023;39:btad545. 10.1093/bioinformatics/btad545. - DOI - PMC - PubMed
1. Ibanez K, Polke J, Hagelstrom RT. et al. Whole genome sequencing for the diagnosis of neurological repeat expansion disorders in the UK: a retrospective diagnostic accuracy and prospective clinical validation study. Lancet Neurol 2022;21:234–45. 10.1016/S1474-4422(21)00462-2. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

2578607/UKRI-BBSRC

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

AutoXAI4Omics: an automated explainable AI tool for omics and tabular data

Affiliations

AutoXAI4Omics: an automated explainable AI tool for omics and tabular data

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources