Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 22;26(1):bbae593.
doi: 10.1093/bib/bbae593.

AutoXAI4Omics: an automated explainable AI tool for omics and tabular data

Affiliations

AutoXAI4Omics: an automated explainable AI tool for omics and tabular data

James Strudwick et al. Brief Bioinform. .

Abstract

Machine learning (ML) methods offer opportunities for gaining insights into the intricate workings of complex biological systems, and their applications are increasingly prominent in the analysis of omics data to facilitate tasks, such as the identification of novel biomarkers and predictive modeling of phenotypes. For scientists and domain experts, leveraging user-friendly ML pipelines can be incredibly valuable, enabling them to run sophisticated, robust, and interpretable models without requiring in-depth expertise in coding or algorithmic optimization. By streamlining the process of model development and training, researchers can devote their time and energies to the critical tasks of biological interpretation and validation, thereby maximizing the scientific impact of ML-driven insights. Here, we present an entirely automated open-source explainable AI tool, AutoXAI4Omics, that performs classification and regression tasks from omics and tabular numerical data. AutoXAI4Omics accelerates scientific discovery by automating processes and decisions made by AI experts, e.g. selection of the best feature set, hyper-tuning of different ML algorithms and selection of the best ML model for a specific task and dataset. Prior to ML analysis AutoXAI4Omics incorporates feature filtering options that are tailored to specific omic data types. Moreover, the insights into the predictions that are provided by the tool through explainability analysis highlight associations between omic feature values and the targets under investigation, e.g. predicted phenotypes, facilitating the identification of novel actionable insights. AutoXAI4Omics is available at: https://github.com/IBM/AutoXAI4Omics.

Keywords: automated; explainable; machine learning; omics.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1
Figure 1
Overview of the AutoXAI4Omics XAI workflow from data input to results and interpretation.
Figure 2
Figure 2
Binary classification using AutoXAI4Omics: a case study in plant genomics. Figure summarises the predictive performance of AutoXAI4Omics in predicting either two-rowed (0) or six-rowed barley (1). (a) Box plots displaying the f1-score during cross validation, (b) confusion matrix for the best performing ML model (XGBoost), (c) ROC curve for the best performing model, and (d) feature selection accuracy curve.
Figure 3
Figure 3
Binary classification using AutoXAI4Omics: a case study in plant genomics. Figure summarizes the XAI output relating to the best performing model from AutoXAI4Omics(XGBoost) to predict either two-rowed (0) or six-rowed barley (1). (a) Bar plot summarizing the top 15 feature values selected for the best model (XGBoost), (b) SHAP global view of explanations for six-rowed predictions, and (c) SHAP global view of explanations for two-rowed Barley predictions.
Figure 4
Figure 4
Multi-class classification using AutoXAI4Omics: a case study with human RNA-seq data. (a) Bar charts displaying the f1-score on the held-out test dataset, (b) confusion matrix for the best performing ML model (random forest), (c) bar chart displays feature rank from feature importance analysis with their average abundance, and (d) ROC curve for the best performing ML model.
Figure 5
Figure 5
Multi-class classification using AutoXAI4Omics: a case study with human RNA-seq data. XAI output relating to the best ML model (random forest) to predict three classes unstim, LPS, and dNS1. (a) Global explanation for class LPS, (b) global explanation for class dNS1, and (c) global explanation for class unstim.
Figure 6
Figure 6
Regression using AutoXAI4Omics: a case study with environmental microbiome data. Predictive performance of AutoXAI4Omics to predict soil pH from the soil microbiome. (a) Bar charts displaying the MAE on the held-out test data set across a range of ML models, (b) box plot displaying MAE on cross validation across a range of models, (c) correlation, and (d) joint plot to compare predicted values (y-axis) with true values (x-axis) for the best performing ML model (random forest). Diagonal dotted line is also shown.
Figure 7
Figure 7
Regression using AutoXAI4Omics: a case study with human RNA-seq data. Figure summarizes the XAI output from SHAP relating to the best ML model generated by AutoXAI4Omics (random forest) to predict soil pH.

References

    1. Wang H, Fu T, du Y. et al. . Scientific discovery in the age of artificial intelligence. Nature 2023;620:47–60. 10.1038/s41586-023-06221-2. - DOI - PubMed
    1. Gao F, Huang K, Xing Y. Artificial intelligence in omics. Genomics Proteomics Bioinformatics 2023;20:811–3. 10.1016/j.gpb.2023.01.002. - DOI - PMC - PubMed
    1. Mieth B, Rozier A, Rodriguez JA. et al. . DeepCOMBI: explainable artificial intelligence for the analysis and discovery in genomewide association studies. NAR Genomics and Bioinformatics 2021;3:lqab065. 10.1093/nargab/lqab065. - DOI - PMC - PubMed
    1. Lakiotaki K, Papadovasilakis Z, Lagani V. et al. . Automated machine learning for genome wide association studies. Bioinformatics 2023;39:btad545. 10.1093/bioinformatics/btad545. - DOI - PMC - PubMed
    1. Ibanez K, Polke J, Hagelstrom RT. et al. . Whole genome sequencing for the diagnosis of neurological repeat expansion disorders in the UK: a retrospective diagnostic accuracy and prospective clinical validation study. Lancet Neurol 2022;21:234–45. 10.1016/S1474-4422(21)00462-2. - DOI - PMC - PubMed

Grants and funding