Automated machine learning for genome wide association studies
- PMID: 37672022
- PMCID: PMC10562960
- DOI: 10.1093/bioinformatics/btad545
Automated machine learning for genome wide association studies
Abstract
Motivation: Genome-wide association studies (GWAS) present several computational and statistical challenges for their data analysis, including knowledge discovery, interpretability, and translation to clinical practice.
Results: We develop, apply, and comparatively evaluate an automated machine learning (AutoML) approach, customized for genomic data that delivers reliable predictive and diagnostic models, the set of genetic variants that are important for predictions (called a biosignature), and an estimate of the out-of-sample predictive power. This AutoML approach discovers variants with higher predictive performance compared to standard GWAS methods, computes an individual risk prediction score, generalizes to new, unseen data, is shown to better differentiate causal variants from other highly correlated variants, and enhances knowledge discovery and interpretability by reporting multiple equivalent biosignatures.
Availability and implementation: Code for this study is available at: https://github.com/mensxmachina/autoML-GWAS. JADBio offers a free version at: https://jadbio.com/sign-up/. SNP data can be downloaded from the EGA repository (https://ega-archive.org/). PRS data are found at: https://www.aicrowd.com/challenges/opensnp-height-prediction. Simulation data to study population structure can be found at: https://easygwas.ethz.ch/data/public/dataset/view/1/.
© The Author(s) 2023. Published by Oxford University Press.
Conflict of interest statement
I.T., P.C., Z.P., S.F., and V.L. are or were directly or indirectly affiliated with Gnosis Data Analysis that offers the JADBio service commercially.
Figures



Similar articles
-
RAINBOW: Haplotype-based genome-wide association study using a novel SNP-set method.PLoS Comput Biol. 2020 Feb 14;16(2):e1007663. doi: 10.1371/journal.pcbi.1007663. eCollection 2020 Feb. PLoS Comput Biol. 2020. PMID: 32059004 Free PMC article.
-
Scaling tree-based automated machine learning to biomedical big data with a feature set selector.Bioinformatics. 2020 Jan 1;36(1):250-256. doi: 10.1093/bioinformatics/btz470. Bioinformatics. 2020. PMID: 31165141 Free PMC article.
-
Robust SNP-based prediction of rheumatoid arthritis through machine-learning-optimized polygenic risk score.J Transl Med. 2023 Feb 7;21(1):92. doi: 10.1186/s12967-023-03939-5. J Transl Med. 2023. PMID: 36750873 Free PMC article.
-
Automated machine learning: Review of the state-of-the-art and opportunities for healthcare.Artif Intell Med. 2020 Apr;104:101822. doi: 10.1016/j.artmed.2020.101822. Epub 2020 Feb 21. Artif Intell Med. 2020. PMID: 32499001 Review.
-
The promise of automated machine learning for the genetic analysis of complex traits.Hum Genet. 2022 Sep;141(9):1529-1544. doi: 10.1007/s00439-021-02393-x. Epub 2021 Oct 28. Hum Genet. 2022. PMID: 34713318 Free PMC article. Review.
Cited by
-
AutoXAI4Omics: an automated explainable AI tool for omics and tabular data.Brief Bioinform. 2024 Nov 22;26(1):bbae593. doi: 10.1093/bib/bbae593. Brief Bioinform. 2024. PMID: 39576223 Free PMC article.
-
From Serendipity to Precision: Integrating AI, Multi-Omics, and Human-Specific Models for Personalized Neuropsychiatric Care.Biomedicines. 2025 Jan 12;13(1):167. doi: 10.3390/biomedicines13010167. Biomedicines. 2025. PMID: 39857751 Free PMC article. Review.
-
Single-cell transcriptome analysis revealed heterogeneity in glycolysis and identified IGF2 as a therapeutic target for ovarian cancer subtypes.BMC Cancer. 2024 Jul 31;24(1):926. doi: 10.1186/s12885-024-12688-7. BMC Cancer. 2024. PMID: 39085784 Free PMC article.
References
-
- Adamou M, Antoniou G, Greasidou E. et al. Toward automatic risk assessment to support suicide prevention. Crisis 2019;40:249–56. - PubMed
-
- Agrapetidou A, Charonyktakis P, Gogas P. et al. An AutoML application to forecasting bank failures. Appl Econ Lett 2021;28:5–9.
-
- Batsakis S, Adamou M, Tachmazidis I. et al. Data-driven decision support for autism diagnosis using machine learning. Digital 2022;2:224.
-
- Borboudakis G, Stergiannakos T, Frysali M. et al. Chemically intuited, large-scale screening of MOFs by machine learning techniques. npj Comput Mater 2017;3:40.