Disease prediction with multi-omics and biomarkers empowers case-control genetic discoveries in the UK Biobank
- PMID: 39261665
- PMCID: PMC11390475
- DOI: 10.1038/s41588-024-01898-1
Disease prediction with multi-omics and biomarkers empowers case-control genetic discoveries in the UK Biobank
Abstract
The emergence of biobank-level datasets offers new opportunities to discover novel biomarkers and develop predictive algorithms for human disease. Here, we present an ensemble machine-learning framework (machine learning with phenotype associations, MILTON) utilizing a range of biomarkers to predict 3,213 diseases in the UK Biobank. Leveraging the UK Biobank's longitudinal health record data, MILTON predicts incident disease cases undiagnosed at time of recruitment, largely outperforming available polygenic risk scores. We further demonstrate the utility of MILTON in augmenting genetic association analyses in a phenome-wide association study of 484,230 genome-sequenced samples, along with 46,327 samples with matched plasma proteomics data. This resulted in improved signals for 88 known (P < 1 × 10-8) gene-disease relationships alongside 182 gene-disease relationships that did not achieve genome-wide significance in the nonaugmented baseline cohorts. We validated these discoveries in the FinnGen biobank alongside two orthogonal machine-learning methods built for gene-disease prioritization. All extracted gene-disease associations and incident disease predictive biomarkers are publicly available ( http://milton.public.cgr.astrazeneca.com ).
© 2024. The Author(s).
Conflict of interest statement
M.G., M.K., D.M., L.M., O.S.B., F.H., E.W., K.R.S., M.A.F., J.M., A.O’N., E.A.A., A.R.H., Q.W., R.S.D., S.P. and D.V. are current employees and/or stockholders of AstraZeneca. E.A.A. is a founder of Personalis, Inc., DeepCell, Inc. and Svexa Inc.; a founding advisor of Nuevocor; a nonexecutive director at AstraZeneca; and an advisor to SequenceBio, Novartis, Medical Excellence Capital, Foresite Capital and Third Rock Ventures.
Figures






Similar articles
-
Improving prediction models of amyotrophic lateral sclerosis (ALS) using polygenic, pre-existing conditions, and survey-based risk scores in the UK Biobank.J Neurol. 2024 Oct;271(10):6923-6934. doi: 10.1007/s00415-024-12644-2. Epub 2024 Sep 9. J Neurol. 2024. PMID: 39249108
-
A phenome-wide association study of polygenic scores for selected childhood cancer: Results from the UK Biobank.HGG Adv. 2025 Jan 9;6(1):100356. doi: 10.1016/j.xhgg.2024.100356. Epub 2024 Sep 26. HGG Adv. 2025. PMID: 39340156 Free PMC article.
-
Optimizing UK biobank cloud-based research analysis platform to fine-map coronary artery disease loci in whole genome sequencing data.Sci Rep. 2025 Mar 25;15(1):10335. doi: 10.1038/s41598-025-95286-2. Sci Rep. 2025. PMID: 40133599 Free PMC article.
-
The UK Biobank: A Shining Example of Genome-Wide Association Study Science with the Power to Detect the Murky Complications of Real-World Epidemiology.Annu Rev Genomics Hum Genet. 2022 Aug 31;23:569-589. doi: 10.1146/annurev-genom-121321-093606. Epub 2022 May 4. Annu Rev Genomics Hum Genet. 2022. PMID: 35508184 Review.
-
United Kingdom Biobank (UK Biobank): JACC Focus Seminar 6/8.J Am Coll Cardiol. 2021 Jul 6;78(1):56-65. doi: 10.1016/j.jacc.2021.03.342. J Am Coll Cardiol. 2021. PMID: 34210415 Review.
Cited by
-
Longitudinal clinical and proteomic diabetes signatures in women with a history of gestational diabetes.JCI Insight. 2024 Nov 26;10(2):e183213. doi: 10.1172/jci.insight.183213. JCI Insight. 2024. PMID: 39589852 Free PMC article.
-
Large-scale evaluation of proteomic and polygenic risk scores reveals complementary contributions to incident disease prediction.medRxiv [Preprint]. 2025 Jul 11:2025.07.10.25331242. doi: 10.1101/2025.07.10.25331242. medRxiv. 2025. PMID: 40672481 Free PMC article. Preprint.
-
Plasma proteome variation and its genetic determinants in children and adolescents.Nat Genet. 2025 Mar;57(3):635-646. doi: 10.1038/s41588-025-02089-2. Epub 2025 Feb 19. Nat Genet. 2025. PMID: 39972214 Free PMC article.
-
Potential value streams of an integrated Canadian serosurveillance network.Can J Public Health. 2025 Jun 30. doi: 10.17269/s41997-025-01075-9. Online ahead of print. Can J Public Health. 2025. PMID: 40588636
-
Genome-wide association neural networks identify genes linked to family history of Alzheimer's disease.Brief Bioinform. 2024 Nov 22;26(1):bbae704. doi: 10.1093/bib/bbae704. Brief Bioinform. 2024. PMID: 39775791 Free PMC article.
References
MeSH terms
Substances
LinkOut - more resources
Full Text Sources