Learning the natural history of human disease with generative transformers
- PMID: 40963019
- PMCID: PMC12589094
- DOI: 10.1038/s41586-025-09529-3
Learning the natural history of human disease with generative transformers
Erratum in
-
Author Correction: Learning the natural history of human disease with generative transformers.Nature. 2025 Nov;647(8091):E8. doi: 10.1038/s41586-025-09879-y. Nature. 2025. PMID: 41225015 Free PMC article. No abstract available.
Abstract
Decision-making in healthcare relies on understanding patients' past and current health states to predict and, ultimately, change their future course1-3. Artificial intelligence (AI) methods promise to aid this task by learning patterns of disease progression from large corpora of health records4,5. However, their potential has not been fully investigated at scale. Here we modify the GPT6 (generative pretrained transformer) architecture to model the progression and competing nature of human diseases. We train this model, Delphi-2M, on data from 0.4 million UK Biobank participants and validate it using external data from 1.9 million Danish individuals with no change in parameters. Delphi-2M predicts the rates of more than 1,000 diseases, conditional on each individual's past disease history, with accuracy comparable to that of existing single-disease models. Delphi-2M's generative nature also enables sampling of synthetic future health trajectories, providing meaningful estimates of potential disease burden for up to 20 years, and enabling the training of AI models that have never seen actual data. Explainable AI methods7 provide insights into Delphi-2M's predictions, revealing clusters of co-morbidities within and across disease chapters and their time-dependent consequences on future health, but also highlight biases learnt from training data. In summary, transformer-based models appear to be well suited for predictive and generative health-related tasks, are applicable to population-scale datasets and provide insights into temporal dependencies between disease events, potentially improving the understanding of personalized health risks and informing precision medicine approaches.
© 2025. The Author(s).
Conflict of interest statement
Competing interests: A patent has been filed for the use of generative transformer architectures to model competing risk and timings of diseases (application number: PCT/EP2025/065771; applicants: DKFZ, EMBL), with M.G., A.S., T.F., E.B., K.G. and A.W.J. listed as inventors. S.B. has ownership interests in Hoba Therapeutics Aps, Novo Nordisk, Lundbeck and Eli Lilly. E.B. is a consultant and shareholder of Oxford Nanopore. The other authors declare no competing interests.
Figures
References
-
- Link, B. G. & Phelan, J. Social conditions as fundamental causes of disease. J. Health Soc. Behav.10.2307/2626958 (1995).
-
- Kraljevic, Z., Yeung, J. A., Bean, D., Teo, J. & Dobson, R. J. Large language models for medical forecasting—foresight 2. Preprint at https://arxiv.org/abs/2412.10848 (2024).
-
- Yang, L. et al. Advancing multimodal medical capabilities of Gemini. Preprint at https://arxiv.org/abs/2405.03162 (2024).
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous
