A foundation model for the Earth system

Cristian Bodnar^#^{1

2}, Wessel P Bruinsma^#¹, Ana Lucic^#^{1

3}, Megan Stanley^#¹, Anna Allen^#⁴, Johannes Brandstetter^{1

5}, Patrick Garvan¹, Maik Riechert¹, Jonathan A Weyn⁶, Haiyu Dong⁶, Jayesh K Gupta^{2

7}, Kit Thambiratnam⁶, Alexander T Archibald⁴, Chun-Chieh Wu⁸, Elizabeth Heider¹, Max Welling^{1

3}, Richard E Turner^{1

4

9}, Paris Perdikaris^{10

11}

Affiliations

¹ Microsoft Research, AI for Science, Amsterdam, The Netherlands.
² Silurian AI, Kirkland, WA, USA.
³ University of Amsterdam, Amsterdam, The Netherlands.
⁴ University of Cambridge, Cambridge, UK.
⁵ Johannes Kepler University Linz, Linz, Austria.
⁶ Microsoft Corporation, Redmond, WA, USA.
⁷ Microsoft Research, Redmond, WA, USA.
⁸ National Taiwan University, Taipei, Taiwan.
⁹ Alan Turing Institute, London, UK.
¹⁰ Microsoft Research, AI for Science, Amsterdam, The Netherlands. pgp@seas.upenn.edu.
¹¹ University of Pennsylvania, Philadelphia, PA, USA. pgp@seas.upenn.edu.

^# Contributed equally.

PMID: 40399684
PMCID: PMC12119322
DOI: 10.1038/s41586-025-09005-y

A foundation model for the Earth system

Cristian Bodnar et al. Nature. 2025 May.

. 2025 May;641(8065):1180-1187.

doi: 10.1038/s41586-025-09005-y. Epub 2025 May 21.

Authors

Affiliations

¹ Microsoft Research, AI for Science, Amsterdam, The Netherlands.
² Silurian AI, Kirkland, WA, USA.
³ University of Amsterdam, Amsterdam, The Netherlands.
⁴ University of Cambridge, Cambridge, UK.
⁵ Johannes Kepler University Linz, Linz, Austria.
⁶ Microsoft Corporation, Redmond, WA, USA.
⁷ Microsoft Research, Redmond, WA, USA.
⁸ National Taiwan University, Taipei, Taiwan.
⁹ Alan Turing Institute, London, UK.
¹⁰ Microsoft Research, AI for Science, Amsterdam, The Netherlands. pgp@seas.upenn.edu.
¹¹ University of Pennsylvania, Philadelphia, PA, USA. pgp@seas.upenn.edu.

^# Contributed equally.

PMID: 40399684
PMCID: PMC12119322
DOI: 10.1038/s41586-025-09005-y

Abstract

Reliable forecasting of the Earth system is essential for mitigating natural disasters and supporting human progress. Traditional numerical models, although powerful, are extremely computationally expensive¹. Recent advances in artificial intelligence (AI) have shown promise in improving both predictive performance and efficiency^2,3, yet their potential remains underexplored in many Earth system domains. Here we introduce Aurora, a large-scale foundation model trained on more than one million hours of diverse geophysical data. Aurora outperforms operational forecasts in predicting air quality, ocean waves, tropical cyclone tracks and high-resolution weather, all at orders of magnitude lower computational cost. With the ability to be fine-tuned for diverse applications at modest expense, Aurora represents a notable step towards democratizing accurate and efficient Earth system predictions. These results highlight the transformative potential of AI in environmental forecasting and pave the way for broader accessibility to high-quality climate and weather information.

PubMed Disclaimer

Conflict of interest statement

Competing interests: C.B., W.P.B., M.S., J.B., P.G., M.R., J.A.W., H.D., J.K.G., K.T., E.H., M.W. and P.P. own Microsoft stock. W.P.B., M.S., P.G., M.R., J.A.W., H.D. and K.T. are Microsoft employees. C.B. and J.K.G. are employees of Silurian AI Inc. and own Silurian AI Inc. stock. The remaining authors declare no competing interests.

Figures

**Fig. 1. Aurora is a 1.3-billion-parameter foundation model for the Earth system.**
Icons are for illustrative purposes only. a, Aurora is pretrained on several heterogeneous datasets with different resolutions, variables and pressure levels. The model is then fine-tuned for several operational forecasting scenarios at different resolutions: atmospheric chemistry and air quality at 0.4°, wave modelling at 0.25°, hurricane tracking at 0.25° and weather forecasting at 0.1°. b, Aurora is a flexible 3D Swin Transformer with 3D Perceiver-based atmospheric encoders and decoders. The model is able to ingest inputs with different spatial resolutions, numbers of pressure levels and variables.

**Fig. 2. In an operational setting, Aurora matches or outperforms CAMS in most comparisons, at orders of magnitude smaller computational expense.**
a, Predictions for TC NO₂ by Aurora accurately predict CAMS analysis. Predicting atmospheric gases correctly is extremely challenging owing to their spatially heterogeneous nature. In particular, NO₂, like most air pollution variables, is skewed towards high values in areas with large anthropogenic emissions, such as densely populated regions of East Asia. Also, NO₂ exhibits a strong diurnal cycle; for example, sunlight reduces background levels of NO₂ through a process called photolysis. Aurora accurately captures both the extremes and background levels. Aurora and CAMS forecasts are initialized with CAMS analysis on 1 September 2022 at 00 UTC. b, Across all lead times, Aurora matches or outperforms CAMS on 74% of all targets. c, At a lead time of 3 days, Aurora matches or outperforms CAMS on 89% of all variables. See Supplementary Information Section I.1 for the full results.

**Fig. 3. In an operational setting, Aurora matches or outperforms HRES-WAM in most comparisons.**
a, Aurora accurately predicts significant wave height and mean wave direction for Typhoon Nanmadol, the most intense tropical cyclone in 2022. The red box shows the location of the typhoon and the number is the peak significant wave height. Aurora’s prediction and HRES-WAM analysis are for 17 September 2022 at 12 UTC, when Typhoon Nanmadol reached peak intensity. Aurora was initialized on 16 September 2022 at 12 UTC. b, Across all lead times, Aurora matches or outperforms HRES-WAM on 86% of all wave variables. c, At a lead time of 3 days, Aurora matches or outperforms HRES-WAM on 91% of all surface-level variables. See Supplementary Information Section I.2 for the full results.

**Fig. 4. In an operational setting, Aurora outperforms state-of-the-art tropical cyclone prediction systems for several agencies and regions worldwide.**
a, Aurora attains better track prediction MAE than several agencies in various regions. Official forecasts are given by OFCL, PGTW, CWA, BABJ, RJTD, RKSL and BoM (in bold). For the North Atlantic and East Pacific, we also compare with various models used in creating OFCL (not bold). A model does not always make forecasts, which means that different columns are computed over different data. Columns are therefore not indicative of model performance and only indicate the performance compared with Aurora. Here ‘≈’ indicates that the 95% confidence interval for the cell contains zero (see Supplementary Information Section I.3.4 for details). On average, Aurora is 20% better than other agencies in the North Atlantic and East Pacific, 18% in the Northwest Pacific and 24% in the Australian region (Aus.). b, On 21 July, a tropical depression intensified into a tropical storm and was named Typhoon Doksuri. Typhoon Doksuri would become the costliest Pacific typhoon so far, inflicting more than US$28 billion in damage. The black lines show its ground-truth paths extracted from IBTrACS^,. Aurora correctly predicts that Typhoon Doksuri will make landfall in the Northern Philippines, whereas PGTW predicts that it will pass over Taiwan.

**Fig. 5. In an operational setting, Aurora outperforms IFS HRES in most comparisons and is the only AI model to accurately estimate the maximum wind speeds in Storm Ciarán.**
a, Aurora outperforms IFS HRES at 0.1° on more than 92% of targets. The scorecard is limited to pressure levels lower in the atmosphere owing to restricted availability of test year data. b, Wind speed RMSE computed against measurements at weather stations. Aurora greatly outperforms IFS HRES. c, Operational predictions for Storm Ciarán compared with IFS HRES analysis at 0.1°. Black dots show the location of minimum MSL and therefore trace the path of the storm. The maximum 10-m wind speed of the storm is shown in the bottom-left corner of each prediction. To better facilitate the prediction of extreme events, Aurora was run without LoRA. See Supplementary Information Section I.7 for details. d, Operational predictions for maximum 10-m wind speed during Storm Ciarán by Aurora, FourCastNet, GraphCast and Pangu-Weather. Aurora is able to predict the sudden increase in 10-m wind speed, unlike the other AI models. The numbers for all AI models except Aurora have been extracted from Fig. 3 in ref. .

**Extended Data Fig. 1. Pretraining on diverse data and increasing model size improves performance.**
a, Performance on ERA5 2021 at 6-h lead time for models pretrained on different dataset configurations, labelled C1–C4, without fine-tuning. Adding low-fidelity simulation data from CMIP6 (that is, CMCC and IFS-HR) improves performance almost uniformly (C2). Adding even more simulation data improves performance further on most surface variables and for the atmospheric levels present in this newly added data (C3). Finally, configuration C4, which includes comprehensive atmospheric coverage and analysis data from GFS, achieves the best overall performance, with improvements across the board. b, For the same configurations considered in a, performance for extreme values on IFS HRES 2022 at 6-h lead time. Shows RMSEs computed only on data below (left panels) or above (right panels) a threshold b together with a 95% confidence interval obtained through bootstrapping. Pretraining on many diverse data sources also improves the forecasting of extreme values. c, Bigger models obtain lower IFS HRES validation loss for the same number of GPU hours. At 5,000 GPU hours, we find that the validation loss behaves like $L (N) \propto N^{- 0.026}$ , in which N is the number of parameters, which corresponds to a 6% reduction in validation loss for every ten times increase in model size.

**Extended Data Fig. 2. Validation curves for all surface-level variables during pretraining.**
For every surface-level variable, at 5,000 GPU hours, we find that the validation loss roughly behaves like f(N) ∝ N^−α, in which N is the number of parameters and α > 0 is an estimated parameter.

**Extended Data Fig. 3. Aurora is an encoder–decoder model with a 3D latent representation.**
The colours are for illustrative purposes only. a, Aurora’s encoder module. Input weather states are tokenized and compressed into a 3D latent representation using Perceiver-style cross-attention blocks. The resulting latent tokens are augmented with appropriate encodings that provide spatial, temporal and scale information. b, Aurora’s decoder module. The target output variables are reconstructed in spatial patches by decoding Aurora’s 3D latent state using Perceiver-style cross-attention blocks.

See this image and copyright information in PMC

References

1. European Centre for Medium-Range Weather Forecasts (ECMWF). IFS Documentation CY48R1, Vol. 8. 10.21957/0f360ba4ca (2023).
1. Bi, K. et al. Accurate medium-range global weather forecasting with 3D neural networks. Nature619, 533–538 (2023). - PMC - PubMed
1. Lam, R. et al. Learning skillful medium-range global weather forecasting. Science382, 1416–1421 (2023). - PubMed
1. Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature630, 493–500 (2024). - PMC - PubMed
1. OpenAI et al. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2024).

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A foundation model for the Earth system

Affiliations

A foundation model for the Earth system

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources