Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 13;383(2292):20240216.
doi: 10.1098/rsta.2024.0216. Epub 2025 Mar 13.

Emulating computer models with high-dimensional count output

Affiliations

Emulating computer models with high-dimensional count output

James M Salter et al. Philos Trans A Math Phys Eng Sci. .

Abstract

Computer models are used to study the real world, and often contain a large number of uncertain input parameters, produce a large number of outputs, may be expensive to run and need calibrating to real-world observations to be useful for decision-making. Emulators are often used as cheap surrogates for the expensive simulator, trained on a small number of simulations to provide predictions with uncertainty at unseen inputs. In epidemiological applications, for example compartmental or agent-based models for modelling the spread of infectious diseases, the output is usually spatially and temporally indexed, stochastic and consists of counts rather than continuous variables. Here, we consider emulating high-dimensional count output from a complex computer model using a Poisson lognormal PCA (PLNPCA) emulator. We apply the PLNPCA emulator to output fields from a COVID-19 model for England and Wales and compare this to fitting emulators to aggregations of the full output. We show that performance is generally comparable, while the PLNPCA emulator inherits desirable properties, including allowing the full output to be predicted while capturing correlations between outputs, providing high-dimensional samples of counts that are representative of the true model output.This article is part of the theme issue 'Uncertainty quantification for healthcare and biological systems (Part 1)'.

Keywords: Gaussian processes; Poisson lognormal; basis emulation; uncertainty quantification.

PubMed Disclaimer

Conflict of interest statement

We declare we have no competing interests.

Figures

(a) and (b) Predictions for the 50 validation inputs from the PLNPCA-LAD
Figure 1.
(a) and (b) Predictions for the 50 validation inputs from the PLNPCA-LAD hetGP emulator, aggregated to the overall total (a) and the northeast region (b), plotted against the true simulator output. Points are coloured green if the truth lies within the 95% error bars, blue otherwise. (c) and (d) Comparing mean predictions from the PLNPCA-LAD and PLNPCA-ward hetGP emulators for the overall total (c) and northeast region (d).
Samples from the PLNPCA-LAD hetGP emulator, for 9 validation inputs across the 12 LADs in the North East region
Figure 2.
Samples from the PLNPCA-LAD hetGP emulator for nine validation inputs across the 12 LADs in the northeast region. For each, 50 individual samples are plotted in grey, with the mean (solid line) and 95% interval (dashed lines) across 1000 samples in red and true simulator output in black.
Comparing LAD-level correlations in the training data and from the PLNPCA-LAD emulator
Figure 3.
Comparing LAD-level correlations in the training data and from the PLNPCA-LAD emulator. For each region, a single LAD was randomly chosen, and the correlations between this LAD and all 338 others in the training data and emulator samples calculated. In each panel, the LADs are ordered from most to least correlated in the training data.
Latent coefficients
Figure 4.
Latent coefficients c1,c2,c3,c4 on the PLNPCA-LAD basis for 12 of the ensemble members with 10 replicates. The box plots show the variability in coefficient values across the replicates of these 12 input vectors.
RMSE between simulator deaths and emulator predictions across the validation
Figure 5.
RMSE between simulator deaths and emulator predictions across the validation set for different sizes of training set (n=100,150,200), for samples from the PLNPCA-LAD emulator aggregated to a regional level compared with individually emulating each region. Each point represents a particular split into training and validation sets (50 per n).

References

    1. Ferguson NM, et al. . 2020. Report 9: impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand. London: Imperial College London. - PMC - PubMed
    1. Volodina V, Challenor P. 2021. The importance of uncertainty quantification in model reproducibility. Phil. Trans. R. Soc. A 379, 20200071. (10.1098/rsta.2020.0071) - DOI - PMC - PubMed
    1. Williamson D, Goldstein M, Allison L, Blaker A, Challenor P, Jackson L, Yamazaki K. 2013. History matching for exploring and reducing climate model parameter space using observations and a large perturbed physics ensemble. Clim. Dyn. 41, 1703–1729. (10.1007/s00382-013-1896-4) - DOI
    1. Salter JM, Williamson DB, Scinocca J, Kharin V. 2019. Uncertainty Quantification for Computer Models With Spatial Output Using Calibration-Optimal Bases. J. Am. Stat. Assoc. 114, 1800–1814. (10.1080/01621459.2018.1514306) - DOI
    1. McNeall D, Robertson E, Wiltshire A. 2024. Constraining the carbon cycle in JULES-ES-1.0. Geosci. Model Dev. 17, 1059–1089. (10.5194/gmd-17-1059-2024) - DOI

LinkOut - more resources