Emulating computer models with high-dimensional count output

James M Salter¹, Trevelyan J McKinley², Xiaoyu Xiong¹, Daniel B Williamson³

Affiliations

¹ Department of Mathematics and Statistics, University of Exeter, Exeter, UK.
² University of Exeter Medical School, University of Exeter, Exeter, UK.
³ Department of Economics, Land Environment Economics and Policy Institute, University of Exeter, Exeter, UK.

PMID: 40078142
PMCID: PMC11904617
DOI: 10.1098/rsta.2024.0216

Emulating computer models with high-dimensional count output

James M Salter et al. Philos Trans A Math Phys Eng Sci. 2025.

. 2025 Mar 13;383(2292):20240216.

doi: 10.1098/rsta.2024.0216. Epub 2025 Mar 13.

Authors

James M Salter¹, Trevelyan J McKinley², Xiaoyu Xiong¹, Daniel B Williamson³

Affiliations

¹ Department of Mathematics and Statistics, University of Exeter, Exeter, UK.
² University of Exeter Medical School, University of Exeter, Exeter, UK.
³ Department of Economics, Land Environment Economics and Policy Institute, University of Exeter, Exeter, UK.

PMID: 40078142
PMCID: PMC11904617
DOI: 10.1098/rsta.2024.0216

Abstract

Computer models are used to study the real world, and often contain a large number of uncertain input parameters, produce a large number of outputs, may be expensive to run and need calibrating to real-world observations to be useful for decision-making. Emulators are often used as cheap surrogates for the expensive simulator, trained on a small number of simulations to provide predictions with uncertainty at unseen inputs. In epidemiological applications, for example compartmental or agent-based models for modelling the spread of infectious diseases, the output is usually spatially and temporally indexed, stochastic and consists of counts rather than continuous variables. Here, we consider emulating high-dimensional count output from a complex computer model using a Poisson lognormal PCA (PLNPCA) emulator. We apply the PLNPCA emulator to output fields from a COVID-19 model for England and Wales and compare this to fitting emulators to aggregations of the full output. We show that performance is generally comparable, while the PLNPCA emulator inherits desirable properties, including allowing the full output to be predicted while capturing correlations between outputs, providing high-dimensional samples of counts that are representative of the true model output.This article is part of the theme issue 'Uncertainty quantification for healthcare and biological systems (Part 1)'.

Keywords: Gaussian processes; Poisson lognormal; basis emulation; uncertainty quantification.

PubMed Disclaimer

Conflict of interest statement

We declare we have no competing interests.

Figures

**Figure 1.**
(a) and (b) Predictions for the 50 validation inputs from the PLNPCA-LAD hetGP emulator, aggregated to the overall total (a) and the northeast region (b), plotted against the true simulator output. Points are coloured green if the truth lies within the 95% error bars, blue otherwise. (c) and (d) Comparing mean predictions from the PLNPCA-LAD and PLNPCA-ward hetGP emulators for the overall total (c) and northeast region (d).

Samples from the PLNPCA-LAD hetGP emulator, for 9 validation inputs across the 12 LADs in the North East region — **Figure 2.**
Samples from the PLNPCA-LAD hetGP emulator for nine validation inputs across the 12 LADs in the northeast region. For each, 50 individual samples are plotted in grey, with the mean (solid line) and 95% interval (dashed lines) across 1000 samples in red and true simulator output in black.

**Figure 3.**
Comparing LAD-level correlations in the training data and from the PLNPCA-LAD emulator. For each region, a single LAD was randomly chosen, and the correlations between this LAD and all 338 others in the training data and emulator samples calculated. In each panel, the LADs are ordered from most to least correlated in the training data.

**Figure 4.**
Latent coefficients $c_{1}, c_{2}, c_{3}, c_{4}$ on the PLNPCA-LAD basis for 12 of the ensemble members with 10 replicates. The box plots show the variability in coefficient values across the replicates of these 12 input vectors.

**Figure 5.**
RMSE between simulator deaths and emulator predictions across the validation set for different sizes of training set ( $n = 100, 150, 200$ ), for samples from the PLNPCA-LAD emulator aggregated to a regional level compared with individually emulating each region. Each point represents a particular split into training and validation sets (50 per $n$ ).

See this image and copyright information in PMC

References

1. Ferguson NM, et al. . 2020. Report 9: impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand. London: Imperial College London. - PMC - PubMed
1. Volodina V, Challenor P. 2021. The importance of uncertainty quantification in model reproducibility. Phil. Trans. R. Soc. A 379, 20200071. (10.1098/rsta.2020.0071) - DOI - PMC - PubMed
1. Williamson D, Goldstein M, Allison L, Blaker A, Challenor P, Jackson L, Yamazaki K. 2013. History matching for exploring and reducing climate model parameter space using observations and a large perturbed physics ensemble. Clim. Dyn. 41, 1703–1729. (10.1007/s00382-013-1896-4) - DOI
1. Salter JM, Williamson DB, Scinocca J, Kharin V. 2019. Uncertainty Quantification for Computer Models With Spatial Output Using Calibration-Optimal Bases. J. Am. Stat. Assoc. 114, 1800–1814. (10.1080/01621459.2018.1514306) - DOI
1. McNeall D, Robertson E, Wiltshire A. 2024. Constraining the carbon cycle in JULES-ES-1.0. Geosci. Model Dev. 17, 1059–1089. (10.5194/gmd-17-1059-2024) - DOI

Grants and funding

Engineering and Physical Sciences Research Council

LinkOut - more resources

Full Text Sources
- Atypon
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Emulating computer models with high-dimensional count output

Affiliations

Emulating computer models with high-dimensional count output

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources