. 2024 Mar 4;40(3):btae131.

doi: 10.1093/bioinformatics/btae131.

Incorporating temporal information during feature engineering bolsters emulation of spatio-temporal emergence

Jason Y Cain¹, Jacob I Evarts², Jessica S Yu², Neda Bagheri^{1

2}

Affiliations

¹ Department of Chemical Engineering, University of Washington, Seattle, WA 98195, United States.
² Department of Biology, University of Washington, Seattle, WA 98195, United States.

PMID: 38444088
PMCID: PMC10957516
DOI: 10.1093/bioinformatics/btae131

Incorporating temporal information during feature engineering bolsters emulation of spatio-temporal emergence

Jason Y Cain et al. Bioinformatics. 2024.

. 2024 Mar 4;40(3):btae131.

doi: 10.1093/bioinformatics/btae131.

Authors

Jason Y Cain¹, Jacob I Evarts², Jessica S Yu², Neda Bagheri^{1

2}

Affiliations

¹ Department of Chemical Engineering, University of Washington, Seattle, WA 98195, United States.
² Department of Biology, University of Washington, Seattle, WA 98195, United States.

PMID: 38444088
PMCID: PMC10957516
DOI: 10.1093/bioinformatics/btae131

Abstract

Motivation: Emergent biological dynamics derive from the evolution of lower-level spatial and temporal processes. A long-standing challenge for scientists and engineers is identifying simple low-level rules that give rise to complex higher-level dynamics. High-resolution biological data acquisition enables this identification and has evolved at a rapid pace for both experimental and computational approaches. Simultaneously harnessing the resolution and managing the expense of emerging technologies-e.g. live cell imaging, scRNAseq, agent-based models-requires a deeper understanding of how spatial and temporal axes impact biological systems. Effective emulation is a promising solution to manage the expense of increasingly complex high-resolution computational models. In this research, we focus on the emulation of a tumor microenvironment agent-based model to examine the relationship between spatial and temporal environment features, and emergent tumor properties.

Results: Despite significant feature engineering, we find limited predictive capacity of tumor properties from initial system representations. However, incorporating temporal information derived from intermediate simulation states dramatically improves the predictive performance of machine learning models. We train a deep-learning emulator on intermediate simulation states and observe promising enhancements over emulators trained solely on initial conditions. Our results underscore the importance of incorporating temporal information in the evaluation of spatio-temporal emergent behavior. Nevertheless, the emulators exhibit inconsistent performance, suggesting that the underlying model characterizes unique cell populations dynamics that are not easily replaced.

Availability and implementation: All source codes for the agent-based model, emulation, and analyses are publicly available at the corresponding DOIs: 10.5281/zenodo.10622155, 10.5281/zenodo.10611675, 10.5281/zenodo.10621244, respectively.

PubMed Disclaimer

Conflict of interest statement

J.S.Y. is Scientist at the Allen Institute for Cell Science. N.B. is Adjunct Associate Professor of Chemical & Biological Engineering at Northwestern University and Sr. Advisor of Modeling at the Allen Institute for Cell Science.

Figures

**Figure 1.**
Emulation workflow—a summary of the overall emulation workflow. (1) Vasculature structures are generated based on the a starting root geometry (single point versus line roots, and the number of initializing arteries and veins) and a random seed, where 0–99 were used. (2) ARCADE, an ABM of the tumor microenvironment, receives *in silico* vasculature networks and initial cell population colonies as inputs. ARCADE simulates intra- and inter-cellular interactions among diverse agents to predict the evolution of vascular architecture and function, as well as the evolution of cell populations, over space and time. Two different simulation contexts were used to initialize populations: colony and tissue. (3) Spatio-temporal dynamics are summarized with output metrics that evaluate emergent tumor properties at the end of the simulations: activity, growth, and symmetry. (4) Network metric-based feature sets are extracted from vascular architectures. Nodes represent junctions in the vasculature; edges represent sources of nutrients in the simulation. (5) Feature sets are aggregated based on the information used. Topological features are extracted from the unweighted structure of the network. Hemodynamic features are extracted from attributes of network topologies including hemodynamic characteristics as edge weights. Spatial features account for distance between the information in the network from the center of the simulation. (6) Statistical learning models use network metric-based feature sets to predict emergent tumor output metrics.

**Figure 2.**
Spatial information does not support emulation—(A) Bar plots show predictive performance of emergent outputs (A: activity, G: growth, S: symmetry) across feature sets for different models (MLR, RF, SVR, and MLP). Feature engineering offered limited improvement. Bar chart values range from −0.1 to 1.0; the horizontal axis is at 0.0. The Bonferroni corrected P-values from a two-way ANOVA highlight significant results (noted with asterisks) that have an adjusted P-value <.05. (B) Parity plots show differences between the variance in the predicted response and the true response, comparing the topological and spatial feature sets. (C) Additional training data offered diminishing returns on predictive performance of MLP models that were trained on both spatial and topological features. These subplots show the average RMSE as a function of the size of training data. The individual points represent the RMSE from randomized test sets.

**Figure 3.**
Temporal information improves accuracy of ML models—(A) Parity plots show the predictive performance of colony context ML models that were trained on exclusively on features from later timepoints. The trends are consistent for all three predicted outputs. The week 1 parity plots for growth and symmetry, and the corresponding results for tissue context, are included in Supplementary Figs S7 and S8. (B) Line plots show improvement of ML models in both colony and tissue contexts when they are trained on features from timepoints later in the simulation. One-way Dunnet (1955) statistical test with Bonferroni correction show timepoints features that perform better than the initial timepoint. Hash marks signify adjusted P-values <.01. (C) Predictive performance as a function of training data for MLP models at later timepoints. These subplots show the average RMSE as a function of the size of training data. The individual points represent the RMSE from randomized test sets.

**Figure 4.**
Incorporating temporal information can improve emulation performance—(A) A summary of the vascular network structure and ML model workflow. Network metrics from vascular structures from consecutive simulation days are used as a sequence to train the RNN. Initial vasculatures (left network below RNN structure box) are then used to predict network properties of the vasculatures after two simulation weeks (right network below RNN structure box); the two-week predictions are then used to predict emergent behaviors with ML models. (B) Bar plots show performance between emulators trained on forecasted network metrics and the top performing naive emulation models (noted beneath bars). Associated parity plots show the prediction performance of the RNN models across contexts for each emergent output.

See this image and copyright information in PMC

References

1. Alden K, Cosgrove J, Coles M. et al. Using emulation to engineer and understand simulations of biological systems. IEEE/ACM Trans Comput Biol Bioinform 2020;17:302–15. 10.1109/TCBB.2018.2843339 - DOI - PubMed
1. Alves AP, Mesquita ON, Gómez-Gardeñes J. et al. Graph analysis of cell clusters forming vascular networks. R Soc Open Sci 2018;5:171592. 10.1098/rsos.171592 - DOI - PMC - PubMed
1. Amat-Roldan I, Berzigotti A, Gilabert R. et al. Assessment of hepatic vascular network connectivity with automated graph analysis of dynamic contrast-enhanced US to evaluate portal hypertension in patients with cirrhosis: a pilot study. Radiology 2015;277:268–76. 10.1148/radiol.2015141941 - DOI - PubMed
1. Angione C, Silverman E, Yaneske E.. Using machine learning as a surrogate model for agent-based simulations. PLoS One 2022;17:e0263150. 10.1371/journal.pone.0263150 - DOI - PMC - PubMed
1. Bagheri N, Carpenter AE, Lundberg E. et al. The new era of quantitative cell imaging—challenges and opportunities. Mol Cell 2022;82:241–7. 10.1016/j.molcel.2021.12.024 - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Incorporating temporal information during feature engineering bolsters emulation of spatio-temporal emergence

Affiliations

Incorporating temporal information during feature engineering bolsters emulation of spatio-temporal emergence

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical