Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Apr 20;19(4):e1010988.
doi: 10.1371/journal.pcbi.1010988. eCollection 2023 Apr.

Bridging the gap between mechanistic biological models and machine learning surrogates

Affiliations
Review

Bridging the gap between mechanistic biological models and machine learning surrogates

Ioana M Gherman et al. PLoS Comput Biol. .

Abstract

Mechanistic models have been used for centuries to describe complex interconnected processes, including biological ones. As the scope of these models has widened, so have their computational demands. This complexity can limit their suitability when running many simulations or when real-time results are required. Surrogate machine learning (ML) models can be used to approximate the behaviour of complex mechanistic models, and once built, their computational demands are several orders of magnitude lower. This paper provides an overview of the relevant literature, both from an applicability and a theoretical perspective. For the latter, the paper focuses on the design and training of the underlying ML models. Application-wise, we show how ML surrogates have been used to approximate different mechanistic models. We present a perspective on how these approaches can be applied to models representing biological processes with potential industrial applications (e.g., metabolism and whole-cell modelling) and show why surrogate ML models may hold the key to making the simulation of complex biological systems possible using a typical desktop computer.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Schematic representation for training and using an ML-based surrogate model.
The mechanistic model is simulated (the top process connected by red arrows) to obtain the input-output pairs that are used to train the ML surrogate. This training stage (the middle process connected by orange arrows) is an average process in terms of speed. Its complexity will depend on the ML algorithm used, the complexity of the data preprocessing steps, and the quantity of training iterations needed to obtain a satisfactory accuracy. Once this is achieved, the ML model can be used for all future predictions, effectively approximating the mechanistic model while running several orders of magnitude faster. The green arrows at the bottom of the figure represent this final (fast) process.
Fig 2
Fig 2. Schematic representation of how active learning and ML surrogates can work together.
Initially, an ML model is trained on a set of data generated by some initial simulations of a mechanistic model (Xinit, yinit), which are equivalent to (X, y) for this initial step. The ML model is used to make predictions (ypred). The estimated error between the prediction of the mechanistic model (y) and that of the ML model (ypred) is used to select a subset from all the possible input data that has not been used to make predictions using the mechanistic model in the past (X’). The mechanistic model is run using X’ as input to obtain a new set of input-output pairs (X, y), equivalent to the newly generated (X′, y′), that when included in the ML pipeline are expected to reduce the estimated error (yypred).
Fig 3
Fig 3. An example of the DBTL pipeline where the metabolic or whole-cell models can be replaced by surrogate models.

References

    1. Fuller A, Fan Z, Day C, Barlow C. Digital twin: Enabling technologies, challenges and open research. IEEE Access. 2020;8:108952–108971.
    1. Karr JR, Sanghvi JC, Macklin DN, Gutschow MV, Jacobs JM, Bolival B Jr, et al.. A whole-cell computational model predicts phenotype from genotype. Cell. 2012;150(2):389–401. doi: 10.1016/j.cell.2012.05.044 - DOI - PMC - PubMed
    1. Macklin DN, Ahn-Horst TA, Choi H, Ruggero NA, Carrera J, Mason JC, et al.. Simultaneous cross-evaluation of heterogeneous E. coli datasets via mechanistic simulation. Science. 2020;369(6502). doi: 10.1126/science.aav3751 - DOI - PMC - PubMed
    1. Wang S, Fan K, Luo N, Cao Y, Wu F, Zhang C, et al.. Massive computational acceleration by using neural networks to emulate mechanism-based biological models. Nat Commun. 2019;10(1):1–9. - PMC - PubMed
    1. Madani A, Bakhaty A, Kim J, Mubarak Y, Mofrad MR. Bridging finite element and machine learning modeling: stress prediction of arterial walls in atherosclerosis. J Biomech Eng. 2019;141 (8). doi: 10.1115/1.4043290 - DOI - PubMed

Publication types