. 2019 Sep 25;10(1):4354.

doi: 10.1038/s41467-019-12342-y.

Massive computational acceleration by using neural networks to emulate mechanism-based biological models

Shangying Wang¹, Kai Fan², Nan Luo¹, Yangxiaolu Cao¹, Feilun Wu¹, Carolyn Zhang¹, Katherine A Heller², Lingchong You^{3

4

5}

Affiliations

¹ Department of Biomedical Engineering, Duke University, Durham, NC, 27708, USA.
² Department of Statistical Science, Duke University, Durham, NC, 27708, USA.
³ Department of Biomedical Engineering, Duke University, Durham, NC, 27708, USA. you@duke.edu.
⁴ Center for Genomic and Computational Biology, Duke University, Durham, NC, 27708, USA. you@duke.edu.
⁵ Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC, 27708, USA. you@duke.edu.

PMID: 31554788
PMCID: PMC6761138
DOI: 10.1038/s41467-019-12342-y

Massive computational acceleration by using neural networks to emulate mechanism-based biological models

Shangying Wang et al. Nat Commun. 2019.

. 2019 Sep 25;10(1):4354.

doi: 10.1038/s41467-019-12342-y.

Authors

Shangying Wang¹, Kai Fan², Nan Luo¹, Yangxiaolu Cao¹, Feilun Wu¹, Carolyn Zhang¹, Katherine A Heller², Lingchong You^{3

4

5}

Affiliations

¹ Department of Biomedical Engineering, Duke University, Durham, NC, 27708, USA.
² Department of Statistical Science, Duke University, Durham, NC, 27708, USA.
³ Department of Biomedical Engineering, Duke University, Durham, NC, 27708, USA. you@duke.edu.
⁴ Center for Genomic and Computational Biology, Duke University, Durham, NC, 27708, USA. you@duke.edu.
⁵ Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC, 27708, USA. you@duke.edu.

PMID: 31554788
PMCID: PMC6761138
DOI: 10.1038/s41467-019-12342-y

Abstract

For many biological applications, exploration of the massive parametric space of a mechanism-based model can impose a prohibitive computational demand. To overcome this limitation, we present a framework to improve computational efficiency by orders of magnitude. The key concept is to train a neural network using a limited number of simulations generated by a mechanistic model. This number is small enough such that the simulations can be completed in a short time frame but large enough to enable reliable training. The trained neural network can then be used to explore a much larger parametric space. We demonstrate this notion by training neural networks to predict pattern formation and stochastic gene expression. We further demonstrate that using an ensemble of neural networks enables the self-contained evaluation of the quality of each prediction. Our work can be a platform for fast parametric space screening of biological models with user defined objectives.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
Using an artificial neural network to emulate a mechanism-based model. Here a hypothetic biological network and the corresponding mechanistic model are shown. The mechanistic model is used to generate a training data set, which is used to train a neural network. Depending on the specific mechanistic model, the trained neural network can be orders of magnitude faster, enabling exploration of a much larger parametric space of the system

**Fig. 2**
Neural network training and performance. We generated 10⁵ simulated spatial distributions using our partial differential equation (PDE) model and split the data into three groups: 80% for training, 10% for validation and 10% for test. We used root mean squared errors (RMSEs) to evaluate the differences between data generated by the mechanism-based model and data generated by the neural network. a Accuracy of the trained neural network. The top panel shows the predicted distributions by the neural network plotted against the distributions generated by numerical simulations. The bottom panel shows the peak values predicted by the neural network plotted against the peak values generated by numerical simulations. Perfect alignment corresponds to the y = x line. The test sample size is s (=10,000). Each spatial distribution consists of 501 discrete points; thus, the top panel consists of 5,010,000 points. b Representative distributions predicted by neural network from test dataset. Each blue line represents a predicted distribution using the trained neural network; the corresponding red dashed line represents that generated by a numerical simulation. Additional examples are shown in Supplementary Figs. 3 and 4. c Identifying the appropriate data size for reliable training. The top panel shows the RMSE between distributions generated by the neural network and the distributions generated by numerical simulations as a function of an increasing training data size. The bottom panel shows the RMSE of peak-value predictions as a function of an increasing training data size. The RMSEs are calculated based on predictions of a test dataset, which contains 20,000 samples

**Fig. 3**
The trained neural network predicts novel patterns. We used the neural network to screen 10⁸ parameter combinations to search for three-ring patterns. We then used the mechanism-based model to test accuracy of predicted patterns. We tested 1284 three-ring patterns and the mean value of the RMSEs between neural network predicted distributions and PDE simulations is 0.079 and the standard deviation is 0.008. The distributions shown in red are from training data set. The other distributions are from the screening process (top) and the corresponding results generated by the mechanism-based model for validation (bottom). In four examples, the neural network predictions are validated. In one, the neural prediction is incorrect. The RMSE values of these distributions (from left to right) are: 0.0099, 0.015, 0.0097, 0.039, 0.031, and 0.41

**Fig. 4**
Ensemble predictions enable self-contained evaluation of the prediction accuracy. a Schematic plot of ensemble prediction. With each new parameter set (different combinations of parameters), we used several trained neural networks to predict the distribution independently. Though these networks have the same architecture, the training process has an intrinsically stochastic component due to random initialization and backpropagation. There might be multiple solutions for a certain output to be reached due to non-linearity. Despite the difference in parameterization of trainable variables (Supplementary Fig. 6a), the different neural networks trained from the same data overall make similar predictions. Based on these independent predictions, we can get a finalized prediction using a voting algorithm. b The disagreement in predictions (DP) is positively correlated with the error in prediction (EP). We calculated the disagreement in predictions (averaged RMSE between predictions from different neural networks) and error of final prediction (RMSE between final ensemble prediction and PDE simulation) for all samples in test data set (red dots). We then divided all data into 15 equally spaced bins (in log scale) and calculated the mean and standard deviation for each bin (red line). Error bar represents one standard error. The positive correlation suggests that the consensus between neural networks represents a self-contained metric for the reliability of each prediction. We showed three sample predictions with different degrees of accuracy: sample 1: DP = 0.14, EP = 0.19; sample 2: DP = 0.024, EP = 0.038; sample 3: DP = 0.0068, EP = 0.0056

**Fig. 5**
Neural network predictions enable comprehensive exploration of pattern formation dynamics. a Ensemble of deep neural networks enables screening through a vast parametric space. The parametric space consists of 13 parameters that were varied uniformly in the provided ranges (Supplementary Table 1). For each instance, we randomly generated all the varying parameters and used the neural network to predict the peak and distributional values for each parameter combination. We collected 10⁸ instances and discarded predictions with disagreement between ensemble predictions larger than 0.1. We then projected all the instances on all the possible 2 parameter plane. The majority of the instances generated patterns with no ring (gray), and they were distributed all over the projected parametric planes. Due to the huge number of instances, the parametric distribution of no ring (grey), one-ring (green), two-rings (blue) patterns on the projected 2D planes partially overlap. From the distribution of neural network predicted three-ring patterns (orange) over all the possible 2D parameter planes, the critical constraints to generate three-ring patterns are revealed: large domain radius (D), large synthesis rate of T7RNAP (α_T), small synthesis rate of T7 lysozyme (α_L), small half activation constant of T7RNAP (K_T), small half activation distance for gene expression ( $K_{φ}$ ). The analysis also suggested correlations between K_T and α_C (cell growth rate on agar), K_T and α_T, D and α_C. b–d Neural network predictions facilitate the evaluation the objective function of interest (generation of three-ring patterns). Based on the analysis above, we sought to further identify the correlation between K_T and α_C, K_T and α_T, D and α_C. For each of the screening, we varied two parameters of interest and fixed the rest. We collected 10⁷ instances and discarded predictions with disagreement between ensemble predictions larger than 0.1. We found generation of three-ring patterns requires a negative correlation between D and α_C and a negative correlation between K_T and α_C. We also found a positive linear correlation between K_T and α_T. α_A = 0.5, α = 0.5, β = 0.5, $K_{\emptyset} = 0.3$ , $n = 0.5$ , $α_{L} = 0.3$ , $K_{C} = 0.5$ , $K_{P} = 0.5$ , $d_{A} = 0.5$ , b $α_{C} = 0.5$ , $D = 1.0$ . c $α_{T} = 0.8$ , $D = 1.0$ . d $α_{T} = 0.8$ , $K_{T} = 0.3$

See this image and copyright information in PMC

Cited by

A novel framework for the evaluation of coastal protection schemes through integration of numerical modelling and artificial intelligence into the Sand Engine App.
Kumar P, Leonardi N. Kumar P, et al. Sci Rep. 2023 May 27;13(1):8610. doi: 10.1038/s41598-023-35801-5. Sci Rep. 2023. PMID: 37244960 Free PMC article.
AI-driven prediction of SARS-CoV-2 variant binding trends from atomistic simulations.
Capponi S, Wang S, Navarro EJ, Bianco S. Capponi S, et al. Eur Phys J E Soft Matter. 2021 Oct 6;44(10):123. doi: 10.1140/epje/s10189-021-00119-5. Eur Phys J E Soft Matter. 2021. PMID: 34613523 Free PMC article.
Deep Neural Networks for Predicting Single-Cell Responses and Probability Landscapes.
Klumpe HE, Lugagne JB, Khalil AS, Dunlop MJ. Klumpe HE, et al. ACS Synth Biol. 2023 Aug 18;12(8):2367-2381. doi: 10.1021/acssynbio.3c00203. Epub 2023 Jul 19. ACS Synth Biol. 2023. PMID: 37467372 Free PMC article.
Systems Biology Approaches to Understanding COVID-19 Spread in the Population.
Marković S, Salom I, Djordjevic M. Marković S, et al. Methods Mol Biol. 2024;2745:233-253. doi: 10.1007/978-1-0716-3577-3_15. Methods Mol Biol. 2024. PMID: 38060190
Predictive biology: modelling, understanding and harnessing microbial complexity.
Lopatkin AJ, Collins JJ. Lopatkin AJ, et al. Nat Rev Microbiol. 2020 Sep;18(9):507-520. doi: 10.1038/s41579-020-0372-5. Epub 2020 May 29. Nat Rev Microbiol. 2020. PMID: 32472051 Review.

See all "Cited by" articles

References

1. Kitano H. Systems biology: a brief overview. Science. 2002;295:1662–1664. doi: 10.1126/science.1069492. - DOI - PubMed
1. Kitano H. Computational systems biology. Nature. 2002;420:206–210. doi: 10.1038/nature01254. - DOI - PubMed
1. Tomlin CJ, Axelrod JD. Biology by numbers: mathematical modelling in developmental biology. Nat. Rev. Genet. 2007;8:331–340. doi: 10.1038/nrg2098. - DOI - PubMed
1. Bottaro S, Lindorff-Larsen K. Biophysical experiments and biomolecular simulations: a perfect match? Science. 2018;361:355–360. doi: 10.1126/science.aat4010. - DOI - PubMed
1. Bruggeman FJ, Westerhoff HV. The nature of systems biology. Trends Microbiol. 2007;15:45–50. doi: 10.1016/j.tim.2006.11.003. - DOI - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 GM098642/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Massive computational acceleration by using neural networks to emulate mechanism-based biological models

Affiliations

Massive computational acceleration by using neural networks to emulate mechanism-based biological models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources