Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 9:9:612893.
doi: 10.3389/fbioe.2021.612893. eCollection 2021.

Multiomics Data Collection, Visualization, and Utilization for Guiding Metabolic Engineering

Affiliations

Multiomics Data Collection, Visualization, and Utilization for Guiding Metabolic Engineering

Somtirtha Roy et al. Front Bioeng Biotechnol. .

Abstract

Biology has changed radically in the past two decades, growing from a purely descriptive science into also a design science. The availability of tools that enable the precise modification of cells, as well as the ability to collect large amounts of multimodal data, open the possibility of sophisticated bioengineering to produce fuels, specialty and commodity chemicals, materials, and other renewable bioproducts. However, despite new tools and exponentially increasing data volumes, synthetic biology cannot yet fulfill its true potential due to our inability to predict the behavior of biological systems. Here, we showcase a set of computational tools that, combined, provide the ability to store, visualize, and leverage multiomics data to predict the outcome of bioengineering efforts. We show how to upload, visualize, and output multiomics data, as well as strain information, into online repositories for several isoprenol-producing strain designs. We then use these data to train machine learning algorithms that recommend new strain designs that are correctly predicted to improve isoprenol production by 23%. This demonstration is done by using synthetic data, as provided by a novel library, that can produce credible multiomics data for testing algorithms and computational tools. In short, this paper provides a step-by-step tutorial to leverage these computational tools to improve production in bioengineered strains.

Keywords: biofuels; flux analysis; machine learning; metabolic engineering; multiomics analysis; synthetic biology.

PubMed Disclaimer

Conflict of interest statement

NH declares financial interests in TeselaGen Biotechnologies, and Ansa Biotechnologies. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Combining several tools to guide metabolic engineering. The combination of ICE, EDD, and ART provides the ability to store, visualize and leverage multiomics data to guide bioengineering. Here, we showcase how to use this collection of tools to improve the production of isoprenol in E. coli for a simulated data set.
Figure 2
Figure 2
Demonstrating ICE, EDD, and ART using synthetic data. For the purposes of the demonstration of how ICE, EDD, and ART work together, we use a synthetic data set of multiomics data (transcriptomics, proteomics, metabolomics, fluxomics) for several time points created by the Omics Mock Generator (OMG) library (see Methods section). We start with a base strain (wild type, or WT) that is bioengineered according to several designs (i.e., knockout malate dehydrogenase, overexpress citrate synthase) suggested by ART. The results are 95 bioengineered strains (BE1, BE2… etc.) for which experimental data (isoprenol production levels) are simulated through OMG and stored in EDD and ICE. These data are then leveraged by ART to recommend, using machine learning, new designs that are expected to improve isoprenol production (REC1, REC2, …). These recommendations and production predictions are compared with the ground truth provided by OMG. Each of these steps (in orange) is demonstrated through screencasts and Jupyter notebooks (Table 1).
Figure 3
Figure 3
Visualizing data in EDD. EDD provides data visualization in the form of bar and line charts. The lower menu provides filtering options to facilitate comparison of lines. More sophisticated visualization can be achieved by pulling the data from EDD through the REST API.
Figure 4
Figure 4
Generating multiomics time series data. For each time point, we generate fluxes by solving an FBA problem, until glucose is fully consumed. MOMA is used in conjunction with the design (e.g., increase MDH flux 2-fold, knock CS out, maintain PTAr) to predict fluxes for the strain bioengineered according to the design.
Figure 5
Figure 5
Using machine learning to predict production and recommend new designs. The ART library takes a DataFrame containing input designs (i.e., which fluxes to overexpress, 2, keep the same, 1, or knock out, 0) and isoprenol production (response). The trained model recommends new designs that have the highest production. The recommendations come with predictions of production in a probabilistic fashion: i.e., the probability of production of 10, 15, 25, 40 mMol, etc.
Figure 6
Figure 6
ART recommendations display production levels production very similar to predictions. Left panel compares cross-validated predictions for isoprenol production from ART versus the values obtained through the OMG library for the training data set. Cross-validation keeps a part of the data set hidden from the training to compare against predictions, providing a good idea of the quality of predictions for new data sets. The right panel compares the predicted production for the recommended strain (#97) vs the actual production as generated through the OMG library. The comparison indicates a very good agreement between the prediction and observation.
Figure 7
Figure 7
Storing strain information in ICE. ICE provides a standardized repository to store information for DNA parts and plasmids, proteins, microbial host strains, and plant seeds. These data will be linked to the experimental data contained in EDD through the part ID, to be present in the experiment description file.
Figure 8
Figure 8
Importing data into EDD. The new data import into EDD is divided into three parts: an initial choice of the data category, the protocol used to gather the data, and the file format used for the data. Once these are chosen, the data is uploaded for future visualization and use with, e.g., machine learning algorithms or mechanistic models.
Figure 9
Figure 9
Exporting data from EDD into an executable Jupyter notebook for downstream processing. The EDD study web address (A) provides the server (magenta) and the slug (red) to export the study data in the form of a pandas DataFrame into a Jupyter notebook (B). Once in a DataFrame format in a Jupyter notebook, a plethora of Python libraries are available for visualization, mechanistic modeling or machine learning.
Figure 10
Figure 10
ART also provides a frontend that does not require coding. The frontend can be found at https://art.agilebiofoundry.org/ and provides the main functionality of the ART library (Figure 5) in an intuitive interface. The frontend also provides a REST API that users with coding experience can leverage to use Berkeley Lab's compute resources for running ART, or to trigger ART runs automatically from other code.

References

    1. Ajikumar P. K., Xiao W.-H., Tyo K. E. J., Wang Y., Simeon F., Leonard E., et al. . (2010). Isoprenoid pathway optimization for Taxol precursor overproduction in Escherichia coli. Science 330, 70–74. 10.1126/science.1191652 - DOI - PMC - PubMed
    1. Beller H. R., Lee T. S., Katz L. (2015). Natural products as biofuels and bio-based chemicals: fatty acids and isoprenoids. Nat. Prod. Rep. 32, 1508–1526. 10.1039/C5NP00068H - DOI - PubMed
    1. Bryksin A. V., Brown A. C., Baksh M. M., Finn M. G., Barker T. H. (2014). Learning from nature - novel synthetic biology approaches for biomaterial design. Acta Biomater. 10, 1761–1769. 10.1016/j.actbio.2014.01.019 - DOI - PMC - PubMed
    1. Canton B., Labno A., Endy D. (2008). Refinement and standardization of synthetic biological parts and devices. Nat. Biotechnol. 26, 787–793. 10.1038/nbt1413 - DOI - PubMed
    1. Carbonell P., Radivojevic T., García Martín H. (2019). Opportunities at the intersection of synthetic biology, machine learning, and automation. ACS Synth. Biol. 8, 1474–1477. 10.1021/acssynbio.8b00540 - DOI - PubMed

LinkOut - more resources