Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Feb 19;9(9):9921-9945.
doi: 10.1021/acsomega.3c05913. eCollection 2024 Mar 5.

Machine Learning and Deep Learning in Synthetic Biology: Key Architectures, Applications, and Challenges

Affiliations
Review

Machine Learning and Deep Learning in Synthetic Biology: Key Architectures, Applications, and Challenges

Manoj Kumar Goshisht. ACS Omega. .

Abstract

Machine learning (ML), particularly deep learning (DL), has made rapid and substantial progress in synthetic biology in recent years. Biotechnological applications of biosystems, including pathways, enzymes, and whole cells, are being probed frequently with time. The intricacy and interconnectedness of biosystems make it challenging to design them with the desired properties. ML and DL have a synergy with synthetic biology. Synthetic biology can be employed to produce large data sets for training models (for instance, by utilizing DNA synthesis), and ML/DL models can be employed to inform design (for example, by generating new parts or advising unrivaled experiments to perform). This potential has recently been brought to light by research at the intersection of engineering biology and ML/DL through achievements like the design of novel biological components, best experimental design, automated analysis of microscopy data, protein structure prediction, and biomolecular implementations of ANNs (Artificial Neural Networks). I have divided this review into three sections. In the first section, I describe predictive potential and basics of ML along with myriad applications in synthetic biology, especially in engineering cells, activity of proteins, and metabolic pathways. In the second section, I describe fundamental DL architectures and their applications in synthetic biology. Finally, I describe different challenges causing hurdles in the progress of ML/DL and synthetic biology along with their solutions.

PubMed Disclaimer

Conflict of interest statement

The author declares no competing financial interest.

Figures

Figure 1
Figure 1
An overview of the advances in ML/DL and synthetic biology since the 1960s.
Figure 2
Figure 2
Schematic representation of machine learning scenarios and mathematical frameworks. (A) SML in which data sets involve ground truth labels. (B) UML in which data sets do not involve ground truth labels. (C) Reinforcement learning where interaction between an algorithmic agent and simulated environment takes place. (D) Linear regression/classification that can be employed to fit models in which the output is a scalar value and data can be predicted by a straight line. (E) Support vector machines locate a separating hyper-plane that parts data into classes. (F) RFs employ the “bagging” technique to construct complete decision trees (DTs) in parallel using random bootstrap instances of the data sets and attributes. RFs select the most labels between different randomized DTs. (G) k-NN is employed for both regression as well as classification, and the input comprises the k nearest training instances in the data set. The output relies on whether the k-NN is employed for regression or classification. (H) NNs generally form a feedforward network of weights in which inputs trigger the hidden layers which give output. However, NNs also form a feedback network in which NNs learn by back-propagation through the networks.
Figure 3
Figure 3
Applications of ML in cell engineering. ML can be employed for (i) improving gene expression, (ii) bettering tools for altering cellular functions, and (iii) upgrading protein search and design.
Figure 4
Figure 4
ART gives predictions and recommendations for the following cycle. ART employs experimental data for (i) constructing a probable predictive representation that predicts response from input variables and (ii) utilizes this model to give a set of recommended inputs for the following experiment that will assist in reaching the desired goal. The predicted response for the directed inputs is specified as an entire probability distribution, efficiently quantifying unpredictability. Instances have relevance to each of the diverse examples of input and response employed for training the algorithm. Reproduced with permission from ref (54). (Licensed under a Creative Commons Attribution 4.0: http://creativecommons.org/licenses/by/4.0/). Copyright 2020, Radivojević et al. Nature Research.
Figure 5
Figure 5
Applications of ML in metabolic engineering systems. In general, a metabolic engineering venture can be divided into three parts: (i) metabolic pathway design, (ii) boosting cells for production, and (iii) upgrading industrial operations for product yield. Numerous computing tools have been developed to direct designing throughout the process. (A) One can design pathways for the synthesis of target products by employing predicted genomic functions or proven chemical reactions. It can assist in locating hosts with inherent industrial applicability. (B) To increase production titer, frequency, and productivity, strains are engineered. Mechanistic techniques leverage the understanding of fundamental biology to predict metabolite synthesis. On the other hand, data-driven methods use patterns found in massive data sets to recommend improvements. Subsequent initiatives have attempted to integrate the two methodologies to boost predictive power. (C) The output of downstream bioprocesses is maximized. The time needed to adapt a lab strain for industrial output can be significantly decreased with in silico prediction.
Figure 6
Figure 6
Library consists of 280,000 random 50 nucleotide oligomers as 5′ untranslated regions (UTRs) for enhanced green fluorescent protein (eGFP). (A) Shows the usage of a 5′ UTR to assess the potential of 5′ UTR single nucleotide variants (SNVs) and engineer state-of-the-art sequences for prime protein expression. (B) The construction of the library of 280,000 members by the insertion of a T7 promoter accompanied by 25 nucleotides of stipulated 5′ UTR pattern, a random 50-nucleotide pattern, and the eGFP coding sequences (CDSs) into the backbone of a plasmid. In vitro transcribed (IVT) library mRNA was generated by in vitro transcription from a linear DNA template acquired by a polymerase chain reaction from the plasmid library. HEK293T cells were transfected with IVT library mRNA; cells were collected after 12 h; and polysome fractions were then collected and sequenced. In vitro transcribed library mRNA transfected HEK293T cells were recovered after 12 h, and then polysome profiling was conducted. For each UTR, read counts per fraction were utilized to calculate mean ribosome load (MRL), and the resulting information was employed to train a CNN. (C) The uAUGs (out-of-frame upstream start codons) decrease ribosome loading (positions that are in frame with the enhanced green fluorescent protein coding sequences are shown by the vertical lines). Analogous but very weak periodicity was observed in the case of GUGs and CUGs. (D) Shows the repressive efficacy of all out-of-frame variance of NNNAUGNN. (E) Shows the nucleotide frequencies deliberated for the 20 least repressive (weak) and most repressive (strong) translation initiation site sequences. Adapted with permission from ref (90). Copyright 2019, Nature Publishing Group.
Figure 7
Figure 7
DL enabled applications of synthetic biology. (A) Representative cases of pertinent inputs to DL networks and their allied output predictions. (B) Given a fresh input, predictions can be made using deep learning. Using a desired output as a starting point, models can likewise be utilized in reverse to produce new designs.
Figure 8
Figure 8
Learning a proposed plan in 24 h. (A) Training of reinforcement learning agent was conducted online for 24 h on a model comprising five parallel chemostats. (B) Shows the reward obtained from the surroundings. Despite a little standard difference in reward, all five chemostats had been relocated to the intended population levels by the completion of the simulation. (C) Exhibit the population curve of one chemostat. The population levels change, and random actions are conducted throughout the exploration phase. When the exploring rate declines, the population levels approach the target values. Reproduced from ref (150) (an open access article distributed under the terms of the Creative Commons Attribution License). Copyright 2020, Treloar et al.
Figure 9
Figure 9
(A) Application of the distribution of simple genetic circuits among bacterial populations to solve chemically produced 2 × 2 maze issues by selectively articulating four distinct fluorescent proteins. Reproduced with permission from ref (158). Copyright 2021, American Chemical Society (https://pubs.acs.org/doi/10.1021/acssynbio.1c00279, further permissions related to the material excerpted should be directed to the ACS). (B) Synthetic in vitro TxTl-based perceptron comprised of WSO linked to a thresholding function. Reproduced with permission from ref (159). Copyright 2022, American Chemical Society (https://pubs.acs.org/doi/10.1021/acssynbio.1c00596, further permissions related to the material excerpted should be directed to the ACS).
Figure 10
Figure 10
(A) Challenges of amalgamating ML/DL techniques with applications of synthetic biology. (B) A standard ML/DL framework can help synthetic biology research. The intermediate stages are typically the center of attention, yet the foundation is critical and requires massive resource investment.

Similar articles

Cited by

References

    1. Gersbach C. Genome engineering: The next genomic revolution. Nat. Methods 2014, 11, 1009–1011. 10.1038/nmeth.3113. - DOI - PubMed
    1. Doudna J.; Charpentier E. Genome editing: The new frontier of genome engineering with CRISPR-Cas9. Science 2014, 346 (6213), 1258096.10.1126/science.1258096. - DOI - PubMed
    1. Cameron D.; Bashor C.; Collins J. A brief history of synthetic biology. Nature Reviews Microbiology 2014, 12, 381–390. 10.1038/nrmicro3239. - DOI - PubMed
    1. Kotopka B. J.; Smolke C. D. Model-driven generation of artificial yeast promoters. Nat. Commun. 2020, 11, 2113.10.1038/s41467-020-15977-4. - DOI - PMC - PubMed
    1. Jumper J.; Evans R.; Pritzel A.; Green T.; Figurnov M.; Ronneberger O.; Tunyasuvunakool K.; Bates R.; Žídek A.; Potapenko A.; Bridgland A.; Meyer C.; Kohl S. A. A.; Ballard A. J.; Cowie A.; Romera-Paredes B.; Nikolov S.; Jain R.; Adler J.; Back T.; Petersen S.; Reiman D.; Clancy E.; Zielinski M.; Steinegger M.; Pacholska M.; Berghammer T.; Bodenstein S.; Silver D.; Vinyals O.; Senior A. W.; Kavukcuoglu K.; Kohli P.; Hassabis D. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. 10.1038/s41586-021-03819-2. - DOI - PMC - PubMed

LinkOut - more resources