Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 13;18(11):e0294469.
doi: 10.1371/journal.pone.0294469. eCollection 2023.

Enabling AI in synthetic biology through Construction File specification

Affiliations

Enabling AI in synthetic biology through Construction File specification

Nassim Ataii et al. PLoS One. .

Abstract

The Construction File (CF) specification establishes a standardized interface for molecular biology operations, laying a foundation for automation and enhanced efficiency in experiment design. It is implemented across three distinct software projects: PyDNA_CF_Simulator, a Python project featuring a ChatGPT plugin for interactive parsing and simulating experiments; ConstructionFileSimulator, a field-tested Java project that showcases 'Experiment' objects expressed as flat files; and C6-Tools, a JavaScript project integrated with Google Sheets via Apps Script, providing a user-friendly interface for authoring and simulation of CF. The CF specification not only standardizes and modularizes molecular biology operations but also promotes collaboration, automation, and reuse, significantly reducing potential errors. The potential integration of CF with artificial intelligence, particularly GPT-4, suggests innovative automation strategies for synthetic biology. While challenges such as token limits, data storage, and biosecurity remain, proposed solutions promise a way forward in harnessing AI for experiment design. This shift from human-driven design to AI-assisted workflows, steered by high-level objectives, charts a potential future path in synthetic biology, envisioning an environment where complexities are managed more effectively.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Shorthand representations of a cloning strategy.
(A) Conventional illustration of a cloning strategy, visually detailing PCR, Digestion, and Ligation steps. (B) Equivalent strategy represented in Construction File Shorthand. Each step begins with an operation (blue), followed by operation-specific inputs, often sequence names (magenta). The final token in each step (orange) denotes the product, encapsulating the outcome of the operation. This shorthand format provides a structured, machine-readable alternative to traditional illustrations.
Fig 2
Fig 2. Polynucleotide object representation for simulating molecular biology operations.
The hypothetical DNA ’polyA’ is a linear, double-stranded DNA previously cut with BamHI, dephosphorylated, and subsequently cut with XmaI. In the Polynucleotide object representation, the fully duplexed DNA portion is captured as the "sequence". Single-stranded overhangs are represented by the coding strand sequence as ext5 and ext3, denoting the overhangs on the left and right of the diagram, respectively. Modifications at the ends are indicated by enumerated types as mod_ext5 and mod_ext3. The simulation of an EcoRI digestion of this DNA would yield two fragments, indexed as 0 and 1. The ’fragmentSelection’ field of the shorthand statement is set to 0, resulting in ’polyB’ being returned as depicted. In the simulation software, Polynucleotides serve as dynamic representations of DNAs, reflecting their states as they undergo operation-specific transformations to yield expected products. Simulation software currently supports PCR, Digest, Ligate, GoldenGate, Gibson, and Transform operations.
Fig 3
Fig 3. Zero-shot natural language processing interpretation of construction files by ChatGPT.
After being prompted with the shorthand specification document, ChatGPT (GPT-4) demonstrates its ability to interpret plasmid construction text from a scientific paper into a construction file with high accuracy. This demonstration underscores the potential of A.I. to automatically extract construction files from scientific literature, opening new possibilities for large-scale, automated analysis of genetic engineering experiments from unstructured archival text. Partial, illustrative representation; see supplemental for complete chat.
Fig 4
Fig 4. Simulation of Invasin Construction File in a script editor.
SimulatorView, a simple GUI included with ConstructionFileSimulator, accepts the text of a construction file and outputs the product of the final step. In this instance, the GUI is provided with the steps parsed by ChatGPT, along with the sequences of the three input plasmid sequences. The complete document can be found in the supplementary file ’invasin_cf.txt’. Upon clicking ’run’, the construction file is simulated step-by-step. The resulting sequence of pBACr-AraInvasin aligns with the expected map and is consistent with sequenced isolates, demonstrating the accuracy and utility of the simulation.

References

    1. Hillson NJ, Rosengarten RD, Keasling JD. j5 DNA assembly design automation software. ACS Synth. Biol. 2012;1(1):14–21. doi: 10.1021/sb2000116 - DOI - PubMed
    1. Benchling. https://www.benchling.com/
    1. Davis MW, Jorgensen EM. ApE, A Plasmid Editor: A Freely Available DNA Manipulation and Visualization Program. Front. Bioinform. 2022;2:818619. 10.3389/fbinf.2022.818619 - DOI - PMC - PubMed
    1. SnapGene. https://www.snapgene.com/
    1. Gorelenkov V, Antipov A, Lejnine S, Daraselia N, Yuryev A. Set of novel tools for PCR primer design. Biotechniques. 2001;31(6):1326–1330. doi: 10.2144/01316bc04 - DOI - PubMed