Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Aug;42(8):1002-1016.
doi: 10.1016/j.tibtech.2024.02.002. Epub 2024 Feb 27.

Cryptographic approaches to authenticating synthetic DNA sequences

Affiliations
Review

Cryptographic approaches to authenticating synthetic DNA sequences

Casey-Tyler Berezin et al. Trends Biotechnol. 2024 Aug.

Abstract

In a bioeconomy that relies on synthetic DNA sequences, the ability to ensure their authenticity is critical. DNA watermarks can encode identifying data in short sequences and can be combined with error correction and encryption protocols to ensure that sequences are robust to errors and securely communicated. New digital signature techniques allow for public verification that a sequence has not been modified and can contain sufficient information for synthetic DNA to be self-documenting. In translating these techniques from bacteria to more complex genetically modified organisms (GMOs), special considerations must be made to allow for public verification of these products. We argue that these approaches should be widely implemented to assert authorship, increase the traceability, and detect the unauthorized use of synthetic DNA.

Keywords: DNA cryptography; DNA watermark; cyberbiosecurity; digital signature; genetic engineering attribution.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests J.P. and S.P. have financial interests in GenoFAB, Inc. This company may benefit or be perceived as benefiting from this publication.

Figures

Figure 1
Figure 1. Watermarking techniques allow information storage in DNA.
(A, B, C) Substitution ciphers (yellow) encode messages in DNA by assigning each character to a short DNA sequence. Throughout, every other character of a message and its respective DNA/binary sequences are bolded for visibility. (A) DNA watermarks can be transformed into bacteria [21,22], (B) made more economical by decreasing the number of bases per character [23] and/or (C) more secure by splitting the message across multiple fragments [24]. (D, E, F) Methods using binary code (green) to encode messages in DNA add a layer of protection against interceptors. (D) DNA binary strings composed of START, STOP, 0-bit and 1-bit sequences can be visualized on a gel. The message X is mixed with dummy fragments Y1 and Y2 (collectively Y) and the message is decrypted by subtracting the image of the dummy strands from the mixed image (X+Y). Each method relies on the interpreter knowing the key primer sequences (half arrows) and/or the 0- and 1-bit primer sequences (D), which produce fragments from START to each 0 or 1, which are read from the bottom up [25]. (E) Binary messages can be encrypted with single-use keys. An “exclusive OR” calculation is performed wherein different bits produce 0 and matching bits produce 1 (bolded) [26]. (F) Characters encoded by 8-bit ASCII values can be encrypted with a secret key, converted into 6-bit binary sequences and embedded only into synonymous codons (black) wherein 0 and 1 represent how common the codon is [27]. (G) Discrete wavelet transform (DWT) coefficients can be calculated to find optimal synonymous codon subsequences (blue) [29]. Created with BioRender.com.
Figure 2
Figure 2. Error detection and correction techniques in DNA watermarks.
(A) A predictable arrangement of one nucleotide makes it easy to confirm watermark integrity [23]. (B) Discrepancies between a watermark sequence and an obtained sequence can be visualized on a DNA dot plot [33]. (D) The presence of a parity bit (bold) in the binary code of each character verifies sequence identity [32]. While error detection can be rather simple (A, B, D; yellow), error correction techniques (C, E, F; green) are more involved. (C) Four information-containing bits are interspersed with four parity bits (underlined) in the construction of 1-base error-correcting 8/4 Hamming codes. The integrity of a resulting byte (blue) can be confirmed through a series of calculations on individual bits (h0-h7) [31]. (E) Hamming codes can be combined with additional encryption methods, such as cyclic permutation, to increase the security of the message [37]. (F) Groups of six characters can be arranged in 2D space, converted to base-four binary (0–3) and a block sum check for error correction performed. The sum of each column and row (after modulo 4) is converted to a nucleotide and appended to the linear DNA sequence, which is embedded into single nucleotide polymorphisms (SNPs) in the genome. Decryption follows the encryption process in reverse, and a single error can be corrected based on the expected parity nucleotides for a certain row and column [38]. Created with BioRender.com.
Figure 3
Figure 3. Digital signature techniques allow DNA sequences to be publicly verified.
Signature techniques aim to securely encode more information into DNA than watermarks and are based around public-key cryptography methods that allow for wide, public verification of DNA sequences. An identity-based signature method developed for plasmids compresses the original message (the sequence itself) through one of two algorithms (resulting in two different sized signatures) which utilizes the signer’s private key, which is derived from their ORCID and issued by a trusted third party, to encrypt the hashed message. Additional information (e.g. plasmid ID, an error correction code (ECC), or a self-documenting fragment encoding a Genbank file with gene annotations) is added to the signature cassette. The signature and plasmid itself are verified through a simple process on a publicly available web server [48]. Created with BioRender.com.
Figure 4, Key Figure.
Figure 4, Key Figure.. Watermark and digital signature methods vary in terms of security and reliability.
Security includes features like the encryption and secrecy of the message, whereas reliability represents the ease with which the message can be retrieved (i.e., mutation resistance, error correction, availability of tools). Watermarks using substitution ciphers are generally insecure due to their simple nature [–24,32,33], while more complex systems like binary code [–28,31,37,38,54] or discrete wavelet transform [29] increase the security of the message. Digital signatures are more secure than watermarks due to the use of more complex encryption algorithms and public key cryptography but vary in terms of their reliability [48]. The presence of error correcting mechanisms is critical for long-term information storage in DNA.

Similar articles

Cited by

References

    1. Hughes RA and Ellington AD (2017) Synthetic DNA Synthesis and Assembly: Putting the Synthetic in Synthetic Biology. Cold Spring Harbor Perspectives in Biology 9, a023812. 10.1101/cshperspect.a023812 - DOI - PMC - PubMed
    1. Goeddel DV et al. (1979) Expression in Escherichia coli of chemically synthesized genes for human insulin. Proceedings of the National Academy of Sciences 76, 106–110. 10.1073/pnas.76.1.106 - DOI - PMC - PubMed
    1. Voigt CA (2020) Synthetic biology 2020–2030: six commercially-available products that are changing our world. Nature Communications 11, 6379. 10.1038/s41467-020-20122-2 - DOI - PMC - PubMed
    1. Chiarabelli C et al. (2013) Chemical synthetic biology: a mini-review. Frontiers in microbiology 4, 285. - PMC - PubMed
    1. Peccoud J et al. (2018) Cyberbiosecurity: from naive trust to risk awareness. Trends in biotechnology 36, 4–7 - PubMed

LinkOut - more resources