Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 16;11(1):5246.
doi: 10.1038/s41467-020-18681-5.

Photon-directed multiplexed enzymatic DNA synthesis for molecular digital data storage

Affiliations

Photon-directed multiplexed enzymatic DNA synthesis for molecular digital data storage

Howon Lee et al. Nat Commun. .

Abstract

New storage technologies are needed to keep up with the global demands of data generation. DNA is an ideal storage medium due to its stability, information density and ease-of-readout with advanced sequencing techniques. However, progress in writing DNA is stifled by the continued reliance on chemical synthesis methods. The enzymatic synthesis of DNA is a promising alternative, but thus far has not been well demonstrated in a parallelized manner. Here, we report a multiplexed enzymatic DNA synthesis method using maskless photolithography. Rapid uncaging of Co2+ ions by patterned UV light activates Terminal deoxynucleotidyl Transferase (TdT) for spatially-selective synthesis on an array surface. Spontaneous quenching of reactions by the diffusion of excess caging molecules confines synthesis to light patterns and controls the extension length. We show that our multiplexed synthesis method can be used to store digital data by encoding 12 unique DNA oligonucleotide sequences with video game music, which is equivalent to 84 trits or 110 bits of data.

PubMed Disclaimer

Conflict of interest statement

H.L., K.G., and G.M.C. have filed a patent application for the method described in this paper. The remaining authors declare no conflict of interest.

Figures

Fig. 1
Fig. 1. Overview of photon-directed multiplexed enzymatic DNA synthesis system.
a An array surface derivatized with single-stranded DNA initiator oligonucleotide is brought into contact with a master mix containing the appropriate buffers, Co2+ divalent cation cofactor, TdT enzyme, the desired nucleotide to be incorporated (dXTP), and photolabile DMNP-EDTA caging molecule provided in excess. All Co2+ ions are initially complexed with DMNP-EDTA, which causes TdT to remain in an inactive state until needed. Using photolithography, patterned UV light at 365 nm illuminates the array’s surface causing the complexed DMNP-EDTA to degrade releasing Co2+ ions and activates TdT in a spatially selective manner. The UV light is then turned off and the reaction is allowed to incubate for a short period of time. During this incubation, excess, non-complexed DMNP-EDTA spontaneously quenches the extension reaction by chelating free Co2+ causing active TdT to become inactive. The array surface is then washed and either the next synthesis cycle begins or material is retrieved from the surface for downstream applications. b Arrays are mounted into a simple flow cell with a reaction chamber inlet and outlet to waste or collection. Individually addressable patterning is a major advantage of our synthesis method, which is provided by the generation of reflective on-demand dynamic masks from the DMD through a ×10 objective from a collimated UV light source.
Fig. 2
Fig. 2. Multiplex enzymatic synthesis optimization and base transition normalization.
a Demonstration of UV irradiation of the array surface with 100 µm circular spots arranged in a (3 ⋅ 4) patterned format on 1.2 mm2 of surface area. UV irradiation is not limited to this particular patterning and may be pixel-wise (1920 × 1080) changed on-demand with our photolithographic system. Any spot on the surface is individually addressable in terms of spatial location and the total amount of UV irradiation time. b Visualization of G homopolymeric oligonucleotide synthesis post system optimization via the splint-end ligation of a probe sequence containing a 3-Cy3 fluorophore using the (3 ⋅ 4) pattern. c Results of base transition normalization in which the total illumination time was adjusted for any base transition that may be encountered during multiplexed synthesis. The left axis indicates the composition of the last 4 bases on the 3-terminus of surface initiator oligonucleotide and the top axis indicates the nucleotide that was incorporated onto the respective initiator oligonucleotide. The optimal illumination time is indicated in each base transition box and was determined by splint-end ligated 3-Cy3 fluorescent signal. d Box plots indicating NGS analysis of base transition normalization. Graphs show the normalized nucleotide (nt) extension length distribution for all possible base transitions with red pluses being statistical outliers. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Demonstration of multi-cycle and multiplex enzymatic synthesis.
a An overview of multi-cycle, single-plex synthesis of a heteropolymer oligonucleotide comprised of 8 unique base transitions. Each cycle represents the addition of a single-nucleotide base type at all 12 spots on the (3 ⋅ 4) array. Synthesis results in short homopolymeric blocks of A, T, G, or C at variable lengths. b Following multi-cycle synthesis and NGS, raw sequence reads were extracted and filtered by the presence of the adapter sequence added by splint-end ligation after the final C extension. The box-plot indicates a statistical representation of the number of extension events for each homopolymeric block for the synthesized sequence GATGTAGAC with red pluses being statistical outliers. Note that only sequencing reads that contained all eight base transitions were used to generate the box-plot and that synthesis started with a string of four G on the 3-terminus of the initiator oligonucleotide. c Denaturing gel electrophoresis analysis of each individual cycle of synthesis. Each lane represents the material retrieved of an individual flow cell after the appropriate number of cycles were performed. No final C extension or splint-end ligation occurred for this analysis. d An overview of multi-cycle, multiplex synthesis of 12 unique heteropolymer oligonucleotides on the (3 ⋅ 4) array. Each synthesis step contained the individual cycles for the addition A, T, G, and C at the appropriate spots on the array. Sequence barcodes indicating the physical location of each oligonucleotide on the array can be built into the initiator oligonucleotide or synthesized by TdT. For each cycle, a unique mask was generated by the system’s DMD to locally activate TdT based on the desired sequence to be synthesized. e For example, the masks needed for the 4th step of an eight-step multiplex oligonucleotide synthesis run is shown with the appropriate post-synthesis sequencing data to verify spatially selective synthesis. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Demonstration of digital music data storage in DNA oligonucleotides enzymatically synthesized in multiplex: music data storage process.
a A snapshot of the simplified melody from the two first measures of the 1985 Nintendo Entertainment System video game Super Mario Brothers “Overworld Theme” piano sheet music. Each note and rest of the melody was indexed as #0 through #11 and stored into one of the 12 unique DNA oligonucleotides synthesized in multiplex on the (3 ⋅ 4) array. The full piano sheet music is shown in Supplementary Fig. 7. b Indexed notes and rests were assigned a note number based on a modified Musical Instrument Digital Information note chart, which also indicates note octave. In addition to the note, the encoding scheme also stores the order and duration each note is to be played. All rests are assigned to the note G#0, which is inaudible (25.9 Hz) to normal human hearing. A full overview of MIDI conversion of sheet music to digital data is shown in Supplementary Fig. 8. c Digital music data is translated to ternary and the unique DNA sequences are mapped to unique base transitions using a conversion map. The left axis of the table represents the string of bases at the 3′- terminus of the initiator oligonucleotide and the top axis indicates the nucleotide to be incorporated next. Supplementary Figure 9 indicates a table outlining the mapped “Overworld Theme” musical melody to DNA sequences with all relevant data and information. d Generated DNA sequences are enzymatically synthesized with TdT using our multiplex photolithographic system. e Synthesized oligonucleotides are retrieved from the flow cell surface and stored in tubes for sequencing or other downstream applications. f To read and decode stored digital music data, DNA oligonucleotides synthesized in multiplex can be sequenced with NGS such as Illumina or nanopore methodologies. g From sequencing data, the decoding process converts sequencing information back to musical notes. A sinusoidal wave generator is used to play the “Overworld Theme” melody in the correct note order with the proper duration in true sound.
Fig. 5
Fig. 5. Demonstration of digital music data storage in DNA oligonucleotides enzymatically synthesized in multiplex: decoding and data analysis.
a During the decoding process, several filters are applied to extract and align the reads that contain the digital music data to the expected template sequences. Template alignment was performed using the Smith–Waterman algorithm. b From this, error analysis can be performed to determine the quality of multiplex synthesis. The upper histogram indicates the percentage of insertions, deletions, or mismatches that occurred in the filtered sequencing reads. The bottom histogram indicates the number of reads containing errors for each possible base transition across the array. c Sequencing data also yields statistical information regarding the extension length distribution for each base transition for all 12 oligonucleotides synthesized in multiplex. For example, subset 6, which indicates sequence index 6 is shown in the box plot. All other subsets are indicated in Supplementary Figs. 10 and 11. d Additional statistics such as the extension length distribution for all possible transitions from the entire array were analyzed as shown in the indicated box plot with red pluses being statistical outliers. Source data are provided as a Source Data file.

References

    1. Church GM, Gao Y, Kosuri S. Next-generation digital information storage in DNA. Science. 2012;337:1628. doi: 10.1126/science.1226355. - DOI - PubMed
    1. Goldman N, et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature. 2013;494:77–80. doi: 10.1038/nature11875. - DOI - PMC - PubMed
    1. Ceze L, Nivala J, Strauss K. Molecular digital data storage using DNA. Nat. Rev. Genet. 2019;20:456–466. doi: 10.1038/s41576-019-0125-3. - DOI - PubMed
    1. Koch J, et al. A DNA-of-things storage architecture to create materials with embedded memory. Nat. Biotechnol. 2019;38:39–43. doi: 10.1038/s41587-019-0356-z. - DOI - PubMed
    1. Anavy, L., Vaknin, I., Atar, O., Amit, R. & Yakhini, Z. Data storage in DNA with fewer synthesis cycles using composite DNA letters. Nat. Biotechnol. 10.1038/s41587-019-0240-x (2019). - PubMed

Publication types

MeSH terms