Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 26;7(48):eabi6714.
doi: 10.1126/sciadv.abi6714. Epub 2021 Nov 24.

Scaling DNA data storage with nanoscale electrode wells

Affiliations

Scaling DNA data storage with nanoscale electrode wells

Bichlien H Nguyen et al. Sci Adv. .

Abstract

Synthetic DNA is an attractive medium for long-term data storage because of its density, ease of copying, sustainability, and longevity. Recent advances have focused on the development of new encoding algorithms, automation, preservation, and sequencing technologies. Despite progress in these areas, the most challenging hurdle in deployment of DNA data storage remains the write throughput, which limits data storage capacity. We have developed the first nanoscale DNA storage writer, which we expect to scale DNA write density to 25 × 106 sequences per square centimeter, three orders of magnitude improvement over existing DNA synthesis arrays. We show confinement of DNA synthesis to an area under 1 square micrometer, parallelized over millions of nanoelectrode wells and then successfully write and decode a message in DNA. DNA synthesis on this scale will enable write throughputs to reach megabytes per second and is a key enabler to a practical DNA data storage system.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. DNA data storage requires higher synthesis throughput than is possible with current techniques.
(A to D) Overview of the DNA data storage pipeline. (A) Digital data are encoded from their binary representation into sequences of DNA bases, with an identifier that correlates them with a data object, addressing information that is used to reorder the data when reading, and redundant information that is used for error correction. (B) These sequences are synthesized into DNA oligonucleotides and stored. (C) At retrieval time, the DNA molecules are selected and copied via PCR or other methods and sequenced back into electronic representations of the bases in these sequences. (D) The decoding process takes this noisy and sometimes incomplete set of sequencing reads, corrects for errors and missing sequences, and decodes the information to recover the data. (E) Summary of the commercial synthesis processes and corresponding estimated oligonucleotide densities, as reported in the literature or by the companies themselves (see text S2). Our electrochemical method density is highlighted in dark red.
Fig. 2.
Fig. 2.. Overview of 650-nm array pitched 2 μm.
(A) Finite element analysis of anodic acid generation and diffusion at a 650-nm-diameter electrode with a 200-nm well is depicted with a cross-sectional view along the y = x plane and (B) top-down view on the z = 0 plane. The colors blue and yellow represent regions with relatively low and high acid concentrations, respectively. (C) An overview of the nanoscale DNA synthesis array with scanning electron microscopy images of the 650-nm electrode array and enlarged view of one electrode. (D) A fluorescent image in which the well surrounding each activated anode is patterned with AAA-fluorescein. The cartoon diagram depicts which electrodes in the layout were activated. (E) Illustration of the wells patterned with AAA-fluorescein and AAA-AquaPhluor and (F) corresponding image overlay of the two fluorophores at the end of DNA synthesized on the same 650-nm electrode array.
Fig. 3.
Fig. 3.. Errors stemming from synthesis followed by sequencing.
(A) Insertions (Ins), deletions (Del), and substitutions (Sub) per position for a synthesized and PCR-amplified 180-base sequence. (B) Electrophoresis image of synthesis products after PCR amplification. (C) Message encoded into 64 bytes split into four unique sequences of 104 bases (top). Insertions, deletions, and substitutions per locus of each of the four sequences in the multiplex synthesis run. In every error analysis graph, the terminal 20 bases at both 3′ and 5′ ends come from the primers used in PCR and are not representative of the synthesized errors.

References

    1. Goldman N., Bertone P., Chen S., Dessimoz C., LeProust E. M., Sipos B., Birney E., Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494, 77–80 (2013). - PMC - PubMed
    1. Erlich Y., Zielinski D., DNA fountain enables a robust and efficient storage architecture. Science 355, 950–954 (2017). - PubMed
    1. Ceze L., Nivala J., Strauss K., Molecular digital data storage using DNA. Nat. Rev. Genet. 20, 456–466 (2019). - PubMed
    1. Chen W. D., Kohll A. X., Nguyen B. H., Koch J., Heckel R., Stark W. J., Ceze L., Strauss K., Grass R. N., Combining data longevity with high storage capacity—Layer-by-layer DNA encapsulated in magnetic nanoparticles. Adv. Funct. Mater. 29, 1901672 (2019).
    1. Kohll A. X., Antkowiak P. L., Chen W. D., Nguyen B. H., Stark W. J., Ceze L., Strauss K., Grass R. N., Stabilizing synthetic DNA for long-term data storage with Earth alkaline salts. Chem. Commun. 56, 3613–3616 (2020). - PubMed

LinkOut - more resources