. 2023 Sep 12;120(37):e2217330120.

doi: 10.1073/pnas.2217330120. Epub 2023 Sep 5.

Parallel molecular computation on digital data stored in DNA

Boya Wang^#¹, Siyuan Stella Wang^#^{2

3}, Cameron Chalk¹, Andrew D Ellington^{2

3}, David Soloveichik¹

Affiliations

¹ Electrical and Computer Engineering, University of Texas at Austin, Austin, TX 78712.
² Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712.
³ Department of Chemistry and Biochemistry, University of Texas at Austin, Austin, TX 78712.

^# Contributed equally.

PMID: 37669382
PMCID: PMC10500265
DOI: 10.1073/pnas.2217330120

Parallel molecular computation on digital data stored in DNA

Boya Wang et al. Proc Natl Acad Sci U S A. 2023.

. 2023 Sep 12;120(37):e2217330120.

doi: 10.1073/pnas.2217330120. Epub 2023 Sep 5.

Authors

Boya Wang^#¹, Siyuan Stella Wang^#^{2

3}, Cameron Chalk¹, Andrew D Ellington^{2

3}, David Soloveichik¹

Affiliations

¹ Electrical and Computer Engineering, University of Texas at Austin, Austin, TX 78712.
² Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712.
³ Department of Chemistry and Biochemistry, University of Texas at Austin, Austin, TX 78712.

^# Contributed equally.

PMID: 37669382
PMCID: PMC10500265
DOI: 10.1073/pnas.2217330120

Abstract

DNA is an incredibly dense storage medium for digital data. However, computing on the stored information is expensive and slow, requiring rounds of sequencing, in silico computation, and DNA synthesis. Prior work on accessing and modifying data using DNA hybridization or enzymatic reactions had limited computation capabilities. Inspired by the computational power of "DNA strand displacement," we augment DNA storage with "in-memory" molecular computation using strand displacement reactions to algorithmically modify data in a parallel manner. We show programs for binary counting and Turing universal cellular automaton Rule 110, the latter of which is, in principle, capable of implementing any computer algorithm. Information is stored in the nicks of DNA, and a secondary sequence-level encoding allows high-throughput sequencing-based readout. We conducted multiple rounds of computation on 4-bit data registers, as well as random access of data (selective access and erasure). We demonstrate that large strand displacement cascades with 244 distinct strand exchanges (sequential and in parallel) can use naturally occurring DNA sequence from M13 bacteriophage without stringent sequence design, which has the potential to improve the scale of computation and decrease cost. Our work merges DNA storage and DNA computing, setting the foundation of entirely molecular algorithms for parallel manipulation of digital information preserved in DNA.

Keywords: DNA computation; DNA storage; molecular programming; strand displacement.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

**Fig. 1.**
Overview of SIMD $| |$ DNA. (A) Instead of outsourcing the computation process to an electronic computer with required sequencing and synthesis steps, SIMD $| |$ DNA allows in-memory computation performed by DNA itself. (B) Analogous to SIMD computation in classical computing, in SIMD $| |$ DNA, a set of instruction strands is added to the solution to simultaneously react with all the registers (multistranded complexes with information encoded in the pattern of nicks and exposed single-stranded regions). Magnetic beads (blue) labeling of registers allows separation of unreacted instruction strands and displaced reaction products. (C) Notation: Vertically aligned Top and Bottom domains are complementary unless specifically labeled (e.g., $a$ and $a^{*}$ , with “*” indicating complementarity). We assume that domains bind only their complements. Dashed strands share the same sequence as their vertically aligned Bottom strands. (D) Instruction strands induce three types of events: attachment, displacement, and detachment. The toehold exchange reaction in (iv) is considered irreversible since instruction strands are at high concentration. (E) Experimental workflow: Registers are assembled through annealing and attached to magnetic beads, followed by computation. The postcomputation process of ligation and PCR amplification prepares for sequencing readout. Yellow dots indicate Top-Bottom strand mismatches, allowing data readout after ligation erases the nick information (secondary sequence–based encoding).

**Fig. 2.**
Binary counting program on naturally occurring (M13) sequences. (A) Molecular program implementing binary counting. The example register (Top) encoding the initial value updates to the new value (Bottom) after 7 instructions. Strands are labeled by colors: value $1$ (purple), value $0$ (pink), intermediates (others). Solid boxes show the instruction strands and the configuration of the register before reacting. Dashed boxes explain the logical meaning of the instructions. (B) Locations of registers on M13mp18 and the encoding of 0 and 1. In the text, M13. $i$ refers to registers at location choice $i$ . The yellow circle on one of the Top strands encoding the value 1 indicates a single-base mismatch with respect to the Bottom strand. (C) NGS results of SISD binary counting (40 ℃) on M13.8 registers. For each initial value, the distribution of the output values is represented in the heat-map matrix. The lower bar plot shows an example of the distribution of output values in one row (input 1011) of the heat map. (D) NGS results of SIMD binary counting (40 ℃) on M13.8 registers. (E) NGS results of SIMD binary counting (50 ℃) on M13.7 and M13.9 registers in parallel. In the heat maps, black borders indicate the correct output value; the percent value is shown if the output value is at least $25 %$ of all reads for a given sample.

**Fig. 3.**
SIMD cellular automaton Rule 110 computation. (A) Postcomputation process for the Rule 110 program with chemically synthesized DNA. After computation, “seal” strands are added to fill in the gap for cells representing bit $1$ for the following ligation step. Strands complementary to the barcode sequences are added to selectively displace the register with certain initial values. (B) NGS results of SIMD Rule 110 computation (25 ℃) on $16$ registers with unique initial values. The Leftmost and Rightmost cells of the input only have one neighbor; thus, they undergo a modified Rule 110 update which proceeds as if the missing neighbor were $0$ (see *SI Appendix*, section S10 for the implementation of this boundary condition). The registers were composed of chemically synthesized DNA. The correct output value is indicated with a black border; values that appear in at least $25 %$ of all reads for a given sample are printed.

**Fig. 4.**
NGS results for random access: Series of selective retrieval operations following Rule 110 computation and selective erasure after the binary counting computation. (A) Registers with initial value 0011 were accessed first (*Left*), 1001 second (*Middle*), and all remaining values last (*Right*) by adding the query strands corresponding to their barcodes. The query strand concentration was lower than the estimated register concentration; thus, displacement achieved partial extraction of registers. Registers reacted with instruction strands at 25 ℃. (B) The remaining registers after selective erasure are plotted. The query strand concentration was higher than the estimated register concentration; thus, displacement achieved full erasure of registers. The green bars indicate the fraction of the readout registers with the corresponding barcode. Registers reacted with Instruction 1 strands at 40 ℃ and all other instruction strands at 25 ℃. For both (A) and (B), the registers were composed of chemically synthesized DNA.

**Fig. 5.**
Multiple rounds of SIMD computation. (A) Three rounds of Rule 110 computation were performed on an input mix with 5 distinct initial values (columns). *Lower* panels show the distribution of output values for initial values 0001 and 1101. Registers reacted with instruction strands at 25 ℃. The reaction time for every instruction was 2 min except for Instruction 2, which was 10 min. (B) The mix of inputs that underwent selective erasure from Fig. 4B went through another round of binary counting computation. Registers reacted with Instruction 1 strands at 40 ℃, and all other instruction strands at 25 ℃. For (A) and (B), the registers were composed of chemically synthesized DNA.

See this image and copyright information in PMC

References

1. Church G. M., Gao Y., Kosuri S., Next-generation digital information storage in DNA. Science 337, 1628 (2012). - PubMed
1. Ceze L., Nivala J., Strauss K., Molecular digital data storage using DNA. Nat. Rev. Genet. 20, 456–466 (2019). - PubMed
1. Meiser L. C., et al. , Synthetic DNA applications in information technology. Nat. Commun. 13, 1–13 (2022). - PMC - PubMed
1. Organick L., et al. , Random access in large-scale DNA data storage. Nat. Biotechnol. 36, 242–248 (2018). - PubMed
1. S. M. H. T. Yazdi et al., Random-access DNA-based storage system. Sci. Rep. 5, 14138 (2015). - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Parallel molecular computation on digital data stored in DNA

Affiliations

Parallel molecular computation on digital data stored in DNA

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources