Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep;9(9):342.
doi: 10.1007/s13205-019-1868-4. Epub 2019 Aug 24.

A high storage density strategy for digital information based on synthetic DNA

Affiliations

A high storage density strategy for digital information based on synthetic DNA

Shufang Zhang et al. 3 Biotech. 2019 Sep.

Abstract

DNA has been recognized as a promising natural medium for information storage. The expensive DNA synthesis process makes it an important challenge to utilize DNA nucleotides optimally and increase the storage density. Thus, a novel scheme is proposed for the storage of digital information in synthetic DNA with high storage density and perfect error correction capability. The proposed strategy introduces quaternary Huffman coding to compress the binary stream of an original file before it is converted into a DNA sequence. The proposed quaternary Huffman coding is based on the statistical properties of the source and can gain a very high compression ratio for files with a non-uniform probability distribution of the source. Consequently, the amount of information that each base can store increases, and the storage density is also improved. In addition, quaternary Hamming code with low redundancy is proposed to correct errors occurring in the synthesis and sequencing. We have successfully converted a total of 5.2 KB of files into 3934 bits in DNA bases. The results of biological experiment indicate that the storage density of the proposed scheme is higher than that of state-of-the-art schemes.

Keywords: DNA information storage; Quaternary Huffman code; Storage density; Synthetic DNA.

PubMed Disclaimer

Conflict of interest statement

Conflict of interestOn behalf of all authors, the corresponding author states that there is no conflict of interest.

Figures

Fig. 1
Fig. 1
The framework of the proposed DNA information storage strategy
Fig. 2
Fig. 2
The proposed DNA encoding process
Fig. 3
Fig. 3
An example of quaternary Huffman coding with eight source symbols
Fig. 4
Fig. 4
The process of quaternary Huffman coding
Fig. 5
Fig. 5
Compression ratios of Song of Mulan, circbw.tif, sp01.wav, and file.rar with different scanning bits
Fig. 6
Fig. 6
a Coding time of Song of Mulan, circbw.tif, sp01.wav, and file.rar with different scanning bits. b Coding time of Song of Mulan and circbw.tif with different scanning bits
Fig. 7
Fig. 7
Input data for the biological experiment: a Song of Mulan, b Twinkle Twinkle Little Star, and c circbw.tif
Fig. 8
Fig. 8
Biological experiment: a DNA pool, b the electrophoretogram of enzyme samples, and c marker
Fig. 9
Fig. 9
Theoretical and actual storage densities of previous storage schemes and the proposed scheme

References

    1. Ailenberg M, Rotstein O. An improved Huffman coding method for archiving text, images, and music characters in DNA. Biotechniques. 2009;47:747–754. doi: 10.2144/000113218. - DOI - PubMed
    1. Akram F, Haq IU, Ali H, Laghari AT. Trends to store digital data in DNA: an overview. Mol Biol Rep. 2018;45:1479–1490. doi: 10.1007/s11033-018-4280-y. - DOI - PubMed
    1. Babu HMH, Mia MS, Biswas AK. Efficient techniques for fault detection and correction of reversible circuits. J Electron Test. 2017;33:591. doi: 10.1007/s10836-017-5679-4. - DOI
    1. Bancroft C, Bowler T, Bloom B, Clelland CT. Long-term storage of information in DNA. Science. 2001;5536:1763–1765. doi: 10.1126/science.293.5536.1763c. - DOI - PubMed
    1. Blawat M, Gaedkea K, Hütter I, Chen XM, Turczyk B, Inverso S, Pruitt BW, Church GM. Forward error correction for DNA data storage. Procedia Comput Sci. 2016;80:1011–1022. doi: 10.1016/j.procs.2016.05.398. - DOI

LinkOut - more resources