Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 4:PP.
doi: 10.1109/TCBBIO.2025.3586008. Online ahead of print.

Stable DNA Storage Encoding Scheme Based on Repeating Substring Tree

Stable DNA Storage Encoding Scheme Based on Repeating Substring Tree

Jieqiong Wu et al. IEEE Trans Comput Biol Bioinform. .

Abstract

DNA storage is considered to be a promising storage media in the current era of data explosion. DNA encoding is the beginning of the DNA storage process and lays the foundation for subsequent processes. However, many encoding methods suffer from low encoding rate, do not satisfy important constraints, or have insufficient sequence stability. To address these issues and improved sequences stability, this paper proposes a novel approach called the Repeating Substring Tree Encoding (RSTE) method. The method begins by applying the Longest Substring Backtracking Method (LSBM) to identify the longest repeated substrings within the binary file. These substrings are then encoded into compact DNA motifs using Huffman encoding. In contrast to the ideal coding density of 2 bits per nucleotide (2 bit/nt) targeted by previous studies, RSTE enhances the encoding rate by 13% through efficient utilization of repeated substrings. Furthermore, the DNA sequences generated by the RSTE method successfully meet three biological constraints: run-length limitation, GC content balance and end constraints. The experimental results of minimum free energy and melting temperature indicate that the stability of the sequences encoded by RSTE is also greatly improved. A series of experiments showed that the sequences encoded by RSTE have a higher coding rate, satisfy constraints, and are more stable.

PubMed Disclaimer

Similar articles

LinkOut - more resources