Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Aug;32(8):761-780.
doi: 10.1089/cmb.2024.0879. Epub 2025 May 30.

Enhancing Data Compression: Recent Innovations in LZ77 Algorithms

Affiliations
Review

Enhancing Data Compression: Recent Innovations in LZ77 Algorithms

Aaron Hong et al. J Comput Biol. 2025 Aug.

Abstract

The growing volume of genomic data, driven by advances in sequencing technologies, demands efficient data compression solutions. Traditional algorithms, such as Lempel-Ziv77 (LZ77), have been fundamental in offering lossless compression, yet they often fall short when applied to the highly repetitive structures typical of genomic sequences. This review explores the evolution of LZ77 and its adaptations for genomic data compression, highlighting specialized algorithms designed to handle redundancy in large-scale sequencing datasets efficiently. Innovations in this field have enhanced compression ratios and processing efficiencies leveraging intrinsic redundancy within genomic datasets. We critically examine a spectrum of LZ77-based algorithms, including newer adaptations for external and semi-external memory settings, and contrast their efficacy in managing large-scale genomic data. We conducted experiments to evaluate the performance of several algorithms, including KKP2, RLE-LZ, SE-KKP, BGone, and PFP-LZ77, on both real-world datasets from the Pizza&Chili repetitive corpus, Salmonella genomes, and human chromosome 19 genomes. These results underscore the trade-offs between time and memory consumption between algorithms. This article aims to provide a comprehensive guide on the current landscape and future directions of data compression technologies, equipping bioinformaticians and other practitioners with insight to tackle the escalating data challenges in genomics and beyond.

Keywords: Burrows–Wheeler transform; LZ77; data compression; data structures; suffix array.

PubMed Disclaimer

References

    1. Bingmann T, Fischer J, Osipov V, et al. Inducing Suffix and LCP Arrays in External Memory. ACM J Exp Algorithmics 2016;21:1–27.
    1. Boucher C, et al. PFP Compressed Suffix Trees. In: Proceedings of the SIAM Symposium on Algorithm Engineering and Experiments (ALENEX). 2021; pp. 60–72. - PMC - PubMed
    1. Büren F, et al. Suffix Array Construction on Multi-GPU Systems. In: Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing. 2019; pp. 183–194.
    1. Burrows M, Wheeler DJ. A Block-Sorting Lossless Data Compression Algorithm. Technical Report 0769518966, Digital Equipment Corporation; 1994.
    1. Castelli R, et al. Lossless compression of nanopore sequencing raw signals. In: Proceedings of the Bioinformatics and Biomedical Engineering (IWBBIO). Springer Nature; 2024; pp. 130–141.

LinkOut - more resources