Simplitigs as an efficient and scalable representation of de Bruijn graphs
- PMID: 33823902
- PMCID: PMC8025321
- DOI: 10.1186/s13059-021-02297-z
Simplitigs as an efficient and scalable representation of de Bruijn graphs
Abstract
de Bruijn graphs play an essential role in bioinformatics, yet they lack a universal scalable representation. Here, we introduce simplitigs as a compact, efficient, and scalable representation, and ProphAsm, a fast algorithm for their computation. For the example of assemblies of model organisms and two bacterial pan-genomes, we compare simplitigs to unitigs, the best existing representation, and demonstrate that simplitigs provide a substantial improvement in the cumulative sequence length and their number. When combined with the commonly used Burrows-Wheeler Transform index, simplitigs reduce memory, and index loading and query times, as demonstrated with large-scale examples of GenBank bacterial pan-genomes.
Keywords: Data compression; Indexing; Pan-genomes; Scalability; Sequence analysis; Simplitigs; Storage; de Bruijn graph representation; de Bruijn graphs; k-mers.
Conflict of interest statement
The authors declare that they have no competing interests.
Figures







Similar articles
-
Applications of de Bruijn graphs in microbiome research.Imeta. 2022 Mar 1;1(1):e4. doi: 10.1002/imt2.4. eCollection 2022 Mar. Imeta. 2022. PMID: 38867733 Free PMC article. Review.
-
Compression Algorithm for Colored de Bruijn Graphs.Lebniz Int Proc Inform. 2023 Sep;273:17. doi: 10.4230/LIPIcs.WABI.2023.17. Epub 2023 Aug 29. Lebniz Int Proc Inform. 2023. PMID: 38712341 Free PMC article.
-
deGSM: Memory Scalable Construction Of Large Scale de Bruijn Graph.IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2157-2166. doi: 10.1109/TCBB.2019.2913932. Epub 2021 Dec 8. IEEE/ACM Trans Comput Biol Bioinform. 2021. PMID: 31056509
-
Compact representation of k-mer de Bruijn graphs for genome read assembly.BMC Bioinformatics. 2013 Oct 23;14:313. doi: 10.1186/1471-2105-14-313. BMC Bioinformatics. 2013. PMID: 24152242 Free PMC article.
-
Pan-Genome Storage and Analysis Techniques.Methods Mol Biol. 2018;1704:29-53. doi: 10.1007/978-1-4939-7463-4_2. Methods Mol Biol. 2018. PMID: 29277862 Review.
Cited by
-
Applications of de Bruijn graphs in microbiome research.Imeta. 2022 Mar 1;1(1):e4. doi: 10.1002/imt2.4. eCollection 2022 Mar. Imeta. 2022. PMID: 38867733 Free PMC article. Review.
-
Fractional hitting sets for efficient multiset sketching.Algorithms Mol Biol. 2025 Feb 8;20(1):1. doi: 10.1186/s13015-024-00268-0. Algorithms Mol Biol. 2025. PMID: 39923117 Free PMC article.
-
Compression Algorithm for Colored de Bruijn Graphs.Lebniz Int Proc Inform. 2023 Sep;273:17. doi: 10.4230/LIPIcs.WABI.2023.17. Epub 2023 Aug 29. Lebniz Int Proc Inform. 2023. PMID: 38712341 Free PMC article.
-
Matchtigs: minimum plain text representation of k-mer sets.Genome Biol. 2023 Jun 9;24(1):136. doi: 10.1186/s13059-023-02968-z. Genome Biol. 2023. PMID: 37296461 Free PMC article.
-
REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets.Bioinformatics. 2020 Jul 1;36(Suppl_1):i177-i185. doi: 10.1093/bioinformatics/btaa487. Bioinformatics. 2020. PMID: 32657392 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources