A representation of a compressed de Bruijn graph for pan-genome analysis that enables search

Timo Beller¹, Enno Ohlebusch¹

Affiliations

PMID: 27437028
PMCID: PMC4950428
DOI: 10.1186/s13015-016-0083-7

A representation of a compressed de Bruijn graph for pan-genome analysis that enables search

Timo Beller et al. Algorithms Mol Biol. 2016.

. 2016 Jul 18:11:20.

doi: 10.1186/s13015-016-0083-7. eCollection 2016.

Authors

Timo Beller¹, Enno Ohlebusch¹

Affiliation

¹ Institute of Theoretical Computer Science, Ulm University, James-Franck-Ring O27/537, 89069 Ulm, Germany.

PMID: 27437028
PMCID: PMC4950428
DOI: 10.1186/s13015-016-0083-7

Erratum in

Erratum to: A representation of a compressed de Bruijn graph for pan-genome analysis that enables search.
Beller T, Ohlebusch E. Beller T, et al. Algorithms Mol Biol. 2016 Nov 28;11:28. doi: 10.1186/s13015-016-0090-8. eCollection 2016. Algorithms Mol Biol. 2016. PMID: 27933096 Free PMC article.

Abstract

Background: Recently, Marcus et al. (Bioinformatics 30:3476-83, 2014) proposed to use a compressed de Bruijn graph to describe the relationship between the genomes of many individuals/strains of the same or closely related species. They devised an [Formula: see text] time algorithm called splitMEM that constructs this graph directly (i.e., without using the uncompressed de Bruijn graph) based on a suffix tree, where n is the total length of the genomes and g is the length of the longest genome. Baier et al. (Bioinformatics 32:497-504, 2016) improved their result.

Results: In this paper, we propose a new space-efficient representation of the compressed de Bruijn graph that adds the possibility to search for a pattern (e.g. an allele-a variant form of a gene) within the pan-genome. The ability to search within the pan-genome graph is of utmost importance and is a design goal of pan-genome data structures.

Keywords: Backward search; Burrows–Wheeler transform; Compressed de Bruijn graph; Pan-genome analysis.

PubMed Disclaimer

Figures

**Fig. 1**
The de Bruijn graph for $k = 3$ and the string ACTACGTACGTACG$ is shown on the *left*, while its compressed counterpart is shown on the *right*

**Fig. 2**
Explicit representation of the compressed de Bruijn graph from Fig. 1

**Fig. 3**
Implicit representation of the compressed de Bruijn graph from Fig. 1

**Fig. 4**
The string $ω$ must be split if the length k prefix of $c ω$ is a *right-maximal* repeat or the length k prefix of $ω$ is a *left-maximal* repeat

**Fig. 5**
The string $ω$ has u as suffix and u has $P [m - k + 1 . . m]$ as prefix

See this image and copyright information in PMC

References

1. Schneeberger K, Hagmann J, Ossowski S, Warthmann N, Gesing S, Kohlbacher O, Weigel D. Simultaneous alignment of short reads against multiple genomes. Genome Biol. 2009;10(9):98. doi: 10.1186/gb-2009-10-9-r98. - DOI - PMC - PubMed
1. Huang L, Popic V, Batzoglou S. Short read alignment with populations of genomes. Bioinformatics. 2013;29(13):361–370. doi: 10.1093/bioinformatics/btt215. - DOI - PMC - PubMed
1. Rahn R, Weese D, Reinert K. Journaled string tree-a scalable data structure for analyzing thousands of similar genomes on your laptop. Bioinformatics. 2014;30(24):3499–3505. doi: 10.1093/bioinformatics/btu438. - DOI - PubMed
1. Dilthey A, Cox C, Iqbal Z, Nelson MR, McVean G. Improved genome inference in the MHC using a population reference graph. Nat Genet. 2015;47(6):682–688. doi: 10.1038/ng.3257. - DOI - PMC - PubMed
1. Marcus S, Lee H, Schatz MC. SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips. Bioinformatics. 2014;30(24):3476–3483. doi: 10.1093/bioinformatics/btu756. - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A representation of a compressed de Bruijn graph for pan-genome analysis that enables search

Affiliation

A representation of a compressed de Bruijn graph for pan-genome analysis that enables search

Authors

Affiliation

Erratum in

Abstract

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources