Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jul 18:11:20.
doi: 10.1186/s13015-016-0083-7. eCollection 2016.

A representation of a compressed de Bruijn graph for pan-genome analysis that enables search

Affiliations

A representation of a compressed de Bruijn graph for pan-genome analysis that enables search

Timo Beller et al. Algorithms Mol Biol. .

Erratum in

Abstract

Background: Recently, Marcus et al. (Bioinformatics 30:3476-83, 2014) proposed to use a compressed de Bruijn graph to describe the relationship between the genomes of many individuals/strains of the same or closely related species. They devised an [Formula: see text] time algorithm called splitMEM that constructs this graph directly (i.e., without using the uncompressed de Bruijn graph) based on a suffix tree, where n is the total length of the genomes and g is the length of the longest genome. Baier et al. (Bioinformatics 32:497-504, 2016) improved their result.

Results: In this paper, we propose a new space-efficient representation of the compressed de Bruijn graph that adds the possibility to search for a pattern (e.g. an allele-a variant form of a gene) within the pan-genome. The ability to search within the pan-genome graph is of utmost importance and is a design goal of pan-genome data structures.

Keywords: Backward search; Burrows–Wheeler transform; Compressed de Bruijn graph; Pan-genome analysis.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The de Bruijn graph for k=3 and the string ACTACGTACGTACG$ is shown on the left, while its compressed counterpart is shown on the right
Fig. 2
Fig. 2
Explicit representation of the compressed de Bruijn graph from Fig. 1
Fig. 3
Fig. 3
Implicit representation of the compressed de Bruijn graph from Fig. 1
Fig. 4
Fig. 4
The string ω must be split if the length k prefix of cω is a right-maximal repeat or the length k prefix of ω is a left-maximal repeat
Fig. 5
Fig. 5
The string ω has u as suffix and u has P[m-k+1..m] as prefix

References

    1. Schneeberger K, Hagmann J, Ossowski S, Warthmann N, Gesing S, Kohlbacher O, Weigel D. Simultaneous alignment of short reads against multiple genomes. Genome Biol. 2009;10(9):98. doi: 10.1186/gb-2009-10-9-r98. - DOI - PMC - PubMed
    1. Huang L, Popic V, Batzoglou S. Short read alignment with populations of genomes. Bioinformatics. 2013;29(13):361–370. doi: 10.1093/bioinformatics/btt215. - DOI - PMC - PubMed
    1. Rahn R, Weese D, Reinert K. Journaled string tree-a scalable data structure for analyzing thousands of similar genomes on your laptop. Bioinformatics. 2014;30(24):3499–3505. doi: 10.1093/bioinformatics/btu438. - DOI - PubMed
    1. Dilthey A, Cox C, Iqbal Z, Nelson MR, McVean G. Improved genome inference in the MHC using a population reference graph. Nat Genet. 2015;47(6):682–688. doi: 10.1038/ng.3257. - DOI - PMC - PubMed
    1. Marcus S, Lee H, Schatz MC. SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips. Bioinformatics. 2014;30(24):3476–3483. doi: 10.1093/bioinformatics/btu756. - DOI - PMC - PubMed

LinkOut - more resources