. 2018 Sep 4;19(1):311.

doi: 10.1186/s12859-018-2319-7.

BrownieAligner: accurate alignment of Illumina sequencing data to de Bruijn graphs

Mahdi Heydari^{1

2}, Giles Miclotte^{1

2}, Yves Van de Peer^{2

3

4

5}, Jan Fostier^{6

7}

Affiliations

¹ Department of Information Technology, Ghent University-imec, IDLab, Ghent, B-9052, Belgium.
² Bioinformatics Institute Ghent, Ghent, B-9052, Belgium.
³ Center for Plant Systems Biology, VIB, Ghent, B-9052, Belgium.
⁴ Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, B-9052, Belgium.
⁵ Department of Genetics, Genome Research Institute, University of Pretoria, Pretoria, South Africa.
⁶ Department of Information Technology, Ghent University-imec, IDLab, Ghent, B-9052, Belgium. jan.fostier@ugent.be.
⁷ Bioinformatics Institute Ghent, Ghent, B-9052, Belgium. jan.fostier@ugent.be.

PMID: 30180801
PMCID: PMC6122196
DOI: 10.1186/s12859-018-2319-7

BrownieAligner: accurate alignment of Illumina sequencing data to de Bruijn graphs

Mahdi Heydari et al. BMC Bioinformatics. 2018.

. 2018 Sep 4;19(1):311.

doi: 10.1186/s12859-018-2319-7.

Authors

Mahdi Heydari^{1

2}, Giles Miclotte^{1

2}, Yves Van de Peer^{2

3

4

5}, Jan Fostier^{6

7}

Affiliations

¹ Department of Information Technology, Ghent University-imec, IDLab, Ghent, B-9052, Belgium.
² Bioinformatics Institute Ghent, Ghent, B-9052, Belgium.
³ Center for Plant Systems Biology, VIB, Ghent, B-9052, Belgium.
⁴ Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, B-9052, Belgium.
⁵ Department of Genetics, Genome Research Institute, University of Pretoria, Pretoria, South Africa.
⁶ Department of Information Technology, Ghent University-imec, IDLab, Ghent, B-9052, Belgium. jan.fostier@ugent.be.
⁷ Bioinformatics Institute Ghent, Ghent, B-9052, Belgium. jan.fostier@ugent.be.

PMID: 30180801
PMCID: PMC6122196
DOI: 10.1186/s12859-018-2319-7

Abstract

Background: Aligning short reads to a reference genome is an important task in many genome analysis pipelines. This task is computationally more complex when the reference genome is provided in the form of a de Bruijn graph instead of a linear sequence string.

Results: We present a branch and bound alignment algorithm that uses the seed-and-extend paradigm to accurately align short Illumina reads to a graph. Given a seed, the algorithm greedily explores all branches of the tree until the optimal alignment path is found. To reduce the search space we compute upper bounds to the alignment score for each branch and discard the branch if it cannot improve the best solution found so far. Additionally, by using a two-pass alignment strategy and a higher-order Markov model, paths in the de Bruijn graph that do not represent a subsequence in the original reference genome are discarded from the search procedure.

Conclusions: BrownieAligner is applied to both synthetic and real datasets. It generally outperforms other state-of-the-art tools in terms of accuracy, while having similar runtime and memory requirements. Our results show that using the higher-order Markov model in BrownieAligner improves the accuracy, while the branch and bound algorithm reduces runtime. BrownieAligner is written in standard C++11 and released under GPL license. BrownieAligner relies on multithreading to take advantage of multi-core/multi-CPU systems. The source code is available at: https://github.com/biointec/browniealigner.

Keywords: Graph alignment; Illumina; Markov Model; Next-generation sequencing; de Bruijn Graph.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

**Fig. 1**
This figure shows the association between the de Bruijn graph and MM tables. On the left side, part of a de Bruijn graph is shown. True paths are depicted by blue lines. The numbers inside each node indicate the multiplicity of that node, i.e., the number of times the node’s sequence is present in the reference genome. A table at each node guides the aligner based on previously observed nodes. The 2-MM and 3-MM tables of node A are shown on the right side. Based on the 2-MM table, reads that align to CA are guided to E as the continuation to node D is not allowed. However, the information in this table is insufficient to guide reads that align to BA since continuations to E and D are both valid. In contrast, the 3-MM table guides the reads that align to FBA to D, and GBA to E. The information in the final row in 3-MM table is redundant because it is also contained in the lower-order 2-MM table

**Fig. 2**
Peak memory usage. Peak memory usage of the aligner tools for simulated datasets

**Fig. 3**
Runtime. Average runtime of tools to align 1M reads for the simulated datasets

**Fig. 4**
Runtime. The effect of branch and bound strategy on the running time of BrownieAligner

**Fig. 5**
Peak memory usage. Peak memory usage of the aligner tools for real datasets

**Fig. 6**
Runtime. Average runtime of tools to align 1M reads for the real datasets

See this image and copyright information in PMC

Cited by

From the reference human genome to human pangenome: Premise, promise and challenge.
Singh V, Pandey S, Bhardwaj A. Singh V, et al. Front Genet. 2022 Nov 10;13:1042550. doi: 10.3389/fgene.2022.1042550. eCollection 2022. Front Genet. 2022. PMID: 36437921 Free PMC article.
Plant graph-based pangenomics: techniques, applications, and challenges.
Du ZZ, He JB, Jiao WB. Du ZZ, et al. aBIOTECH. 2025 Mar 28;6(2):361-376. doi: 10.1007/s42994-025-00206-7. eCollection 2025 Jun. aBIOTECH. 2025. PMID: 40641648 Free PMC article. Review.
Pan-genome de Bruijn graph using the bidirectional FM-index.
Depuydt L, Renders L, Abeel T, Fostier J. Depuydt L, et al. BMC Bioinformatics. 2023 Oct 26;24(1):400. doi: 10.1186/s12859-023-05531-6. BMC Bioinformatics. 2023. PMID: 37884897 Free PMC article.
Label-guided seed-chain-extend alignment on annotated De Bruijn graphs.
Mustafa H, Karasikov M, Mansouri Ghiasi N, Rätsch G, Kahles A. Mustafa H, et al. Bioinformatics. 2024 Jun 28;40(Suppl 1):i337-i346. doi: 10.1093/bioinformatics/btae226. Bioinformatics. 2024. PMID: 38940164 Free PMC article.
The Human Pangenome Project: a global resource to map genomic diversity.
Wang T, Antonacci-Fulton L, Howe K, Lawson HA, Lucas JK, Phillippy AM, Popejoy AB, Asri M, Carson C, Chaisson MJP, Chang X, Cook-Deegan R, Felsenfeld AL, Fulton RS, Garrison EP, Garrison NA, Graves-Lindsay TA, Ji H, Kenny EE, Koenig BA, Li D, Marschall T, McMichael JF, Novak AM, Purushotham D, Schneider VA, Schultz BI, Smith MW, Sofia HJ, Weissman T, Flicek P, Li H, Miga KH, Paten B, Jarvis ED, Hall IM, Eichler EE, Haussler D; Human Pangenome Reference Consortium. Wang T, et al. Nature. 2022 Apr;604(7906):437-446. doi: 10.1038/s41586-022-04601-8. Epub 2022 Apr 20. Nature. 2022. PMID: 35444317 Free PMC article. Review.

See all "Cited by" articles

References

1. Minoche AE, Dohm JC, Himmelbauer H. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems. Genome Biol. 2011;12(11):R112. doi: 10.1186/gb-2011-12-11-r112. - DOI - PMC - PubMed
1. Compeau PEC, Pevzner PA, Tesler G. How to apply de bruijn graphs to genome assembly. Nat Biotechnol. 2011;29(11):987–91. doi: 10.1038/nbt.2023. - DOI - PMC - PubMed
1. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644–52. doi: 10.1038/nbt.1883. - DOI - PMC - PubMed
1. Pell J, Hintze A, Canino-Koning R, Howe A, Tiedje JM, Brown CT. Scaling metagenome sequence assembly with probabilistic de bruijn graphs. Proc Natl Acad Sci. 2012;109(33):13272–7. doi: 10.1073/pnas.1121464109. - DOI - PMC - PubMed
1. Tattini L, D’Aurizio R, Magi A. Detection of genomic structural variants from next-generation sequencing data. Front Bioeng Biotechnol. 2015;3(June):1–8. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

G0C3914N/The Research Foundation - Flanders (FWO)

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

BrownieAligner: accurate alignment of Illumina sequencing data to de Bruijn graphs

Affiliations

BrownieAligner: accurate alignment of Illumina sequencing data to de Bruijn graphs

Authors

Affiliations

Abstract

Conflict of interest statement

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources