On the Hardness of Sequence Alignment on De Bruijn Graphs
- PMID: 36450127
- DOI: 10.1089/cmb.2022.0411
On the Hardness of Sequence Alignment on De Bruijn Graphs
Abstract
The problem of aligning a sequence to a walk in a labeled graph is of fundamental importance to Computational Biology. For an arbitrary graph and a pattern P of length m, a lower bound based on the Strong Exponential Time Hypothesis implies that an algorithm for finding a walk in G exactly matching P significantly faster than time is unlikely. However, for many special graphs, such as de Bruijn graphs, the problem can be solved in linear time. For approximate matching, the picture is more complex. When edits (substitutions, insertions, and deletions) are only allowed to the pattern, or when the graph is acyclic, the problem is solvable in time. When edits are allowed to arbitrary cyclic graphs, the problem becomes NP-complete, even on binary alphabets. Moreover, NP-completeness continues to hold even when edits are restricted to only substitutions. Despite the popularity of the de Bruijn graphs in Computational Biology, the complexity of approximate pattern matching on the de Bruijn graphs remained unknown. We investigate this problem and show that the properties that make the de Bruijn graphs amenable to efficient exact pattern matching do not extend to approximate matching, even when restricted to the substitutions only case with alphabet size four. Specifically, we prove that determining the existence of a matching walk in a de Bruijn graph is NP-complete when substitutions are allowed to the graph. We also demonstrate that an algorithm significantly faster than is unlikely for the de Bruijn graphs in the case where substitutions are only allowed to the pattern. This stands in contrast to pattern-to-text matching where exact matching is solvable in linear time, such as on the de Bruijn graphs, but approximate matching under substitutions is solvable in subquadratic time, where n is the text's length.
Keywords: approximate pattern matching; computational complexity; de Bruijn graphs; sequence alignment.
Similar articles
-
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.BMC Bioinformatics. 2010 Nov 15;11:560. doi: 10.1186/1471-2105-11-560. BMC Bioinformatics. 2010. PMID: 21078174 Free PMC article.
-
AN EFFICIENT ALGORITHM FOR CHINESE POSTMAN WALK ON BI-DIRECTED DE BRUIJN GRAPHS.Discrete Math Algorithms Appl. 2010;1:184-196. doi: 10.1007/978-3-642-17458-2_16. Discrete Math Algorithms Appl. 2010. PMID: 25364472 Free PMC article.
-
Bit-parallel sequence-to-graph alignment.Bioinformatics. 2019 Oct 1;35(19):3599-3607. doi: 10.1093/bioinformatics/btz162. Bioinformatics. 2019. PMID: 30851095 Free PMC article.
-
Pan-genome de Bruijn graph using the bidirectional FM-index.BMC Bioinformatics. 2023 Oct 26;24(1):400. doi: 10.1186/s12859-023-05531-6. BMC Bioinformatics. 2023. PMID: 37884897 Free PMC article.
-
Applications of de Bruijn graphs in microbiome research.Imeta. 2022 Mar 1;1(1):e4. doi: 10.1002/imt2.4. eCollection 2022 Mar. Imeta. 2022. PMID: 38867733 Free PMC article. Review.
Cited by
-
Label-guided seed-chain-extend alignment on annotated De Bruijn graphs.Bioinformatics. 2024 Jun 28;40(Suppl 1):i337-i346. doi: 10.1093/bioinformatics/btae226. Bioinformatics. 2024. PMID: 38940164 Free PMC article.
-
Chaining for accurate alignment of erroneous long reads to acyclic variation graphs.Bioinformatics. 2023 Aug 1;39(8):btad460. doi: 10.1093/bioinformatics/btad460. Bioinformatics. 2023. PMID: 37494467 Free PMC article.
-
Pangenome comparison via ED strings.Front Bioinform. 2024 Sep 26;4:1397036. doi: 10.3389/fbinf.2024.1397036. eCollection 2024. Front Bioinform. 2024. PMID: 39391331 Free PMC article.
-
Haplotype-aware sequence alignment to pangenome graphs.Genome Res. 2024 Oct 11;34(9):1265-1275. doi: 10.1101/gr.279143.124. Genome Res. 2024. PMID: 39013594 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous