On the complexity of Minimum Path Cover with Subpath Constraints for multi-assembly
- PMID: 25252805
- PMCID: PMC4168716
- DOI: 10.1186/1471-2105-15-S9-S5
On the complexity of Minimum Path Cover with Subpath Constraints for multi-assembly
Abstract
Background: Multi-assembly problems have gathered much attention in the last years, as Next-Generation Sequencing technologies have started being applied to mixed settings, such as reads from the transcriptome (RNA-Seq), or from viral quasi-species. One classical model that has resurfaced in many multi-assembly methods (e.g. in Cufflinks, ShoRAH, BRANCH, CLASS) is the Minimum Path Cover (MPC) Problem, which asks for the minimum number of directed paths that cover all the nodes of a directed acyclic graph. The MPC Problem is highly popular because the acyclicity of the graph ensures its polynomial-time solvability.
Results: In this paper, we consider two generalizations of it dealing with integrating constraints arising from long reads or paired-end reads; these extensions have also been considered by two recent methods, but not fully solved. More specifically, we study the two problems where also a set of subpaths, or pairs of subpaths, of the graph have to be entirely covered by some path in the MPC. We show that in the case of long reads (subpaths), the generalized problem can be solved in polynomial-time by a reduction to the classical MPC Problem. We also consider the weighted case, and show that it can be solved in polynomial-time by a reduction to a min-cost circulation problem. As a side result, we also improve the time complexity of the classical minimum weight MPC Problem. In the case of paired-end reads (pairs of subpaths), the generalized problem becomes NP-hard, but we show that it is fixed-parameter tractable (FPT) in the total number of constraints. This computational dichotomy between long reads and paired-end reads is also a general insight into multi-assembly problems.
Figures





Similar articles
-
MIDAS: Mining differentially activated subpaths of KEGG pathways from multi-class RNA-seq data.Methods. 2017 Jul 15;124:13-24. doi: 10.1016/j.ymeth.2017.05.026. Epub 2017 Jun 1. Methods. 2017. PMID: 28579402
-
Explaining a Weighted DAG with Few Paths for Solving Genome-Guided Multi-Assembly.IEEE/ACM Trans Comput Biol Bioinform. 2015 Nov-Dec;12(6):1345-54. doi: 10.1109/TCBB.2015.2418753. IEEE/ACM Trans Comput Biol Bioinform. 2015. PMID: 26671806
-
Safety in Multi-Assembly via Paths Appearing in All Path Covers of a DAG.IEEE/ACM Trans Comput Biol Bioinform. 2022 Nov-Dec;19(6):3673-3684. doi: 10.1109/TCBB.2021.3131203. Epub 2022 Dec 8. IEEE/ACM Trans Comput Biol Bioinform. 2022. PMID: 34847041
-
Flow Decomposition With Subpath Constraints.IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):360-370. doi: 10.1109/TCBB.2022.3147697. Epub 2023 Feb 3. IEEE/ACM Trans Comput Biol Bioinform. 2023. PMID: 35104222
-
The present and future of de novo whole-genome assembly.Brief Bioinform. 2018 Jan 1;19(1):23-40. doi: 10.1093/bib/bbw096. Brief Bioinform. 2018. PMID: 27742661 Review.
Cited by
-
Evaluating approaches to find exon chains based on long reads.Brief Bioinform. 2018 May 1;19(3):404-414. doi: 10.1093/bib/bbw137. Brief Bioinform. 2018. PMID: 28069635 Free PMC article.
-
Strawberry: Fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq.PLoS Comput Biol. 2017 Nov 27;13(11):e1005851. doi: 10.1371/journal.pcbi.1005851. eCollection 2017 Nov. PLoS Comput Biol. 2017. PMID: 29176847 Free PMC article.
-
CircAST: Full-length Assembly and Quantification of Alternatively Spliced Isoforms in Circular RNAs.Genomics Proteomics Bioinformatics. 2019 Oct;17(5):522-534. doi: 10.1016/j.gpb.2019.03.004. Epub 2020 Jan 31. Genomics Proteomics Bioinformatics. 2019. PMID: 32007626 Free PMC article.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous