Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014;15 Suppl 9(Suppl 9):S5.
doi: 10.1186/1471-2105-15-S9-S5. Epub 2014 Sep 10.

On the complexity of Minimum Path Cover with Subpath Constraints for multi-assembly

On the complexity of Minimum Path Cover with Subpath Constraints for multi-assembly

Romeo Rizzi et al. BMC Bioinformatics. 2014.

Abstract

Background: Multi-assembly problems have gathered much attention in the last years, as Next-Generation Sequencing technologies have started being applied to mixed settings, such as reads from the transcriptome (RNA-Seq), or from viral quasi-species. One classical model that has resurfaced in many multi-assembly methods (e.g. in Cufflinks, ShoRAH, BRANCH, CLASS) is the Minimum Path Cover (MPC) Problem, which asks for the minimum number of directed paths that cover all the nodes of a directed acyclic graph. The MPC Problem is highly popular because the acyclicity of the graph ensures its polynomial-time solvability.

Results: In this paper, we consider two generalizations of it dealing with integrating constraints arising from long reads or paired-end reads; these extensions have also been considered by two recent methods, but not fully solved. More specifically, we study the two problems where also a set of subpaths, or pairs of subpaths, of the graph have to be entirely covered by some path in the MPC. We show that in the case of long reads (subpaths), the generalized problem can be solved in polynomial-time by a reduction to the classical MPC Problem. We also consider the weighted case, and show that it can be solved in polynomial-time by a reduction to a min-cost circulation problem. As a side result, we also improve the time complexity of the classical minimum weight MPC Problem. In the case of paired-end reads (pairs of subpaths), the generalized problem becomes NP-hard, but we show that it is fixed-parameter tractable (FPT) in the total number of constraints. This computational dichotomy between long reads and paired-end reads is also a general insight into multi-assembly problems.

PubMed Disclaimer

Figures

Figure 1
Figure 1
In Fig. 1(a), an input DAG G. In Fig. 1(b), the reduction to the maximum matching problem: a bipartite graph B(G) having as vertices two copies of V(G) and an edge between the first copy of v1 ∈ V(G) and the second copy of v2 ∈ V(G) iff there is a directed path in G from v1 to v2. The edges of a maximum matching of B(G) are highlighted, and a MPC for G is obtained by putting v1 and v2 in the same path there is an edge between v1 and v2 and is selected by the maximum matching. In Fig. 1(c), a network flow N(G) corresponding to G; the labels '1' on some edges are the lower bounds on that edges; all other edges have lower bound 0. The min-flow on N(G) has value 2; the edges with flow value 1 are highlighted; any decomposition of this flow into paths gives a MPC.
Figure 2
Figure 2
In Fig. 2(a), subpath constraint P , in Fig. 2(b) the reduction of [20] which replaces path P by node vP connected to the end points of P , removes all internal nodes of P , and adds all transitive edges from and to v1 and v2. In Fig. 2(c), a case not covered by the reduction in [20].
Figure 3
Figure 3
A visual proof of Lemma 1.
Figure 4
Figure 4
In Fig. 4(a), an input DAG G with two subpath constraints P1 and P2; we take. V=VG, E=, S = {a, d, e} and T = {f, c}; weights are not drawn. In Fig. 4(b), the graph transformed by Steps 1-6; the vertices still in V are drawn as circles, other vertices as squares. In Fig. 4(c), the reduction to a min-cost circulation problem; the edges with flow lower bound 1 are labeled as '1'; other edges have flow lower bound 0. In a min-cost circulation of value 3, all highlighted edges have flow value 1, except for (f, t) with flow value 2, and (t, s) with value 3. Any decomposition of the min-cost circulation into 3 paths gives the solution for Problem MW-MPC-SC.
Figure 5
Figure 5
A reduction from chromatic number 3 to the MPC-PSC Problem.

Similar articles

Cited by

References

    1. Xing Y. et al.The multiassembly problem: reconstructing multiple transcript isoforms from EST fragment mixtures. Genome Research. 2004;14(3):426–441. doi: 10.1101/gr.1304504. - DOI - PMC - PubMed
    1. Mortazavi A. et al.Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods. 2008;5:621–628. doi: 10.1038/nmeth.1226. - DOI - PubMed
    1. Pepke S, Wold B, Mortazavi A. Computation for ChIP-seq and RNA-seq studies. Nature methods. 2009;6(11):22–32. - PMC - PubMed
    1. Kim E, Goren A, Ast G. Insights into the connection between cancer and alternative splicing. Trends in genetics: TIG. 2008;24(1):7–10. doi: 10.1016/j.tig.2007.10.001. - DOI - PubMed
    1. Lopez-Bigas N, Audit B, Ouzounis C, Parra G, Guigo R. Are splicing mutations the most frequent cause of hereditary disease? FEBS Letters. 2005;579(9):1900–1903. doi: 10.1016/j.febslet.2005.02.047. - DOI - PubMed

Publication types

MeSH terms