Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov;29(11):1252-1267.
doi: 10.1089/cmb.2022.0257. Epub 2022 Oct 18.

Efficient Minimum Flow Decomposition via Integer Linear Programming

Affiliations

Efficient Minimum Flow Decomposition via Integer Linear Programming

Fernando H C Dias et al. J Comput Biol. 2022 Nov.

Abstract

Minimum flow decomposition (MFD) is an NP-hard problem asking to decompose a network flow into a minimum set of paths (together with associated weights). Variants of it are powerful models in multiassembly problems in Bioinformatics, such as RNA assembly. Owing to its hardness, practical multiassembly tools either use heuristics or solve simpler, polynomial time-solvable versions of the problem, which may yield solutions that are not minimal or do not perfectly decompose the flow. Here, we provide the first fast and exact solver for MFD on acyclic flow networks, based on Integer Linear Programming (ILP). Key to our approach is an encoding of all the exponentially many solution paths using only a quadratic number of variables. We also extend our ILP formulation to many practical variants, such as incorporating longer or paired-end reads, or minimizing flow errors. On both simulated and real-flow splicing graphs, our approach solves any instance in <13 seconds. We hope that our formulations can lie at the core of future practical RNA assembly tools. Our implementations are freely available on Github.

Keywords: flow decomposition; integer linear programming; multiassembly and RNA assembly; network flow.

PubMed Disclaimer

Conflict of interest statement

The authors declare they have no conflicting financial interests.

Figures

FIG. 1.
FIG. 1.
Example of a flow network and of two FDs of it. (a) A flow network. (b) A 3-flow decomposition into paths of weights (4, 2, 7). (c) A 4-flow decomposition into paths of weights (4, 2, 6, 1). FD, flow decomposition.
FIG. 2.
FIG. 2.
Example of the edge variables of the i-th path, satisfying Equations (3a–3c).
FIG. 3.
FIG. 3.
The flow network from Figure 1 with a subpath constraint (which is satisfied by the 4-FD from Fig. 1c, but not by the one in Fig. 1b), and example of a path satisfying the constraint. (a) A flow network with a single subpath constraint R1 = (a, c, t). (b) Constraint R1 is satisfied because for the i-th path we can set ri1 = 1 [and satisfy Eq. (8b)] so that xaci + xcti ≥ 2ri1 holds [and satisfy Eq. (8a)].

References

    1. Ahuja RK, Magnanti TL, Orlin JB. Network Flows: Theory, Algorithms, and Applications. Prentice-Hall, Inc., New Jersey, 1993.
    1. Amarasinghe SL, Su S, Dong X, et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biol 2020;21(1):30; doi: 10.1186/s13059-020-1935-5. - DOI - PMC - PubMed
    1. Baaijens JA, Stougie L, Schönhuth A. Strain-aware assembly of genomes from mixed samples using flow variation graphs. In: Schwartz, R., ed. Research in Computational Molecular Biology. RECOMB 2020. Lecture Notes in Computer Science, vol 12074. Springer, Cham. 2020; pp. 221–222. 10.1007/978-3-030-45257-5_14 - DOI
    1. Baaijens JA, Van der Roest B, Köster J, et al. Full-length de novo viral quasispecies assembly through variation graph construction. Bioinformatics 2019;35(24):5086–5094; doi: 10.1093/bioinformatics/btz443. - DOI - PubMed
    1. Bernard E, Jacob L, Mairal J, et al. Efficient RNA isoform identification and quantification from RNA-Seq data with network flows. Bioinformatics 2014;30(17):2447–2455; doi: 10.1093/bioinformatics/btu317. - DOI - PMC - PubMed

Publication types

LinkOut - more resources