. 2014 Sep 1;30(17):2447-55.

doi: 10.1093/bioinformatics/btu317. Epub 2014 May 9.

Efficient RNA isoform identification and quantification from RNA-Seq data with network flows

Elsa Bernard¹, Laurent Jacob², Julien Mairal², Jean-Philippe Vert¹

Affiliations

¹ Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Laboratoire Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, INRA, UMR5558, Villeurbanne, France and LEAR Project-Team, INRIA Grenoble Rhône Alpes, 38330 Montbonnot, France Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Laboratoire Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, INRA, UMR5558, Villeurbanne, France and LEAR Project-Team, INRIA Grenoble Rhône Alpes, 38330 Montbonnot, France Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Laboratoire Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, INRA, UMR5558, Villeurbanne, France and LEAR Project-Team, INRIA Grenoble Rhône Alpes, 38330 Montbonnot, France.
² Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Laboratoire Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, INRA, UMR5558, Villeurbanne, France and LEAR Project-Team, INRIA Grenoble Rhône Alpes, 38330 Montbonnot, France.

PMID: 24813214
PMCID: PMC4147886
DOI: 10.1093/bioinformatics/btu317

Efficient RNA isoform identification and quantification from RNA-Seq data with network flows

Elsa Bernard et al. Bioinformatics. 2014.

. 2014 Sep 1;30(17):2447-55.

doi: 10.1093/bioinformatics/btu317. Epub 2014 May 9.

Authors

Elsa Bernard¹, Laurent Jacob², Julien Mairal², Jean-Philippe Vert¹

Affiliations

¹ Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Laboratoire Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, INRA, UMR5558, Villeurbanne, France and LEAR Project-Team, INRIA Grenoble Rhône Alpes, 38330 Montbonnot, France Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Laboratoire Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, INRA, UMR5558, Villeurbanne, France and LEAR Project-Team, INRIA Grenoble Rhône Alpes, 38330 Montbonnot, France Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Laboratoire Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, INRA, UMR5558, Villeurbanne, France and LEAR Project-Team, INRIA Grenoble Rhône Alpes, 38330 Montbonnot, France.
² Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Laboratoire Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, INRA, UMR5558, Villeurbanne, France and LEAR Project-Team, INRIA Grenoble Rhône Alpes, 38330 Montbonnot, France.

PMID: 24813214
PMCID: PMC4147886
DOI: 10.1093/bioinformatics/btu317

Abstract

Motivation: Several state-of-the-art methods for isoform identification and quantification are based on [Formula: see text]-regularized regression, such as the Lasso. However, explicitly listing the-possibly exponentially-large set of candidate transcripts is intractable for genes with many exons. For this reason, existing approaches using the [Formula: see text]-penalty are either restricted to genes with few exons or only run the regression algorithm on a small set of preselected isoforms.

Results: We introduce a new technique called FlipFlop, which can efficiently tackle the sparse estimation problem on the full set of candidate isoforms by using network flow optimization. Our technique removes the need of a preselection step, leading to better isoform identification while keeping a low computational cost. Experiments with synthetic and real RNA-Seq data confirm that our approach is more accurate than alternative methods and one of the fastest available.

Availability and implementation: Source code is freely available as an R package from the Bioconductor Web site (http://www.bioconductor.org/), and more information is available at http://cbio.ensmp.fr/flipflop.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
Illustration of the graph construction for a gene with 5 exons. The original splicing graph is represented in (a). The 5 exons are represented as vertices and an arrow between two vertices indicates a junction. The nodes of graph $G'$ in (b) and (c) are bins with positive effective length denoted by gray square, as well as source s and sink t represented as circles. $G'$ in (b) is the resulting graph when all exons are bigger than the read length. In that case, each bin either corresponds to a unique exon, or to a junction between two exons. $G'$ in (c) is the resulting graph when the length of exon 3 is smaller than the read length. Some bins involve then more than two exons, here bins $(2 - 3 - 4)$ and $(2 - 3 - 5)$ . The source links all possible starting bins, and conversely all possible stopping bins are linked to the sink. There is a one-to-one correspondence between (s, t)-paths in $G'$ (paths starting at s and ending at t) and isoform candidates. For example, the path $(s, 1, 1 - 4, 4, 4 - 5, 5, t)$ corresponds to isoform 1-4-5

**Fig. 2.**
Flow interpretation of isoforms using the same graph as in Figure 1b. For the sake of clarity, some edges connecting s and t to internal nodes are not represented, and the length of the different bins is assumed to be equal. In (a), one unit of flow is carried along the path in red, corresponding to an isoform with abundance 1. In (b), another isoform with abundance 3 is added, yielding additional read counts at every node

**Fig. 3.**
Precision and recall of compared methods on simulated reads from the UCSC annotated human transcripts. (a) Single-end reads with different lengths (100, 200, 300 bp). (b) Paired-end reads with different lengths (100, 125, 150, 175 bp). (c) Single-end reads with a fixed 150 bp length and an increasing amout of material (1, 5, 10 million)

**Fig. 4.**
Average CPU times in milliseconds (logarithmic scale) for the compared methods with process a gene from human simulated 100 bp single-end reads

**Fig. 5.**
Precision and recall of compared methods on human embryonic stem cells data

See this image and copyright information in PMC

References

1. Ahuja RK, et al. Network Flows: Theory, Algorithms, and Applications. Englewood Cliffs, New Jersey: Prentice Hall; 1993.
1. Behr J, et al. Mitie: simultaneous RNA-Seq based transcript identification and quantification in multiple samples. Bioinformatics. 2013;29:2529–2538. - PMC - PubMed
1. Bertsekas DP. Network Optimization: Continuous and Discrete Models. Athena Scientific, Belmont, Massachusetts: Athena Scientific; 1998.
1. Bohnert R, Rätsch G. rQuant.web: a tool for RNA-Seq-based transcript quantitation. Nucleic Acids Res. 2010;38:W348–W351. - PMC - PubMed
1. Goldberg AV. An efficient implementation of a scaling minimum-cost flow algorithm. J. Algorithm. 1997;22:1–29.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Efficient RNA isoform identification and quantification from RNA-Seq data with network flows

Affiliations

Efficient RNA isoform identification and quantification from RNA-Seq data with network flows

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases