Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2011 Nov;18(11):1681-91.
doi: 10.1089/cmb.2011.0170. Epub 2011 Sep 19.

Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences

Affiliations
Comparative Study

Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences

Song Gao et al. J Comput Biol. 2011 Nov.

Abstract

Scaffolding, the problem of ordering and orienting contigs, typically using paired-end reads, is a crucial step in the assembly of high-quality draft genomes. Even as sequencing technologies and mate-pair protocols have improved significantly, scaffolding programs still rely on heuristics, with no guarantees on the quality of the solution. In this work, we explored the feasibility of an exact solution for scaffolding and present a first tractable solution for this problem (Opera). We also describe a graph contraction procedure that allows the solution to scale to large scaffolding problems and demonstrate this by scaffolding several large real and synthetic datasets. In comparisons with existing scaffolders, Opera simultaneously produced longer and more accurate scaffolds demonstrating the utility of an exact approach. Opera also incorporates an exact quadratic programming formulation to precisely compute gap sizes (Availability: http://sourceforge.net/projects/operasf/ ).

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
Paired-read and scaffold graph. (a) Paired-read constraints on order and orientation of contigs (pointed boxes). (b) A set of paired-reads and contigs. (c) The resulting scaffold graph. (d) A scaffold for the graph in (c).
FIG. 2.
FIG. 2.
An algorithm for generating a scaffold for a bounded-width scaffold graph.
FIG. 3.
FIG. 3.
An algorithm for generating a scaffold with at most p discordant edges.
FIG. 4.
FIG. 4.
Contracting the Scaffold Graph. (a) Original scaffold graph G. (b) A fenced subgraph of G (with optimal scaffolds ″3 4 − 5″ and ″8 − 9 10″). (c) The new graph after contraction of optimal scaffolds for the subgraph in (b).
FIG. 5.
FIG. 5.
A recursive graph contraction based algorithm to compute an optimal scaffold for a scaffold graph G.

References

    1. Aparicio S. Chapma J. Stupka E., et al. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science. 2002;297:1301–1310. - PubMed
    1. Chaisson M.J. Brinza D. Pevzner P.A. De novo fragment assembly with short mate-paired reads: does the read length matter? Genome Res. 2009;19:336–346. - PMC - PubMed
    1. Dayarian A. Michael T.P. Sengupta A.M. SOPRA: scaffolding algorithm for paired reads via statistical optimization. BMC Bioinform. 2010;11:345. - PMC - PubMed
    1. Eid J. Fehr A. Gray J., et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323:133–138. - PubMed
    1. Husemann P. Stoye J. Phylogenetic comparative assembly. Alg. Mol. Biol. 2010;5:3. - PMC - PubMed

Publication types

LinkOut - more resources