Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Dec 1;27(23):3259-65.
doi: 10.1093/bioinformatics/btr562. Epub 2011 Oct 13.

Fast scaffolding with small independent mixed integer programs

Affiliations

Fast scaffolding with small independent mixed integer programs

Leena Salmela et al. Bioinformatics. .

Abstract

Motivation: Assembling genomes from short read data has become increasingly popular, but the problem remains computationally challenging especially for larger genomes. We study the scaffolding phase of sequence assembly where preassembled contigs are ordered based on mate pair data.

Results: We present MIP Scaffolder that divides the scaffolding problem into smaller subproblems and solves these with mixed integer programming. The scaffolding problem can be represented as a graph and the biconnected components of this graph can be solved independently. We present a technique for restricting the size of these subproblems so that they can be solved accurately with mixed integer programming. We compare MIP Scaffolder to two state of the art methods, SOPRA and SSPACE. MIP Scaffolder is fast and produces better or as good scaffolds as its competitors on large genomes.

Availability: The source code of MIP Scaffolder is freely available at http://www.cs.helsinki.fi/u/lmsalmel/mip-scaffolder/.

Contact: leena.salmela@cs.helsinki.fi.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The four possible ways of linking two contigs with a mate pair and the bidirected edge representing each case.
Fig. 2.
Fig. 2.
The two possible placements of contigs i and j when an edge with orientation corresponding to the top left case in Figure 1 is used in a scaffolding. The top figure shows the case when both contigs are forward oriented and the bottom one shows the case when they are reverse oriented.
Fig. 3.
Fig. 3.
Examples of (w, 0.5)-consistent and inconsistent mate pair mappings. The arrows denote contigs and the short lines mate pairs. The mate pair ends are connected with arcs. The mate pairs with solid line arcs connecting them are consistent and the mate pairs with dashed line arcs connecting them are not consistent.
Fig. 4.
Fig. 4.
The effect of filtering (w,p)-consistent mappings for the E.coli dataset. Here we used the maximum component size of 100 edges.
Fig. 5.
Fig. 5.
The effect of maximum component size for (w,p)-consistent mappings for various combinations of w and p for the E.coli dataset.

References

    1. Abouelhoda M. Proceedings of SPIRE′07. Vol. 4726. Heidelberg: Springer; 2007. A chaining algorithm for mapping cdna sequences to multiple genomic sequences; pp. 1–13. of LNCS.
    1. Boetzer M., et al. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics. 2011;27:578–579. - PubMed
    1. Butler J., et al. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 2008;18:810–820. - PMC - PubMed
    1. Dayarian A., et al. SOPRA: scaffolding algorithm for paired reads via statistical optimization. BMC Bioinformatics. 2010;11:345. - PMC - PubMed
    1. Gnerre S., et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl Acad. Sci. USA. 2011;108:1513–1518. - PMC - PubMed

Publication types

LinkOut - more resources