Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jul 15;26(14):1699-703.
doi: 10.1093/bioinformatics/btq268. Epub 2010 May 30.

Gap5--editing the billion fragment sequence assembly

Affiliations

Gap5--editing the billion fragment sequence assembly

James K Bonfield et al. Bioinformatics. .

Abstract

Motivation: Existing sequence assembly editors struggle with the volumes of data now readily available from the latest generation of DNA sequencing instruments.

Results: We describe the Gap5 software along with the data structures and algorithms used that allow it to be scalable. We demonstrate this with an assembly of 1.1 billion sequence fragments and compare the performance with several other programs. We analyse the memory, CPU, I/O usage and file sizes used by Gap5.

Availability and implementation: Gap5 is part of the Staden Package and is available under an Open Source licence from http://staden.sourceforge.net. It is implemented in C and Tcl/Tk. Currently it works on Unix systems only.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Binning tree containing sequences from two libraries (represented by solid and dashed lines). Information about the sequence positions and pairings is stored in the bin records, while the sequence names, DNA and qualities are held in the sequence records.
Fig. 2.
Fig. 2.
Contig editor, showing quality values by gray scales and mismatches to the consensus by base color.
Fig. 3.
Fig. 3.
Template display showing a mapped assembly with a short insert Illumina library and a long insert capillary library. The Y-axis here shows insert size, while the X-axis is the position within the contig. A genomic insertion is visible at around 5 kb, identified by the jump in average insert size for the Illumina library. Also visible is the filter subwindow. The template colors used are red: inconsistent read-pair orientation; blue: single-ended template; orange: template spanning two contigs; otherwise gray-scale: the mapping quality of the DNA fragments.

References

    1. Arner E, et al. NGSView: an extensible open source editor for next-generation sequencing data. Bioinformatics. 2010;26:125–126. - PMC - PubMed
    1. Bao H, et al. MapView: visualization of short reads alignment on a desktop computer. Bioinformatics. 2009;25:1554–1555. - PubMed
    1. Bentley DR, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–59. - PMC - PubMed
    1. Bonfield JK, et al. A new DNA sequence assembly program. Nucleic Acids Res. 1995;23:4992–4999. - PMC - PubMed
    1. Chain PSG, et al. Genome project standards in a new era of sequencing. Science. 2009;326:236–237. - PMC - PubMed

Publication types