Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jun 20:15:211.
doi: 10.1186/1471-2105-15-211.

SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information

Affiliations

SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information

Marten Boetzer et al. BMC Bioinformatics. .

Abstract

Background: The recent introduction of the Pacific Biosciences RS single molecule sequencing technology has opened new doors to scaffolding genome assemblies in a cost-effective manner. The long read sequence information is promised to enhance the quality of incomplete and inaccurate draft assemblies constructed from Next Generation Sequencing (NGS) data.

Results: Here we propose a novel hybrid assembly methodology that aims to scaffold pre-assembled contigs in an iterative manner using PacBio RS long read information as a backbone. On a test set comprising six bacterial draft genomes, assembled using either a single Illumina MiSeq or Roche 454 library, we show that even a 50× coverage of uncorrected PacBio RS long reads is sufficient to drastically reduce the number of contigs. Comparisons to the AHA scaffolder indicate our strategy is better capable of producing (nearly) complete bacterial genomes.

Conclusions: The current work describes our SSPACE-LongRead software which is designed to upgrade incomplete draft genomes using single molecule sequences. We conclude that the recent advances of the PacBio sequencing technology and chemistry, in combination with the limited computational resources required to run our program, allow to scaffold genomes in a fast and reliable manner.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The effect of PacBio RS long read coverage on genome closure. Results are displayed for SSPACE-LongRead based on the CLCbio draft assembly for 5 organisms. For all samples the addition of PacBio reads has a positive effect and leads to a significant contig reduction. In general a 50× coverage is sufficient to scaffold over most gaps, though ideally a 110-160× coverage is required to guarantee an optimal performance of our software. Arguably a higher coverage (>160×) leads to more fragmented genomes, which is likely due to the increased complexity of the assembly graph.
Figure 2
Figure 2
Overview of the SSPACE-LongRead scaffolding algorithm. A) The input consists of a set of pre-assembled contigs (or scaffolds) in FASTA format and a set of PacBio CLR reads (in FASTA or FASTQ format). B) The PacBio CLR reads are aligned against the contigs using BLASR and only the best alignment matches are kept. In red a repeated element is indicated. C) Contig pairings and multi-contig linkage information is stored, from this information also repeated elements are detected. D) Based on the pairing and linkage information, contigs are ordered, oriented and connected into scaffolds. E) A post-processing step performs the final linearization and circularization.

References

    1. Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–829. doi: 10.1101/gr.074492.107. - DOI - PMC - PubMed
    1. Li R, Fan W, Tian G, Zhu H, He L, Cai J, Huang Q, Cai Q, Li B, Bai Y, Zhang Z, Zhang Y, Wang W, Li J, Wei F, Li H, Jian M, Li J, Zhang Z, Nielsen R, Li D, Gu W, Yang Z, Xuan Z, Ryder OA, Leung FC, Zhou Y, Cao J, Sun X, Fu Y. et al.The sequence and de novo assembly of the giant panda genome. Nature. 2010;463:311–317. doi: 10.1038/nature08696. - DOI - PMC - PubMed
    1. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009;19:1117–1123. doi: 10.1101/gr.089532.108. - DOI - PMC - PubMed
    1. Dayarian A, Michael TP, Sengupta AM. SOPRA: Scaffolding algorithm for paired reads via statistical optimization. BMC Bioinformatics. 2010;11:345. doi: 10.1186/1471-2105-11-345. - DOI - PMC - PubMed
    1. Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics. 2011;27:578–579. doi: 10.1093/bioinformatics/btq683. - DOI - PubMed

Publication types

LinkOut - more resources