Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Feb 14:13:31.
doi: 10.1186/1471-2105-13-31.

PANDAseq: paired-end assembler for illumina sequences

Affiliations

PANDAseq: paired-end assembler for illumina sequences

Andre P Masella et al. BMC Bioinformatics. .

Abstract

Background: Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information.

Results: PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods.

Conclusions: PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over naïve assembly with negligible loss of "good" sequence.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic of paired-end assembly. Typical scenario: forward and reverse reads are overlapped and the primer regions are removed to reconstruct the sequences. Highly overlapping scenario: for short templates, the overlapping region may include the primer regions.
Figure 2
Figure 2
Quality scores of assembled masked data. A perfect 16S rRNA sequence from Sinorhizobium meliloti was masked using real Illumina quality scores and the resulting paired-end sequences were assembled with PANDAseq. A histogram of quality scores for the assembled sequences is shown.
Figure 3
Figure 3
Comparison of output of various assemblers. A scatter plot of the percentage of paired-end sequence assemblies from sequenced V3-region amplicons of Methylococcus capsulatus strain Bath against the average number of mismatching nucleotides between the assembled sequence and the reference sequence. The comparison was done between PANDAseq and three alternative assemblers (see text).
Figure 4
Figure 4
Rank abundance curves for control libraries. Rank-abundance curves for defined multi-organism libraries [1] assembled at two different quality thresholds using PANDAseq and naïve assembly followed by clustering with CD-HIT into OTUs of 97% identity.

References

    1. Bartram AK, Lynch MDJ, Stearns JC, Moreno-Hagelsieb G, Neufeld JD. Generation of multimillion-sequence 16S rRNA gene libraries from complex microbial communities by assembling paired-end Illumina reads. Appl Environ Microbiol. 2011;77:3846–3852. doi: 10.1128/AEM.02772-10. http://aem.asm.org/cgi/content/abstract/77/11/3846 - DOI - PMC - PubMed
    1. Gloor GB, Hummelen R, Macklaim JM, Dickson RJ, Fernandes AD, MacPhee R, Reid G. Microbiome Profiling by Illumina sequencing of combinatorial sequence-tagged PCR products. PLoS ONE. 2010;5:e15406. doi: 10.1371/journal.pone.0015406. - DOI - PMC - PubMed
    1. Degnan PH, Ochman H. Illumina-based analysis of microbial community diversity. ISME J. 2011. http://www.nature.com/ismej/journal/v6/n1/full/ismej201174a.html - PMC - PubMed
    1. Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Lozupone CA, Turnbaugh PJ, Fierer N, Knight R. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc Natl Acad Sci USA. 2011;108(Suppl 1):4516–4522. http://genomebiology.com/2011/12/5/R50 - PMC - PubMed
    1. Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48:443–453. doi: 10.1016/0022-2836(70)90057-4. - DOI - PubMed

Publication types