Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Oct 1:14:670.
doi: 10.1186/1471-2164-14-670.

An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome

Affiliations

An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome

Marco Ferrarini et al. BMC Genomics. .

Abstract

Background: Second generation sequencing has permitted detailed sequence characterisation at the whole genome level of a growing number of non-model organisms, but the data produced have short read-lengths and biased genome coverage leading to fragmented genome assemblies. The PacBio RS long-read sequencing platform offers the promise of increased read length and unbiased genome coverage and thus the potential to produce genome sequence data of a finished quality containing fewer gaps and longer contigs. However, these advantages come at a much greater cost per nucleotide and with a perceived increase in error-rate. In this investigation, we evaluated the performance of the PacBio RS sequencing platform through the sequencing and de novo assembly of the Potentilla micrantha chloroplast genome.

Results: Following error-correction, a total of 28,638 PacBio RS reads were recovered with a mean read length of 1,902 bp totalling 54,492,250 nucleotides and representing an average depth of coverage of 320× the chloroplast genome. The dataset covered the entire 154,959 bp of the chloroplast genome in a single contig (100% coverage) compared to seven contigs (90.59% coverage) recovered from an Illumina data, and revealed no bias in coverage of GC rich regions. Post-assembly the data were largely concordant with the Illumina data generated and allowed 187 ambiguities in the Illumina data to be resolved. The additional read length also permitted small differences in the two inverted repeat regions to be assigned unambiguously.

Conclusions: This is the first report to our knowledge of a chloroplast genome assembled de novo using PacBio sequence data. The PacBio RS data generated here were assembled into a single large contig spanning the P. micrantha chloroplast genome, with a higher degree of accuracy than an Illumina dataset generated at a much greater depth of coverage, due to longer read lengths and lower GC bias in the data. The results we present suggest PacBio data will be of immense utility for the development of genome sequence assemblies containing fewer unresolved gaps and ambiguities and a significantly smaller number of contigs than could be produced using short-read sequence data alone.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Sequence data coverage of the P. micrantha chloroplast genome. Schematic diagram showing the coverage of the P. micrantha chloroplast genome by the seven Illumina contigs (black) and a single PacBio contig (green) following assembly using ABySS and Celera assembler respectively. The red line across the top of the schematic represents the P. micrantha chloroplast genome sequence, blue bold sections indicate the inverted repeat regions of the genome. Sections of contig 1 from both the Illumina and PacBio assemblies corresponding to the non-unique section of the IR are shown in red. Illumina contig 1 spans the start/end point of the linear representation of the circular chloroplast genome.
Figure 2
Figure 2
Base-per-base coverage of the P. micrantha chloroplast genome. Graph showing the base per base depth of sequencing coverage across the P. micrantha chloroplast genome with (a) Illumina (black) and PacBio (green) data and (b) PacBio data only, revealing a more uniform coverage of PacBio data across the genome despite the substantially lower depth of coverage, and regions of the genome with poor or zero coverage in the Illumina dataset. The two regions of significantly greater coverage in both datasets represent the two inverted repeat regions.
Figure 3
Figure 3
Determination of percentage GC bias in the Illumina and PacBio datasets. Percentage of mean depth of coverage across 987 windows of 157 nucleotides plotted as a function of percentage GC content for (a) Illumina (black) and (b) PacBio (green) data showing a much stronger positive dependency within the Illumina data (Pearsons correlation coefficient = 0.61 p-value = 2.2e-16) than in the PacBio data (Pearsons correlation coefficient = 0.23 p-value = 5.675e-09). For the purposes of the calculation, high coverage data from the two inverted repeat regions were excluded.
Figure 4
Figure 4
The P. micrantha chloroplast genome sequence. Structural organisation of gene content of the P. micrantha chloroplast genome detailing genes transcribed clockwise inside the circle and genes transcribed counter-clockwise outside the circle. Genes coloured according to functional categorisation, inner circle indicates mean percentage GC content across the genome. IRa and IRb denote inverted repeat regions, LSC and SSC denote long and short single copy regions respectively. Genome map plotted using OGDRAW [15].

References

    1. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen ZT. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2006;441:120–120. - PMC - PubMed
    1. Fuller CW, Middendorf LR, Benner SA, Church GM, Harris T, Huang X, Jovanovich SB, Nelson JR, Schloss JA, Schwartz DC. et al. The challenges of sequencing by synthesis. Nature Biotechnol. 2009;27:1013–1023. doi: 10.1038/nbt.1585. - DOI - PubMed
    1. Dohm JC, Lottaz C, Borodina T, Himmelbauer H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 2008;36:e105. doi: 10.1093/nar/gkn425. - DOI - PMC - PubMed
    1. Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B. et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323:133–138. doi: 10.1126/science.1162986. - DOI - PubMed
    1. Rasko DA, Webster DR, Sahl JW, Bashir A, Boisen N, Scheutz F, Paxinos EE, Sebra R, Chin C-S, Iliopoulos D. et al. Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. New England J Med. 2011;365:709–717. doi: 10.1056/NEJMoa1106920. - DOI - PMC - PubMed

Publication types

LinkOut - more resources