Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct;29(10):2336-2348.
doi: 10.1105/tpc.17.00521. Epub 2017 Oct 12.

De Novo Assembly of a New Solanum pennellii Accession Using Nanopore Sequencing

Affiliations

De Novo Assembly of a New Solanum pennellii Accession Using Nanopore Sequencing

Maximilian H-W Schmidt et al. Plant Cell. 2017 Oct.

Abstract

Updates in nanopore technology have made it possible to obtain gigabases of sequence data. Prior to this, nanopore sequencing technology was mainly used to analyze microbial samples. Here, we describe the generation of a comprehensive nanopore sequencing data set with a median read length of 11,979 bp for a self-compatible accession of the wild tomato species Solanum pennellii We describe the assembly of its genome to a contig N50 of 2.5 MB. The assembly pipeline comprised initial read correction with Canu and assembly with SMARTdenovo. The resulting raw nanopore-based de novo genome is structurally highly similar to that of the reference S. pennellii LA716 accession but has a high error rate and was rich in homopolymer deletions. After polishing the assembly with Illumina reads, we obtained an error rate of <0.02% when assessed versus the same Illumina data. We obtained a gene completeness of 96.53%, slightly surpassing that of the reference S. pennellii Taken together, our data indicate that such long read sequencing data can be used to affordably sequence and assemble gigabase-sized plant genomes.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Characteristics of the S. pennellii Genome and Its Assembly. (A) Circos visualization of variant distribution between S. pennellii LYC1722 and S. pennellii LA716. Distribution of single nucleotide polymorphisms (outer layer) and InDels (middle layer) is compared with the gene density (inner layer) for each chromosome of S. pennellii LA716 based on generated Illumina data for S. pennellii LYC1722. (B) The effect of randomly downsampling pass reads on the N50 produced by different assemblers. (C) Discrepancies between the assembly and the Illumina data over several rounds of Pilon correction. Dotted lines approximate expected discrepancy rates if Illumina data were mapped to a perfect reference.
Figure 2.
Figure 2.
Violin Plots of Read Length per Library for Three Different Size-Selection Protocols. Read length distribution is shown for all 16 S. pennellii MinION libraries and the corresponding pass (blue) and failed (red) classified reads. Libraries are grouped by size selection protocol: (A) 15-kb cutoff, (B) 12 kb cutoff, and (C) 0.4x bead size selection. Filled dots indicate mean read length.
Figure 3.
Figure 3.
The 6-mer Counts in the Polished Assembly versus Those in the Raw Reads. The 6-mers were counted both in the polished assembly and in the raw reads. Each 6-mer represents counts to both itself and to its reverse complement, i.e., AAAAAA represents both AAAAAA and TTTTTT. Red indicates the new Albacore basecaller, whereas blue and gray dots represent the raw and Canu-corrected Metrichor data. In each case, a trend line is added.

Comment in

  • Nanopore Sequencing Comes to Plant Genomes.
    Hofmann NR. Hofmann NR. Plant Cell. 2017 Nov;29(11):2677-2678. doi: 10.1105/tpc.17.00863. Epub 2017 Nov 7. Plant Cell. 2017. PMID: 29114013 Free PMC article. No abstract available.

References

    1. Aflitos S., et al.; 100 Tomato Genome Sequencing Consortium (2014). Exploring genetic variation in the tomato (Solanum section lycopersicon) clade by whole-genome sequencing. Plant J. 80: 136–148. - PubMed
    1. Alseekh S., et al. (2015). Identification and mode of inheritance of quantitative trait loci for secondary metabolite abundance in tomato. Plant Cell 27: 485–512. - PMC - PubMed
    1. Berlin K., Koren S., Chin C.S., Drake J.P., Landolin J.M., Phillippy A.M. (2015). Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol. 33: 623–630. - PubMed
    1. Bolger A., et al. (2014a). The genome of the stress-tolerant wild tomato species Solanum pennellii. Nat. Genet. 46: 1034–1038. - PMC - PubMed
    1. Bolger A.M., Lohse M., Usadel B. (2014b). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114–2120. - PMC - PubMed

MeSH terms