Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan;9(1):mgen000910.
doi: 10.1099/mgen.0.000910.

Comparison of R9.4.1/Kit10 and R10/Kit12 Oxford Nanopore flowcells and chemistries in bacterial genome reconstruction

Affiliations

Comparison of R9.4.1/Kit10 and R10/Kit12 Oxford Nanopore flowcells and chemistries in bacterial genome reconstruction

Nicholas D Sanderson et al. Microb Genom. 2023 Jan.

Abstract

Complete, accurate, cost-effective, and high-throughput reconstruction of bacterial genomes for large-scale genomic epidemiological studies is currently only possible with hybrid assembly, combining long- (typically using nanopore sequencing) and short-read (Illumina) datasets. Being able to use nanopore-only data would be a significant advance. Oxford Nanopore Technologies (ONT) have recently released a new flowcell (R10.4) and chemistry (Kit12), which reportedly generate per-read accuracies rivalling those of Illumina data. To evaluate this, we sequenced DNA extracts from four commonly studied bacterial pathogens, namely Escherichia coli, Klebsiella pneumoniae, Pseudomonas aeruginosa and Staphylococcus aureus, using Illumina and ONT's R9.4.1/Kit10, R10.3/Kit12, R10.4/Kit12 flowcells/chemistries. We compared raw read accuracy and assembly accuracy for each modality, considering the impact of different nanopore basecalling models, commonly used assemblers, sequencing depth, and the use of duplex versus simplex reads. 'Super accuracy' (sup) basecalled R10.4 reads - in particular duplex reads - have high per-read accuracies and could be used to robustly reconstruct bacterial genomes without the use of Illumina data. However, the per-run yield of duplex reads generated in our hands with standard sequencing protocols was low (typically <10 %), with substantial implications for cost and throughput if relying on nanopore data only to enable bacterial genome reconstruction. In addition, recovery of small plasmids with the best-performing long-read assembler (Flye) was inconsistent. R10.4/Kit12 combined with sup basecalling holds promise as a singular sequencing technology in the reconstruction of commonly studied bacterial genomes, but hybrid assembly (Illumina+R9.4.1 hac) currently remains the highest throughput, most robust, and cost-effective approach to fully reconstruct these bacterial genomes.

Keywords: Genome sequencing; hybrid assembly; long-read assembly.

PubMed Disclaimer

Conflict of interest statement

Oxford Nanopore Technologies supplied the R10.3 and R10.4 flowcells free of charge for this study. They were also involved in discussions regarding which data processing approaches to use to optimise basecalling and assembly outputs; however, they did not impact on the presentation of any of the results.

Figures

Fig. 1.
Fig. 1.
Experimental workflow.
Fig. 2.
Fig. 2.
Read length distributions by (a) modality and (b) by modality and species. Boxplots reflect median (central line) and IQR (box hinges) values, whiskers the smallest and largest values 1.5*IQR, and dots the outlying points beyond these ranges. Note the y-axis is a log-scale. Median differences in read length were significant across the whole dataset (Kruskal-Wallis test, P<0.001); other significance values represent comparisons with the median read length for R9.4 hac as the reference category (two-sample Wilcoxon test, ‘ns’ - not significant, ‘****' - P<0.001). (a) Modality (b) Modality and species.
Fig. 3.
Fig. 3.
Median and modal raw read accuracy (% identity when reads are mapped to the Illumina-corrected reference) for each of the major nanopore sequencing sequencing modalities, flowcells/kit and basecalling combinations. Reads matching to the reference with <75 % identity have been excluded. Complete details summarising all accuracies across all modality, flowcell/kit and basecalling combinations, and stratified by species are represented in Supplementary Table S6.
Fig. 4.
Fig. 4.
Number of insertions (panel A) and deletions (panel B) amongst reads mapped to the Illumina-corrected reference for all sequencing modalities. (a) Insertions (b) Deletions.
Fig. 5.
Fig. 5.
Assembly reference coverage percentage (%) by sequencing modality, assembler and species. Panel A represents the data for chromosomes and panel B evaluations for the five plasmids known to occur in the K. pneumoniae reference strain (labelled by their lengths in bp). Data shown for complete data only (i.e. no sub-sampling performed). (a) Chromosomes (b) Plasmids.
Fig. 6.
Fig. 6.
Assembly accuracy by sequencing modality, assembly strategy and species. Accuracy evaluated on the basis of contig comparisons to Illumina-corrected references using dnadiff, for (a) Indels, and (b) SNPs. NB - SPAdes was only used on Illumina data, and Unicycler hybrid assembly was only performed on R9.4.1+Illumina data. For R10.4, data presented are those from unplexed runs. Dashed black vertical line indicates a threshold of 1 error/100 kb. (a) Indel errors (b) Single nucleotide-level errors.
Fig. 7.
Fig. 7.
Impact of subsampling of long-read datasets on assembly accuracy. Presented here by species for Indels (top panels), and SNPs (lower panels). For ease of representation, only data for Flye assemblies polished with one round of Medaka are shown, as the effects of additional polishing was shown to be marginal for most modalities (Fig. S6, Table S7). Data for 10× long-read coverage is omitted for Canu assemblies as this coverage was considered too low for default settings and was unlikely to improve results. (a) E. coli (b) K. pneumoniae (chromosome only) (c) P. aeruginosa (d) S. aureus .
Fig. 8.
Fig. 8.
Coding sequence (CDS) recovery on the basis of exact CDS (amino acid sequence) matches with respect to the Prokka-annotated Illumina-corrected reference (chromosome +all plasmids for K. pneumoniae ). Plot shows the percentage of reference coding sequences missed by each modality. For long-read data only Flye assemblies with one round of polishing with Medaka are shown; for R10.3 and R10.4 datasets these were from non-multiplexed evaluations (i.e. only single extracts per flowcell). For Unicycler, the assembly using R.9.4 hac +Illumina data is shown. The total number of coding sequences missed by each approach is shown as a number at the top of each bar.

References

    1. Van Goethem N, Descamps T, Devleesschauwer B, Roosens NHC, Boon NAM, et al. Status and potential of bacterial genomics for public health practice: a scoping review. Implement Sci. 2019;14:79. doi: 10.1186/s13012-019-0930-2. - DOI - PMC - PubMed
    1. Shaw LP, Chau KK, Kavanagh J, AbuOun M, Stubberfield E, et al. Niche and local geography shape the pangenome of wastewater- and livestock-associated Enterobacteriaceae . Sci Adv. 2021;7:15. doi: 10.1126/sciadv.abe3868. - DOI - PMC - PubMed
    1. Arredondo-Alonso S, Pöntinen AK, Cléon F, Gladstone RA, Schürch AC, et al. A high-throughput multiplexing and selection strategy to complete bacterial genomes. Gigascience. 2021;10:giab079. doi: 10.1093/gigascience/giab079. - DOI - PMC - PubMed
    1. Lipworth S, Pickford H, Sanderson N, Chau KK, Kavanagh J, et al. Optimized use of Oxford Nanopore flowcells for hybrid assemblies. Microb Genom. 2020;6:11. doi: 10.1099/mgen.0.000453. - DOI - PMC - PubMed
    1. Oxford Nanopore Technologies. https://nanoporetech.com/about-us/news/r103-newest-nanopore-high-accurac... n.d.

Publication types

MeSH terms

LinkOut - more resources