Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2019 May 9;14(5):e0216471.
doi: 10.1371/journal.pone.0216471. eCollection 2019.

NanoR: A user-friendly R package to analyze and compare nanopore sequencing data

Affiliations
Comparative Study

NanoR: A user-friendly R package to analyze and compare nanopore sequencing data

Davide Bolognini et al. PLoS One. .

Abstract

MinION and GridION X5 from Oxford Nanopore Technologies are devices for real-time DNA and RNA sequencing. On the one hand, MinION is the only real-time, low cost and portable sequencing device and, thanks to its unique properties, is becoming more and more popular among biologists; on the other, GridION X5, mainly for its costs, is less widespread but highly suitable for researchers with large sequencing projects. Despite the fact that Oxford Nanopore Technologies' devices have been increasingly used in the last few years, there is a lack of high-performing and user-friendly tools to handle the data outputted by both MinION and GridION X5 platforms. Here we present NanoR, a cross-platform R package designed with the purpose to simplify and improve nanopore data visualization. Indeed, NanoR is built on few functions but overcomes the capabilities of existing tools to extract meaningful informations from MinION sequencing data; in addition, as exclusive features, NanoR can deal with GridION X5 sequencing outputs and allows comparison of both MinION and GridION X5 sequencing data in one command. NanoR is released as free package for R at https://github.com/davidebolo1993/NanoR.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. NanoR workflow.
NanoR can work with both basecalled .fast5 files and sequencing summary/.fastq files. Users have to rely on NanoFastqM() to direclty extract .fastq sequences from basecalled .fast5 files and on NanoFastqG() to filter .fastq files. NanoPrepare() functions, as well as NanoTable() and NanoStats() can be used one after another to generate a complete overview for the sequencing run, starting from basecalled .fast5 files (“M” version) or from sequencing summary and .fastq files (“G” version). NanoCompare(), at last, allows one-command comparison of MinION/GridION X5 analyzed sequencing experiments.
Fig 2
Fig 2. Reads number, base pairs number, reads length and reads quality per-time bin.
The plots show the number of reads (Panel A) and basepairs (Panel B), the maximum, average and minimum length of reads in log10 scale (Panel C) and the maximum, average and minimum quality of reads (Panel D), all calculated every 30 minutes of an experimental MinION run.
Fig 3
Fig 3. Reads length and reads quality compared jointly.
The plot shows the correlation between length (x axis) and quality (y axis) for ∼ 100000 MinION reads. The regression line highlights that longer the reads are, higher their quality score is.
Fig 4
Fig 4. Heatmap of channels and muxes activity.
The plots show the base pairs productivity of channels (Panel A) and muxes (Panel B) with respect to their real disposition on the Flow Cell (as described in https://community.nanoporetech.com/technical_documents/hardware/v/hwtd_5000_v1_revh_03may2016/flow-cell-chip) for a MinION run; inactive channels and muxes are grey-colored.
Fig 5
Fig 5. Violin plots for comparison between experiments.
From top to bottom, the plots show the comparison 3 ONT experiments (first 2 are GridION X5 experiments, producing multi-read and single-read .fast5 files respectively and last is a MinION experiment producing multi-read .fast5 files) in terms of reads number, base pairs number, reads mean length and reads mean quality. Comparison is done every 10 hours of experiment using time bins of 30 minutes.
Fig 6
Fig 6. LOESS curves comparing NanoR (dark blue, light blue), poRe (red) and IONiseR (green) performances when extracting metadata informations from increasing number of .fast5 reads (25000,50000,100000,500000,1000000) randomly sanpled from ∼ 2000000 reads coming from 5 MinION runs, using 10 Intel®Xeon®CPU E5-46100 @ 2.40GHz cores.
Each sampling-extraction step was repeated 5 times. Under the same conditions (i.e. without GC content calculation), NanoR is the fastest in extracting metadata informations for all the groups of .fast5 files considered (e.g. NanoR takes approximately 60 minutes to extract metadata from 1000000 .fast5 files, poRe takes approximately 80 minutes and IONiseR takes over 10 hours) while the extraction of metadata together with GC content computation (light blue line) makes NanoR working slightly slower (it takes approximatley 100 minutes to both extract metadata informations from 1000000 .fast5 files and calculate their GC content). Data on x and y axes are log10-scaled.

Comment in

References

    1. Magi A, Semeraro R, Mingrino A, Giusti B, D’Aurizio R. Nanopore sequencing data analysis: state of the art, applications and challenges. Brief Bioinform. 2018. November 27;19(6):1256–1272. 10.1093/bib/bbx062 - DOI - PubMed
    1. Korlach J, Bjornson KP, Chaudhuri BP, Cicero RL, Flusberg BA, Gray JJ, Holden D, Saxena R, Wegener J, Turner SW. Real-time DNA sequencing from single polymerase molecules. Methods Enzymol. 2010;472:431–55. 10.1016/S0076-6879(10)72001-2 - DOI - PubMed
    1. Jain M, Olsen HE, Paten B, Akeson M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 2016. November 25;17(1):239 10.1186/s13059-016-1103-0 - DOI - PMC - PubMed
    1. The HDF Group. Hierarchical Data Format, version 5.
    1. Loman NJ, Quinlan AR. Poretools: a toolkit for analyzing nanopore sequence data. Bioinformatics. 2014;30(23):3399–340. 10.1093/bioinformatics/btu555 - DOI - PMC - PubMed

Publication types

LinkOut - more resources