Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 1;35(13):2193-2198.
doi: 10.1093/bioinformatics/bty841.

BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files

Affiliations

BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files

Alexander Payne et al. Bioinformatics. .

Abstract

Motivation: The Oxford Nanopore Technologies (ONT) MinION is used for sequencing a wide variety of sample types with diverse methods of sample extraction. Nanopore sequencers output FAST5 files containing signal data subsequently base called to FASTQ format. Optionally, ONT devices can collect data from all sequencing channels simultaneously in a bulk FAST5 file enabling inspection of signal in any channel at any point. We sought to visualize this signal to inspect challenging or difficult to sequence samples.

Results: The BulkVis tool can load a bulk FAST5 file and overlays MinKNOW (the software that controls ONT sequencers) classifications on the signal trace and can show mappings to a reference. Users can navigate to a channel and time or, given a FASTQ header from a read, jump to its specific position. BulkVis can export regions as Nanopore base caller compatible reads. Using BulkVis, we find long reads can be incorrectly divided by MinKNOW resulting in single DNA molecules being split into two or more reads. The longest seen to date is 2 272 580 bases in length and reported in eleven consecutive reads. We provide helper scripts that identify and reconstruct split reads given a sequencing summary file and alignment to a reference. We note that incorrect read splitting appears to vary according to input sample type and is more common in 'ultra-long' read preparations.

Availability and implementation: The software is available freely under an MIT license at https://github.com/LooseLab/bulkvis.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Screenshot of the BulkVis application running. The vertical dashed lines indicate different annotations overlaid by MinKNOW on the signal trace in real time. The left panel provides configuration and navigation options for the selected bulk FAST5 file
Fig. 2.
Fig. 2.
Illustrative segments from a bulk FAST5 file visualized with BulkVis. (a) The start of a read mapping to chromosome 6. Open channel ‘pore’, followed by an ‘adapter’, and ‘strand’ as annotated by MinKNOW. (b) Read ending with an ‘unblock’ followed by ‘pore’ and then a new read. (c) Adjacent reads from a channel separated by unusual current patterns.These two reads are reported as distinct molecules by MinKNOW, they map consecutively to the reference. (d) Two adjacent reads separated by an ‘unblock’ signal. The unblock does not successfully remove the DNA. Instead the read continues to sequence again mapping adjacently to the reference
Fig. 3.
Fig. 3.
Read mappings. (a) Longest single read. (b) Longest fused read (>2 Mb), sequenced in 11 reads. (c) Longest fused read sequenced with a bulk FAST5 file. (d) Fused read comprising 38 individual sequences. (a–d) Reads mapped and visualized with last (-m 1) and last-dotplot (Kiełbasa et al., 2011). Horizontal lines indicate breaks between individual reads. (e) Illustration of reads, shown as red rectangles, from A to D mapped against GRCh38 in ENSEMBL
Fig. 4.
Fig. 4.
MinKNOW Classifications. Here we show selected classifications (see Supplementary Table S1 for classification definitions) captured from an entire bulk FAST5 file. (a) Shows the labels used for reads. Unique Read Starts and Split Read Starts are genuine new molecules being sequenced. Unique Read Ends and Split Read Ends are the real end of a read. Internal Read End and Start refers to just those incorrectly split reads. (b) Shows the density of each selected MinKNOW classification in a 10 second window before and after each of these read labels

References

    1. Collette A. (2013) Python and HDF5. O’Reilly Media, Incorporated.
    1. Euskirchen P. et al. (2017) Same-day genomic and epigenomic diagnosis of brain tumors using real-time nanopore sequencing. Acta Neuropathol., 134, 691–703. - PMC - PubMed
    1. Ip C.L. et al. (2015) MinION Analysis and Reference Consortium: Phase 1 data release and analysis [version 1; referees: 2 approved]. F1000Res., 4, 1075. - PMC - PubMed
    1. Jain M. et al. (2015) Improved data analysis for the MinION nanopore sequencer. Nat. Methods, 12, 351–356. - PMC - PubMed
    1. Jain M. et al. (2018) Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol., 36, 338–345. - PMC - PubMed

Publication types