Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 27;22(1):298.
doi: 10.1186/s13059-021-02511-y.

SquiggleNet: real-time, direct classification of nanopore signals

Affiliations

SquiggleNet: real-time, direct classification of nanopore signals

Yuwei Bao et al. Genome Biol. .

Abstract

We present SquiggleNet, the first deep-learning model that can classify nanopore reads directly from their electrical signals. SquiggleNet operates faster than DNA passes through the pore, allowing real-time classification and read ejection. Using 1 s of sequencing data, the classifier achieves significantly higher accuracy than base calling followed by sequence alignment. Our approach is also faster and requires an order of magnitude less memory than alignment-based approaches. SquiggleNet distinguished human from bacterial DNA with over 90% accuracy, generalized to unseen bacterial species in a human respiratory meta genome sample, and accurately classified sequences containing human long interspersed repeat elements.

Keywords: Deep learning; Oxford Nanopore; Raw signal; Read-until; Real-time.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Read-until pipeline overview. a A DNA molecule translocates through a nanopore, generating electric signals (squiggles). SquiggleNet rapidly classifies the molecule to determine whether it is a sequence of interest. If the molecule is accepted by the classifier, it is sequenced to full length. Otherwise, the molecule is ejected from the pore, freeing the pore to sequence another molecule. b SquiggleNet employs 1D-ResNet-styled bottleneck blocks with increasing numbers of filters. Average pooling and a final fully connected layer are performed after the last convolutional block
Fig. 2
Fig. 2
Overall performance across five test datasets: accuracy, true positive rate (TPR, RECALL), true negative rate (TNR), precision, and the AUROC score of the model trained on the HeLa&Zymo training set, and tested on five test sets with bacterial sequences as the target
Fig. 3
Fig. 3
Taxonomy tree and accuracy per species. Taxonomy tree for the eight species in our dataset grouped in color and their corresponding accuracy breakdown per species. The accuracy for distinguishing bacterial sequences from human was highest for the red branch, intermediate for the blue group, and lowest for the brown group
Fig. 4
Fig. 4
Performance of SquiggleNet on unseen species. Each column (except “All”) is a model trained on a Zymo/HeLa 1:1 mix without the held-out species. For each species, the red bar shows the test accuracy on all species minus the held-out species; this number provides a baseline against which to compare performance on the held-out species. Blue bars show the accuracy of each trained model on Test-Uniform/HeLa, a test set with all eight Zymo bacterial species included and HeLa in a 1:1 ratio. Brown bars show the accuracy of each model on Test-One/HeLa, a test set with only the single unseen species and HeLa in a 1:1 ratio
Fig. 5
Fig. 5
SquiggleNet accuracy by species in human respiratory metagenome sample[19]. Some of the dominant bacterial groups include Neisseria (23%), Bacteriodales (21%), and Firmicutes (20%). Less than 3% of the bacterial species overlap with the training dataset
Fig. 6
Fig. 6
Processing time and accuracy comparison. The processing time of SquiggleNet with 300 bp of input is among the lowest, and yet the accuracy is the highest among the three methods. For the other two alignment-based methods, with longer input length, the processing time grows drastically, whereas the accuracy gain is limited
Fig. 7
Fig. 7
Identifying reads containing human long interspersed repeat elements. a Diagram of experimental strategy for enriching human mobile elements, including interspersed repeats. A guide RNA specific to each repeat class directs Cas9 to cut the DNA and ligate a sequencing adapter. However, adapters are also ligated to some sequences without repeat elements. Subsequent nanopore sequencing produces both target and non-target reads. b Pie charts of the proportion of L1Hs repeat elements from Cas9 enrichment only vs. SquiggleNet classification. c Classification metrics demonstrating SquiggleNet’s ability to distinguish reads with or without L1Hs repeats
Fig. 8
Fig. 8
Throughput and sequencing time comparison without/with read-until. When the average non-target read length is about 20 times longer than the target read length, and sample contains over 90% non-target reads, a normal sequencing pipeline would have to sequence ∼ 10 times more base pairs (left) than Read-Until pipeline with SquiggleNet to achieve a fixed number of targeted reads. The ratio is about 10 for the required sequencing time as well (right)

References

    1. Oxford Nanopore: Minion. https://nanoporetech.com/products/minion. Accessed 10 Sept 2019.
    1. Gnirke A, Melnikov A, Maguire J, Rogov P, LeProust EM, Brockman W, Fennell T, Giannoukos G, Fisher S, Russ C, Gabriel S, Jaffe DB, Lander ES, Nusbaum C. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol. 2009;27(2):182–89. doi: 10.1038/nbt.1523. - DOI - PMC - PubMed
    1. Kozarewa I, Armisen J, Gardner AF, Slatko BE, Hendrickson CL. Overview of Target Enrichment Strategies. Curr Protoc Mol Biol. 2015;112:7.21.1–7.21.23. doi: 10.1002/0471142727.mb0721s112. - DOI - PubMed
    1. Rand AC, Jain M, Eizenga JM, Musselman-Brown A, Olsen HE, Akeson M, Paten B. Mapping dna methylation with high-throughput nanopore sequencing. Nat Methods. 2017;14(4):411–13. doi: 10.1038/nmeth.4189. - DOI - PMC - PubMed
    1. Simpson JT, Workman RE, Zuzarte PC, David M, Dursi LJ, Timp W. Detecting dna cytosine methylation using nanopore sequencing. Nat Methods. 2017;14(4):407–10. doi: 10.1038/nmeth.4184. - DOI - PubMed

Publication types

LinkOut - more resources