Pan-genomic matching statistics for targeted nanopore sequencing

Omar Ahmed¹, Massimiliano Rossi², Sam Kovaka¹, Michael C Schatz¹, Travis Gagie³, Christina Boucher², Ben Langmead¹

Affiliations

¹ Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
² Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USA.
³ Faculty of Computer Science, Dalhousie University, Halifax, NS, USA.

PMID: 34195571
PMCID: PMC8237286
DOI: 10.1016/j.isci.2021.102696

Pan-genomic matching statistics for targeted nanopore sequencing

Omar Ahmed et al. iScience. 2021.

. 2021 Jun 8;24(6):102696.

doi: 10.1016/j.isci.2021.102696. eCollection 2021 Jun 25.

Authors

Omar Ahmed¹, Massimiliano Rossi², Sam Kovaka¹, Michael C Schatz¹, Travis Gagie³, Christina Boucher², Ben Langmead¹

Affiliations

¹ Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
² Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USA.
³ Faculty of Computer Science, Dalhousie University, Halifax, NS, USA.

PMID: 34195571
PMCID: PMC8237286
DOI: 10.1016/j.isci.2021.102696

Abstract

Nanopore sequencing is an increasingly powerful tool for genomics. Recently, computational advances have allowed nanopores to sequence in a targeted fashion; as the sequencer emits data, software can analyze the data in real time and signal the sequencer to eject "nontarget" DNA molecules. We present a novel method called SPUMONI, which enables rapid and accurate targeted sequencing using efficient pan-genome indexes. SPUMONI uses a compressed index to rapidly generate exact or approximate matching statistics in a streaming fashion. When used to target a specific strain in a mock community, SPUMONI has similar accuracy as minimap2 when both are run against an index containing many strains per species. However SPUMONI is 12 times faster than minimap2. SPUMONI's index and peak memory footprint are also 16 to 4 times smaller than those of minimap2, respectively. This could enable accurate targeted sequencing even when the targeted strains have not necessarily been sequenced or assembled previously.

Keywords: Biocomputational Method; Bioinformatics; Biotechnology; Genomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
Distribution of matching statistics from positive and null indexes on simulated ZymoMC reads at accuracies of (A) 85%, (B) 90%, (C) 95%, and (D) 98%. Each plot contains the density curves for the first 720 bases ( $\sim$ 1.6 s) for three randomly chosen simulated *Escherichia coli* reads.

**Figure 2**
Distribution of matching statistics across three randomly chosen reads from (A) the human simulation and (B) the microbiome study (Moss et al. 2020). A single curve represents the first 720 bases ( $\sim$ 1.6 Read Until seconds) of a read.

See this image and copyright information in PMC

References

1. Bannai H., Gagie T., Tomohiro I. Refining the r-index. Theor. Comput. Sci. 2020;812:96–108. doi: 10.1016/j.tcs.2019.08.005. - DOI
1. Burrows M., Wheeler D. A block-sorting lossless data compression algorithm. Technical Report 124. 1994;Digital SRC Research Report
1. Church D.M., Schneider V.A., Steinberg K.M., Schatz M.C., Quinlan A.R., Chin C.S., Kitts P.A., Aken B., Marth G.T., Hoffman M.M. Extending reference assembly models. Genome Biol. 2015;16:13. doi: 10.1186/s13059-015-0587-3. - DOI - PMC - PubMed
1. Gagie T., Tomohiro I., Manzini G., Navarro G., Sakamoto H., Takabatake Y. Rpair: Rescaling RePair with Rsync. Proc. SPIRE. 2019 doi: 10.1007/978-3-030-32686-9_3. - DOI
1. Gagie T., Navarro G., Prezza N. Fully functional suffix trees and optimal text searching in BWT-runs bounded space. J. ACM. 2020;67:2:1–2:54. doi: 10.1145/3375890. - DOI

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Pan-genomic matching statistics for targeted nanopore sequencing

Affiliations

Pan-genomic matching statistics for targeted nanopore sequencing

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources