This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2024 Mar 10:2024.03.05.583511.

doi: 10.1101/2024.03.05.583511.

Uncalled4 improves nanopore DNA and RNA modification detection via fast and accurate signal alignment

Sam Kovaka, Paul W Hook, Katharine M Jenike, Vikram Shivakumar, Luke B Morina, Roham Razaghi, Winston Timp, Michael C Schatz

PMID: 38496646
PMCID: PMC10942365
DOI: 10.1101/2024.03.05.583511

Uncalled4 improves nanopore DNA and RNA modification detection via fast and accurate signal alignment

Sam Kovaka et al. bioRxiv. 2024.

[Preprint]. 2024 Mar 10:2024.03.05.583511.

doi: 10.1101/2024.03.05.583511.

Authors

Sam Kovaka, Paul W Hook, Katharine M Jenike, Vikram Shivakumar, Luke B Morina, Roham Razaghi, Winston Timp, Michael C Schatz

PMID: 38496646
PMCID: PMC10942365
DOI: 10.1101/2024.03.05.583511

Update in

Uncalled4 improves nanopore DNA and RNA modification detection via fast and accurate signal alignment.
Kovaka S, Hook PW, Jenike KM, Shivakumar V, Morina LB, Razaghi R, Timp W, Schatz MC. Kovaka S, et al. Nat Methods. 2025 Apr;22(4):681-691. doi: 10.1038/s41592-025-02631-4. Epub 2025 Mar 28. Nat Methods. 2025. PMID: 40155722 Free PMC article.

Abstract

Nanopore signal analysis enables detection of nucleotide modifications from native DNA and RNA sequencing, providing both accurate genetic/transcriptomic and epigenetic information without additional library preparation. Presently, only a limited set of modifications can be directly basecalled (e.g. 5-methylcytosine), while most others require exploratory methods that often begin with alignment of nanopore signal to a nucleotide reference. We present Uncalled4, a toolkit for nanopore signal alignment, analysis, and visualization. Uncalled4 features an efficient banded signal alignment algorithm, BAM signal alignment file format, statistics for comparing signal alignment methods, and a reproducible de novo training method for k-mer-based pore models, revealing potential errors in ONT's state-of-the-art DNA model. We apply Uncalled4 to RNA 6-methyladenine (m6A) detection in seven human cell lines, identifying 26% more modifications than Nanopolish using m6Anet, including in several genes where m6A has known implications in cancer. Uncalled4 is available open-source at github.com/skovaka/uncalled4.

PubMed Disclaimer

Figures

**Figure 1.. Pore model and alignment methods overview.**
**(a)** Schematics of Nanopore sequencing chemistries and their pore k-mer substitution profiles. Heatmaps show the mean normalized current difference observed by substituting each base (y-axis) at each k-mer position (x-axis) averaged over all k-mers in the model. **(b)** A signal-to-reference dotplot of an *Escherichia coli* 16S ribosomal RNA (rRNA) read sequenced using ONT r9.4 direct RNA sequencing. Top panel shows the raw samples (black) plotted over the reference base it was aligned to, with the expected pore model current in white. Main panel shows the Uncalled4 read alignment (purple line) over the projected basecaller metadata alignment (orange dots). Side panels show per-reference coordinate summary statistics for the alignment. **(c)** A comparative signal-to-reference dotplot and distance metrics between the alignments. **(d)** A trackplot displaying heatmaps of many native (top) and *in vitro* transcribed (IVT, bottom) *E. coli* 16S rRNA reads aligned by Uncalled4, colored by the difference between the observed and expected normalized current level. Top bar is colored by reference base, and an O6-methylguanine site is known to occur at position 526. **(e)** A refplot summarizing the distributions of differences between observed and expected normalized current levels for native (purple) and IVT (green) reads. **(f)** Schematic of Uncalled4 inputs, outputs, and subcommands (see Methods).

**Figure 2.. Current distribution and nucleotide composition of k-mers in Uncalled4 trained pore models.**
Plots represent **(a)** r9.4.1 DNA, **(b)** r10.4.1 DNA, and **(c)** r9.4.1 RNA (RNA002). ONT pore models are highly similar and produce nearly identical figures. **(d)** Mean and standard deviation of current surrounding a 9-mer adenine homopolymer in the *D. melanogaster* genome, based on Uncalled4 alignments of r9.4.1 and r10.4.1 DNA reads. **(e)** Fraction of basecalled reads containing a deletion within homopolymers of length nine or longer in the *D. melanogaster* X chromosome, computed using samtools mpileup.

**Figure 3.. 5mCpG signal characteristics.**
**(a)** Normalized current levels for Uncalled4 5-methylcytosine (x-axis) and unmodified control (y-axis) r10.4.1 pore models, reduced to 4-mers by averaging k-mers sharing their last four bases. Each point is colored by the identity of the central base, with diamonds representing CpG containing k-mers. Outlined diamonds indicate k-mers with the modified cytosine at central position (C[G]) or one base upstream ([C]G). **(b)** Current-level KS statistic mean and interquartile ranges surround 5mCpG sites in the *D. melanogaster* X chromosome, computed from Uncalled4 and f5c r10.4.1 signal alignments using the ONT r10.4.1 400bps model.

**Figure 4.. RNA modification detection.**
**(a)** Gene-level comparative m6A detection in DRACH contexts. “Uncalled4 (spliced)” (magenta) is based on spliced genome alignments, while all other use transcriptome alignments averaged to the gene-level. **(b)** Number of m6A sites found in each cell line which occur in the m6A-Atlas v2. Solid bars indicate the number of sites found with the default probability threshold 0.9, and shaded bars indicate the count at threshold where the precision is 85%. Uncalled4 with NA12878 has reduced recall at 85% precision, as indicated by dashed line. Precision with default probability threshold of 0.9. **(c)** Coverage distribution of true positive (TP) sites (top) and precision of sites within coverage bins. **(d)** Number of sites shared by Uncalled4, Nanopolish, and m6A-atlas v2 across all cell lines. **(e)** Difference in per-gene m6A count found by Uncalled4 and Nanopolish across all seven cell lines. **(f)** Difference in aggregated gene m6A count from Uncalled4 and Nanopolish alignments in COSMIC tier 1 genes with m6A modification found in every cell line (51 genes). **(g)** Transcript-level m6A calls in an ABL1 transcript alongside BCR fusion, and **(h)** gene-level m6A calls in the TTC4 gene.

See this image and copyright information in PMC

References

1. Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, et al. The complete sequence of a human genome. Science. 2022;376: 44–53. doi:10.1126/science.abj6987 - DOI - PMC - PubMed
1. Glinos DA, Garborcauskas G, Hoffman P, Ehsan N, Jiang L, Gokden A, et al. Transcriptome variation in human tissues revealed by long-read sequencing. Nature. 2022;608: 353–359. doi:10.1038/s41586-022-05035-y - DOI - PMC - PubMed
1. Kovaka S, Ou S, Jenike KM, Schatz MC. Approaching complete genomes, transcriptomes and epi-omes with accurate long-read sequencing. Nat Methods. 2023;20: 12–16. doi:10.1038/s41592-022-01716-8 - DOI - PMC - PubMed
1. Gershman A, Sauria MEG, Guitart X, Vollger MR, Hook PW, Hoyt SJ, et al. Epigenetic patterns in a complete human genome. Science. 2022;376: eabj5089. doi:10.1126/science.abj5089 - DOI - PMC - PubMed
1. Loman NJ, Quick J, Simpson JT. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods. 2015;12: 733–735. doi:10.1038/nmeth.3444 - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Uncalled4 improves nanopore DNA and RNA modification detection via fast and accurate signal alignment

Uncalled4 improves nanopore DNA and RNA modification detection via fast and accurate signal alignment

Authors

Update in

Abstract

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous