Identifying and correcting repeat-calling errors in nanopore sequencing of telomeres
- PMID: 36028900
- PMCID: PMC9414165
- DOI: 10.1186/s13059-022-02751-6
Identifying and correcting repeat-calling errors in nanopore sequencing of telomeres
Abstract
Nanopore long-read sequencing is an emerging approach for studying genomes, including long repetitive elements like telomeres. Here, we report extensive basecalling induced errors at telomere repeats across nanopore datasets, sequencing platforms, basecallers, and basecalling models. We find that telomeres in many organisms are frequently miscalled. We demonstrate that tuning of nanopore basecalling models leads to improved recovery and analysis of telomeric regions, with minimal negative impact on other genomic regions. We highlight the importance of verifying nanopore basecalls in long, repetitive, and poorly defined regions, and showcase how artefacts can be resolved by improvements in nanopore basecalling models.
Keywords: Basecalling; Long-reads; Nanopore-sequencing; Telomere.
© 2022. The Author(s).
Conflict of interest statement
H.L. is a consultant of Integrated DNA Technologies and on the SAB of Sentieon, Innozeen and BGI. M.M. is a consultant for Interline, Isabl, and Bayer; receives research support from Bayer, Janssen, and Ono; has a patent for
Figures


Similar articles
-
Estimated Nucleotide Reconstruction Quality Symbols of Basecalling Tools for Oxford Nanopore Sequencing.Sensors (Basel). 2023 Jul 29;23(15):6787. doi: 10.3390/s23156787. Sensors (Basel). 2023. PMID: 37571570 Free PMC article.
-
RODAN: a fully convolutional architecture for basecalling nanopore RNA sequencing data.BMC Bioinformatics. 2022 Apr 20;23(1):142. doi: 10.1186/s12859-022-04686-y. BMC Bioinformatics. 2022. PMID: 35443610 Free PMC article.
-
SACall: A Neural Network Basecaller for Oxford Nanopore Sequencing Data Based on Self-Attention Mechanism.IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):614-623. doi: 10.1109/TCBB.2020.3039244. Epub 2022 Feb 3. IEEE/ACM Trans Comput Biol Bioinform. 2022. PMID: 33211664
-
Beyond sequencing: machine learning algorithms extract biology hidden in Nanopore signal data.Trends Genet. 2022 Mar;38(3):246-257. doi: 10.1016/j.tig.2021.09.001. Epub 2021 Oct 25. Trends Genet. 2022. PMID: 34711425 Review.
-
Nanopore sequencing technology, bioinformatics and applications.Nat Biotechnol. 2021 Nov;39(11):1348-1365. doi: 10.1038/s41587-021-01108-x. Epub 2021 Nov 8. Nat Biotechnol. 2021. PMID: 34750572 Free PMC article. Review.
Cited by
-
Understanding the Natural Language of DNA using Encoder-Decoder Foundation Models with Byte-level Precision.ArXiv [Preprint]. 2024 Aug 22:arXiv:2311.02333v3. ArXiv. 2024. Update in: Bioinform Adv. 2024 Aug 12;4(1):vbae117. doi: 10.1093/bioadv/vbae117. PMID: 38410643 Free PMC article. Updated. Preprint.
-
The fate of artificial transgenes in Acanthamoeba castellanii.BMC Genomics. 2025 Apr 13;26(1):368. doi: 10.1186/s12864-025-11552-7. BMC Genomics. 2025. PMID: 40223056 Free PMC article.
-
Atlas of telomeric repeat diversity in Arabidopsis thaliana.Genome Biol. 2024 Sep 16;25(1):244. doi: 10.1186/s13059-024-03388-3. Genome Biol. 2024. PMID: 39285474 Free PMC article.
-
Techniques for assessing telomere length: A methodological review.Comput Struct Biotechnol J. 2024 Apr 10;23:1489-1498. doi: 10.1016/j.csbj.2024.04.011. eCollection 2024 Dec. Comput Struct Biotechnol J. 2024. PMID: 38633384 Free PMC article. Review.
-
Digital telomere measurement by long-read sequencing distinguishes healthy aging from disease.Nat Commun. 2024 Jun 18;15(1):5148. doi: 10.1038/s41467-024-49007-4. Nat Commun. 2024. PMID: 38890274 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources