Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 26;23(1):180.
doi: 10.1186/s13059-022-02751-6.

Identifying and correcting repeat-calling errors in nanopore sequencing of telomeres

Affiliations

Identifying and correcting repeat-calling errors in nanopore sequencing of telomeres

Kar-Tong Tan et al. Genome Biol. .

Abstract

Nanopore long-read sequencing is an emerging approach for studying genomes, including long repetitive elements like telomeres. Here, we report extensive basecalling induced errors at telomere repeats across nanopore datasets, sequencing platforms, basecallers, and basecalling models. We find that telomeres in many organisms are frequently miscalled. We demonstrate that tuning of nanopore basecalling models leads to improved recovery and analysis of telomeric regions, with minimal negative impact on other genomic regions. We highlight the importance of verifying nanopore basecalls in long, repetitive, and poorly defined regions, and showcase how artefacts can be resolved by improvements in nanopore basecalling models.

Keywords: Basecalling; Long-reads; Nanopore-sequencing; Telomere.

PubMed Disclaimer

Conflict of interest statement

H.L. is a consultant of Integrated DNA Technologies and on the SAB of Sentieon, Innozeen and BGI. M.M. is a consultant for Interline, Isabl, and Bayer; receives research support from Bayer, Janssen, and Ono; has a patent for EGFR mutations for lung cancer diagnosis issued, licensed, and with royalties paid from LabCorp and has issued patents and patents pending licensed to Bayer; and was a founding advisor of, consultant to, and equity holder in Foundation Medicine, shares of which were sold to Roche.

Figures

Fig. 1
Fig. 1
Strand-specific nanopore basecalling errors are pervasive at telomeres. a, b IGV screenshot illustrating the three types of basecalling errors found on the forward and reverse strands of telomeres for nanopore sequencing. (TTAGGG)n on the forward strand of nanopore sequencing data was basecalled as (TTAAAA)n while (CCCTAA)n on the reverse strand was basecalled as (CTTCTT)n and (CCCTGG)n. PacBio HiFi data generated from the same cell line (CHM13) is depicted as a control. Reference genome indicated in the plot corresponds to the chm13 draft genome assembly (v1.0). c Co-occurrence heatmap illustrating the frequency of co-occurrence of repeats corresponding to natural telomeres, or to basecalling errors in PacBio HiFi and nanopore long-reads found at chromosomal ends (within 10kb of annotated end of the reference genome). Diagonal of co-occurrence matrix represents counts of long-reads with only a single type of repeats observed. d Basecalling errors at telomeres are observed across different nanopore datasets and sequencing platforms. e Basecalling errors at telomeres are observed for different nanopore basecallers and basecalling models. Guppy5 and the Bonito basecallers, and different bascalling models for each basecaller, were used to basecall telomeric reads in the CHM13 PromethION dataset (reads that mapped to flanking 10kb regions of the CHM13 reference genome). f Basecalling errors share similar nanopore current profiles as telomeric repeats. Current profiles for telomeric and basecalling error repeats were plotted based on known mean current profiles for each k-mer (“Methods”). g Summary of organisms assessed and the types of repeat errors observed. Note that S. pombe and D. melanogaster could not be readily assessed for the presence of error repeats by visualization in IGV as these sequences are more complex
Fig. 2
Fig. 2
Selective re-basecalling of telomeric reads resolves basecalling errors at telomeres. a Approach for tuning the bonito basecalling model for improving basecalls at telomeres. b Tuned bonito basecalling model leads to improvement in basecalls at telomeric regions. IGV screenshots of the telomeric region (chr2q) in the CHM13 dataset basecalled using the default bonito basecaller, and the tuned bonito basecalling model is as depicted. c Overall approach for selecting and fixing telomeric reads in nanopore sequencing datasets. Telomeric reads are selected (“Methods”) and rebasecalled using the tuned bonito basecalling model. d The selective tuning approach leads to improved recovery of telomeric reads, and a decrease in the number of reads with basecalling artefacts. Evaluation was performed on the held-out test dataset (run226). e The “selective tuning” approach leads to little detected negative impact on basecalling of other genomic regions. The sequence similarity of all reads to the reference genome for three approaches for basecalling of nanopore reads was evaluated. They are applying the default bonito basecalling model to all reads (untuned bonito model), applying the tuned bonito basecalling model to all reads (tuned bonito model), and applying the tuned bonito basecalling model selectively to telomeric reads only (selective tuning of telomeric reads). The density plot depicts the sequence similarity of each read against the CHM13 reference genome as assessed using minimap2

Similar articles

Cited by

References

    1. Shay JW, Wright WE. Telomeres and telomerase: three decades of progress. Nat Rev Genet. 2019;20:299–309. doi: 10.1038/s41576-019-0099-1. - DOI - PubMed
    1. Turner KJ, Vasu V, Griffin DK. Telomere biology and human phenotype. Cells. 2019;8:73. doi: 10.3390/cells8010073. - DOI - PMC - PubMed
    1. Li Y, Tergaonkar V. Noncanonical functions of telomerase: implications in telomerase-targeted cancer therapies. Cancer Res. 2014;74:1639–1644. doi: 10.1158/0008-5472.CAN-13-3568. - DOI - PubMed
    1. Kim NW, Piatyszek MA, Prowse KR, Harley CB, West MD, Ho PLC, et al. Specific Association of Human Telomerase Activity with Immortal Cells and Cancer. Science (80- ) 1994;266:2011–2015. doi: 10.1126/science.7605428. - DOI - PubMed
    1. Meyerson M, Counter CM, Eaton EN, Ellisen LW, Steiner P, Caddle SD, et al. hEST2, the Putative Human Telomerase Catalytic Subunit Gene, Is Up-Regulated in Tumor Cells and during Immortalization. Cell. 1997;90:785–795. doi: 10.1016/S0092-8674(00)80538-3. - DOI - PubMed

Publication types

LinkOut - more resources