Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 31;26(5):bbaf504.
doi: 10.1093/bib/bbaf504.

TELLBASE: a novel tool of TELL-seq barcode-assisted scaffold assembler for bacterial genomes

Affiliations

TELLBASE: a novel tool of TELL-seq barcode-assisted scaffold assembler for bacterial genomes

Yutong Li et al. Brief Bioinform. .

Abstract

Transposase enzyme linked long-read sequencing (TELL-seq) technology generates barcode-linked reads, facilitating whole-genome sequencing (WGS), and complete assembly with improved accuracy and reduced costs. Unlike mate-pair sequencing technology, TELL-seq employs a near-full-sequence tagging strategy that allows more efficient capture of comprehensive genomic information. However, assembly algorithms and software capable of fully leveraging the characteristics of TELL-seq technology to effectively assemble genomic sequences at the megabase-scale are lacking, particularly for bacteria and their plasmids. In this study, we present TELL-seq barcode-assisted scaffold assembler (TELLBASE), a de novo genome assembler designed specifically for assembling bacterial genomes using TELL-seq-derived linked reads. In assembly tests involving bacteria such as Acinetobacter baumannii, Klebsiella pneumoniae, Mycobacterium tuberculosis, and Staphylococcus aureus, TELLBASE exhibited exceptional efficacy in producing chromosome-level bacterial genomic sequences and successful identification of plasmids present in the sequenced strains. Comparative analysis revealed that TELLBASE significantly outperforms existing assemblers tailored for TELL-seq-derived linked reads, such as TuringAssembler and Ariadne, in terms of the completeness and accuracy of the assembled genomes. Therefore, TELLBASE shows promising potential for refining draft bacterial genomes and further applications in related fields. The package for TELLBASE is freely available on GitHub (https://github.com/sosie1/TELLBASE).

Keywords: assemble algorithm for TELL-seq platform; bacterial genome assembly; barcoded NGS for bacterial genome; complete de novo assembly.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1
Figure 1
Overview of TELLBASE workflow. (A) Combining DNA sequences with barcoded TELL beads. (B) Paired-end reads with different barcodes. Different colors represent different barcode sequences. (C) Contigs are assembled by SPAdes and classified into three categories. (D) Mapping paired-end reads on heads and tails of long contigs. (E) Identifying the positional relationship among long contigs by using Jaccard correlation and Sorensen-Dice correlation. (F) Utilizing short contigs and pair-end reads to fill the gaps in the super-scaffolds.
Figure 2
Figure 2
Details of TELLBASE workflow and principle. The input file of TELLBASE is the fastq format data obtained from TELL-seq platform, and the output gets organized plasmids and chromosome-level bacterial genome sequence information respectively.
Figure 3
Figure 3
Schematic diagram of contigs extension and gaps filling. (A) When calculating the correlation of contigs, to minimize the influence of length, sequences longer than 20 kb are selected from the barcode information corresponding to each 10 kb site at each end. Contigs shorter than 20 kb are split into two parts. (B) There are only four directional relationships between contigs: back-front, back-back, front-front, and front-back. (C) The relationships between contigs may include gaps in the middle or overlaps at the end or start sequences. (D) Various paths can be used to fill gaps between contigs. Score (P) is utilized to select the optimal path that perfectly fills the gaps. Additionally, formula image is formulated to select suboptimal path. Paired-end reads are used to fill gaps if there are still “N” in the suboptimal path.
Figure 4
Figure 4
Heatmaps illustrating the correlation between contigs at various input DNA concentrations. Heatmap for the AB ATCC19606 strain at 20 pg (A), the AB ATCC19606 strain at 100 pg (B), the KP HS11286 strain at 100 pg (C), the KP HS11286 strain at 200 pg (D), the KP 2044 strain at 10 pg (E), and the KP 2044 strain at 50 pg (F). All the correlations of contigs are calculated according to formula formula image. The leftmost column displays the length information of the corresponding position contigs.
Figure 5
Figure 5
The distribution of Jaccard similarity for AB ATCC19606 (A), KP HS11286 (B), and KP 2044 (C) at different DNA input concentration. For the AB ATCC19606 strain, the orange curve (dashed curve) corresponds to distribution at 50 pg DNA input concentration, and the blue curve (solid line) corresponds to results at 10 pg DNA input concentration. For the KP HS11286 strain, the orange curve (dashed curve) corresponds to result at 200 pg DNA input concentration, and the blue curve (solid line) corresponds to results at 100 pg DNA input concentration. For the KP 2044 strain, the orange curve (dashed curve) corresponds to distribution at 50 pg DNA input concentration, and the blue curve (solid line) corresponds to results at 10 pg DNA input concentration.
Figure 6
Figure 6
Evaluation of the assembly results between TELLBASE, TuringAssembler, and Ariadne. (A) Bar chart of the largest contigs of all drawn bacterial strains. For each strain, from left to right, the first bar represents the reference genome size, the second barrepresents the TELLBASE assembled genome size, the third bar represents the TuringAssembler assembled genome size, and the fourth bar represents the Ariadne assembled genome size. The dashed portion of the figure represents the size of the genome of a highly homologous strain found at NCBI, serving as a reference genome. The Mauve collinear plots for the strain C.Diff 1326 (B) and the strain Staph.Aureus FPR (C). The BRIG ring image for the strain C.Diff 1326 (D) and the strain Staph.Aureus FPR (E). In the BRIG Ring Image, from inside to outside, the first circle represents the reference genome, the second circle is GC content, the third circle is GC Skew, the fourth circle is the genome assembled by TELLBASE, and the rest are the broken scaffolds assembled by TuringAssembler.

References

    1. Fleischmann RD, Adams MD, White O. et al. Whole-genome random sequencing and assembly of haemophilus influenzae Rd. Science 1995;269:496–512. 10.1126/science.7542800 - DOI - PubMed
    1. Ferreira S, Queiroz JA, Oleastro M. et al. Insights in the pathogenesis and resistance of arcobacter: a review. Crit Rev Microbiol 2016;42:364–83. 10.3109/1040841X.2014.954523 - DOI - PubMed
    1. Baker S, Thomson N, Weill FX. et al. Genomic insights into the emergence and spread of antimicrobial-resistant bacterial pathogens. Science 2018;360:733–8. 10.1126/science.aar3777 - DOI - PMC - PubMed
    1. Rozman V, Mohar Lorbeg P, Treven P. et al. Genomic insights into antibiotic resistance and mobilome of lactic acid bacteria and bifidobacteria. Life Sci Alliance 2023;6:e202201637. 10.26508/lsa.202201637 - DOI - PMC - PubMed
    1. Salamzade R, McElheny CL, Manson AL. et al. Genomic epidemiology and antibiotic susceptibility profiling of uropathogenic Escherichia coli among children in the United States. mSphere 2023;8:e0018423. 10.1128/msphere.00184-23 - DOI - PMC - PubMed

LinkOut - more resources