TELLBASE: a novel tool of TELL-seq barcode-assisted scaffold assembler for bacterial genomes

Yutong Li¹, Tianlong Kuang¹, Tao Xu^{1

2

3}, Hanxiao Du¹, Yi Zhang^{2

3}, Yu Qian¹, Yiwen Chen⁴, Zhenxian Xiao⁴, Chen Chen^{2

3}, Jing Wu^{2

3}, Wen-Hong Zhang^{1

2

3}, Chenqi Lu¹, Ning Jiang^{1

2

3}

Affiliations

¹ State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Yangpu district, Shanghai 200433, China.
² Shanghai Sci-Tech Inno Center for Infection & Immunity, No. 1688 Guoquan Bei Road, Yangpu district, Shanghai 200433, China.
³ Department of Infectious Diseases, Huashan Hospital, Shanghai Medical College, Fudan University, No. 12 Wulumuqi Zhong Road, Jingan district, Shanghai 200040, China.
⁴ Department of Bioinformatics, Wuxi Universal Sequencing Co., Ltd., No. 35 South Changjiang Road, Xinwu district, Wuxi 214013, China.

PMID: 41016011
PMCID: PMC12476840
DOI: 10.1093/bib/bbaf504

TELLBASE: a novel tool of TELL-seq barcode-assisted scaffold assembler for bacterial genomes

Yutong Li et al. Brief Bioinform. 2025.

. 2025 Aug 31;26(5):bbaf504.

doi: 10.1093/bib/bbaf504.

Authors

Affiliations

¹ State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Yangpu district, Shanghai 200433, China.
² Shanghai Sci-Tech Inno Center for Infection & Immunity, No. 1688 Guoquan Bei Road, Yangpu district, Shanghai 200433, China.
³ Department of Infectious Diseases, Huashan Hospital, Shanghai Medical College, Fudan University, No. 12 Wulumuqi Zhong Road, Jingan district, Shanghai 200040, China.
⁴ Department of Bioinformatics, Wuxi Universal Sequencing Co., Ltd., No. 35 South Changjiang Road, Xinwu district, Wuxi 214013, China.

PMID: 41016011
PMCID: PMC12476840
DOI: 10.1093/bib/bbaf504

Abstract

Transposase enzyme linked long-read sequencing (TELL-seq) technology generates barcode-linked reads, facilitating whole-genome sequencing (WGS), and complete assembly with improved accuracy and reduced costs. Unlike mate-pair sequencing technology, TELL-seq employs a near-full-sequence tagging strategy that allows more efficient capture of comprehensive genomic information. However, assembly algorithms and software capable of fully leveraging the characteristics of TELL-seq technology to effectively assemble genomic sequences at the megabase-scale are lacking, particularly for bacteria and their plasmids. In this study, we present TELL-seq barcode-assisted scaffold assembler (TELLBASE), a de novo genome assembler designed specifically for assembling bacterial genomes using TELL-seq-derived linked reads. In assembly tests involving bacteria such as Acinetobacter baumannii, Klebsiella pneumoniae, Mycobacterium tuberculosis, and Staphylococcus aureus, TELLBASE exhibited exceptional efficacy in producing chromosome-level bacterial genomic sequences and successful identification of plasmids present in the sequenced strains. Comparative analysis revealed that TELLBASE significantly outperforms existing assemblers tailored for TELL-seq-derived linked reads, such as TuringAssembler and Ariadne, in terms of the completeness and accuracy of the assembled genomes. Therefore, TELLBASE shows promising potential for refining draft bacterial genomes and further applications in related fields. The package for TELLBASE is freely available on GitHub (https://github.com/sosie1/TELLBASE).

Keywords: assemble algorithm for TELL-seq platform; bacterial genome assembly; barcoded NGS for bacterial genome; complete de novo assembly.

PubMed Disclaimer

Figures

**Figure 1**
Overview of TELLBASE workflow. (A) Combining DNA sequences with barcoded TELL beads. (B) Paired-end reads with different barcodes. Different colors represent different barcode sequences. (C) Contigs are assembled by SPAdes and classified into three categories. (D) Mapping paired-end reads on heads and tails of long contigs. (E) Identifying the positional relationship among long contigs by using Jaccard correlation and Sorensen-Dice correlation. (F) Utilizing short contigs and pair-end reads to fill the gaps in the super-scaffolds.

**Figure 2**
Details of TELLBASE workflow and principle. The input file of TELLBASE is the fastq format data obtained from TELL-seq platform, and the output gets organized plasmids and chromosome-level bacterial genome sequence information respectively.

**Figure 3**
Schematic diagram of contigs extension and gaps filling. (A) When calculating the correlation of contigs, to minimize the influence of length, sequences longer than 20 kb are selected from the barcode information corresponding to each 10 kb site at each end. Contigs shorter than 20 kb are split into two parts. (B) There are only four directional relationships between contigs: back-front, back-back, front-front, and front-back. (C) The relationships between contigs may include gaps in the middle or overlaps at the end or start sequences. (D) Various paths can be used to fill gaps between contigs. Score (P) is utilized to select the optimal path that perfectly fills the gaps. Additionally, is formulated to select suboptimal path. Paired-end reads are used to fill gaps if there are still “N” in the suboptimal path.

formula image — **Figure 3**
Schematic diagram of contigs extension and gaps filling. (A) When calculating the correlation of contigs, to minimize the influence of length, sequences longer than 20 kb are selected from the barcode information corresponding to each 10 kb site at each end. Contigs shorter than 20 kb are split into two parts. (B) There are only four directional relationships between contigs: back-front, back-back, front-front, and front-back. (C) The relationships between contigs may include gaps in the middle or overlaps at the end or start sequences. (D) Various paths can be used to fill gaps between contigs. Score (P) is utilized to select the optimal path that perfectly fills the gaps. Additionally, is formulated to select suboptimal path. Paired-end reads are used to fill gaps if there are still “N” in the suboptimal path.

**Figure 4**
Heatmaps illustrating the correlation between contigs at various input DNA concentrations. Heatmap for the *AB ATCC19606* strain at 20 pg (A), the *AB ATCC19606* strain at 100 pg (B), the *KP HS11286* strain at 100 pg (C), the *KP HS11286* strain at 200 pg (D), the *KP 2044* strain at 10 pg (E), and the *KP 2044* strain at 50 pg (F). All the correlations of contigs are calculated according to formula . The leftmost column displays the length information of the corresponding position contigs.

**Figure 5**
The distribution of Jaccard similarity for *AB ATCC19606* (A), *KP HS11286* (B), and *KP 2044* (C) at different DNA input concentration. For the *AB ATCC19606* strain, the orange curve (dashed curve) corresponds to distribution at 50 pg DNA input concentration, and the blue curve (solid line) corresponds to results at 10 pg DNA input concentration. For the *KP HS11286* strain, the orange curve (dashed curve) corresponds to result at 200 pg DNA input concentration, and the blue curve (solid line) corresponds to results at 100 pg DNA input concentration. For the KP 2044 strain, the orange curve (dashed curve) corresponds to distribution at 50 pg DNA input concentration, and the blue curve (solid line) corresponds to results at 10 pg DNA input concentration.

**Figure 6**
Evaluation of the assembly results between TELLBASE, TuringAssembler, and Ariadne. (A) Bar chart of the largest contigs of all drawn bacterial strains. For each strain, from left to right, the first bar represents the reference genome size, the second barrepresents the TELLBASE assembled genome size, the third bar represents the TuringAssembler assembled genome size, and the fourth bar represents the Ariadne assembled genome size. The dashed portion of the figure represents the size of the genome of a highly homologous strain found at NCBI, serving as a reference genome. The Mauve collinear plots for the strain *C.Diff 1326* (B) and the strain *Staph.Aureus FPR* (C). The BRIG ring image for the strain *C.Diff 1326* (D) and the strain *Staph.Aureus FPR* (E). In the BRIG Ring Image, from inside to outside, the first circle represents the reference genome, the second circle is GC content, the third circle is GC Skew, the fourth circle is the genome assembled by TELLBASE, and the rest are the broken scaffolds assembled by TuringAssembler.

See this image and copyright information in PMC

References

1. Fleischmann RD, Adams MD, White O. et al. Whole-genome random sequencing and assembly of haemophilus influenzae Rd. Science 1995;269:496–512. 10.1126/science.7542800 - DOI - PubMed
1. Ferreira S, Queiroz JA, Oleastro M. et al. Insights in the pathogenesis and resistance of arcobacter: a review. Crit Rev Microbiol 2016;42:364–83. 10.3109/1040841X.2014.954523 - DOI - PubMed
1. Baker S, Thomson N, Weill FX. et al. Genomic insights into the emergence and spread of antimicrobial-resistant bacterial pathogens. Science 2018;360:733–8. 10.1126/science.aar3777 - DOI - PMC - PubMed
1. Rozman V, Mohar Lorbeg P, Treven P. et al. Genomic insights into antibiotic resistance and mobilome of lactic acid bacteria and bifidobacteria. Life Sci Alliance 2023;6:e202201637. 10.26508/lsa.202201637 - DOI - PMC - PubMed
1. Salamzade R, McElheny CL, Manson AL. et al. Genomic epidemiology and antibiotic susceptibility profiling of uropathogenic Escherichia coli among children in the United States. mSphere 2023;8:e0018423. 10.1128/msphere.00184-23 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

TELLBASE: a novel tool of TELL-seq barcode-assisted scaffold assembler for bacterial genomes

Affiliations

TELLBASE: a novel tool of TELL-seq barcode-assisted scaffold assembler for bacterial genomes

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources