Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 4;41(2):btaf049.
doi: 10.1093/bioinformatics/btaf049.

tidk: a toolkit to rapidly identify telomeric repeats from genomic datasets

Affiliations

tidk: a toolkit to rapidly identify telomeric repeats from genomic datasets

Max R Brown et al. Bioinformatics. .

Abstract

Summary: "tidk" (short for telomere identification toolkit) uses a simple, fast algorithm to scan long DNA reads for the presence of short tandemly repeated DNA in runs, and to aggregate them based on canonical DNA string representation. These are telomeric repeat candidates. Our algorithm is shown to be accurate in genomes for which the telomeric repeat unit is known and is tested across a wide variety of newly assembled genomes to uncover new telomeric repeat units. Tools are provided to identify telomeric repeats de novo, scan genomes for known telomeric repeats, and to visualize telomeric repeats on the assembly. "tidk" is implemented in Rust and is available as a command line tool which can be compiled using the Rust toolchain or downloaded as a binary from bioconda.

Availability and implementation: The "tidk" Rust crate is freely available under the MIT license (https://crates.io/crates/tidk), and the source code is available at https://github.com/tolkit/telomeric-identifier.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
(A) Occurrence of the repeat “AACCT,” which is a substring of the canonical telomeric repeat “AACCCGAACCT,” across the ten largest pseudomolecules (chromosomes) of the Bombus sylvestris genome. In eight molecules, telomeric repeat is found at both ends. Where there are no peaks at the ends of the chromosomes, the telomere has not assembled. The x-axis represents the position along each pseudomolecule, while the y-axis denotes the frequency of the identified repeat. Plot is generated from “tidk plot.” (B) Sequence composition at the telomeric region of chromosome 1. The first 100 base pairs of chromosome 1 reveal an alternating pattern between the repeats “AACCT” (larger text) and “AACCCG” (smaller text). The repeats are numbered sequentially from 1 to 19, corresponding to their order along the chromosome.

References

    1. Abad JP, de Pablos B, Osoegawa K et al. TAHRE, a novel telomeric retrotransposon from Drosophila melanogaster, reveals the origin of Drosophila telomeres. Mol Biol Evol 2004;21:1620–4. - PubMed
    1. Bolzán AD, Bianchi MS. Telomeres, interstitial telomeric repeat sequences, and chromosomal aberrations. Mutat Res 2006;612:189–214. - PubMed
    1. Bush J, Webster C, Wegrzyn J et al. Chromosome-level genome assembly and annotation of a periodical cicada species: Magicicada septendecula. Genome Biol Evol 2024;16:evae001. - PMC - PubMed
    1. Crowley LM; University of Oxford and Wytham Woods Genome Acquisition; Darwin Tree of Life Barcoding Collective; Wellcome Sanger Institute Tree of Life Programme; Wellcome Sanger Institute Scientific Operations: DNA Pipelines Collective; Tree of Life Core Informatics Collective; Darwin Tree of Life Consortium. The genome sequence of the Forest Cuckoo Bee, Bombus sylvestris (Lepeletier, 1832). Wellcome Open Res 2023;8:78. - PMC - PubMed
    1. Darwin Tree of Life Project Consortium. Sequence locally, think globally: the Darwin tree of life project. Proc Natl Acad Sci 2022;119:e2115642118. - PMC - PubMed