Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 2;26(4):bbaf442.
doi: 10.1093/bib/bbaf442.

Terrier: a deep learning repeat classifier

Affiliations

Terrier: a deep learning repeat classifier

Robert Turnbull et al. Brief Bioinform. .

Abstract

Repetitive DNA sequences underpin genome architecture and evolutionary processes, yet they remain challenging to classify accurately. Terrier is a deep learning model designed to overcome these challenges by classifying repetitive DNA sequences using a publicly available, curated repeat sequence library trained under the RepeatMasker schema. Poor representation of taxa within repeat databases often limits the classification accuracy and reproducibility of current repeat annotation methods, limiting our understanding of repeat evolution and function. Terrier overcomes these challenges by leveraging deep learning for improved accuracy. Trained on Repbase, which includes over 100,000 repeat families-four times more than Dfam-Terrier maps 97.1% of Repbase sequences to RepeatMasker categories, offering the most comprehensive classification system available. When benchmarked against DeepTE, TERL, and TEclass2 in model organisms (rice, fruit flies, humans, and mice), Terrier achieved superior accuracy while classifying a broader range of sequences. Further validation in non-model amphibian, flatworm, and Northern krill genomes highlights its effectiveness in improving classification in non-model species, facilitating research on repeat-driven evolution, genomic instability, and phenotypic variation.

Keywords: DNA sequence classification; Northern krill; amphibians; deep learning; flatworms; transposable elements (TEs).

PubMed Disclaimer

Conflict of interest statement

No competing interest is declared.

Figures

Graphical Abstract
Graphical Abstract
Figure 1
Figure 1
The tree used to classify the repeat families, showing the number of sequences in Repbase mapped to each node.
Figure 2
Figure 2
The Terrier neural network architecture.
Figure 3
Figure 3
Results for the five cross-validation folds.
Figure 4
Figure 4
The confusion matrix for the concatenated predictions on the five cross-validation sets at the Order level.
Figure 5
Figure 5
Superfamily classification accuracy versus proportion classified across four software packages. Preferred results are in the top right corner.
Figure 6
Figure 6
Confusion matrices for Terrier on the Fruit Fly and Rice test datasets using a threshold of 0.9. Predictions at the ‘Superfamily’ level. Confusion matrices for Terrier on the larger Human and Mouse datasets are available on the Terrier GitHub repository and online documentation.
Figure 7
Figure 7
Computation time for running Terrier, TERL, and DeepTE on flatworm and amphibian TE datasets. The filesize refers to the uncompressed FASTA input files in megabytes. Linear trendlines are shown with the equation written on the right.
Figure 8
Figure 8
Comparison between results from RepeatModeler (left) and Terrier (right) on the experimental data of 8 flatworms and 51 amphibians. The percentage of sequences classified as ‘Unknown’ is labeled for each species.
Figure 9
Figure 9
Extra classifications by Terrier at different probability threshold of previously unclassified repeat families from northern krill. The percentage of sequences remaining unknown is labeled for each threshold.

References

    1. Osmanski AB, Paulat NS, Korstian J. et al. Insights into mammalian TE diversity through the curation of 248 genome assemblies. Science 2023;380:eabn1430. 10.1126/science.abn1430 - DOI - PMC - PubMed
    1. Rhie A, McCarthy SA, Fedrigo O. et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 2021;592:737–46. 10.1038/s41586-021-03451-0 - DOI - PMC - PubMed
    1. Ou S, Su W, Liao Y. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol 2019;20:275. 10.1186/s13059-019-1905-y - DOI - PMC - PubMed
    1. Bao W, Kojima KK, Kohany O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 2015;6:11. 10.1186/s13100-015-0041-9 - DOI - PMC - PubMed
    1. Zuo B, Nneji LM, Sun YB. Comparative genomics reveals insights into anuran genome size evolution. BMC Genomics 2023;24:379. 10.1186/s12864-023-09499-8 - DOI - PMC - PubMed

LinkOut - more resources