Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun 3;20(1):450.
doi: 10.1186/s12864-019-5796-9.

LtrDetector: A tool-suite for detecting long terminal repeat retrotransposons de-novo

Affiliations

LtrDetector: A tool-suite for detecting long terminal repeat retrotransposons de-novo

Joseph D Valencia et al. BMC Genomics. .

Abstract

Background: Long terminal repeat retrotransposons are the most abundant transposons in plants. They play important roles in alternative splicing, recombination, gene regulation, and defense mechanisms. Large-scale sequencing projects for plant genomes are currently underway. Software tools are important for annotating long terminal repeat retrotransposons in these newly available genomes. However, the available tools are not very sensitive to known elements and perform inconsistently on different genomes. Some are hard to install or obsolete. They may struggle to process large plant genomes. None can be executed in parallel out of the box and very few have features to support visual review of new elements. To overcome these limitations, we developed LtrDetector, which uses techniques inspired by signal-processing.

Results: We compared LtrDetector to LTR_Finder and LTRharvest, the two most successful predecessor tools, on six plant genomes. For each organism, we constructed a ground truth data set based on queries from a consensus sequence database. According to this evaluation, LtrDetector was the most sensitive tool, achieving 16-23% improvement in sensitivity over LTRharvest and 21% improvement over LTR_Finder. All three tools had low false positive rates, with LtrDetector achieving 98.2% precision, in between its two competitors. Overall, LtrDetector provides the best compromise between high sensitivity and low false positive rate while requiring moderate time and utilizing memory available on personal computers.

Conclusions: LtrDetector uses a novel methodology revolving around k-mer distributions, which allows it to produce high-quality results using relatively lightweight procedures. It is easy to install and use. It is not species specific, performing well using its default parameters on genomes of varying size and repeat content. It is automatically configured for parallel execution and runs efficiently on an ordinary personal computer. It includes a k-mer scores visualization tool to facilitate manual review of the identified elements. These features make LtrDetector an attractive tool for future annotation projects involving long terminal repeat retrotransposons.

Keywords: Long terminal repeats retrotransposons; Repeats; Signal processing; Software.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Method overview: LtrDetector is a software tool for locating long terminal repeat (LTR) retrotransposons (RTs). a A sequence of scores reflects the distance to the closest exact copy of the k-mer starting at each nucleotide. b Smoothed scores are produced after adjacent spikes are merged into a contiguous region. c Plateau regions are identified. Separate plateaus here are represented by black and red lines. d Plateaus are paired and their boundaries are adjusted. The red triangles denote the start and end coordinates for each LTR
Fig. 2
Fig. 2
(a) Contiguous stretches of the same non-zero score are identified and marked as keep (K) or delete (D). (b) The forward pass merges K sections toward each other and adjacent D sections. (c) The backward pass merges remaining D sections that are close to K sections
Fig. 3
Fig. 3
The effect of different values of k — the size of the short words, which are used as the keys in the hash table — on the F1 measure. As the value of k increases from 9 to 11 or 12, the F1 value increases (the higher, the better). The performance does not change markedly after that. (a) Shows the experiment on A. thaliana, (b) shows O. sativa

References

    1. Lerat E. Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs. Heredity. 2010;104(6):520. - PubMed
    1. McClintock B. The origin and behavior of mutable loci in maize. Proc Natl Acad Sci U S A. 1950;36(6):344–55. - PMC - PubMed
    1. Consortium IHGS, Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. - PubMed
    1. SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL. The paleontology of intergene retrotransposons of maize. Nat Genet. 1998;20:43–5. - PubMed
    1. Bennetzen JL, Wang H. The contributions of transposable elements to the structure, function, and evolution of plant genomes. Annu Rev Plant Biol. 2014;65:505–30. - PubMed

LinkOut - more resources