Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 26:5:e2942.
doi: 10.7717/peerj.2942. eCollection 2017.

RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing

Affiliations

RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing

Jinfeng Chen et al. PeerJ. .

Abstract

Background: Transposable element (TE) polymorphisms are important components of population genetic variation. The functional impacts of TEs in gene regulation and generating genetic diversity have been observed in multiple species, but the frequency and magnitude of TE variation is under appreciated. Inexpensive and deep sequencing technology has made it affordable to apply population genetic methods to whole genomes with methods that identify single nucleotide and insertion/deletion polymorphisms. However, identifying TE polymorphisms, particularly transposition events or non-reference insertion sites can be challenging due to the repetitive nature of these sequences, which hamper both the sensitivity and specificity of analysis tools.

Methods: We have developed the tool RelocaTE2 for identification of TE insertion sites at high sensitivity and specificity. RelocaTE2 searches for known TE sequences in whole genome sequencing reads from second generation sequencing platforms such as Illumina. These sequence reads are used as seeds to pinpoint chromosome locations where TEs have transposed. RelocaTE2 detects target site duplication (TSD) of TE insertions allowing it to report TE polymorphism loci with single base pair precision.

Results and discussion: The performance of RelocaTE2 is evaluated using both simulated and real sequence data. RelocaTE2 demonstrate high level of sensitivity and specificity, particularly when the sequence coverage is not shallow. In comparison to other tools tested, RelocaTE2 achieves the best balance between sensitivity and specificity. In particular, RelocaTE2 performs best in prediction of TSDs for TE insertions. Even in highly repetitive regions, such as those tested on rice chromosome 4, RelocaTE2 is able to report up to 95% of simulated TE insertions with less than 0.1% false positive rate using 10-fold genome coverage resequencing data. RelocaTE2 provides a robust solution to identify TE insertion sites and can be incorporated into analysis workflows in support of describing the complete genotype from light coverage genome sequencing.

Keywords: Annotation; Bioinformatics; Diversity; Parallel processing; Population genomics; Resequencing; Rice; Short read; Transposons.

PubMed Disclaimer

Conflict of interest statement

The authors declare there are no competing interests.

Figures

Figure 1
Figure 1. Workflow for identification of transposable element insertions in population resequencing data using Illumina paired-end reads.
Figure 2
Figure 2. Performance of RelocaTE2, RelocaTE, TEMP and ITIS on simulated rice data.
Comparison of tool performance on rice chromosome 3 (OsChr3) for Sensitivity (A), Specificity (B), Recall rate of Target Site Duplication (TSD) (C), and comparison of performance on rice chromosome 4 (OsChr4) for Sensitivity (D), Specificity (E), Recall rate of TSD (F). Three replicate simulations of 200 random transposable element (TE) insertions were generated for OsChr3 and OsChr4. A series of datasets were constructed by sampling at varying sequence depths (from 1 to 40) from each simulated TE datasets. Sensitivity (SN), Specificity (SP) and TSD recall of each tool was estimated on each simulated dataset across multiple sequence depths. The error bars show the standard deviation among the three replicates which had different composition of 200 random TE insertions. SN was defined as the percentage of TE insertions from 200 simulated TE insertions were recalled within 100 base pairs of simulated TE insertion sites. SP was defined as the percentage of TE insertions from all calls were within 100 base pairs of 200 simulated TE insertions. Recall rate of TSD was defined as the percentage of true positive calls that correctly matched the simulated TSD of TE insertions.
Figure 3
Figure 3. Performance of RelocaTE2 and TEMP on biological dataset in HuRef genome, IR64 genome, and 50 rice and wild rice strains.
(A) Venn diagram of the overlap in non-reference TE insertions identified in the HuRef genome and the rice IR64 genome using RelocaTE2 and TEMP. Sensitivity (SN) and Specificity (SP) were assessed by comparing the assembled HuRef genome to the GRCh36 reference genome and the assembled IR64 genome to the MSU7 reference genome. SN was defined as the percentage of validated calls out of all validated calls by either RelocaTE2 or TEMP. SP was defined as the percentage of validated calls out of all calls by each tool. (B) Comparison of the number of non-reference TE insertions of 14 TE families in 50 rice and wild rice strains identified by RelocaTE2 and TEMP. Strains are color-coded based on subpopulation classification.

Similar articles

Cited by

References

    1. Bennetzen JL, Wang H. The contributions of transposable elements to the structure, function, and evolution of plant genomes. Annual Review of Plant Biology. 2014;65:505–530. doi: 10.1146/annurev-arplant-050213-035811. - DOI - PubMed
    1. Campbell PJ, Stephens PJ, Pleasance ED, O’Meara S, Li H, Santarius T, Stebbings LA, Leroy C, Edkins S, Hardy C, Teague JW, Menzies A, Goodhead I, Turner DJ, Clee CM, Quail MA, Cox A, Brown C, Durbin R, Hurles ME, Edwards PA, Bignell GR, Stratton MR, Futreal PA. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nature Genetics. 2008;40:722–729. doi: 10.1038/ng.128. - DOI - PMC - PubMed
    1. Cordaux R, Batzer MA. The impact of retrotransposons on human genome evolution. Nature Reviews Genetics. 2009;10:691–703. doi: 10.1038/nrg2640. - DOI - PMC - PubMed
    1. Cowley M, Oakey RJ. Transposable elements re-wire and fine-tune the transcriptome. PLOS Genetics. 2013;9(1):e1003234. doi: 10.1371/journal.pgen.1003234. - DOI - PMC - PubMed
    1. Feschotte C. Transposable elements and the evolution of regulatory networks. Nature Reviews Genetics. 2008;9:397–405. doi: 10.1038/nrg2337. - DOI - PMC - PubMed