Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 12;18(1):91.
doi: 10.1186/s13059-017-1232-0.

EpiTEome: Simultaneous detection of transposable element insertion sites and their DNA methylation levels

Affiliations

EpiTEome: Simultaneous detection of transposable element insertion sites and their DNA methylation levels

Josquin Daron et al. Genome Biol. .

Abstract

The genome-wide investigation of DNA methylation levels has been limited to reference transposable element positions. The methylation analysis of non-reference and mobile transposable elements has only recently been performed, but required both genome resequencing and MethylC-seq datasets. We have created epiTEome, a program that detects both new transposable element insertion sites and their methylation states from a single MethylC-seq dataset. EpiTEome outperforms other split-read insertion site detection programs, even while functioning on bisulfite-converted reads. EpiTEome characterizes the previously discarded fraction of DNA methylation at sites of new insertions, enabling future investigation into the epigenetic regulation of non-reference and transposed elements.

Keywords: Bioinformatics; Bisulfite; Insertion site; MethylC-seq; Methylome; Split reads; Transposable elements.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Design of epiTEome function. a Workflow of methodology developed to identify non-reference insertions of TEs using filtered MethylC-seq reads that fail to align to the reference genome. b Principle behind split-read detection of new TE insertion sites. Reads that fail to fully map to the reference genome are used to identify the sites of new TE insertion. Non-mapping reads are split and mapped to the reference genome to identify reads with one end that maps to a TE and the other end to the site of insertion. c Example of a new TE insertion detected by epiTEome in Arabidopsis: ddm1 mutants undergo TE transcriptional reactivation and transposition [30]. Split reads not present in wild-type (wt Col-0) identify a TE insertion into the gene At2g34840 in two biological replicates of ddm1 MethylC-seq (RepA and RepB). The 5′ and 3′ flanking spit reads overlap (dashed lines) at the target site duplication (gold sequence) generated by TE insertion. d In addition to identifying new TE insertion sites, epiTEome detects the cytosine methylation status at these loci. Sequence alignment of split MethylC-seq reads at the insertion site are used to determine the cytosine DNA methylation status. Unconverted cytosines represent methylated bases, while C → T transitions (bold) in the MethylC-seq reads represent unmethylated cytosines. The sequence context of each cytosine is displayed (CG = red, CHG = blue, CHH = green)
Fig. 2
Fig. 2
Validation of epiTEome on simulated data. a Bar plot of sensitivity of detection for simulated TE insertions at three different TE insertion contexts (gene, intergenic, TE). SPLITREADER and TEPID use non-bisulfite converted reads, while epiTEome utilizes bisulfite-converted MethylC-seq reads. b FDR of epiTEome, SPLITREADER, and TEPID calculated from the same simulated data as part A. Error bars in (a) and (b) represent the 95% confidence interval (CI) generated using five replicates. c Analysis of how the variables of sequencing depth, read length, methylation level, and number of SNPs affect epiTEome sensitivity. Throughout the analysis in (c), epiTEome produced a 2.88% false-positive average, with a standard deviation of 1.45
Fig. 3
Fig. 3
Validation of epiTEome using published MethylC-seq data. a Venn diagram comparing three independent programs created to identify TE insertion sites in the Arabidopsis ecotype Ha-0. EpiTEome is the only program that utilizes MethylC-seq data. Color codes are maintained throughout panels (b)–(d). Split-reads identify the insertion and target site duplication of a TE insertion detected by all three programs (b) and a TE insertion specifically detected by epiTEome (c). The split-read analysis is confirmed by the decrease in coverage of un-split MethylC-seq reads in Ha-0 (insertion present) vs. the ecotype Rou-0 (insertion absent). d Meta-plot of MethylC-seq un-split read coverage at the TE insertion sites and flanking regions uniquely detected by each program or detected by all three. e MethylC-seq un-split read coverage z-score for each of the 175 TE insertions uniquely identified by epiTEome, plus the 16 detected by all three programs (asterisks). Seven percent of the insertion sites with high un-split read coverage (bracket) at the TE insertion site are likely false positives (FP)
Fig. 4
Fig. 4
EpiTEome detects new TE insertions in repetitive crop genomes. a, b Bar plot of sensitivity of detection for simulated TE insertions at three different potential TE insertion contexts (gene, intergenic, TE) in the maize (a) or rice (b) genomes using in silico generated bisulfite-converted reads. Results are divided by TE copy number. c FDR of epiTEome calculated from the same simulated data as part A. Error bars in (a)–(c) represent the 95% CI generated using five replicates. d Genome browser visualization of a non-reference (not in the reference B73 genome) LTR retrotransposon TE insertion into the PFK5 GRMZM2G127717 gene in the Oh43 inbred line identified by epiTEome
Fig. 5
Fig. 5
EpiTEome detects DNA methylation at new TE insertion sites. Average DNA methylation at new insertions in the Arabidopsis Ha-0 ecotype (middle), these same sites without TE insertion in the reference Col-0 ecotype (top), and the parental TEs in Ha-0 that produced each transposed TE (bottom). DNA methylation is split between cytosine sequence contexts (colors) and location at the insertion site (x-axis). Bar plots are shown on the left and a metaplot combining both the 5′ and 3′ TE ends and insertion sites is shown on the right. Error bars (left) and transparent colors (right) represent the 95% CI. N/A not applicable

Similar articles

Cited by

References

    1. Kidwell MG, Lisch D. Transposable elements as sources of variation in animals and plants. Proc Natl Acad Sci U S A. 1997;94:7704–11. doi: 10.1073/pnas.94.15.7704. - DOI - PMC - PubMed
    1. Hancks DC, Kazazian HH. Roles for retrotransposon insertions in human disease. Mob DNA. 2016;7:9. doi: 10.1186/s13100-016-0065-9. - DOI - PMC - PubMed
    1. Levin HL, Moran JV. Dynamic interactions between transposable elements and their hosts. Nat Rev Genet. 2011;12:615–27. doi: 10.1038/nrg3030. - DOI - PMC - PubMed
    1. Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, et al. A unified classification system for eukaryotic transposable elements. Nat Rev Genet. 2007;8:973–82. doi: 10.1038/nrg2165. - DOI - PubMed
    1. Daron J, Glover N, Pingault L, Theil S, Jamilloux V, Paux E, et al. Organization and evolution of transposable elements along the bread wheat chromosome 3B. Genome Biol. 2014;15:546. doi: 10.1186/s13059-014-0546-4. - DOI - PMC - PubMed

Publication types