Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jan 1;2(1):51-54.
doi: 10.4161/mge.19479.

A proposal for the reference-based annotation of de novo transposable element insertions

Affiliations

A proposal for the reference-based annotation of de novo transposable element insertions

Casey M Bergman. Mob Genet Elements. .

Abstract

Understanding the causes and consequences of transposable element (TE) activity in the genomic era requires sophisticated bioinformatics approaches to accurately identify individual insertion sites. Next-generation sequencing technology now makes it possible to rapidly identify new TE insertions using resequencing data, opening up new possibilities to study the nature of TE-induced mutation and the target site preferences of different TE families. While the identification of new TE insertion sites is seemingly a simple task, the mechanisms of transposition present unique challenges for the annotation of de novo transposable element insertions mapped to a reference genome. Here I discuss these challenges and propose a framework for the annotation of de novo TE insertions that accommodates known mechanisms of TE insertion and established coordinate systems for genome annotation.

PubMed Disclaimer

Figures

None
Figure 1. Genome coordinate systems and the annotation of TE insertions. The location of an arbitrary genomic feature encoded by the sequence GGGCCC is represented differently in base and interbase coordinate systems (A). Since de novo TE insertions occur between bases in the reference genome, they are more naturally represented by interbase coordinate systems. On the widely-used base coordinate system, mapping a de novo TE insertion requires the invocation of arbitrary rules (either before or after the insertion site) (B). These arbitrary rules can lead to ambiguity in the mapping and interpretation of de novo TE insertions.
None
Figure 2. TSDs create ambiguity in the annotation of de novo TE insertion sites. Unique DNA in the reference genome (e.g., positions 3–7 for a 5 bp TSD) is duplicated on insertion of a TE for both insertions on the positive strand (> > > ) and negative strand (< < < ). When NGS reads (solid gray arrows) that span the TE-flanking region junction are used to map de novo TE insertions on the positive strand, the placement of the insertion relative the TSD differs for reads from the 5′ (after TSD) and 3′ (before TSD) ends of the TE. Differential annotation of TE insertion sites is also observed for negative strand insertions, but placement relative to the TSD is reversed relative to positive strand insertions. These TSD-induced effects can lead to ambiguity in the mapping and interpretation of de novo TE insertions.

References

    1. Sackton TB, Kulathinal RJ, Bergman CM, Quinlan AR, Dopman EB, Carneiro M, et al. Population genomic inferences from sparse high-throughput sequencing of two populations of Drosophila melanogaster. Genome Biol Evol. 2009;1:449–65. doi: 10.1093/gbe/evp048. - DOI - PMC - PubMed
    1. Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009;25:2865–71. doi: 10.1093/bioinformatics/btp394. - DOI - PMC - PubMed
    1. Ewing AD, Kazazian HH., Jr High-throughput sequencing reveals extensive variation in human-specific L1 content in individual human genomes. Genome Res. 2010;20:1262–70. doi: 10.1101/gr.106419.110. - DOI - PMC - PubMed
    1. Quinlan AR, Clark RA, Sokolova S, Leibowitz ML, Zhang Y, Hurles ME, et al. Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome. Genome Res. 2010;20:623–35. doi: 10.1101/gr.102970.109. - DOI - PMC - PubMed
    1. Iskow RC, McCabe MT, Mills RE, Torene S, Pittard WS, Neuwald AF, et al. Natural mutagenesis of human genomes by endogenous retrotransposons. Cell. 2010;141:1253–61. doi: 10.1016/j.cell.2010.05.020. - DOI - PMC - PubMed

LinkOut - more resources