Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Mar 1;41(5):3190-200.
doi: 10.1093/nar/gkt011. Epub 2013 Jan 21.

Turning gold into 'junk': transposable elements utilize central proteins of cellular networks

Affiliations

Turning gold into 'junk': transposable elements utilize central proteins of cellular networks

György Abrusán et al. Nucleic Acids Res. .

Abstract

The numerous discovered cases of domesticated transposable element (TE) proteins led to the recognition that TEs are a significant source of evolutionary innovation. However, much less is known about the reverse process, whether and to what degree the evolution of TEs is influenced by the genome of their hosts. We addressed this issue by searching for cases of incorporation of host genes into the sequence of TEs and examined the systems-level properties of these genes using the Saccharomyces cerevisiae and Drosophila melanogaster genomes. We identified 51 cases where the evolutionary scenario was the incorporation of a host gene fragment into a TE consensus sequence, and we show that both the yeast and fly homologues of the incorporated protein sequences have central positions in the cellular networks. An analysis of selective pressure (Ka/Ks ratio) detected significant selection in 37% of the cases. Recent research on retrovirus-host interactions shows that virus proteins preferentially target hubs of the host interaction networks enabling them to take over the host cell using only a few proteins. We propose that TEs face a similar evolutionary pressure to evolve proteins with high interacting capacities and take some of the necessary protein domains directly from their hosts.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(A) Examples of TEs with significant homology to yeast and Drosophila genes. Black bars indicate TE ORFs, gray bars indicate the location of the homologous yeast/Drosophila genes. Wherever present, we used the ORF coordinates provided by RepBase; for the repeats where RepBase provides no gene prediction, we predicted the ORFs with Glimmer3. (B) The workflow used to determine whether the homology between a TE and a gene is a result of domestication or the incorporation of a host protein into a TE.
Figure 2.
Figure 2.
Evolution of Arc proteins. (A) Examples of Arc genes in vertebrates (human, mouse, chicken) and invertebrates (Drosophila melanogaster, Stomoxys calictrans, Drosophila silvestris). Black bars indicate the location of the regions that are homologous both to the domesticated Gypsy transposon and other Arc genes and contain a Retrotrans_gag conserved domain (pfam accession PF03732). (B) A maximum likelihood tree of the homologous regions of the Arc proteins and several Gypsy retrotransposons. Although the bootstrap support (1000 replications) is low for many branches, the presence of two Gypsy families between the Arc genes and the absence of Arc proteins in deuterostomes other than tetrapods and protostomes other than insects indicate that Gypsy gag proteins were domesticated twice independently.
Figure 3.
Figure 3.
Characteristics of the genes that were incorporated into TEs. We performed Monte Carlo simulations to test whether fitness and network characteristics like degree (the number of interactions with other nodes), betweenness centrality (the fraction of shortest paths that pass through a node) and closeness centrality (the inverse of the mean distance between node v and all other nodes reachable from it) are significantly different in the incorporated genes than the random expectation. (A) Distribution of the median degree (PPIs) in 100 000 random samples, the arrow represents the median of the 35 Drosophila genes for which PPI information was available. (B) Distribution of the median betweenness centrality in 100 000 random samples, and the observed level in Drosophila. (C) Statistical summary of Monte Carlo simulations. We did not perform tests for the fitness effect for Drosophila gene knockouts, as we are not aware of studies that provide such data at a genomic scale.
Figure 4.
Figure 4.
(A) The network of all PPIs of Drosophila genes with homology to a TE, for which PPI data were available in FlyBase (35 genes, highlighted in black). The median number of PPIs is 20. (B) An example of a PPI network for a randomly selected set of Drosophila genes (also 35 genes, highlighted in black). The median number of PPIs is 9, corresponding to the average of the random samples (see Figure 3).
Figure 5.
Figure 5.
Examples of chimeric TE protein structures with an incorporated fragment of a host gene. The structures were predicted with I-TASSER, their function and catalytic centers were predicted with COFACTOR. In all cases, the estimated TM-score with the true topology is >0.5; thus, the models have an essentially correct topology. Alpha helices are highlighted with blue, beta sheets with yellow, green regions indicate the fraction of the sequence that is homologous to a non-TE protein and red highlights the catalytic core predicted by COFACTOR. (A) DNA8-8B_Mad from the apple genome, estimated TM score with the correct fold is 0.53. The incorporated fragment shows 80% sequence similarity to a short chain dehydrogenase (B9RTW7) with oxidoreductase activity (GO:0016491). (B) Helitron_N3_ZM from maize, estimated TM score is 0.64. The incorporated fragment shows 90% amino acid sequence similarity to maize fibrillarin (B6T4G7). The predicted highest scoring gene ontology terms for the molecular function of the protein are methyltransferase activity (GO:0008168) and RNA binding (GO:0003723). (C) MARINER2_DM transposon from Drosophila, estimated TM score is 0.74. The incorporated fragment is only 30% similar to the Drosophila gene CG18367-PA. The highest scoring molecular function GO term is DNA binding (GO:0003677).

References

    1. Volff J. Turning junk into gold: domestication of transposable elements and the creation of new genes in eukaryotes. Bioessays. 2006;28:913–922. - PubMed
    1. Jurka J, Kapitonov VV, Kohany O, Jurka MV. Repetitive sequences in complex genomes: structure and evolution. Annu. Rev. Genomics. Hum. Genet. 2007;8:241–259. 10.1146/annurev.genom.8.080706.092416. - PubMed
    1. Feschotte C, Pritham EJ. DNA transposons and the evolution of eukaryotic genomes. Annu. Rev. Genet. 2007;41:331–368. 10.1146/annurev.genet.40.110405.090448. - PMC - PubMed
    1. Kapitonov VV, Jurka J. RAG1 core and V(D)J recombination signal sequences were derived from Transib transposons. PLoS Biol. 2005;3:e181. 10.1371/journal.pbio.0030181. - PMC - PubMed
    1. Smit AF, Riggs AD. Tiggers and DNA transposon fossils in the human genome. Proc. Natl Acad. Sci. USA. 1996;93:1443–1448. - PMC - PubMed

Publication types

Substances