Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May 6:11:17.
doi: 10.1186/s13100-020-00214-y. eCollection 2020.

Identification of RAG-like transposons in protostomes suggests their ancient bilaterian origin

Affiliations

Identification of RAG-like transposons in protostomes suggests their ancient bilaterian origin

Eliza C Martin et al. Mob DNA. .

Abstract

Background: V(D) J recombination is essential for adaptive immunity in jawed vertebrates and is initiated by the RAG1-RAG2 endonuclease. The RAG1 and RAG2 genes are thought to have evolved from a RAGL (RAG-like) transposon containing convergently-oriented RAG1-like (RAG1L) and RAG2-like (RAG2L) genes. Elements resembling this presumptive evolutionary precursor have thus far only been detected convincingly in deuterostomes, leading to the model that the RAGL transposon first appeared in an early deuterostome.

Results: We have identified numerous RAGL transposons in the genomes of protostomes, including oysters and mussels (phylum Mollusca) and a ribbon worm (phylum Nemertea), and in the genomes of several cnidarians. Phylogenetic analyses are consistent with vertical evolution of RAGL transposons within the Bilateria clade and with its presence in the bilaterian ancestor. Many of the RAGL transposons identified in protostomes are intact elements containing convergently oriented RAG1L and RAG2L genes flanked by terminal inverted repeats (TIRs) and target site duplications with striking similarities with the corresponding elements in deuterostomes. In addition, protostome genomes contain numerous intact RAG1L-RAG2L adjacent gene pairs that lack detectable flanking TIRs. Domains and critical active site and structural amino acids needed for endonuclease and transposase activity are present and conserved in many of the predicted RAG1L and RAG2L proteins encoded in protostome genomes.

Conclusions: Active RAGL transposons were present in multiple protostome lineages and many were likely transmitted vertically during protostome evolution. It appears that RAGL transposons were broadly active during bilaterian evolution, undergoing multiple duplication and loss/fossilization events, with the RAGL genes that persist in present day protostomes perhaps constituting both active RAGL transposons and domesticated RAGL genes. Our findings raise the possibility that the RAGL transposon arose earlier in evolution than previously thought, either in an early bilaterian or prior to the divergence of bilaterians and non-bilaterians, and alter our understanding of the evolutionary history of this important group of transposons.

Keywords: Adaptive immunity; Evolution; RAG; Recombination activating genes; Transposon; Transposon molecular domestication.

PubMed Disclaimer

Conflict of interest statement

Competing interestsThe authors declare that they have no competing interest.

Figures

Fig. 1
Fig. 1
Genomic organization and phylogenetic tree of RAG and RAGL transposons. a Genomic organization of the mouse RAG locus and the amphioxus RAGL (ProtoRAG) transposon. The legend for panels (a), (b), and (c) is provided at the bottom of panel (c). b Tree depicting phyla within which RAG loci or RAGL or Transib transposons have been identified. Blue, orange and green shading indicate Deuterostomia, Protostomia and Cnidaria, respectively. Branches lacking evidence for RAG/RAGL sequences were omitted. With the exception of Cnidaria, all phyla with identified RAGL sequences contain at least one complete RAGL transposon with the configuration TSD-TIR5′-RAG1L-RAG2L-TIR3′-TSD. Prior to this study, potentially active RAG1-RAG2 gene pairs and RAGL transposons had been reported only in deuterostomes [9], while outside this phylum, only as a single deteriorated RAG1L-RAG2L locus (in C. gigas) was previously reported [10]. c Genomic organization of the most complete RAGL copies detected in Mollusca, Nemertea, and Cnidaria. Predicted RAG1L/RAG2L coding regions, TIRs, and TSDs are depicted, using symbols explained in the legend at bottom. Supporting transcriptomic data are indicated along with corresponding TSA entry (Additional file 8: File S1 and Additional file 9: S2). Green and gray arrows indicate transcripts corresponding to coding and untranslated regions, respectively. Unmapped regions of transcripts are shown unfilled while the black star in GFRY01002319.1 indicates a frameshift caused by an 8 bp deletion
Fig. 2
Fig. 2
TIRs, TSDs, and phylogenetic tree (a) TIRs and TSDs of protostome and deuterostome RAGL transposons. Sequences of 5’-TIR and 3’-TIR (reverse complement) pairs are aligned with Transib TIRs and the consensus RSS heptamer. Protostome consensus TIR, shown at top, was generated using only nonredundant TIR sequences (asterisk). Most intact TIR sequence pairs detected in protostomes are flanked by 5 bp TSDs, displayed at right, with TIRs indicated black triangles and identities indicated with dark gray shading. b Phylogenetic trees of RAG1 and RAG1L. Phylogenetic trees were built using the Maximum Likelihood as described in Methods using MEGA X [27] and WAG with Freqs. (+) correction model [28]. The bootstrap numbers next to branches are the percentage of branches in which the associated proteins clustered together. Branches with a bootstrap value below 50% were collapsed together. Branch lengths represent the number of substitutions per site, with positions with gaps or missing data being ignored. The protein sequences used for these analyses were chosen as representative of the monophyletic group to which they belong. Bold indicates proteins from protostomes
Fig. 3
Fig. 3
Bilaterian evolutionary tree and RAG evolutionary history. Tree was built with species for which there are at least one WGS and/or TSA project in the NCBI database. The species in which there are no RAG-like sequences found were regrouped into larger groups (Xenacoelomorpha, Platyhelminthes, Nematoda, Tardigrada, Arthropoda phyla), and the Gnathostome species were grouped as well, as RAG is domesticated in all of these species. Several species contain RAG1L-RAG2L copies with different statuses (e.g., one copy is potentially active and another is fossilized), and in such cases, the species are annotated as having more than one status within the status box. Red lines indicate branches in which RAGL transposon activity might have been present
Fig. 4
Fig. 4
Protein domains found in protostome RAGL proteins. a Predicted domains present in the best preserved RAG1L and RAG2L proteins identified in this study (blue font) compared with RAG and RAGL protein sequences from deuterostomes (black font) and representative Transib transposase proteins (red font). All RAG1L and RAG2L proteins shown exist in tandem pairs with the exception of M. philippinarum RAG1L and RAG2L. Black lines, RAG1(L) and RAG2(L) core regions. Domains are not depicted to scale. RAG1L from S. glomerata is intact except for a premature stop codon in the ZnH2 domain. b, c Cartoon representations of the apo mouse RAG (i.e. in the absence of DNA) and B. belcheri RAGL (with DNA removed) tetramer structures, with domains colored as indicated and darker and lighter tones used to discriminate between subunits. Boundaries between domains are indicated with residue numbers. RAG1(L) domain abbreviations used: N-terminal zinc finger motifs, C1(*), C2(*), C3(*); ring zinc finger dimerization domain, ZDD(*); nonamer binding domain, NBD(*); dimerization and DNA binding domain, DDBD(*); pre-RNase H domain, PreR (*); catalytic RNase H domain, RNH(*); zinc finger ZnC2(*), zinc finger ZnH2(*),C-terminal domain, CTD(*); and C-terminal tail, CTT(*), that contains either the CCGHC motif of invertebrate RAG1L or acidic amino acids of vertebrate RAG1. RAG2(L) domain abbreviations used: 6-bladed kelch-type beta propeller domain, 6-Kelch(*); and plant homeodomain, PHD(*)
Fig. 5
Fig. 5
Alignment of RAG1L sequences from protostomes and deuterostomes. RAG1L sequences from 3 deuterostomes (the cephalochordate amphioxus (Bbe), echinoderm purple sea urchin (Spu), and hemichordate P. flava (Pfl)), 2 mollusk RAGL_B subfamilies (eastern oyster (Cvi) and pearly oyster (Pim)), and a nemertean N. geniculatus RAGL_D family representative (Nge) were aligned to mouse (Mmu) RAG1. Domains, sequence motifs, secondary structure assignment (helices - wavy lines; beta sheet - arrows, other - straight line), protein-protein and protein-DNA contact interactions (within 5 Å) displayed above the alignment derive from the BbeRAG1L cryo-EM structure (PDB: 6B40). Acidic catalytic residues, red; active site residue mouse H795, purple; zinc coordinating residues within ZDD (*) and ZnC2 and ZnH2 (#) are indicated above the sequences. Locations at which coding sequences span exon boundaries are underlined. Amino acid color code: hydrophobic aliphatic, yellow; hydrophobic aromatic, orange; positively charged, blue; negatively charged, red; neutral polar, light blue; glycine and prolines, grey; cysteine, purple; histidine, dark purple. Sequences displayed are BbeRAG1L_B (GenBank: KJ748699.1), PflRAG1L_B (TSA:GDGM01438088.1), SpuRAG1L_B_Ech1 (Uniprot: Q45ZT6), and CviRAG1L_B_Biv1_0007, PimRAG1L_B_Biv2_3145, and NgeRAG1L_D_2322 from this study (Additional file 7: Alignment S1a).
Fig. 6
Fig. 6
Alignment of RAG2L sequences from protostomes and deuterostomes. RAG2L sequences are aligned and displayed as in Fig. 5. Domains, beta sheet regions of each kelch-type blade, the GG motif, secondary structure (helixes - wavy lines; beta sheet - arrows, other - straight line), protein-protein interactions (5 Å threshold) displayed above the alignment derive from the BbeRAG2L cryo-EM structure (PDB: 6B40). Sequences displayed are: BbeRAG2L_B (GenBank: KJ748699.1), PflRAG2L_B (TSA:GDGM01438088.1), SpuRAG2L_B_Ech1 (Uniprot: Q45ZT5) and CviRAG2L_B_Biv1_0007, PimRAG2L_B_Biv2_5135, and NgeRAG1L_D_2322 from this study (Additional file 7: Alignment S1b). These RAG2L proteins are the transposon pairs of the RAG1L sequences displayed in Fig. 5 except that PimRAG2L_B_Biv2_5135 was used instead of PimRAG2L_B_Biv2_3145 due to merge uncertainties in the 3145 sequence; these two sequences are 98% identical on their counterpart RAG1L core. Species abbreviations as in Fig. 5
Fig. 7
Fig. 7
Identity and similarity matrices of the a RAG1L and b RAG2L core regions. Identity (upper right region) and similarity (lower left region) percentages were computed using the protein multiple sequence alignment shown in Additional file 7: Alignment S1a, b starting from the beginning of RAG1L NBD(*) until the end of CTD(*) and RAG2L kelch-type domain respectively, as described in Methods. Two sequences from Additional file 7: Alignment S1a (SglRAG1L_B_Biv1_1405 and NgeRAG1L_D_0727) were not included because they are incomplete in the core region interval
Fig. 8
Fig. 8
Sequence variability mapped onto BbeRAG1L/2 L cryo-EM structure (PDB: 6B40). a, b, c, d Surface representation of sequence variability of the protein-DNA and protein-protein contact interfaces of RAG1L (a, b) or RAG2L (c, d) in a lateral view of a RAG1L-RAG2L heterodimer (a, c) or a top view of the RAG1L-RAG2L tetramer (b, d). Jensen-Shannon divergence (JSD) conservation score is displayed using a rainbow color code as indicated with the scale bar, with blue and red indicating highly conserved and highly variable positions, respectively. RAG2L and RAG1L are shown in gray in (a, b) and (c, d), respectively, while TIR DNA and TIR flanking DNA are shown in black and white, respectively. e, f Alternative models for the evolutionary relationship between Transib and the RAGL transposon. In the current model [16] (e), Transib was the ancestral element and the RAGL transposon was derived from Transib through acquisition of a RAG2L gene. In the alternative model (f), the RAGL transposon was ancestral and the first Transib transposon arose from a RAGL transposon by loss of RAG2L

References

    1. Flajnik MF. Re-evaluation of the immunological big bang. Curr Biol. 2014;24(21):R1060–R1065. doi: 10.1016/j.cub.2014.09.070. - DOI - PMC - PubMed
    1. Litman GW, Rast JP, Fugmann SD. The origins of vertebrate adaptive immunity. Nat Rev Immunol. 2010;10(8):543–553. doi: 10.1038/nri2807. - DOI - PMC - PubMed
    1. Gellert M. V(D) J recombination: RAG proteins, repair factors, and regulation. Annu Rev Biochem. 2002;71:101–132. doi: 10.1146/annurev.biochem.71.090501.150203. - DOI - PubMed
    1. Schatz DG, Swanson PC. V(D) J recombination: mechanisms of initiation. Annu Rev Genet. 2011;45:167–202. doi: 10.1146/annurev-genet-110410-132552. - DOI - PubMed
    1. Kim MS, Lapkouski M, Yang W, Gellert M. Crystal structure of the V(D) J recombinase RAG1-RAG2. Nature. 2015;518(7540):507–511. doi: 10.1038/nature14174. - DOI - PMC - PubMed

LinkOut - more resources