Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 11;50(20):11696-11711.
doi: 10.1093/nar/gkac1038.

An updated definition of V(D)J recombination signal sequences revealed by high-throughput recombination assays

Affiliations

An updated definition of V(D)J recombination signal sequences revealed by high-throughput recombination assays

Walker Hoolehan et al. Nucleic Acids Res. .

Abstract

In the adaptive immune system, V(D)J recombination initiates the production of a diverse antigen receptor repertoire in developing B and T cells. Recombination activating proteins, RAG1 and RAG2 (RAG1/2), catalyze V(D)J recombination by cleaving adjacent to recombination signal sequences (RSSs) that flank antigen receptor gene segments. Previous studies defined the consensus RSS as containing conserved heptamer and nonamer sequences separated by a less conserved 12 or 23 base-pair spacer sequence. However, many RSSs deviate from the consensus sequence. Here, we developed a cell-based, massively parallel assay to evaluate V(D)J recombination activity on thousands of RSSs where the 12-RSS heptamer and adjoining spacer region contained randomized sequences. While the consensus heptamer sequence (CACAGTG) was marginally preferred, V(D)J recombination was highly active on a wide range of non-consensus sequences. Select purine/pyrimidine motifs that may accommodate heptamer unwinding in the RAG1/2 active site were generally preferred. In addition, while different coding flanks and nonamer sequences affected recombination efficiency, the relative dependency on the purine/pyrimidine motifs in the RSS heptamer remained unchanged. Our results suggest RAG1/2 specificity for RSS heptamers is primarily dictated by DNA structural features dependent on purine/pyrimidine pattern, and to a lesser extent, RAG:RSS base-specific interactions.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic overview of SARP-seq method. (A) Flow chart of SARP-seq protocol. (B) Diagram of episomal V(D)J recombination. The 12- and 23-RSSs are represented as red and black triangles, respectively. Half arrows represent PCR primers used to selectively amplify recombination products. The P5 adapter (light blue) and P7 adapter (dark blue) sequences are listed in Supplementary Table S1. Coupled cleavage of 12- and 23-RSSs is followed by coding end (rectangles) joining and signal end joining by NHEJ. Joining of coding and signal ends results in an inversional recombination event. (C) Diagram of partially degenerate 12-RSS in pSARP-12R4-9 input library flanked by P5 adapter sequence. The non-degenerate spacer sequence is from murine VκL8 12-RSS, shown previously to impart good recombination efficiency (17). (D) Diagram of final PCR amplicon of the recombined pSARP-12R4-9 that was subjected to NGS. (E) Electropherogram charts of 12-RSS in pSARP-12R4-9 input library and of V(D)J recombination products (recombined library), showing the reverse complement sequence of the 12-RSS.
Figure 2.
Figure 2.
DNA sequence selectivity determined by SARP-seq. SARP-seq was replicated 3 times on two different sequencing platforms: twice on the iSeq 100 (iSeq 1 and iSeq 2) and once on the miSeq (miSeq 1). (A) Recombination frequencies of individual RSSs expressed as a percentage of total recombination events. Every RSS with reproducible V(D)J recombination activity was ranked by mean recombination frequency, so more efficacious RSSs occupied higher ranks and less efficacious RSSs occupied lower ranks. Efficacy was determined by calculating the mean recombination frequency of all three replicates (Supplementary Dataset S1). Specific RSSs are indicated by black pointers and listed from most efficacious to least efficacious: The top-ranked consensus R/Y motif (CACTATGAT), top-ranked RSS that completely lacks canonical consensus RSS base identity for heptamer positions 4–7 (CACGTCATT), median-ranked consensus R/Y motif (CACTATAGA, top-ranked anti-consensus R/Y motif (CACGTACAT), bottom-ranked canonical consensus RSS (CACAGTGGG), and median ranked anti-consensus R/Y motif (CACGTACTT). The top 100 RSSs are magnified in the panel inset. (B) Sequence logos depicting RAG1/2 specificity expressed as the probability of finding each base in precise signal joints. Total information content for each position of the degenerate 12-RSS region was calculated and expressed in nats. (C) Bar chart showing V(D)J recombination frequency of different purine/pyrimidine sequence motifs. Recombination frequencies are expressed as log2(O/E). O is the observed frequency of recombination events, and E is the expected frequency of random, non-specific recombination. Positive values indicate positive selection, and negative values indicate negative selection. Statistically significant differences were determined by ordinary one-way ANOVA with Dunnett's multiple comparisons test (n = 3) (****P < 0.0001).
Figure 3.
Figure 3.
Coding flank effects on selectivity of 12-RSSs in V(D)J recombination. (A) The pSARP-cf12R4-7 input library contained three separate inserts that differed in coding flank sequence. The inserts are referred to as Coding Flank (CF)1, CF2 and CF3 with an AC, TTT or CTT sequence immediately flanking the RSS heptamer, respectively. The different coding flank sequences are in red text and index sequences for each insert are in green text. The degenerate portion of the RSS is shown in blue text with positions 4–7 of the heptamer fully randomized, and the second position of the spacer either a T or G (designated as K). (B) Pie chart depicting relative V(D)J recombination activity for each insert. The percentage values are the sum of the signal joint read counts for the respective insert divided by the total signal joint read counts for the pSARP-cf12R4-7 output library multiplied by 100. (C) Bar charts showing mean V(D)J recombination frequency of two technical replicates for indicated R/Y sequence motifs for the CF1 (left plot), CF2 (middle plot), and CF3 (right plot) sequences. Recombination frequencies are expressed as log2(O/E), as described in Figure 2 legend. Below each bar chart is the corresponding sequence logo. The probability of finding each base at positions 4–7 and bases G or T at position 9 is shown on the y-axis of the sequence logo. Positions 1–3 (CAC) and position 8 (A) are identical in all sequences.
Figure 4.
Figure 4.
Effects of nonamer sequences on 12-RSS selectivity in V(D)J recombination. (A) The three different inserts in the pSARP-MAX-cNON input library included the CF1 insert (in Figure 3A), and inserts containing the cryptic nonamers present in known cRSSs found in the murine Pax3 and LMO2 gene loci. The heptamer, spacer and nonamer regions are denoted below the sequences. The degenerate sequences are in blue text as described in Figure 3 legend. Red text denotes sequence differences in the 12-RSS nonamer. (B) Pie chart depicting relative V(D)J recombination activity for each insert. The percentage values are the sum of the signal joint read counts for the respective insert divided by the total signal joint read counts for the pSARP-MAX-cNON output library multiplied by 100. The total signal joint read counts is derived from the sum of the signal joint reads for the Pax3 cNON, LMO2 cNON, and 15X the total read counts for the consensus nonamer insert. (C) Bar charts showing mean V(D)J recombination frequency of two technical replicates for indicated R/Y sequence motifs for the consensus nonamer (left plot), Pax3 (middle plot), and LMO2 (right plot) sequences. Recombination frequencies are expressed as log2(O/E), as described in Figure 2 legend. Below each bar chart is the corresponding sequence logo. The probability of finding each base at positions 4–7 and bases G or T at position 9 is shown on the y-axis of the sequence logo. Positions 1–3 (CAC) and position 8 (A) are identical in all sequences.
Figure 5.
Figure 5.
Molecular dynamics simulations of a consensus R/Y RSS (CACAATGAT) and an anti-consensus R/Y RSS (CACTTATGT). (A) Base position nomenclature for the 12-RSS heptamer and 5′ spacer region as used in subsequent panels. Violin plots depict probability distributions for (B) twist, (C) slide and (D) roll. Base-pair steps between heptamer positions 5–7 are colored red. Diagrams illustrating each base-pair step parameter is shown adjacent to the corresponding plots. (E) Roll angle probability distributions for base-pair steps H5–H6 (left) and H6–H7 (right). Plots for additional base-pair steps are shown in Supplementary Figure S8.
Figure 6.
Figure 6.
Endogenous RSS information content (RIC) scores and endogenous Tcra J-gene recombination compared with SARP-seq recombination. (A) Calculated RIC scores (blue dots, left y-axis) for each RSS analyzed in SARP-seq. Red line indicates mean RSS recombination frequency for each RSS characterized by SARP-seq (right y-axis, n = 3), and RSSs were ranked along the x-axis with most efficacious RSSs occupying higher ranks and less efficacious RSSs occupying lower ranks. (B) Log-transformed mean recombination frequencies for each RSS characterized by SARP-seq (n = 3) were plotted against RIC score. Black trend-line was generated using a linear regression model of log-transformed RSS recombination frequency expressed as a function of RIC score. (C) Mean normalized SARP-seq count frequencies and normalized END-seq count frequencies expressed as O/E where O is the observed frequency and E is the expected frequency if V(D)J recombination is completely random and nonspecific. END-seq expected frequency E was calculated by dividing the total END-seq counts for each RSS included in the analysis by the total number of unique RSSs being counted. Chromosomal position-specific effects were accounted for in the END-seq data by analyzing RAG cleavage efficiency of each J-gene relative to the cleavage efficiency of two flanking J-genes on either side (quantifications are provided in Supplementary Dataset S2).

References

    1. Schatz D.G., Swanson P.C.. V(D)J recombination: mechanisms of initiation. Annu. Rev. Genet. 2011; 45:167–202. - PubMed
    1. Rodgers K.K. Riches in RAGs: revealing the V(D)J recombinase through high-resolution structures. Trends Biochem. Sci. 2017; 42:72–84. - PMC - PubMed
    1. Lu J., Van Laethem F., Bhattacharya A., Craveiro M., Saba I., Chu J., Love N.C., Tikhonova A., Radaev S., Sun X.et al.. Molecular constraints on CDR3 for thymic selection of MHC-restricted TCRs from a random pre-selection repertoire. Nat. Commun. 2019; 10:1019. - PMC - PubMed
    1. Gellert M. V(D)J recombination: RAG proteins, repair factors, and regulation. Annu. Rev. Biochem. 2002; 71:101–132. - PubMed
    1. Hiom K., Gellert M.. Assembly of a 12/23 paired signal complex: a critical control point in V(D)J recombination. Mol. Cell. 1998; 1:1011–1019. - PubMed

Publication types