Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov;30(11):1547-1558.
doi: 10.1101/gr.259598.119. Epub 2020 Sep 18.

V(DD)J recombination is an important and evolutionarily conserved mechanism for generating antibodies with unusually long CDR3s

Affiliations

V(DD)J recombination is an important and evolutionarily conserved mechanism for generating antibodies with unusually long CDR3s

Yana Safonova et al. Genome Res. 2020 Nov.

Abstract

The V(DD)J recombination is currently viewed as an aberrant and inconsequential variant of the canonical V(D)J recombination. Moreover, since the classical 12/23 rule for the V(D)J recombination fails to explain the V(DD)J recombination, the molecular mechanism of tandem D-D fusions has remained unknown since they were discovered three decades ago. Revealing this mechanism is a biomedically important goal since tandem fusions contribute to broadly neutralizing antibodies with ultralong CDR3s. We reveal previously overlooked cryptic nonamers in the recombination signal sequences of human IGHD genes and demonstrate that these nonamers explain the vast majority of tandem fusions in human repertoires. We further reveal large clonal lineages formed by tandem fusions in antigen-stimulated immunosequencing data sets, suggesting that such data sets contain many more tandem fusions than previously thought and that about a quarter of large clonal lineages with unusually long CDR3s are generated through tandem fusions. Finally, we developed the SEARCH-D algorithm for identifying D genes in mammalian genomes and applied it to the recently completed Vertebrate Genomes Project assemblies, nearly doubling the number of mammalian species with known D genes. Our analysis revealed cryptic nonamers in RSSs of many mammalian genomes, thus demonstrating that the V(DD)J recombination is not a "bug" but an important feature preserved throughout mammalian evolution.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Cryptic nonamers explain V(DD)J recombination via the 1-turn/2-turn and 1-turn/3-turn mechanisms. (A) Canonical heptamers and nonamers in RSSs are shown by green and yellow rectangles, respectively. The 12/23 rule (1-turn/2-turn) explains the V-D and D-J recombination but fails to explain the D-D recombination using canonical nonamers (upper row). Cryptic nonamers (shown as red and blue rectangles) enable both the canonical 12/23 rule and the alternative 12/34 mechanism (1-turn/3-turn) and explain the V(DD)J recombination (lower row). (B) The left and right figures correspond to nonamers in the left and right RSSs. Sequence logos for canonical nonamers with 12-spacers for the human IGHD genes. Cryptic nonamers (with spacers shorter than 40 nt) in the RSSs of all 27 human D genes. D genes are shown on the left and are ordered according to the order in the IGHD locus. Canonical and cryptic nonamers (with likelihoods exceeding minLikelihood) are shown as red cells.
Figure 2.
Figure 2.
Fusion graph and fusion matrix for 12 D genes with at least 2% usage computed for the ALLERGY data set. (A) Vertices of the fusion graph are arranged clockwise along the circle according to the order in the IGHD locus, from D2-2 to D3-22. Vertices are colored according to the usage of the corresponding D genes: from pale (D2-8, usage 2.0%) to dark (D3-10, usage 15.5%). Each directed edge connects a vertex D with a vertex D*, where D* follows D in the IGHD locus. The width of an edge (D, D*) is proportional to coeff(D, D*). Only edges corresponding to coupled D genes are shown. (B) The matrix on the right shows values of coeff(D, D*) for fusions of the selected twelve commonly used IGHD genes, where genes D and D* correspond to rows and columns, respectively. Cells are colored according to the values of tandem coefficients: from low (dark blue) through medium (pale) to high (dark red).
Figure 3.
Figure 3.
Fusion graph on genes D2–2, D3–10, D2–15, and D3–16 (A), cryptic nonamers in RSSs of these genes that explain this graph (B), and four optimal configurations for this graph (C). (A) Fusion graph on genes D2–2, D3–10, D2–15, and D3–16. Each cryptic nonamer is shown as either a blue left half-vertex or a red right half-vertex of the corresponding vertex in the fusion graph. Edges represent tandem fusions and are labeled with the tandem coefficient for the corresponding fusion. The edge between D3–10 and D2–15 is not shown since these genes do not form tandem fusions. (B) The table shows that all cryptic nonamers in RSSs of genes D2-2, D3-10, D2-15, and D3–16, found among the top 12 nonamers, correspond to 2- and 3-turning nonamers; 2- and 3-turning cryptic nonamers explain all edges of the fusion graph and do not “trigger” any other edges. Conserved positions in these nonamers are shown by uppercase letters. Positions coinciding with the consensus sequence of the canonical nonamers are bolded and underlined. (C) The table shows four optimal configurations of cryptic RSSs (i.e., configurations explaining all edges of the fusion graph) for the fusion graph on genes D2-2, D3-10, D2-15, and D3-16. Each configuration is shown as a binary vector, where 1 (0) means that the corresponding cryptic nonamer forms (does not form) tandem fusions.
Figure 4.
Figure 4.
RSSs with cryptic nonamers corresponding to 2- and 3-turn spacers explain the fusion graph in Figure 2A. (A) 2- and 3-turning nonamers that “explain” the fusion graph in Figure 2A are highlighted in red and blue, respectively. (B,C) The fusion matrices with each cell classified as explained (green), unexplained (purple), or false (orange) based on the optimal configuration with nine (6 left + 3 right) cryptic nonamers (B) and the observed configuration with 12 (7 left + 5 right) cryptic nonamers (C). A D gene on the y-axis (x-axis) is colored red (blue) if its right (left) RSS contributes to the optimal configuration. (D) Each tandem fusion in Figure 2 is classified by the number of turns in cryptic nonamers that can explain it. For example, the fusion (D2-2, D3-3) can be explained by both 2–turning and 3–turning nonamers and thus is classified as “2,3 fusion type.” In total, we generate three groups: “2” (fusions are explained by the 2–turning nonamers), “2,3” (fusions are explained either by the 2-turning or by the 3–turning nonamers), and “3” (fusions are explained by the 3-turning nonamers) with average values of tandem coefficients 10.7, 5.0, and 2.2, respectively. The y-axis shows tandem coefficients of tandem fusions. Group “2” has higher values of tandem coefficients that groups “2,3” and “3” (P-value = 0.0026 according to the one-way ANOVA test [Heiman 2001]).
Figure 5.
Figure 5.
Clonal trees derived from V(DD)J recombinations in the INTESTINAL data set. Two large clonal trees derived from tandem fusions of D2-15 with D5-24 (A) and D2-2 with D2-15 (B) genes in the INTESTINAL data set. Genes D2-15 (AGGATATTGTAGTGGTGGTAGCTGCTACTCC), D5-24 (GTAGAGATGGCTACAATTAC), and D2-2*01 (AGGATATTGTAGTAGTACCAGCTGCTATGCC) have length 31, 20, and 31 nt, respectively (substrings occurring in CDR3s are underlined). A clonal tree for each clonal lineage was constructed using the IgEvolution tool (Safonova and Pevzner 2019b) applied to all reads from these lineages. These clonal lineages originated from a V(DD)J recombination that resulted in long CDR3s of length 72 and 78 nt, respectively. For the tree in B, we showed 20 out of 131 CDR3s. Blue, orange, and green vertices represent sequences of IgA memory, IgA plasma, and IgM plasma B cells, respectively. Violet vertices represent sequences found in both plasma and memory B cells. Alignments of CDR3s corresponding to the lineages are shown below the trees: green/red substrings correspond to fragments of D2-15/D5-24 genes in A and D2-2/D2-15 in B, respectively. Somatic hypermutations in green and red substrings are shown in blue. Plus and minus signs before each sequence in the alignment of the CDR3s in A indicates whether IgScout identified it as a tandem CDR3 or not (with the default k-mer-size parameter). CDR3s in B were semimanually annotated, as IgScout (with the default k-mer-size parameter) failed to identify them as tandem.
Figure 6.
Figure 6.
Tandem repeats in IGHD locus of human (A,B), mouse (C), and common marmoset (D) IGHD loci. Duplicated and identical IGHD genes are shown by the same (nonblack) color. Dot plots were generated by the Gepard tool (Krumsiek et al. 2007). (A) The dot plot shows that the 56-kbp-long human IGHD locus contains a tandem repeat R1-R2-R3-R4 covering 24 out of 27 IGHD genes. Positions of 27 IGHD genes are shown on the left. (B) The structure of units R1–R4. (C) For better resolution, we show only a 97-kbp-long fragment of the 1.1-Mbp-long mouse IGHD loci that covers 22 out of 26 IGHD genes. The shown fragment does not cover genes IGHD4–1, IGHD5–2, and IGHD1–3 that precede the first occurrence of IGHD5-1 (the first gene in the dot plot) and a copy of gene IGHD4-1 that follows IGHD4-1 (the last gene in the dot plot). (D) A dot plot shows the 47-kbp-long IGHD locus of the common marmoset.

Similar articles

Cited by

References

    1. Achour I, Cavelier P, Tichit M, Bouchier C, Lafaye P, Rougeon F. 2008. Tetrameric and homodimeric camelid igGs originate from the same IgH locus. J Immunology 181: 2001–2009. 10.4049/jimmunol.181.3.2001 - DOI - PubMed
    1. Briney BS, Willis JR, Hicar MD, Thomas JW, Crowe JE. 2012a. Frequency and genetic characterization of V(DD)J recombinants in the human peripheral blood antibody repertoire. Immunology 137: 56–64. 10.1111/j.1365-2567.2012.03605.x - DOI - PMC - PubMed
    1. Briney BS, Willis JR, Crowe JE Jr. 2012b. Human peripheral blood antibodies with long HCDR3s are established primarily at original recombination using a limited subset of germline genes. PLoS One 7: e36750 10.1371/journal.pone.0036750 - DOI - PMC - PubMed
    1. Burton DR, Poignard P, Stanfield RL, Wilson IA. 2012. Broadly neutralizing antibodies present new prospects to counter highly antigenically diverse viruses. Science 337: 183–186. 10.1126/science.1225416 - DOI - PMC - PubMed
    1. Corbett SJ, Tomlinson IM, Sonnhammer EL, Buck D, Winter G. 1997. Sequence of the human immunoglobulin diversity (D) segment locus: A systematic analysis provides no evidence for the use of DIR segments, inverted D segments, “minor” D segments or D-D recombination. J Mol Biol 270: 587–597. 10.1006/jmbi.1997.1141 - DOI - PubMed

Publication types

Substances