Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 10:12:671944.
doi: 10.3389/fimmu.2021.671944. eCollection 2021.

Characterization of DNA G-Quadruplex Structures in Human Immunoglobulin Heavy Variable (IGHV) Genes

Affiliations

Characterization of DNA G-Quadruplex Structures in Human Immunoglobulin Heavy Variable (IGHV) Genes

Catherine Tang et al. Front Immunol. .

Abstract

Activation-induced deaminase (AID) is a key enzyme involved in antibody diversification by initiating somatic hypermutation (SHM) and class-switch recombination (CSR) of the Immunoglobulin (Ig) loci. AID preferentially targets WRC (W=A/T, R=A/G) hotspot motifs and avoids SYC (S=C/G, Y=C/T) coldspots. G-quadruplex (G4) structures are four-stranded DNA secondary structures with key functions in transcription, translation and replication. In vitro studies have shown G4s to form and bind AID in Ig switch (S) regions. Alterations in the gene encoding AID can further disrupt AID-G4 binding and reduce CSR in vivo. However, it is still unclear whether G4s form in the variable (V) region, or how they may affect SHM. To assess the possibility of G4 formation in human V regions, we analyzed germline human Ig heavy chain V (IGHV) sequences, using a pre-trained deep learning model that predicts G4 potential. This revealed that many genes from the IGHV3 and IGHV4 families are predicted to have high G4 potential in the top and bottom strand, respectively. Different IGHV alleles also showed variability in G4 potential. Using a high-resolution (G4-seq) dataset of biochemically confirmed potential G4s in IGHV genes, we validated our computational predictions. G4-seq also revealed variation between S and V regions in the distribution of potential G4s, with the V region having overall reduced G4 abundance compared to the S region. The density of AGCT motifs, where two AGC hotspots overlap on both strands, was roughly 2.6-fold greater in the V region than the Constant (C) region, which does not mutate despite having predicted G4s at similar levels. However, AGCT motifs in both V and C regions were less abundant than in S regions. In silico mutagenesis experiments showed that G4 potentials were generally robust to mutation, although large deviations from germline states were found, mostly in framework regions. G4 potential is also associated with higher mutability of certain WRC hotspots on the same strand. In addition, CCC coldspots opposite a predicted G4 were shown to be targeted significantly more for mutation. Our overall assessment reveals plausible evidence of functional G4s forming in the Ig V region.

Keywords: G-quadruplex; IGHV genes; activation induced deaminase (AID); immunoglobulin heavy chain (Igh); somatic hypermutation (SHM).

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
G4 potential of the human IGHV alleles. (A) Mean G4 potentials of 232 functional IGHV alleles grouped by IGHV family are shown separately for bottom strand (red) and top strand (blue). Error bars representing one standard deviation are shown in black above each bar. (B) Box plots displaying G4 potentials grouped by IGHV gene. Individual IGHV families are distinguished by different colors.
Figure 2
Figure 2
Assessment of high-throughput experimental G4-seq data in the Ig V region. (A) Genome browser view showing the distribution of percent mismatches according to Chambers et al. for the IGHV3-15 gene. Observed G4s indicate regions reaching a threshold of 25% or above (red and blue bars below mismatches for bottom and top strand, respectively). The gene location is shown as a dark blue bar below. All tracks are aligned to the reverse strand of the human hg19 reference genome. (B) Comparison of the maximum percent mismatch of the 40 IGHV genes located in the genome between the bottom strand (red) and top strand (blue). A two-sided non-parametric t-test (Mann-Whitney) comparing the two strands was performed. (C) Comparison between experimental G4-seq data and G4detector predictions by strand. Reported Spearman correlation (r) and p-value for each strand is shown in their respective color.
Figure 3
Figure 3
Analysis of G4 activity in the IgH locus. (A) Same as Figure 2A , except showing an expanded view of G4 activity in the Ig S and V region. AGCT motifs are represented as black tracks and are shaded using the mean windowing function in IGV. (B) Comparison of the density of AGCT motifs (x-axis; log-scale) and mean percent mismatches (y-axis) representing overall G4 activity within each constant (C) region (yellow shading) and S region (brown shading). Edges, vertices and labels of all regions are colored according to strand location (red: bottom; blue: top). (C) Quantifying G4 activity within C regions (yellow) and S regions (brown) for both bottom and top strands. Error bars above each bar represent -/+ 1 standard deviation. Significant p-values from two-sided Mann-Whitney U tests are shown in asterisks (*p ≤ 0.05; **p ≤ 0.01). (D) Density of AGCT motifs by isotype as observed within each C region (yellow) and S region (brown).
Figure 4
Figure 4
Influence of predicted G4s on AID targeting to hotspots. Association between the predicted G4 potential (y-axis) and difference (top-bottom) in mutation frequency of the various AID hotspot motifs. For each gene, the difference in the average mutation frequency of the hotspot motif on the bottom strand from the corresponding hotspot on the top strand (x-axis) was calculated. Pearson correlations between the difference in hotspot mutation frequency and predicted G4 potential were computed separately for each strand, as well as for each hotspot motif (the analyzed strand and AID hotspot are both indicated in gray above and to the side of the plot, respectively).
Figure 5
Figure 5
In silico experiments of IGHV alleles. Corresponding G4 potentials from sequences belonging to the IGHV1, IGHV3, and IGHV4 families. Each point represents one sequence containing a single mutation from its germline context at the gapped IMGT position indicated on the x-axis. The y-axis represents the G4 potential of the mutated sequence. Points are colored according to the observable difference in G4 potential in the mutated sequence from its germline. Gray bars above each plot indicate CDRs.
Figure 6
Figure 6
AID targeting effects on G4 potential. (A) Differences in G4 potential of the sequences used in the in silico experiments were analyzed. The plots contain sequences where the mutation occurred in C nucleotides only. Each sequence is categorized as containing a mutation in one of three AID-associated contexts: SYC coldspot, WRC hotspot, or neutral C base. Strand-specific effects on G4 potential are distinguished by having mutated either the bottom or top strand (strand mutated), and the resulting outcome taking place on either the same strand or opposite strand (strand affected). Opposite strand plots have a red border. P-values from two-sided Mann-Whitney U tests are indicated as asterisks (ns, not significant; ****p ≤ 0.0001). (B) Mean difference in G4 potential caused by mutations at specific SYC and WRC trinucleotide motifs. The strand subjected to mutation is indicated by the bar color (red, bottom strand; blue, top strand). The G4 potential of the strand being affected is labeled at the top of the plot in gray. The different SYC and WRC trinucleotide motifs are labeled at the bottom in blue and yellow, respectively. Error bars drawn at each bar represent -/+ 1 standard deviation. (C) Comparison of mutation frequencies of top strand CCC coldspots in 17 alleles from the IGHV4 family. Individual CCC motifs are separated based on the resulting outcome of the bottom strand G4 potential when mutated. The p-value of a two-sided Mann-Whitney U test comparing coldspots that negatively impacted G4 potential against those that led to no difference is reported.

References

    1. Methot SP, Di Noia JM. Molecular Mechanisms of Somatic Hypermutation and Class Switch Recombination. Adv Immunol (2017) 133:37–87. 10.1016/bs.ai.2016.11.002 - DOI - PubMed
    1. Lefranc MP. IMGT, the International ImMunoGeneTics Database. Nucleic Acids Res (2001) 29(1):207–9. 10.1093/nar/29.1.207 - DOI - PMC - PubMed
    1. Schramm CA, Douek DC. Beyond Hot Spots: Biases in Antibody Somatic Hypermutation and Implications for Vaccine Design. Front Immunol (2018) 9:1876. 10.3389/fimmu.2018.01876 - DOI - PMC - PubMed
    1. Han L, Masani S, Yu K. Overlapping Activation-Induced Cytidine Deaminase Hotspot Motifs in Ig Class-Switch Recombination. Proc Natl Acad Sci USA (2011) 108(28):11584–9. 10.1073/pnas.1018726108 - DOI - PMC - PubMed
    1. Wei L, Chahwan R, Wang S, Wang X, Pham PT, Goodman MF, et al. . Overlapping Hotspots in CDRs are Critical Sites for V Region Diversification. Proc Natl Acad Sci USA (2015) 112(7):E728–37. 10.1073/pnas.1500788112 - DOI - PMC - PubMed

Publication types

MeSH terms

Substances