Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct;39(7):688-701.
doi: 10.1007/s10875-019-00670-z. Epub 2019 Aug 6.

Predicting the Occurrence of Variants in RAG1 and RAG2

Affiliations

Predicting the Occurrence of Variants in RAG1 and RAG2

Dylan Lawless et al. J Clin Immunol. 2019 Oct.

Abstract

While widespread genome sequencing ushers in a new era of preventive medicine, the tools for predictive genomics are still lacking. Time and resource limitations mean that human diseases remain uncharacterized because of an inability to predict clinically relevant genetic variants. A strategy of targeting highly conserved protein regions is used commonly in functional studies. However, this benefit is lost for rare diseases where the attributable genes are mostly conserved. An immunological disorder exemplifying this challenge occurs through damaging mutations in RAG1 and RAG2 which presents at an early age with a distinct phenotype of life-threatening immunodeficiency or autoimmunity. Many tools exist for variant pathogenicity prediction, but these cannot account for the probability of variant occurrence. Here, we present a method that predicts the likelihood of mutation for every amino acid residue in the RAG1 and RAG2 proteins. Population genetics data from approximately 146,000 individuals was used for rare variant analysis. Forty-four known pathogenic variants reported in patients and recombination activity measurements from 110 RAG1/2 mutants were used to validate calculated scores. Probabilities were compared with 98 currently known human cases of disease. A genome sequence dataset of 558 patients who have primary immunodeficiency but that are negative for RAG deficiency were also used as validation controls. We compared the difference between mutation likelihood and pathogenicity prediction. Our method builds a map of most probable mutations allowing pre-emptive functional analysis. This method may be applied to other diseases with hopes of improving preparedness for clinical diagnosis.

Keywords: Recombination activating genes 1 and 2 (RAG1, RAG2); genomics; pathogenic variant; predictive.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Fig. 1
Fig. 1
RAG1 (red, left) and RAG2 (blue, right) conservation and mutation rate residue frequency. a Gene conservation score, non-conserved 0 and conserved 1. The color indicates no known mutations in humans. b Histogram, raw MRF score; Heatmap, MRF prediction for conserved residues, graded 0 to 0.05 (scale of increasing mutation likelihood with human disease). c Colored bars indicate most likely clinically relevant variant clusters. MRF score averaged with 1% intervals for each gene and cutoff below the 75th percentile, graded 0 to 0.03 (noise reduction method). d Gene structure with functional domains. Full list of residues and scores available in Table E1.
Fig. 2
Fig. 2
RAG1 and RAG2 MRF scores predict the likelihood of mutations that are clinically relevant. a Known damaging variants (clinically diagnosed with genetic confirmation) reported on GnomAD have significantly higher MRF scores than unreported variants. b GnomAD rare variant allele frequency < 0.0001. No significant difference in allele frequency is found between known damaging and non-clinically reported variants. Unpaired t test, RAG1 P value 0.002** and RAG2 P value 0.0339*. MRF, mutation rate residue frequency; ns, non-significant
Fig. 3
Fig. 3
RAG1 and RAG2 MRF score categories and variants assayed to date. Protein residues are ranked and stacked into categories based on their MRF score. High scores (0.043 and 0.038 in RAG1 and RAG2, respectively) represent a greater mutation likelihood. Functional assays have measured recombination activity (as its inverse; % loss of activity) in a total of 110 mutants. The severity of protein loss of function is represented by a red gradient. Residues that have not been functionally tested are shown in grey. While many protein residues are critical to protein function, their mutation is less probable than many of the top MRF candidates. Data further expanded in Fig. E3. MRF, mutation rate residue frequency
Fig. 4
Fig. 4
False positives in Transib domains do not worsen probability prediction. The Transib domains contain critical conserved protein residues. a False positives were simulated by scoring Transib domains MRF without omitting Boolean conservation weight C = 0. b Allele frequencies on GnomAD had conservation levels inversely proportional to simulated false-positive MRF scoring. c When testing for all Boolean component C > 0 after MRF calculation, the effect of false positives remained non-significant, illustrating the non-negative impact of MRF for predicting the mutation rate. Unpaired t test, *P = 0.0195 and ***P < 0.0001. MRF, mutation rate residue frequency; ns, non-significant
Fig. 5
Fig. 5
A linear regression model of RAG1/2 MRF scoring in cases of primary immune deficiency. MRF prediction correlates with clinical presentation. Damaging variants identified in confirmed RAG deficiency cases. Non-damaging variants sourced from cases of PID with rare variants but not responsible for disease. An MRF > 0.04 was seen for 31 cases of damaging RAG1 variants. (Slopes of RAG1: Damaging, 0.0008* (± 0.0004) P < 0.05, intercept 5.82e-05***; non-damaging, − 0.0007 (± 0.001). Slopes of RAG2: Damaging, 0.0023 (± 0.0018), intercept 0.0312*; non-damaging 0.0001 (± 0.0008). Source data and script in Supplemental material)
Fig. 6
Fig. 6
RAG1 PHRED-scaled CADD score versus GnomAD conservation rate and MRF score. Allele frequency conservation rate (top) is vastly important for identifying critical structural and functional protein regions. The impact of mutation in one of these conserved regions is often estimated using CADD scoring (middle). CADD score heatmap is aligned by codon and separated into three layers for individual nucleotide positions. The MRF score (bottom) (visualized using the 75th percentile with 1% averaging) highlights protein regions that are most likely to present clinically and may require pre-emptive functional investigation
Fig. 7
Fig. 7
RAG1 PHRED-scaled CADD score versus MRF score against HGMD data. a A high CADD score is a predictor of deleteriousness. Both reported (red) and non-reported residues (pink) have a high density of high CADD score. b MRF scores only show a high-density cluster for high-likelihood variants, reflected by the high MRF score observed for known RAG deficiency variants. The number of pathogenic variants is outweighed by conserved residues; a, b shows the density of scores to normalize between groups. AUC overlap difference in CADD score of 21.43% and MRF score of 74.28% (above intersects > 22.84 and > 0.0409, in a and b respectively). c The number of residues per MRF category shows that disease reported on HGMD accounts for 36% of top MRF candidates. AUC, area under curve; CADD, Combined Annotation Dependent Depletion; HGMD, Human Gene Mutation Database

References

    1. Payne K, Gavan SP, Wright SJ, Thompson AJ. Cost-effectiveness analyses of genetic and genomic diagnostic tests. Nat Rev Genet. 2018;19(4):235–246. - PubMed
    1. Kwan A, Abraham RS, Currier R, Brower A, Andruszewski K, Abbott JK, Baker M, Ballow M, Bartoshesky LE, Bonagura VR, et al. Newborn screening for severe combined immunodeficiency in 11 screening programs in the united states. Jama. 2014;312(7):729–738. - PMC - PubMed
    1. Alexander Liggett L, Sharma A, De S, DeGregori J. Conserved patterns of somatic mutations in human peripheral blood cells. bioRxiv. 2017. 10.1101/208066.
    1. Clark SJ, Argelaguet R, Kapourani C-A, Stubbs TM, Lee HJ, Alda-Catalinas C, Krueger F, Sanguinetti G, Kelsey G, Marioni JC, et al. scnmt-seq enables joint profiling of chromatin accessibility dna methylation and transcription in single cells. Nat Commun. 2018;9(1):781. - PMC - PubMed
    1. Bartha I, di Iulio J, Venter JC, Telenti A. Human gene essentiality. Nat Rev Genet. 2017;19(1):51–62. - PubMed

Publication types