Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Jun 25:2023.07.12.548370.
doi: 10.1101/2023.07.12.548370.

Deep Mutational Scanning in Disease-related Genes with Saturation Mutagenesis-Reinforced Functional Assays (SMuRF)

Affiliations

Deep Mutational Scanning in Disease-related Genes with Saturation Mutagenesis-Reinforced Functional Assays (SMuRF)

Kaiyue Ma et al. bioRxiv. .

Update in

Abstract

Interpretation of disease-causing genetic variants remains a challenge in human genetics. Current costs and complexity of deep mutational scanning methods hamper crowd-sourcing approaches toward genome-wide resolution of variants in disease-related genes. Our framework, Saturation Mutagenesis-Reinforced Functional assays (SMuRF), addresses these issues by offering simple and cost-effective saturation mutagenesis, as well as streamlining functional assays to enhance the interpretation of unresolved variants. Applying SMuRF to neuromuscular disease genes FKRP and LARGE1, we generated functional scores for all possible coding single nucleotide variants, which aid in resolving clinically reported variants of uncertain significance. SMuRF also demonstrates utility in predicting disease severity, resolving critical structural regions, and providing training datasets for the development of computational predictors. Our approach opens new directions for enabling variant-to-function insights for disease genes in a manner that is broadly useful for crowd-sourcing implementation across standard research laboratories.

Keywords: Deep mutational scanning; cost-effective variant interpretation; diagnostics; dystroglycanopathies; genetic diseases; high-throughput functional assays; muscular dystrophies; saturation mutagenesis; variant effect prediction; variants of uncertain significance.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests. Declaration of generative AI and AI-assisted technologies We used ChatGPT 3.5 and Gemini to improve the readability and language in this manuscript. The manuscript was first drafted by us and polished with the tools sentence-by-sentence where we deemed necessary. We then reviewed and finalized the text. We take full responsibility for the contents of this manuscript.

Figures

Figure 1:
Figure 1:. Streamlining the saturation mutagenesis and FACS assay in SMuRF.
(A) A universal workflow of SMuRF. SMuRF accompanies saturation mutagenesis with functional assays. Here, saturation mutagenesis is achieved by delivering variant lentiviral particles to the engineered HAP1 platform where the endogenous gene of interest (GOI) was knocked out and stable DAG1 overexpression was established through lentiviral integration. A fluorescence-activated cell sorting (FACS) assay was employed to separate the high-function population and the low-function population. (B) Lenti-GOI constructs used for the saturation mutagenesis. The GOI CDS expression is driven by a weak promoter UbC. (C) PALS-C is simple and accessible to most molecular biological laboratories. To accommodate the requirements of downstream short-read NGS, the GOI variants were separated into multiple blocks (6 blocks for FKRP and 10 blocks for LARGE1). PALS-C generates block-specific lentiviral plasmid pools from 1 oligo pool per GOI. The steps are massively multiplexed: Step 1 requires only a single-tube reaction; the following steps can be done in a single-tube reaction for each block. Step 8 (not shown) uses electrotransformation to deliver the assembled plasmid pools to bacteria for amplification. (D) A representative example shows the gating strategy; 20k flow cytometry events of FKRP block1 were recorded and reanalyzed with FlowJo. (E) A 3-round PCR strategy to build the NGS library. Samples from the high and low glycosylation groups were barcoded differently in PCR2. PCR2 products of all samples were multiplexed for a single PCR3 reaction. (F) A universal pipeline to generate SMuRF scores from raw NGS data. Steps colored yellow indicate employment of customized scripts. Cleaning is a critical step where the reads carrying co-occurred variants are filtered out.
Figure 2:
Figure 2:. SMuRF recapitulated and expanded the knowledge gained from variant databases.
(A and B) High confidence SMuRF scores align with variant types (A, FKRP; B, LARGE1).The mean of synonymous variants of each block was used to normalize the scores across blocks. The box boundaries represent the 25th/75th percentiles, with a horizontal line indicating the median and a vertical line marking an additional 1.5 times interquartile range (IQR) above and below the box boundaries. p-values were calculated using the two-sided Wilcoxon test. Counts of variants were labeled below the boxes. (C and D) SMuRF revealed functional constraints based on variants reported in gnomAD v4.0.0 genome sequencing data (C, FKRP; D, LARGE1): Low allele frequency variants had diverse functional scores, while high allele frequency variants converged towards wild-type (WT) due to selection pressures (Gray box: AF < 1.5e-05). Dashed lines represent WT functional score. Dots were jittered with geom_jitter (width = 0.05, height = 0.05).
Figure 3:
Figure 3:. SMuRF improved the scope of clinical interpretation of genetic variants.
(A and B) SMuRF scores correlate well with clinical classification in ClinVar (A, FKRP; B, LARGE1). (B/LB: Benign, Benign/Likely benign or Likely benign in ClinVar; VUS: Uncertain significance in ClinVar; P/LP: Pathogenic, Pathogenic/Likely pathogenic or Likely pathogenic in ClinVar.) Counts of variants were labeled below the violins. (C) Real patient data from eight well-curated cohorts demonstrated that SMuRF scores have the potential to predict disease severity. The additive SMuRF scores of the variant pairs associated with mild cases were significantly higher than those of intermediate and severe cases. Counts of cases were labeled below the violins. p-values were calculated using the Wilcoxon test. FS1, the SMuRF functional score of the variant on Allele1; FS2, the SMuRF functional score of the variant on Allele2. (D) The SMuRF scores are correlated with the disease onset age. Dashed trendlines represent linear regression. Spearman’s rank correlation rho: 0.72 (all data), 0.70 (male), 0.73 (female).
Figure 4:
Figure 4:. SMuRF scores can be employed to validate and improve computational predictors.
(A and B) Receiver operating characteristic (ROC) curves of SMuRF and computational predictors (A, FKRP; B, LARGE1). AUC: Area Under Curve. Higher AUC indicates better performance in classifying pathogenic variants. (C) The correlation coefficient was calculated between SMuRF scores and scores generated by computational predictors. Figure depicts absolute correlation coefficient.

(D and E) Among all the predictors examined, AlphaMissense has the strongest correlation with SMuRF (rho = −0.70, FKRP; −0.54, LARGE1). Density was calculated with contour_var = “count” in R. White dashed lines represent linear regression. (F) IIH6C4 blots indicate matriglycan synthesis activity of selected LARGE1 enzyme variants. The laminin overlay provides a different probe for matriglycan, with an arrow indicating the expected band size. The protein loading was controlled by Coomassie gel. Experiments were conducted with the myogenic cell line, MB135. Control: WT MB135. KO Rescue: endogenous LARGE1 was knocked out and the cells were resecured with individual lentiviral transduction. P.Ala145= (SMuRF = −0.43) has the highest AF (0.41) in gnomAD v4, which was used as a high-function reference. P.Ser331Phe (SMuRF = −2.58) is Pathogenic in ClinVar, which was used as a low-function reference.
Figure 5:
Figure 5:. SMuRF highlights the critical structural regions.
(A) SMuRF scores showed higher functional disruption by missense variants in the catalytic domain of FKRP compared to the stem domain. The zinc finger loop (Znf) within the catalytic domain exhibited greater disruption by missense variants. Box plots depict the 25th/75th percentiles (box boundaries), median (horizontal line), and an additional 1.5 times IQR (vertical line) above and below the box boundaries. p-values were calculated using the two-sided Wilcoxon test. Counts of variants were labeled below the violins. Dashed lines represent WT functional score. (B) The SMuRF scores of synonymous FKRP variants in different domains. (C) Missense variants in the catalytic domains of LARGE1 showed higher disruption compared to the N-terminal domain. Missense variants in the XylT domain were more disruptive than those in the GlcAT domain. (D) The SMuRF scores of synonymous LARGE1 variants in different domains. (E and F) Mean SMuRF scores were utilized to map SNV-generated single amino acid substitutions onto the 1D structures of the enzymes (E, FKRP; F, LARGE1). The mean SMuRF score per amino acid residue was calculated and visualized using a color scale, where red indicates positions sensitive to substitutions and green is tolerated. (G and H) Mean SMuRF scores were utilized to map SNV-generated single amino acid substitutions onto the 3D structures of the enzymes (G, FKRP; H, LARGE1). The crystal structure of human FKRP (PDB:6KAM, codon: 45–495) and the electron microscopy structure of LARGE1 (PDB:7UI7, codon: 34–756) were used. Same color scale is employed as E and F. (I and J) Heatmap representation of the mean SMuRF scores for each codon (I, FKRP; J, LARGE1). Amino acids were categorized into five groups: nonpolar, aliphatic (G, A, V, L, M, I); polar, uncharged (S, T, C, P, N, Q); positively charged (K, R, H); negatively charged (D, E); and nonpolar, aromatic (F, Y, W). Each cell in the heatmap corresponds to a codon position (x-axis) and an amino acid group (y-axis). The black dots indicate the wild-type amino acid group for each residue. Grey squares denote the scenario where the amino acid change is not possible with a single SNV within the codon, and a red cross marks positions where variants were filtered out due to low confidence.
Figure 6:
Figure 6:. Validations confirmed SMuRF findings in the myogenic context.
(A) Validation of individual FKRP variants using an IIH6C4 IF assay. The myoblasts underwent transduction and drug selection, followed by differentiation into myotubes, which were subsequently used for IF. “.r” denotes lentiviral transduction of an individual variant. Blue: DAPI. Green: IIH6C4, the glycosylation level of α-DG. Red: SMuRF scores; White: ClinVar clinical significance. The brightness and contrast of the photos were adjusted in Adobe Photoshop with the same settings. (B) Immunofluorescence intensity was quantified using integrated density (IntDen) of IIH6C4 relative to DAPI in differentiated myotubes by ImageJ. Analyses were conducted on 8 representative images. *p < 0.05, ****p < 0.0001 (compared with FKRP KO group). ##p < 0.01, ###p < 0.001, ####p < 0.0001 (compared with WT.r group). Multiple comparisons between groups were performed using analysis of variance (ANOVA) followed by Bonferroni post hoc test through GraphPad Prism 10.2.2. Experiments were conducted independently three times. (C) An orthogonal assay based on α-DG-dependent viral entry. Vesicular stomatitis virus (VSV) with Lassa fever virus glycoprotein complex (LASV-GPC) can infect cells in a glycosylated-α-DG-dependent manner. Variant enrichment before/after VSV infection can be used to quantify their performances regarding α-DG glycosylation. (D and E) The ppVSV assay can be employed to validate the findings from the flow cytometry assay (D, FKRP; E, LARGE1). 10 Lenti-FKRP variants were mixed with Lenti-WT-FKRP to rescue FKRP-KO MB135. 11 Lenti-LARGE1 variants were mixed with Lenti-WT-LARGE1 to rescue LARGE1-KO MB135. The functional score was quantified by the ratio of a variant’s enrichment in the non-infected group to its enrichment in the ppVSV-infected group. A higher functional score indicates better performance in α-DG glycosylation. FKRP c.135C>T (p.Ala45=) and LARGE1 c.435C>T (p.Ala145=) have the highest AFs in gnomAD v4. FKRP c.663C>A (p.Ser221Arg) and LARGE1 c.992C>T (p.Ser331Phe) are Pathogenic in ClinVar. Biological replicate N = 3: Lentiviral transduction and ppVSV infection were both performed independently. Figures display mean values with SEM. Additional discussions for the ppVSV results are included in Supplemental Note. Multiple comparisons were performed using ANOVA and Dunnett’s test.

Similar articles

References

    1. Fridman H., Yntema H.G., Mägi R., Andreson R., Metspalu A., Mezzavila M., Tyler-Smith C., Xue Y., Carmi S., Levy-Lahad E., et al. (2021). The landscape of autosomal-recessive pathogenic variants in European populations reveals phenotype-specific effects. Am. J. Hum. Genet. 108, 608–619. 10.1016/j.ajhg.2021.03.004. - DOI - PMC - PubMed
    1. Balick D.J., Jordan D.M., Sunyaev S., and Do R. (2022). Overcoming constraints on the detection of recessive selection in human genes from population frequency data. Am. J. Hum. Genet. 109, 33–49. 10.1016/j.ajhg.2021.12.001. - DOI - PMC - PubMed
    1. Barton A.R., Hujoel M.L.A., Mukamel R.E., Sherman M.A., and Loh P.-R. (2022). A spectrum of recessiveness among Mendelian disease variants in UK Biobank. Am. J. Hum. Genet. 109, 1298–1307. 10.1016/j.ajhg.2022.05.008. - DOI - PMC - PubMed
    1. Schmenger T., Diwan G.D., Singh G., Apic G., and Russell R.B. (2022). Never-homozygous genetic variants in healthy populations are potential recessive disease candidates. NPJ Genom. Med. 7, 54. 10.1038/s41525-022-00322-z. - DOI - PMC - PubMed
    1. Boycott K.M., Hartley T., Biesecker L.G., Gibbs R.A., Innes A.M., Riess O., Belmont J., Dunwoodie S.L., Jojic N., Lassmann T., et al. (2019). A diagnosis for all rare genetic diseases: the horizon and the next frontiers. Cell 177, 32–37. 10.1016/j.cell.2019.02.040. - DOI - PubMed

Publication types