Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 6;14(1):7702.
doi: 10.1038/s41467-023-43041-4.

Saturation genome editing of DDX3X clarifies pathogenicity of germline and somatic variation

Affiliations

Saturation genome editing of DDX3X clarifies pathogenicity of germline and somatic variation

Elizabeth J Radford et al. Nat Commun. .

Abstract

Loss-of-function of DDX3X is a leading cause of neurodevelopmental disorders (NDD) in females. DDX3X is also a somatically mutated cancer driver gene proposed to have tumour promoting and suppressing effects. We perform saturation genome editing of DDX3X, testing in vitro the functional impact of 12,776 nucleotide variants. We identify 3432 functionally abnormal variants, in three distinct classes. We train a machine learning classifier to identify functionally abnormal variants of NDD-relevance. This classifier has at least 97% sensitivity and 99% specificity to detect variants pathogenic for NDD, substantially out-performing in silico predictors, and resolving up to 93% of variants of uncertain significance. Moreover, functionally-abnormal variants can account for almost all of the excess nonsynonymous DDX3X somatic mutations seen in DDX3X-driven cancers. Systematic maps of variant effects generated in experimentally tractable cell types have the potential to transform clinical interpretation of both germline and somatic disease-associated variation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Experimental design overview.
a Two independent sgRNAs and associated HDR variant libraries are designed at the 5′ and 3′ end of each exon. b The sgRNA, together with the HDR template library are transfected into LIG4-KO Cas9-expressing HAP1 cells. HDR utilises the library as a template for repair of the sgRNA-directed double-stranded DNA cut, incorporating a DDX3X variant of interest. Damaging DDX3X variants will reduce cell viability or proliferation. Variant abundance was assessed at five timepoints. Functional missense (purple) and synonymous variants (Syn, blue) remain abundant, while loss-of-function variants (LOF, red), and damaging missense (yellow) variants are depleted.
Fig. 2
Fig. 2. Functional classification of DDX3X variants.
a Day 7 and Day 15 cLFC of variant abundance. b Day 15 cLFC of variant abundance. c Average cLFC for all variants in each SGE functional class for each time point. Number of variants per class: unchanged n = 9344; enriched = 1095, fast-depleting = 1546, slow-depleting = 791. Error bars represent the 95% CI. d Proportions of SGE functional classes within single nucleotide variant synonymous (Syn), missense (Miss), codon deletion (Cdel), canonical splice acceptor/donor (SpA/D) and nonsense (NonS) variants. e Day 15 cLFC of variant abundance for synonymous (n = 1244) and nonsense (n = 280) variants only. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Properties of SGE-depleted and SGE-enriched variants.
a For each SGE functional class: Top panel: Observed/Expected number of DDX3X SNVs in UK Biobank (UKBB) and Genome Aggregation Database (GnomAD), number of variants per class: unchanged n = 6732; enriched = 710, fast-depleting = 1108, slow-depleting = 710. X2 test, degrees of freedom (df) = 3. Second panel: Amino acid conservation, Kruskal–Wallis (KW) test p = 1.1 × 10−141. Third panel: CADD PHRED scores, KW test p ~ 0. Fourth panel: ΔΔG for missense variants, KW test p = 1.8×10−54. Lower panel: distance from the centroid of DDX3X to the amino-acid side chain centroid (ångströms), missense variants, KW test p = 2.0 × 10−234. Dunn’s post-test FDR, corrected for multiple testing by the Benjamini–Hochberg (BH) method, is shown in panels 2–5. Internal boxplots within each violinplot show median and interquartile range (IQR), whiskers denote 1.5 x IQR. b The proportion of SGE functional classes in DDX3X missense variants stratified by their position in the protein. Interaction interface: residues in contact with RNA, magnesium ion or ATP. Buried residues: all residues with total solvent accessible surface area <25%. X2 p-values relative to all missense are shown, df = 3. c The proportion of SGE functional classes in DDX3X codon-deletion variants stratified by their position in the protein. X2 p-values relative to all codon deletions are shown, df = 3. d AlphaFold2 DDX3X structure together with ATP, magnesium ion and RNA. Coloured according to the modal SGE functional class for missense variants at each residue. Spheres: residue main chain. Sticks: residue side chain. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Functional characterisation of DDX3X variant types.
a Top panel: DDX3X exon structure with locations of key domains and protein annotations. Lower panel: cLFC-trend plotted against chromosome coordinate for variants in four exons. b cLFC-trend of non-coding (nc), synonymous (Syn), in-frame codon-deletion (Cdel), missense (Miss), nonsense (NonS) and canonical splice acceptor/donor SNVs (SpA/D) in the same four exons, coloured by SGE functional class. c Proportion of SGE functional classes across DDX3X exons for nonsense, missense, codon-deletion and canonical splice acceptor/donor variants. d SpliceAI Delta score distributions for canonical splice acceptor/donor variants, split by SGE functional class. Kruskal–Wallis p = 0.002. Compared to SGE-unchanged variants: FD: Fast-depleting n = 130, Dunn’s BH-corrected FDR = 0.001, SD: Slow-depleting n = 35, FDR = 0.3, U: SGE-unchanged n = 51 E: SGE-enriched n = 8 FDR = 0.66. Internal boxplots within each violinplot show median and interquartile range (IQR), whiskers denote 1.5xIQR. e SpliceAI Delta score distributions for intronic variants outside canonical splice sites split by SGE functional class. FD: Fast-depleting n = 38, SD: Slow-depleting n = 66, U: SGE-unchanged n = 1677, E: SGE-enriched n = 104. Internal boxplots within each violinplot show median and interquartile range (IQR), whiskers denote 1.5xIQR. f cLFC-trend for intronic and synonymous SNVs within 2 bp of the end of the exon plotted according to position relative to the splice site. Triangles denote pyrimidine to purine variants (Py > Pu). Source data are provided as a Source Data file.
Fig. 5
Fig. 5. SGE functional classification of DDX3X variants observed in clinical and population databases.
a SGE functional classification of DDX3X variants in individuals with NDDs, split by clinical interpretation and proband sex; and in individuals in UKBB and GnomAD. b Vineland composite scores for individuals with DDX3X-related NDD carrying a protein-truncating variant (PTV) or a missense or in-frame variant (M/IF) in three clinical phenotyping studies. c Vineland composite scores for individuals with DDX3X-related NDD carrying fast-depleting (FD) and slow-depleting (SD) variants. d Composite phenotypic score for individuals with DDX3X-related NDD carrying fast-depleting and slow-depleting variants. e&f) Age at which first words (e) and first independent steps (f) were taken for individuals with DDX3X-related NDD carrying fast-depleting variants and slow-depleting variants, compared to children without an NDD. Number of individuals: First words: n = 24 fast-depleting variants, n = 16 slow-depleting variants; first steps: n = 31 fast-depleting variants, n = 17 slow-depleting variants. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. Performance of a machine learning classifier of NDD-relevance.
a Day 15 cLFC of variant abundance. b Day 7 and Day 15 cLFC of variant abundance coloured by variants’ NDD-relevance. c Modelling the impact of SGE data on DDX3X clinical variant interpretation for a hypothetical female patient with moderate intellectual disability for a variant of unknown inheritance status. NS nonsense, F/S frameshift, Splice A/D canonical splice acceptor/donor sites. d, e Comparison of in silico variant effect predictor scores and Random Forest classifier posterior probability for (d) all likely pathogenic/pathogenic, likely benign and GnomAD/UKBB variants and (e) missense likely pathogenic/pathogenic, likely benign and GnomAD/UKBB variants. Source data are provided as a Source Data file.
Fig. 7
Fig. 7. SGE functional classification of DDX3X variants observed in cancers.
a The proportion of SGE functional classes in DDX3X missense variants observed in cancers stratified by whether or not DDX3X had been identified as a putative driver gene in each cancer type. b comparing the proportion of missense variants classified as SGE-enriched with the estimated percentage of missense variants that are drivers, in different sets of cancer types. c comparing the proportion of missense variants classified as SGE-depleted with the estimated percentage of missense variants that are drivers, in different sets of cancer types. Error bars in b, c show 95% confidence intervals. Source data are provided as a Source Data file.

References

    1. Richards S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 2015;17:405–424. doi: 10.1038/gim.2015.30. - DOI - PMC - PubMed
    1. Abou Tayoun AN, et al. Recommendations for interpreting the loss of function PVS1 ACMG/AMP variant criterion. Hum. Mutat. 2018;39:1517–1524. doi: 10.1002/humu.23626. - DOI - PMC - PubMed
    1. Starita LM, et al. Variant interpretation: functional assays to the rescue. Am. J. Hum. Genet. 2017;101:315–325. doi: 10.1016/j.ajhg.2017.07.014. - DOI - PMC - PubMed
    1. Findlay GM, et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature. 2018;562:217–222. doi: 10.1038/s41586-018-0461-z. - DOI - PMC - PubMed
    1. Fayer S, et al. Closing the gap: systematic integration of multiplexed functional data resolves variants of uncertain significance in BRCA1, TP53, and PTEN. Am. J. Hum. Genet. 2021;108:2248–2258. doi: 10.1016/j.ajhg.2021.11.001. - DOI - PMC - PubMed

Publication types

Substances