Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 21;18(6):e1010278.
doi: 10.1371/journal.pgen.1010278. eCollection 2022 Jun.

Missense variants causing Wiedemann-Steiner syndrome preferentially occur in the KMT2A-CXXC domain and are accurately classified using AlphaFold2

Affiliations

Missense variants causing Wiedemann-Steiner syndrome preferentially occur in the KMT2A-CXXC domain and are accurately classified using AlphaFold2

Tinna Reynisdottir et al. PLoS Genet. .

Abstract

Wiedemann-Steiner syndrome (WDSTS) is a neurodevelopmental disorder caused by de novo variants in KMT2A, which encodes a multi-domain histone methyltransferase. To gain insight into the currently unknown pathogenesis of WDSTS, we examined the spatial distribution of likely WDSTS-causing variants across the 15 different domains of KMT2A. Compared to variants in healthy controls, WDSTS variants exhibit a 61.9-fold overrepresentation within the CXXC domain-which mediates binding to unmethylated CpGs-suggesting a major role for this domain in mediating the phenotype. In contrast, we find no significant overrepresentation within the catalytic SET domain. Corroborating these results, we find that hippocampal neurons from Kmt2a-deficient mice demonstrate disrupted histone methylation (H3K4me1 and H3K4me3) preferentially at CpG-rich regions, but this has no systematic impact on gene expression. Motivated by these results, we combine accurate prediction of the CXXC domain structure by AlphaFold2 with prior biological knowledge to develop a classification scheme for missense variants in the CXXC domain. Our classifier achieved 92.6% positive and 92.9% negative predictive value on a hold-out test set. This classification performance enabled us to subsequently perform an in silico saturation mutagenesis and classify a total of 445 variants according to their functional effects. Our results yield a novel insight into the mechanistic basis of WDSTS and provide an example of how AlphaFold2 can contribute to the in silico characterization of variant effects with very high accuracy, suggesting a paradigm potentially applicable to many other Mendelian disorders.

PubMed Disclaimer

Conflict of interest statement

I have read the journal’s policy and the authors of this manuscript have the following competing interests: HTB is a consultant for Mahzi therapeutics. No other authors have any potential conflict of interest.

Figures

Fig 1
Fig 1. The distribution of likely pathogenic Wiedemann-Steiner syndrome missense variants across the different domains of KMT2A.
(A) KMT2A missense variants in gnomAD (top) and WDSTS (bottom). See Methods for filtering criteria. (B) The percentage of missense variants from gnomAD (grey dots) and WDSTS (red dots) that fall in each of the different domains of KMT2A. (C) The percentage of missense variants in gnomAD (grey dots) and likely pathogenic variants (blue dots) that fall in the CXXC domain of different epigenetic regulators. (D) Multiple sequence alignment of the amino-acid sequence of the CXXC domain of KMT2A in eight eukaryotic species. Residues known to be important for DNA binding are marked with red asterisks at the top (see Methods for details). The eight zinc ion-binding cysteines are marked with red asterisks at the bottom.
Fig 2
Fig 2. The relationship between disrupted H3K4me1/3, regional observed-to-expected CpG ratio, and gene expression in Kmt2a-deficient mice.
(A) The percentage of disrupted H3K4me1 peaks (left) and H3K4me3 peaks (right), stratified based on the observed-to-expected CpG ratio of the underlying peak sequence. (B) Scatterplot of the log2 fold change of H3K4me1 peaks against the log2 fold change of H3K4me3 peaks at promoters (+/-1kb from the TSS) that harbor peaks for both marks. (C) The percentage of differentially expressed genes, stratified based on the p-value of associated promoter peaks (+/- 1kb from the TSS) from the differential H3K4me1 analysis (left) and H3K4me3 (right) analysis. (D) Scatterplot of the log2 fold change of H3K4me1 peaks (left) and H3K4me3 (right) against the log2 fold change of gene expression of the downstream gene. Each point corresponds to a gene-promoter pair. In cases where multiple peaks were present at the same promoter, the average log2 fold change was computed.
Fig 3
Fig 3. An AlphaFold2-based variant effect classification scheme for the CXXC domain of KMT2A.
(A) Predicted LDDT values for the CXXC domain of KMT2A. (B) The AlphaFold2-predicted and experimentally determined structures of the CXXC domain of KMT2A. (C) The variant effect classification scheme. See Methods for details on the derivation of the scheme. (D) The positive and negative predictive value that the classifier shown in (C) attained on a hold-out test set consisting of 41 missense variants (see main text and Methods for details). (E) The position of two different missense variants in the 3D structure of the CXXC domain, in conjunction with a 3D representation of the engaged DNA backbone. For comparison, the same representation is shown for the normal protein as well (PBD* ID: 4NW3). The surface of the domain is color-coded based on the electrostatic potential, with red indicating a negative charge and blue a positive charge. Both variants lead to a decreased electrostatic potential in a residue important for DNA binding.
Fig 4
Fig 4. An in silico saturation mutagenesis of the CXXC domain of KMT2A.
(A) Heatmap depicting the predicted effect of each nucleotide substitution within the CXXC domain. (B) The percentage of variants for each type of predicted effect. (C) The distribution of the phyloP score of the nucleotides coding for the CXXC domain, stratified according to the number of substitutions predicted to have a damaging effect (unfolding, compromised DNA binding, stop-gain).

References

    1. Jones Wendy D., et al.., De Novo Mutations in MLL Cause Wiedemann-Steiner Syndrome. The American Journal of Human Genetics, 2012. 91(2): p. 358–364. doi: 10.1016/j.ajhg.2012.06.008 - DOI - PMC - PubMed
    1. Slany R.K., The molecular biology of mixed lineage leukemia. Haematologica, 2009. 94(7): p. 984–993. doi: 10.3324/haematol.2008.002436 - DOI - PMC - PubMed
    1. Voo K.S., et al.., Cloning of a mammalian transcriptional activator that binds unmethylated CpG motifs and shares a CXXC domain with DNA methyltransferase, human trithorax, and methyl-CpG binding domain protein 1. Mol Cell Biol, 2000. 20(6): p. 2108–21. doi: 10.1128/MCB.20.6.2108-2121.2000 - DOI - PMC - PubMed
    1. Jumper J., et al.., Highly accurate protein structure prediction with AlphaFold. Nature, 2021. 596(7873): p. 583–589. doi: 10.1038/s41586-021-03819-2 - DOI - PMC - PubMed
    1. Kerimoglu C., et al.., KMT2A and KMT2B Mediate Memory Function by Affecting Distinct Genomic Regions. Cell Reports, 2017. 20(3): p. 538–548. doi: 10.1016/j.celrep.2017.06.072 - DOI - PubMed

Substances

Supplementary concepts