Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr;29(4):532-542.
doi: 10.1101/gr.239442.118. Epub 2019 Mar 11.

Coexpression patterns define epigenetic regulators associated with neurological dysfunction

Affiliations

Coexpression patterns define epigenetic regulators associated with neurological dysfunction

Leandros Boukas et al. Genome Res. 2019 Apr.

Abstract

Coding variants in epigenetic regulators are emerging as causes of neurological dysfunction and cancer. However, a comprehensive effort to identify disease candidates within the human epigenetic machinery (EM) has not been performed; it is unclear whether features exist that distinguish between variation-intolerant and variation-tolerant EM genes, and between EM genes associated with neurological dysfunction versus cancer. Here, we rigorously define 295 genes with a direct role in epigenetic regulation (writers, erasers, remodelers, readers). Systematic exploration of these genes reveals that although individual enzymatic functions are always mutually exclusive, readers often also exhibit enzymatic activity (dual-function EM genes). We find that the majority of EM genes are very intolerant to loss-of-function variation, even when compared to the dosage sensitive transcription factors, and we identify 102 novel EM disease candidates. We show that this variation intolerance is driven by the protein domains encoding the epigenetic function, suggesting that disease is caused by a perturbed chromatin state. We then describe a large subset of EM genes that are coexpressed within multiple tissues. This subset is almost exclusively populated by extremely variation-intolerant genes and shows enrichment for dual-function EM genes. It is also highly enriched for genes associated with neurological dysfunction, even when accounting for dosage sensitivity, but not for cancer-associated EM genes. Finally, we show that regulatory regions near epigenetic regulators are genetically important for common neurological traits. These findings prioritize novel disease candidate EM genes and suggest that this coexpression plays a functional role in normal neurological homeostasis.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The modular composition of the epigenetic machinery. (A) Venn diagram illustrating the three broad categories of the epigenetic machinery (histone machinery, DNA methylation machinery, and remodelers), their relative sizes, and their mutual relationships. (B) Venn diagram illustrating the four broad “action” categories of the machinery (writers, erasers, remodelers, and readers), their relative sizes, and their mutual relationships. The modularity of this organization is evident, with some reader components exhibiting enzymatic functions and/or more than one reading function. In contrast, the individual enzymatic component types are pairwise mutually exclusive.
Figure 2.
Figure 2.
A large subset of epigenetic regulators are very intolerant to variation. (A) The pLI distributions of EM genes (red curve), TF genes (green curve), and all other genes (blue curve). (B) The pLI distributions of EM genes (red curve), genes encoding for accessory subunits of EM protein complexes (black curve), and TF genes (green curve). (C) The pLI distribution of disease-associated EM genes versus non-disease–associated EM genes. (D–G) The pLI distributions (D,F) and percentage of genes with pLI > 0.9 (E,G) of individual classes of EM genes. The shaded gray area (A–D,F) indicates highly constrained genes (>0.9). The vertical dashed gray line (E,G) corresponds to the percentage of all other genes with pLI > 0.9.
Figure 3.
Figure 3.
The protein domains known to mediate epigenetic functions drive the observed constraint of EM genes. (A) The number of constrained and unconstrained EM-specific protein domains of high pLI (>0.9) EM genes versus low pLI (<0.1) EM genes. (B) The within-gene differences in the total number of EM-specific constrained domains versus other constrained domains. Each dot corresponds to a gene. Red dots indicate genes with more EM-specific constrained domains; blue dots indicate genes with more other constrained domains; black dots indicate genes with an equal number of constrained EM-specific and other domains. (C) The percentage of high pLI EM genes with at least one constrained EM-specific domain versus the corresponding percentage with at least one constrained other domain.
Figure 4.
Figure 4.
A large subset of the components of the epigenetic machinery exhibit unusually high levels of coexpression. (A) Schematic illustrating our definition and identification of module partners. WGCNA was used to construct tissue-specific coexpression networks and modules for 28 tissues profiled in GTEx. We determined if two EM genes were module partners (part of the same module in 10–14 tissues) or stable module partners (part of the same module in >14 tissues). (B,C) The number of module partners for each EM gene and the module partner matrix, where rows and columns are ordered as in B. We define three groups of EM genes—highly coexpressed, coexpressed, and not coexpressed—based on their number of module partners. (D) The pLI for each EM gene, ordered by its number of module partners as in B. (E) The size of the (highly) coexpressed group of EM genes compared to 300 draws of 270 random genes, in which the random genes are selected to have a similar expression level across tissues compared to EM genes (Supplemental Fig. S10).
Figure 5.
Figure 5.
Dual-function EM genes are enriched within the highly coexpressed group. (A) The distribution of dual-function EM genes (collectively and separately for each enzymatic group) within the three coexpression categories. (B) Log odds ratios and 95% confidence intervals for enrichment of dual-function EM genes (collectively and separately for each enzymatic group) in the highly coexpressed category. The vertical gray line at 0 corresponds to statistical significance. (C) Blue dots correspond to randomly chosen genes, sampled in sets of 270 genes from genes with a median expression (log(RPKM + 1)) greater than 0.5 in at least half the tissues, to match the expression of EM genes (as in Fig. 4D). Orange, green, and pink dots correspond to EM genes, TF genes, and protein kinases/phosphatases, respectively. Each dot corresponds to a single gene, and its position along the y-axis corresponds to the number of other genes with which it partners. The genes are ordered on the x-axis according to the number of their partners. This figure also serves as a sensitivity analysis with respect to the number of partners for this particular tissue cutoff.
Figure 6.
Figure 6.
EM genes linked to disorders with neurological dysfunction demonstrate significant enrichment within the highly coexpressed category. (A) The percentage of EM genes with pLI > 0.9 in each of the coexpression categories. (B) The percentage of EM genes that are associated with different types of disease; individual disease categories are mutually exclusive. (MDEM) Mendelian disorders of the epigenetic machinery; (Neuro) includes autism, schizophrenia, developmental disorders, and MDEM whose phenotype includes dysfunction of the central nervous system (Methods). (C) Log odds ratios and 95% confidence intervals for enrichment of different subsets of EM genes in the highly coexpressed category. The dashed vertical line at 0 corresponds to statistical significance. (D) The percentage of EM genes that are associated with neurological dysfunction and have pLI > 0.9 in each of the coexpression categories. (E) Odds ratio (black line) and 95% confidence interval (shaded area) for enrichment of EM genes associated with neurological dysfunction in the highly coexpressed group, as a function of the size of the highly coexpressed group. For all sizes, the comparison was performed against the not coexpressed group. (F) Estimates for enrichment of explained heritability, and unadjusted P-values, for eight traits and two sets of regulatory features: regions marked by H3K27ac in brain within 1 Mb of the transcription start site of all-EM (red dots) or highly coexpressed (orange dots) EM genes.

References

    1. Allis CD, Jenuwein T. 2016. The molecular hallmarks of epigenetic control. Nat Rev Genet 17: 487–500. 10.1038/nrg.2016.59 - DOI - PubMed
    1. Barrera LA, Vedenko A, Kurland JV, Rogers JM, Gisselbrecht SS, Rossin EJ, Woodard J, Mariani L, Kock KH, Inukai S, et al. 2016. Survey of variation in human transcription factors reveals prevalent DNA binding changes. Science 351: 1450–1454. 10.1126/science.aad2257 - DOI - PMC - PubMed
    1. Biggar KK, Li SS. 2015. Non-histone protein methylation as a regulator of cellular signalling and function. Nat Rev Mol Cell Biol 16: 5–17. 10.1038/nrm3915 - DOI - PubMed
    1. Bjornsson HT. 2015. The Mendelian disorders of the epigenetic machinery. Genome Res 25: 1473–1481. 10.1101/gr.190629.115 - DOI - PMC - PubMed
    1. Bulik-Sullivan BK, Loh PR, Finucane HK, Ripke S, Yang J, Schizophrenia Working Group of the Psychiatric Genomics Consortium, Patterson N, Daly MJ, Price AL, Neale BM. 2015. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47: 291–295. 10.1038/ng.3211 - DOI - PMC - PubMed

Publication types

MeSH terms

Substances