Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 May 30;9(1):2138.
doi: 10.1038/s41467-018-04552-7.

Distinct epigenomic patterns are associated with haploinsufficiency and predict risk genes of developmental disorders

Affiliations

Distinct epigenomic patterns are associated with haploinsufficiency and predict risk genes of developmental disorders

Xinwei Han et al. Nat Commun. .

Abstract

Haploinsufficiency is a major mechanism of genetic risk in developmental disorders. Accurate prediction of haploinsufficient genes is essential for prioritizing and interpreting deleterious variants in genetic studies. Current methods based on mutation intolerance in population data suffer from inadequate power for genes with short transcripts. Here we show haploinsufficiency is strongly associated with epigenomic patterns, and develop a computational method (Episcore) to predict haploinsufficiency leveraging epigenomic data from a broad range of tissue and cell types by machine learning methods. Based on data from recent exome sequencing studies on developmental disorders, Episcore achieves better performance in prioritizing likely-gene-disrupting (LGD) de novo variants than current methods. We further show that Episcore is less-biased by gene size, and complementary to mutation intolerance metrics for prioritizing LGD variants. Our approach enables new applications of epigenomic data and facilitates discovery and interpretation of novel risk variants implicated in developmental disorders.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Epigenomic profiles are associated with gene haploinsufficiency. a Heatmap showing Spearman correlation between epigenomic features. Three groups of epigenomic features are included: active promoter, repressive promoter, and enhancer features. Epigenomic features inside each group strongly correlate with each other. Different feature types, including various histone modifications, histone variant, and DNase I hypersensitivity sites, are color-coded. Above the heatmap, a bar denoting Spearman correction between epigenomic features and pLI shows many epigenomic features relate to HIS with varying degree. Data from stem cells or fetal tissues are also marked by color lines. b, c Known HIS and HS genes have different distributions of peak length of promoter features (b H3K4me3; c H3K27me3). For each gene, peak length was averaged across tissues. d HIS and HS genes have different distributions of number of interacting enhancers inferred by Epitensor. For each gene, the number of interacting enhancers was averaged across tissues
Fig. 2
Fig. 2
A Random Forest model to predict haploinsufficiency. a A flowchart of the method. b ROC curve from tenfold cross-validation. The red curve is the average of 100 randomized cross-validation runs, with error bar showing standard deviation. The mean and median AUC of the 100 runs are 0.88 and 0.89, respectively
Fig. 3
Fig. 3
Performance of Episcore in variant prioritization. a, b Comparison of Episcore, pLI, Shet, and heart expression level (HE) in variant prioritization using CHD exome sequencing data. In a, burden refers to the ratio between the number of de novo LGD variants observed in top genes ranked by each metric and the number of expected de novo LGD variants due to background mutation. Episcore has higher enrichment in top 1000–2500 genes and similar enrichment afterwards. The gray dash line indicates the burden of de novo LGD variants in all genes. b Precision-recall-like curves. True positive is the difference between the observed and expected de novo LGD variants. Precision is calculated by dividing the number of true positives by the number of observed de novo LGD variants. The blue curve for Episcore shifts upright than pLI and Shet, showing Episcore has better recall with precision and vice versa. c, d Episcore has less bias towards genes with longer CDS length (c) or larger background mutation rate (d) than pLI. Gray histogram in the background represents CDS length or mutation rate of all genes in the genome. The blue curve for pLI shifts right, while the curves for Episcore and HE are similar to the distribution of all genes and known HIS genes. e, f A combination of Episcore and pLI, the meta-score, has better performance in variant prioritization when benchmarked using DDD exome sequencing data. Meta-score is the output from a logistic regression model, using Episcore and pLI as input. Enrichment, true positive and precision were calculated similarly to a, b
Fig. 4
Fig. 4
Contribution of epigenomic features to Episcore prediction. a Spearman correlation between epigenomic feature and Episcore. Features used in the Random Forest model, including H2A.Z, H3K27me3, H3K4me3, H3K9ac and the number of interacting enhancers, all have positive correlation with Episcore. Spearman correlation coefficients between gene expression level, measured in RPKM (reads per kilobase per million reads), and Episcore were also plotted for comparison. b The importance of each tissue in generating Episcore is measured by average Z-score, which is converted from Spearman correlation coefficients between epigenomic feature and Episcore. Each dot represents one cell line or tissue type indicated by colors. Stem cells and neural and fetal tissues are the most important tissue and cell types in Episcore prediction. c The epigenomic profile of an example HIS gene, RBFOX2, and a house-keeping gene, CWC22. Each small box represents 100 bp region around transcription start sites (TSSs) and the shade of the color reflects averaged fold change of reads between ChIP-seq library and control samples. RBFOX2 has a broad expansion of epigenomic marks while CWC22 is not, and RBFOX2 shows more tissue-specific regulation but CWC22 has narrow peaks in active marks across all the tissues

References

    1. De Rubeis S, et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature. 2014;515:209–215. doi: 10.1038/nature13772. - DOI - PMC - PubMed
    1. Iossifov I, et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature. 2014;515:216–221. doi: 10.1038/nature13908. - DOI - PMC - PubMed
    1. Hamdan FF, et al. De novo mutations in moderate or severe intellectual disability. PLoS Genet. 2014;10:e1004772. doi: 10.1371/journal.pgen.1004772. - DOI - PMC - PubMed
    1. Deciphering Developmental Disorders, S. Large-scale discovery of novel genetic causes of developmental disorders. Nature. 2015;519:223–228. doi: 10.1038/nature14135. - DOI - PMC - PubMed
    1. Homsy J, et al. De novo mutations in congenital heart disease with neurodevelopmental and other congenital anomalies. Science (New York, NY) 2015;350:1262–1266. doi: 10.1126/science.aac9396. - DOI - PMC - PubMed

Publication types