Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 17;15(1):6027.
doi: 10.1038/s41467-024-50380-3.

MethNet: a robust approach to identify regulatory hubs and their distal targets from cancer data

Affiliations

MethNet: a robust approach to identify regulatory hubs and their distal targets from cancer data

Theodore Sakellaropoulos et al. Nat Commun. .

Abstract

Aberrations in the capacity of DNA/chromatin modifiers and transcription factors to bind non-coding regions can lead to changes in gene regulation and impact disease phenotypes. However, identifying distal regulatory elements and connecting them with their target genes remains challenging. Here, we present MethNet, a pipeline that integrates large-scale DNA methylation and gene expression data across multiple cancers, to uncover cis regulatory elements (CREs) in a 1 Mb region around every promoter in the genome. MethNet identifies clusters of highly ranked CREs, referred to as 'hubs', which contribute to the regulation of multiple genes and significantly affect patient survival. Promoter-capture Hi-C confirmed that highly ranked associations involve physical interactions between CREs and their gene targets, and CRISPR interference based single-cell RNA Perturb-seq validated the functional impact of CREs. Thus, MethNet-identified CREs represent a valuable resource for unraveling complex mechanisms underlying gene expression, and for prioritizing the verification of predicted non-coding disease hotspots.

PubMed Disclaimer

Conflict of interest statement

Aristotelis Tsirigos is a scientific advisor to Intelligencia AI. The rest of the authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Outline of the MethNet pipeline.
Paired RNA-seq and DNA-methylation data were accessed from TCGA. The expression of every protein coding gene across every cancer dataset was modeled as a function of the methylation status in a 1 Mb radius surrounding the gene to generate a set of regulatory networks across all cancers. These were aggregated to produce a network of robust associations that were used to identify regulatory elements and hubs.
Fig. 2
Fig. 2. MethNet identifies putative activating and repressing distal associations.
a Distribution of HumanMethylation450 probes neighboring a protein-coding gene as a function of the window size. At 1 Mbp the average gene has 400 potential regulators. b Histogram of the robustness of MethNet associations as measured by the number of TCGA cancers it appears in. c Performance of MethNet models, measured by the ratio of explained variance (R2), as a function of dataset size. Trend line is fit with linear regression. Shaded area corresponds to 95% confidence interval of the mean performance given the number of samples in a cancer study (n = 24). d MethNet association effect as a function of distance. Associations are grouped based on their ranked distance to a gene, where −1 includes associations from the first CpG island or non-island upstream of the promoter, and positive distance refers to associations downstream of the TES. For every group of potential associations, we calculated the average distance, mean coefficient, and probability of a MethNet association. e Examples of regulatory effects recovered by MethNet in BRCA. Left: A repressive association between IFNγ and a CTCF binding site 250 kb upstream of the promoter. Right: An activating association between GSTT1 and a non-coding RNA 10 kb downstream of the promoter. The linear regression line that is fit models the mean expression as a function of methylation status of the CRE. Shaded area corresponds to 95% confidence interval (n = 868). f Spearman correlation coefficient for the association shown in panel e across all TCGA cancers. We observe that the signal is robust across all cancers (n = 24).
Fig. 3
Fig. 3. The regulatory potential of CREs is correlated with chromatin context and contact frequency.
a Schematic depiction of regulatory potential. We quantified the importance of an association as the excess of its relative effect size with respect to a null model where all elements contribute proportionally to their distance from the promoter. b Enrichment or depletion of regulatory potential by ChromHMM state of the CREs (n = 245,511). Bar length represents mean effect compared to Low Signal state and error bars corresponds to ± the standard error (for details see Methods). c Enrichment or depletion of chromatin remodelers, the transcription machinery, and transcription factor binding sites. d Enrichment or depletion of H3K27ac chromatin loops from CD4-naïve T cells, GM12878 and K562 cells for CREs that do not overlap with protein-coding promoters (n = 166,552). Bar height represents mean effect on MethNet potential for each group compared to “0” loops which is the intercept. Error bars correspond to 95% confidence interval.
Fig. 4
Fig. 4. MethNet hubs control multiple genes and have an impact on patient survival.
a Distribution of regulatory potential as a function of methylation variance across cancers. b Comparison of the distribution of MethNet association per elements for hubs (n = 6,139) versus non-hubs (n = 239,416). Boxplots were drawn with the following parameters: box bounds correspond to 1st and 3rd quantiles, center mark corresponds to median, whisker length is 1.5 the height of the box (inter-quantile region) or up to the extrema of the distribution if they are closer to the box bound. c Mean effect of hub methylation (excluding cancer-specific effects) on overall survival across TCGA cancers. A two-sided Wilcoxon rank sum test with continuity correction is used to calculate the p-value (p-value = 5 × 10−11, nHub = 574, nCRE = 174,139). Boxplots were drawn with default parameters as in panel b. df Enrichment of regulatory potential hubs versus non-hub CREs with positive potential (see Methods for details). d Enrichment of hubs versus non-hubs CREs across ChromHMM (n = 121,517). Bar length correspond to mean effect of state versus Low Signal state and error bars correspond to 95% confidence interval. e Enrichment of hub versus non-hub CREs across chromatin remodelers and transcription factor binding sites f Enrichment of hub versus non-hub non-promoter CREs (n = 73,204) as a function H3K27ac chromatin connectivity from CD4-Naive, GM12878 and K562 cells. Bar height represents log odds (logit) effect size on the probability of a CRE being a hub as a function of its connectivity group, no loops is the intercept of the model. Error bars correspond to 95% confidence interval.
Fig. 5
Fig. 5. MethNet hubs uncover known and potentially novel regulatory elements in the Protocadherin gene cluster.
(chr5:140,000,000−141,000,000) MethNet associations are shown in the top track. Red associations are activating and black repressing. All other tracks are from UCSC Genome Browser. Chromatin marks and CTCF binding sites data are provided by ENCODE. In-situ Hi-C data were generated by Rao et al., and processed with Juicebox to compute contact enrichment (colormap = log2(observed/expected normalized counts), range [−4, 4]). The HS5-1 enhancer, (highlighted in blue), is a known regulator of the PCDHA cluster. The enhancer on the left (highlighted in orange) is a MethNet discovery that is identified as a regulator of the PCDHA, PCDH and PCDHA clusters. The region of the previously unreported enhancer has increased contact frequency with all three Protocadherin families (highlighted by red in the Hi-C heatmap).
Fig. 6
Fig. 6. High scoring MethNet associations are mediated by long-range chromatin interactions.
a Example of MethNet associations regulating TP53 that overlap chromatin loops identified using promoter-capture Hi-C from the A549 and K562 cell lines. The Genome Browser session shows the chromatin context around the promoter of TP53 (chr17:6,886,465-8,364,371) for both cell lines. This includes RNA-seq (ENCODE/Caltech for K562, ENCODE/HAIB for A549 ETOH), methylation status (ENCODE/HAIB - orange and blue correspond to methylated and unmethylated regions, respectively), CTCF binding sites, and ChromHMM chromatin states for K562. A549 ChromHMM states were downloaded from Roadmap Epigenomics (15-state core model). All tracks were loaded with default settings, except RNA-seq which was capped at the top for a better overview. Red and black bars correspond to predicted activating and repressive MethNet associations, respectively. Promoter-capture Hi-C loops (shown as arcs) that overlap with MethNet CRE predictions are shown in virtual 4 C format. Darker arcs correspond to loops called in both cell lines. b Bar graph depicting the probability of a MethNet association overlapping with a chromatin loop in either cell line as a function of its score (n = 1,585,070). Bar heights correspond to the probability of an association of the corresponding group overlaping a loop computed using a logistic regression model. Error bars correspond to the 95% confidence interval. c ROC curves showing the ability of MethNet potential to predict chromatin hubs. We only considered gene promoters because of experimental bias. The AUC increases for stricter criteria of hub calling.
Fig. 7
Fig. 7. Perturbation of MethNet hubs results in altered target gene expression.
a Outline of the perturb-seq validation experiment. b MA plot showing the differential expression induced by CRISPRi targeting of regulatory regions. Points correspond to genes targeted by sgRNAs. Point color indicates whether the interaction was predicted by MethNet. Point shape indicates whether the interaction was considered significant using the False Discovery Rate (FDR) of 0.05. c Bootstrap distribution of the number of targeted regions that show significant changes in gene expression. The number of observed validated targets is marked by a red arrow. d Genome Browser session showing an example of a CRE hub validated by expression changes in two predicted gene targets. A zoomed in version of the browser around the hub regulatory element is shown. The effect of all sgRNA guides targeting the CRE on expression (log of normalized counts) of target genes across all cells is shown in the violin plots and boxplots (nUntargeted = 17,790, nTargeted = 1301). Boxplots were drawn with the following parameters: box bounds correspond to 1st and 3rd quantiles, center mark corresponds to median, whisker length is 1.5 the height of the box (inter-quantile region) or up to the extrema of the distribution if they are closer to the box bound. A Welch two-sided t-test is used to calculate the p-value (pANXA2 = 2.8 × 10−6, CIANXA2 = [0.028, 0.069], pGCNT3 = 1.3 × 10−4, CI GCNT3 = [0.034, 0.106]). Figure 7/panel a Created with BioRender.com released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license (https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en).

Update of

References

    1. Shen H, Laird PW. Interplay between the cancer genome and epigenome. Cell. 2013;153:38–55. doi: 10.1016/j.cell.2013.03.008. - DOI - PMC - PubMed
    1. Iranzo J, Martincorena I, Koonin EV. Cancer-mutation network and the number and specificity of driver mutations. Proc. Natl Acad. Sci. USA. 2018;115:E6010–E6019. doi: 10.1073/pnas.1803155115. - DOI - PMC - PubMed
    1. Yin Y, et al. Impact of cytosine methylation on DNA binding specificities of human transcription factors. Science. 2017;356:eaaj2239. doi: 10.1126/science.aaj2239. - DOI - PMC - PubMed
    1. Ahmed M, et al. CRISPRi screens reveal a DNA methylation-mediated 3D genome dependent causal mechanism in prostate cancer. Nat. Commun. 2021;12:1781. doi: 10.1038/s41467-021-21867-0. - DOI - PMC - PubMed
    1. Zeng Y, et al. DNA methylation modulated genetic variant effect on gene transcriptional regulation. Genome Biol. 2023;24:285. doi: 10.1186/s13059-023-03130-5. - DOI - PMC - PubMed

MeSH terms

Substances