Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 18;37(22):4014-4022.
doi: 10.1093/bioinformatics/btab443.

Methylation-eQTL analysis in cancer research

Affiliations

Methylation-eQTL analysis in cancer research

Yusha Liu et al. Bioinformatics. .

Abstract

Motivation: DNA methylation is a key epigenetic factor regulating gene expression. While promoter methylation has been well studied, recent publications have revealed that functionally important methylation also occurs in intergenic and distal regions, and varies across genes and tissue types. Given the growing importance of inter-platform integrative genomic analyses, there is an urgent need to develop methods to discover and characterize gene-level relationships between methylation and expression.

Results: We introduce a novel sequential penalized regression approach to identify methylation-expression quantitative trait loci (methyl-eQTLs), a term that we have coined to represent, for each gene and tissue type, a sparse set of CpG loci best explaining gene expression and accompanying weights indicating direction and strength of association. Using TCGA and MD Anderson colorectal cohorts to build and validate our models, we demonstrate our strategy better explains expression variability than current commonly used gene-level methylation summaries. The methyl-eQTLs identified by our approach can be used to construct gene-level methylation summaries that are maximally correlated with gene expression for use in integrative models, and produce a tissue-specific summary of which genes appear to be strongly regulated by methylation. Our results introduce an important resource to the biomedical community for integrative genomics analyses involving DNA methylation.

Availability and implementation: We produce an R Shiny app (https://rstudio-prd-c1.pmacs.upenn.edu/methyl-eQTL/) that interactively presents methyl-eQTL results for colorectal, breast and pancreatic cancer. The source R code for this work is provided in the Supplementary Material.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Distribution of the number of CpG probes located within ±500 kb of the gene region. For the 9569 genes satisfying our selection criteria, the distribution of the number of CpG probes located within the gene body or the flanking region of ±500 kb on either end from the Illumina 450 K methylation array per gene is shown
Fig. 2.
Fig. 2.
The probability of selection as a function of a CpG’s epigenomic characteristics in CRC. Each subfigure shows the probability that a CpG probe is selected (indicated by y-axis) as a function of its relative distance to the TSS and TES of the associated gene (indicated by x-axis), CpG type (CpG island, CpG shore, CpG shelf, open sea), gene size (small, medium, large), separately for CpGs negatively (top row) or positively (bottom row) associated with gene expression
Fig. 3.
Fig. 3.
The odds ratios of selection for each pair of CpG types. For every pair of CpG types, the odds ratios (OR) of selection are shown for CpGs (A) inside the gene body and (B) outside the gene body, with P-value testing whether OR = 1 (corrected for multiple testing) given in parentheses. In each table, the odds ratio of every pair of CpG types (CpG type in the column header as numerator; CpG type in the row header as denominator) for marginally positively correlated CpGs are shown in the upper half; the odds ratio of every pair of CpG types (CpG type in the row header as numerator; CpG type in the column header as denominator) for marginally negatively correlated CpGs are shown in the lower half. For example, for a marginally positively correlated CpG in the gene body, a CpG located in a shelf is 40% more likely to be selected than one from a CpG shore (OR = 1.40, P-value <1e-4); for marginally negatively correlated CpG sites in the gene body, a CpG located in the shelf is 18% less likely to be selected than one from a CpG shore (OR = 0.82, P-value =1e-4)
Fig. 4.
Fig. 4.
Performance comparison of Seq-Lasso with other methods in the simulation study. For the 500 randomly selected genes, the subfigures compare the performance of Seq-Lasso with alternatives for identifying methyl-eQTLs in terms of model sparsity as measured by the number of selected CpGs on the training data in (A), and predictive accuracy on the test data as measured by the Spearman correlation between actual and predicted gene expressions in (B) and relative prediction error in (C). We take the median across 100 replicates to compute a gene-specific average for each performance measure
Fig. 5.
Fig. 5.
Performance comparison of Seq-Lasso with other methods in the CRC data. For the 5586 genes with a Spearman correlation of at least 0.40 between actual and predicted gene expressions in the TCGA cohort using at least one method, the subfigures compare the performance of Seq-Lasso with alternatives for identifying methyl-eQTLs in terms of model sparsity as measured by the number of selected CpGs in (A), and the ability to explain expression variability as measured by the Spearman correlation between actual and predicted gene expressions in the TCGA cohort in (B) and in the MDACC cohort in (C)
Fig. 6.
Fig. 6.
Fold enrichment of annotated chromatin states in CpG sites identified by Seq-Lasso in CRC. The plot shows the estimated odds ratio of each annotated chromatin state for CpG sites identified by Seq-Lasso, relative to the unselected CpG sites. The open circles represent the point estimates of odds ratio, and the horizontal bars denote 95% confidence intervals. The chromatin state annotations are based on 17 CRC primary tumor samples with epigenomes generated at MDACC
Fig. 7.
Fig. 7.
Methyl-eQTLs of AREG in CRC, and heatmap of absolute correlations across CpGs. (A)–(C) show the methyl-eQTLs identified by different methods, with the marks at the top indicating all CpG sites in the region (CpG island-red; CpG shore-pink; CpG shelf-green; open sea-black), and the marks at the bottom indicating active chromatin states predicted using ChromHMM (active TSS-red; flanking TSS-orange red; active enhancer-orange; transcribed enhancer-green yellow). The heatmaps (D)–(F) show the pairwise absolute Pearson correlation coefficients between CpGs selected by Seq-Lasso (arranged in columns) and all CpGs within ±500 kb of AREG (arranged in rows). The red marks on the left of each heatmap denote the CpGs identified by Seq-Lasso, and the black marks denote the CpGs that are selected by Lasso but not Seq-Lasso
Fig. 8.
Fig. 8.
Methyl-eQTLs of EREG in CRC, and heatmap of absolute correlations across CpGs. (A)–(C) show the methyl-eQTLs identified by different methods, with the marks at the top indicating all CpG sites in the region (CpG island-red; CpG shore-pink; CpG shelf-green; open sea-black), and the marks at the bottom indicating active chromatin states predicted using ChromHMM (active TSS-red; flanking TSS-orange red; active enhancer-orange; transcribed enhancer-green yellow). The heatmaps (D)–(F) show the pairwise absolute Pearson correlation coefficients between CpGs selected by Seq-Lasso (arranged in columns) and all CpGs within ±500 kb of EREG (arranged in rows). The red marks on the left of each heatmap denote the CpGs identified by Seq-Lasso, and the black marks denote the CpGs that are selected by Lasso but not Seq-Lasso

References

    1. Aran D. et al. (2013) DNA methylation of distal regulatory sites characterizes dysregulation of cancer genes. Genome Biol., 14, R21. - PMC - PubMed
    1. Brenet F. et al. (2011) DNA methylation of the first exon is tightly linked to transcriptional silencing. PLoS One, 6, e14524. - PMC - PubMed
    1. Cheng W. et al. (2014) Graph-regularized dual lasso for robust eQTL mapping. Bioinformatics, 30, i139–i148. - PMC - PubMed
    1. Cheng W. et al. (2016) Sparse regression models for unraveling group and individual associations in eQTL mapping. BMC Bioinformatics, 17, 11. - PMC - PubMed
    1. Chun H., Keles S. (2009) Expression quantitative trait loci mapping with multivariate sparse partial least squares regression. Genetics, 182, 79–90. - PMC - PubMed

Publication types