Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct 12:6:8555.
doi: 10.1038/ncomms9555.

Bayesian integration of genetics and epigenetics detects causal regulatory SNPs underlying expression variability

Collaborators, Affiliations

Bayesian integration of genetics and epigenetics detects causal regulatory SNPs underlying expression variability

Avinash Das et al. Nat Commun. .

Abstract

The standard expression quantitative trait loci (eQTL) detects polymorphisms associated with gene expression without revealing causality. We introduce a coupled Bayesian regression approach--eQTeL, which leverages epigenetic data to estimate regulatory and gene interaction potential, and identifies combination of regulatory single-nucleotide polymorphisms (SNPs) that explain the gene expression variance. On human heart data, eQTeL not only explains a significantly greater proportion of expression variance but also predicts gene expression more accurately than other methods. Based on realistic simulated data, we demonstrate that eQTeL accurately detects causal regulatory SNPs, including those with small effect sizes. Using various functional data, we show that SNPs detected by eQTeL are enriched for allele-specific protein binding and histone modifications, which potentially disrupt binding of core cardiac transcription factors and are spatially proximal to their target. eQTeL SNPs capture a substantial proportion of genetic determinants of expression variance and we estimate that 58% of these SNPs are putatively causal.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Overview of eQTeL model.
(a) Input and output of eQTeL. eQTeL takes genotype and gene expression across samples, epigenetic and interaction features for each SNP and LD block as input. It outputs regulatory SNPs and their target genes, their effect sizes and regulatory-interaction potentials, as well as estimated feature importance of each epigenetic and interaction feature. (b) eQTeL is composed of two coupled regression models (i) a Bayesian variable selection with informative priors models expression as a linear combination of SNPs. Given the regulatory and interaction priors, this hierarchical model first identifies LD blocks and then combinations of SNPs that explains expression variance and that also have high regulatory and interaction potentials. (ii) A Bayesian logistic regression specifies the regulatory and interaction potential as linear model of epigenetic and interaction features in semi-supervised manner. The logistic regression passes the regulatory and interaction potentials to the variable selection model, while the variable selection model passes expression-regulators to the logistic regression model.
Figure 2
Figure 2. Comparative performance of different methods applied to human heart data (MAGNet).
The analysis is based on 2428 SNPs identified by eQTeL for which posterior probability of selection >0.5. To ensure the same total number of SNPs selected by eQTeL, eqtnminer and LASSO: for eqtnminer we sort SNPs based on posterior probability and for LASSO based on absolute estimated effect size and then selected top 2,428 SNPs. (a) Explained expression variance based on three representative methods on human heart data. (b) Accuracy of predicted expression of three methods. (c) Explained expression variance for human heart data by potentially functional (approximated by overlap with a footprint) genotyped SNPs and imputed SNPs. (d) Cross-data set generalization of MAGNet eeSNPs: expression predictability in GTEx by eeSNPs identified in MAGNet.
Figure 3
Figure 3. eQTeL identify causal SNP accurately in semi-simulated data.
(a) Design of simulaton study: simulation study uses (i) 174800 SNPs from MAGNet Genotype (874 SNPs per gene) data for 313 samples, (ii) distribution of number of expression-regulators per gene from MAGNet data, (iii) distribution of explained expression variance estimated from MAGNet data, (iv) ENCODE epigenetic data for heart cell lines and (v) distribution of epigenetic data for regulators VISTA heart enhancers. Expression regulators per gene were chosen amongst regulators (1% of MAGNet SNPs). Using allele status of expression regulators in 313 samples expression of 200 genes was generated such that explained variance distribution matches MAGNets explained variance. Epigenetic data for regulators were generated using the epigenetic distribution estimated from VISTA heart enhancers. (b) Comparative performance assessment on simulated data. Methods include (i) Matrix-eQTL (univariate-eQTL): univariate regression, (ii) LASSO: L1 regularizer multivariate regression, (iii) variable selection: Bayesian variable selection, (iv) eqtnminer: Bayesian variable selection with empirical-priors, (v) epigenetic-only: epigenetic feature weights derived from verified enhancers and used to prioritize SNPs, (vi) eQTeL: proposed method and (vii) known-epigenetic-priors-eQTeL: eQTeL with fixed epigenetic priors as in epigenetic-only. Number of SNPs each methods were controlled.
Figure 4
Figure 4. eQTeL increase statistical power to detect small-effect regulatory SNPs: comparsion of effect-size of SNPs detected by eQTeL and eqtnminer.
Number of SNPs for each method was controlled. eQTeL can detect SNPs with small effect size if the regulatory potential of SNP is high. eQTeL-high-potential are subset of eeSNPs with interacting-regulatory potential=1 and eQTeL-low-potential are subset with interacting-regulatory potential<0.1.
Figure 5
Figure 5. Large fraction of eeSNPs overlaps with DNAse footprint relative to other methods, particularly for heart-related tissues (highlighted in red).
This analysis is based on 2,428 SNPs identified by eQTeL for which posterior probability of selection >0.5. For eqtnminer, we selected the best SNP reported for each gene. For LASSO we selected 2,428 SNPs by sorting the effect sizes. We looked at the footprint in 42 cell lines overlapping the SNP within 25 bps the SNP loci by using bedtools for each method. The heart-related tissues are highlighted in red in the figure. The left-most bar represents pooled data from all heart-related cell types. Note the relative enrichment of each method remains same even if we control for SNPs per gene in each method.
Figure 6
Figure 6. DNAse hypersensitivity at eeSNPs shows greater allele specificity in HCM.
X axis: rank of DHS read counts, Y axis: absolute log-ratio of read counts mapping to the two alleles at a SNP. SNPs from different methods are selected similarly to Fig. 5. The analysis was performed on a subset of SNPs that were heterozygous in the sample. The median "white" lines represent LOESS (local regression) for each method. Confidence intervals for each median line is estimated using bootstrapping and are represented either by thin lines representing the LOESS of each bootstrap or by coloured shades representing confidence intervals in terms of standard deviation of bootstraps. Note the allele-specificity at SNPs detected by eQTeL and eqtnminer remains the same even if we control for number of SNPs per gene.
Figure 7
Figure 7. eeSNP-gene pairs are spatially proximal.
X axis: the rank of eeSNP-gene distance (log 10), Y axis: ChIA-pet support. SNPs from eQTeL and eqtnminer are selected as in Fig. 8. The random SNP-gene pairs were selected so as to have the same distance distribution as for eeSNPs. SNP-gene pair closer to 100 bps were excluded. The median ‘white' lines represent LOESS (local regression) for each method. Confidence was estimated for each method just as in Fig. 6.
Figure 8
Figure 8. Regulatory motifs disrupted by eeSNP include several cardiac TFs.
Only the motifs with average allele-specific binding score ratio>1.5 and Wilcoxon test P value<0.05 are shown, ordered by the ratio. Motifs corresponding to known cardiac TF families are shown in red and additional motifs with literature evidence of involvement in cardiac development or function are shown in blue.

References

    1. Lonsdale J. et al.. The genotype-tissue expression (gtex) project. Nat. Genet. 45, 580–585 (2013). - PMC - PubMed
    1. Beyer K. & Goldstein J. When is nearest neighbour meaningful? Database TheoryICDT'99 (1999). URL http://link.springer.com/chapter/10.1007/3-540-49257-7/_15. - DOI
    1. Kraft P. & Hunter D. Genetic risk prediction: are we there yet? N. Engl. J. Med. 360, 1701–1703 (2009). - PubMed
    1. Hirschhorn J. N. Genomewide association studies-illuminating biologic pathways. N. Engl. J. Med. 360, 1699–1701 (2009). - PubMed
    1. Ward L. D. & Kellis M. Interpreting noncoding genetic variation in complex traits and human disease. Nat. Biotechnol. 30, 1095–1106 (2012). - PMC - PubMed

Publication types