Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Apr 23;11(4):e1005176.
doi: 10.1371/journal.pgen.1005176. eCollection 2015 Apr.

Cross-population joint analysis of eQTLs: fine mapping and functional annotation

Affiliations

Cross-population joint analysis of eQTLs: fine mapping and functional annotation

Xiaoquan Wen et al. PLoS Genet. .

Abstract

Mapping expression quantitative trait loci (eQTLs) has been shown as a powerful tool to uncover the genetic underpinnings of many complex traits at molecular level. In this paper, we present an integrative analysis approach that leverages eQTL data collected from multiple population groups. In particular, our approach effectively identifies multiple independent cis-eQTL signals that are consistent across populations, accounting for population heterogeneity in allele frequencies and linkage disequilibrium patterns. Furthermore, by integrating genomic annotations, our analysis framework enables high-resolution functional analysis of eQTLs. We applied our statistical approach to analyze the GEUVADIS data consisting of samples from five population groups. From this analysis, we concluded that i) jointly analysis across population groups greatly improves the power of eQTL discovery and the resolution of fine mapping of causal eQTL ii) many genes harbor multiple independent eQTLs in their cis regions iii) genetic variants that disrupt transcription factor binding are significantly enriched in eQTLs (p-value = 4.93 × 10(-22)).

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The comparison of powers in identifying independent causal eQTL regions.
We compare the performance of the three competing methods in determining the regions harboring the true causal eQTL in a cross-population meta-analytic setting: the proposed Bayesian multi-SNP analysis method (brown line), a conditional meta-analysis approach (dark green line) and a single SNP meta-analysis approach (navy blue line). Each plotted point on the figure represents the number of true positive findings versus the number of false positive findings of a given method at a particular threshold. For any false positive value, the proposed Bayesian approach always yields the most true positive findings.
Fig 2
Fig 2. An example of modest yet consistent eQTL signals across population groups.
The forest plot shows the genetic effects of SNP rs7207370 with respect to the expression levels of gene NME1 (Ensembl ID: ENSG00000239672). SNP rs7207370 is one of the top associated cis-SNPs in all population groups, yet the strengths of the association signals are modest in all groups (the maximum single SNP Bayes factor, among five groups, is 18.0 in FIN, the corresponding gene-level Bayes factor is 1.8). As a consequence, the gene is not identified as an eGene in any of the separate analyses. Across populations, the SNP exhibits a strongly consistent association pattern. In the cross-population meta-analysis, the gene-level Bayes factor reaches 1.1 × 104 (single SNP Bayes factor for rs7207370 is 9.5 × 105).
Fig 3
Fig 3. Histogram of posterior expected number of cis-eQTLs in 6,555 identified eGenes.
The figure indicates that we identified only single cis-eQTLs for most eGenes. However, for a non-trivial proportion of eGenes, multiple independent eQTL signals were identified.
Fig 4
Fig 4. An example of a gene harboring four independent cis-eQTL signals.
The top left panel plots the cis-SNPs with PIP ≥ 0.02. The locations of the SNPs are labeled with respect to the TSS of gene LHPP. The ticks on the x-axis indicate all interrogated cis-SNPs in the region. The SNPs with the same color are in high LD and represent the same eQTL signal. In the plot, the sums of the PIPs from the SNPs in the same colors are all ∼ 1, indicating that we are confident of the existence of each signal. The heatmaps show the LD patterns in each of the population group. They are qualitatively similar, except that the SNPs representing the first signals are monomorphic in GBR. In the bottom panel, we plot the effect sizes of eQTLs jointly estimated from one of the high posterior probability models. Each of the SNP plotted belongs to a different colored cluster in the PIP plot (as indicated by the color coding of the error bars). The effect sizes and standard errors are estimated from the multiple linear regression models (containing all four SNPs) separately fitted in each population group. All the signals show strong effect size consistency across populations.
Fig 5
Fig 5. An example of automatic LD filtering across population groups.
The top panel shows the result of multiple cis-eQTL analysis for gene AGO3 using only the data from TSI. The SNPs with PIPs ≥ 0.02 are plotted. All SNPs plotted are in high LD in TSI, and the sum of the PIPs across the genomic region is close to 1. The region spanned by the signals is ∼ 140 kb. We repeated the analysis jointly across all five populations, the SNPs with PIPs ≥ 0.02 are plotted in the middle panel. The genomic region harboring the eQTL is narrowed down into a 1.2 kb region enclosed by three SNPs each with PIP ∼ 0.33. The bottom panel shows the LD heatmaps between the 41 SNPs plotted in the top panel in TSI, GBR and YRI, respectively. The multiple cis-eQTL mapping method takes advantage of the varying LD patterns across populations, and automatically narrows down the region harboring the true causal cis-eQTL.
Fig 6
Fig 6. Histogram of reduction in 95% credible region length by cross-population joint analysis.
The histogram shows the differences in the credible region lengths between the average of the five separate population analyses and the joint analysis. The analysis is performed using the set of 526 eGenes that highly likely harbor exactly one cis-eQTL. Only 3 out of 526 eGenes show (slightly) increased credible region lengths (negative reduction values) in the joint analysis.
Fig 7
Fig 7. Multi-SNP analysis explains strong effect size heterogeneity observed in single SNP analysis.
SNP rs6006800 and the SNPs in LD are labeled by the purple triangles in the top panel. They display strong but opposite effects in the European and YRI populations when analyzed alone. The middle left panel shows the effect sizes of rs6008600 separately estimated in the five populations by the single SNP analysis. The top panel shows the multiple cis-eQTL analysis result. SNPs with PIPs ≥ 0.02 are plotted. The result suggests that there are two independent signals in the region, with one represented by SNPs colored in green, and the other represented by SNPs colored in brown. The sums of the green and brown SNPs are both very close to 1. SNP rs6008600 and the SNPs in LD all have PIPs ∼ 0 in the multiple cis-eQTL analysis. The middle right panel shows the effect sizes of rs6006800 estimated from the multiple linear regression models controlling for the two independent signals in each population: the genetic association observed in the single SNP analysis is seemingly “explained away” by the two independent signals identified by the fine mapping analysis. The bottom panel shows LD heatmaps between SNPs highlighted in the top panel (green, brown SNPs and the SNPs labeled by the purple triangles) in the five populations. Some of the green SNPs are monomorphic in YRI. The opposite effects of rs6006800 are clearly explained by the varying LD patterns: rs6006800 is in high LD with the brown SNPs in YRI, whereas in European populations it tags the green SNPs.
Fig 8
Fig 8. Distributions of SNP effect size heterogeneity from single and multi-SNP analyses.
The heterogeneity of SNP effects across the five populations is measured by log10[BFmaxH/BFfix], for which large values suggest highly heterogeneous (i.e., inconsistent) genetic effects in different populations. Each point on the plot represents a SNP in a unique gene showing large effect size heterogeneity in the single SNP analysis (log10[BFmaxH/BFfix] ≥ 5). When controlling the consistent cis-eQTL signals identified by the fine mapping approach in the multi-SNP analysis, we observe a great reduction of effect size heterogeneity.
Fig 9
Fig 9. Enrichment of cis-eQTLs with repect to distance to TSS.
The top panel shows the distribution of posterior expected number of cis-eQTLs with respect to SNP distance to TSS. The bottom panel shows the estimated cumulative percentage of cis-eQTLs with respect to SNP disance to TSS from two different methods. The brown line represents the estimate using the PIPs from the proposed Bayesian approach. The dark green line represents the estimate from a standard method based on single SNP association results. More specifically, the dark green line is estimated by counting the number of SNPs exceeds FDR 5% threhold in each distance bin. The brown line shows a much faster decay in abundance of cis-eQTL signals away from TSS.
Fig 10
Fig 10. Comparison of fine mapping results of gene LY86 before and after incorporating functional annotations.
The top panel is based on the analysis using prior (Eq 5) without annotation information. The bottom panel is based on the analysis using prior (Eq 6) incorporating SNP distances to TSS and the annotations for binding variants and footprint SNPs. In both panels, SNPs with PIPs ≥ 0.02 are plotted. There are clearly three independent cis-eQTLs in the region, represented by different colors of SNPs. SNPs in smae colors are in high LD. The sums of PIPs from SNPs in same colors are all close to 1. The circled points are predicted binding variants. It is clear that binding variants are up-weighted when annotation information is incorporated into the fine mapping analysis.
Fig 11
Fig 11. The comparison between the realized falase discovery rate and the estimated false discovery rate using the PIPs from the proposed Bayesian method in the simulation study.
The plot indicates that the FDRs estimated by PIPs are mostly accurate and become slightly conservative for large FDR values.

References

    1. Nica AC, Montgomery SB, Dimas AS, Stranger BE, Beazley C, et al. (2010) Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLoS genetics 6: e1000895 10.1371/journal.pgen.1000895 - DOI - PMC - PubMed
    1. Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, et al. (2010) Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genetics 6: e1000888 10.1371/journal.pgen.1000888 - DOI - PMC - PubMed
    1. Hao K, Bosse Y, Nickle DC, Pare PD, Postma DS, et al. (2012) Lung eQTLs to help reveal the molecular underpinnings of asthma. PLoS genetics 8: e1003029 10.1371/journal.pgen.1003029 - DOI - PMC - PubMed
    1. GTEx Consortium (2013) The genotype-tissue expression (gtex) project. Nature Genetics 45: 580–585. 10.1038/ng.2653 - DOI - PMC - PubMed
    1. Dimas AS, Deutsch S, Stranger BE, Montgomery SB, Borel C, et al. (2009) Common regulatory variation impacts gene expression in a cell type-dependent manner. Science 325: 1246–1250. 10.1126/science.1174148 - DOI - PMC - PubMed

Publication types