Joint analysis of GWAS and multi-omics QTL summary statistics reveals a large fraction of GWAS signals shared with molecular phenotypes

Yang Wu¹, Ting Qi^{2

3}, Naomi R Wray^{1

4}, Peter M Visscher¹, Jian Zeng¹, Jian Yang^{2

3}

Affiliations

¹ Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia.
² School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310024, China.
³ Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang 310024, China.
⁴ Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072, Australia.

PMID: 37601976
PMCID: PMC10435383
DOI: 10.1016/j.xgen.2023.100344

Joint analysis of GWAS and multi-omics QTL summary statistics reveals a large fraction of GWAS signals shared with molecular phenotypes

Yang Wu et al. Cell Genom. 2023.

. 2023 Jun 19;3(8):100344.

doi: 10.1016/j.xgen.2023.100344. eCollection 2023 Aug 9.

Authors

Yang Wu¹, Ting Qi^{2

3}, Naomi R Wray^{1

4}, Peter M Visscher¹, Jian Zeng¹, Jian Yang^{2

3}

Affiliations

¹ Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia.
² School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310024, China.
³ Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang 310024, China.
⁴ Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072, Australia.

PMID: 37601976
PMCID: PMC10435383
DOI: 10.1016/j.xgen.2023.100344

Abstract

Molecular quantitative trait loci (xQTLs) are often harnessed to prioritize genes or functional elements underpinning variant-trait associations identified from genome-wide association studies (GWASs). Here, we introduce OPERA, a method that jointly analyzes GWAS and multi-omics xQTL summary statistics to enhance the identification of molecular phenotypes associated with complex traits through shared causal variants. Applying OPERA to summary-level GWAS data for 50 complex traits (n = 20,833-766,345) and xQTL data from seven omics layers (n = 100-31,684) reveals that 50% of the GWAS signals are shared with at least one molecular phenotype. GWAS signals shared with multiple molecular phenotypes, such as those at the MSMB locus for prostate cancer, are particularly informative for understanding the genetic regulatory mechanisms underlying complex traits. Future studies with more molecular phenotypes, measured considering spatiotemporal effects in larger samples, are required to obtain a more saturated map linking molecular intermediates to GWAS signals.

Keywords: Bayesian analysis; complex trait; gene discovery; genetic regulatory mechanisms; genome-wide association study; joint analysis; molecular phenotype; molecular quantitative trait locus; multi-omics; summary statistics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
Schematic overview of OPERA OPERA combines GWAS summary statistics, multiple xQTL summary statistics from multiple omics layers, and reference LD to identify molecular phenotypes and their combinations that are associated with a complex trait of interest because of pleiotropy. The OPERA analysis consists of three steps. OPERA first estimates the global proportions of possible configurations (i.e., $π$ ) using a set of quasi-independent loci across the genome. Using the estimated $π$ and data likelihood under each configuration, OPERA then computes the posterior probability for each configuration ( ${P P C}_{γ}$ ) for all possible combinations between molecular phenotypes at a GWAS locus. Thereafter, PPCs across configurations are combined to compute the marginal posterior probability of association (PPA) for each exposure and the joint PPA for multiple exposures (STAR Methods). For associations with high PPAs (e.g., PPA > 0.9), OPERA performs the heterogeneity test (multi-exposure HEIDI test) to reject associations that are caused by linkage. The marginal and joint molecular phenotype associations with high PPAs and passed multi-exposure HEIDI test are accepted as pleiotropic/causal associations. The solid symbols represent the exposure sites associated with the outcome, and the hollow symbols indicate that the exposure sites have a null effect on the outcome.

**Figure 2**
Performance of OPERA in estimating the proportion of loci under each configuration and detecting multi-omics associations (A) Estimation of $π$ from OPERA using simulations based on the imputed genotype data from the UK Biobank, where $π$ is the vector of proportions attributed to each configuration. Shown are the results based on 400 simulated independent loci, with three molecular phenotypes at each locus. Each dot shows the mean difference between the estimated and true proportions across 100 simulation replicates, and each solid line represents ±1 SD. The x axis shows the configurations with their true proportions in parentheses. For example, the proportion of configuration where none of the three exposures are associated with complex trait, i.e., H:0:0:0, is 0.78. The red dashed line represents zero difference between the estimated and true proportions. The color of the box denotes the sample size of xQTL studies for the three exposures. (B) Relationship between PPA and the true discovery rate across ten PPA bins. Colors show the results of different association hypotheses, e.g., H:1 indicates the marginal association of the first exposure with the outcome, and H:123 indicates joint association of three exposures with the outcome. The red line represents $y = x$ . (C, E, and G) The observed false discovery rate (FDR), false positive rate (FPR), and power along with an increasing PPA threshold. In (C) and (E), the red dashed line represents an FDR or FPR of 0.05. (D and F) Consistency of the estimated and observed FDR (or FPR) from OPERA given a PPA threshold, with the red line being y = x. Shown in (B) to (G) are the results computed from 100 simulations using a sample size of 300,000 for GWASs and 1,000 for the xQTL studies.

**Figure 3**
Increased discovery power using joint analysis of multiple xQTL datasets (A) The statistical power of OPERA for detecting causal exposures in simulations using all xQTL data jointly or each xQTL dataset separately. The error bar is the estimated standard error of the mean across simulation replicates. (B) The power of OPERA in simulations with different numbers of causal exposures.

**Figure 4**
Comparison of OPERA with three existing methods: MOLOC, Primo, and HyPrColoc (A–C) The x axis represents the false positive rate, and the y axis represents the statistical power (as measured by true positive rate) of the methods. (D–F) The power of the methods (y axis) at different levels of false positive rate (x axis). We cannot compare OPERA with HyPrColoc with a full ROC curve because HyPrColoc only reports significant PPA that passes the internal selection criterion.

**Figure 5**
Proportion of GWAS loci explained by the detected pleiotropic associations in the joint analysis with seven molecular phenotypes (A) Proportion of GWAS loci explained by each combinatorial pleiotropic association. The quantified proportions are based on results of passed PPA and multi-exposure HEIDI rather than PPC, thus these proportions are not independent and the sum of them is not equal to 1. The x axis shows the association hypotheses with different molecular phenotype combinations; only 32 combinations with largest proportion of GWAS loci explained are shown in the plot. The purple (gray) block represents the presence (absence) of each molecular phenotype, including protein, RNA splicing (Splicing), 3′ UTR alternative polyadenylation (APA), DNA methylation (DNAm), histone modification (HistM), chromatin accessibility (ChromA), and gene expression (Gene). The y axis shows the proportion of GWAS independent loci explained by each marginal or joint association hypothesis across 50 complex traits. In each violin box, the center line shows the median, box limits are the upper and lower quartiles, whiskers represent 1.5× interquartile range, and individual points are outliers. (B) Relationship between proportion of GWAS loci explained by each type of molecular phenotype and number of tested exposure sites for each molecular phenotype. (C) Proportion of GWAS loci explained by any molecular phenotype or gene only for the analyzed 50 complex traits. The estimated proportions are dependent on the sample sizes of xQTL studies in two ways. First, the estimated proportion for each marginal or joint association is expected to be a lower bound given the limited xQTL sample sizes for most molecular phenotypes. Second, the relative differences between molecular phenotypes partially reflect the differences in power between xQTL studies in addition to the differences between the true proportions. The acronyms for complex traits can be found in STAR Methods. (D) Proportion of overlap of molecular phenotypes in the identified joint associations. The proportion is calculated as the number of identified joint associations for each pair of molecular phenotypes divided by the square root of the product of the number of identified marginal associations for each molecular phenotype. Similar to (A), the estimated overlapping proportions are likely affected by sample sizes and coverage of the molecular phenotypes.

**Figure 6**
Prioritizing genes at the *MSMB* locus for prostate cancer The top track shows the −log₁₀(p values) of the GWAS SNPs (gray dots) for prostate cancer. The red diamonds represent OPERA marginal PPA for associations of genes using eQTL data from the GTEx prostate tissue, and circles with different colors are the marginal PPA from OPERA for associations of chromatin peak, histone modification, DNA methylation, and protein with the prostate cancer, respectively. The second track shows −log₁₀(p values) of SNP-gene associations for *MSMB* gene from the GTEx prostate tissue. The subsequent tracks show −log₁₀(p values) of SNP associations for other molecular phenotypes from the corresponding xQTL datasets (STAR Methods). The track on the bottom shows 14 chromatin state annotations inferred from the 127 Roadmap Epigenomics Mapping Consortium samples.

See this image and copyright information in PMC

References

1. Visscher P.M., Wray N.R., Zhang Q., Sklar P., McCarthy M.I., Brown M.A., Yang J. 10 Years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 2017;101:5–22. doi: 10.1016/j.ajhg.2017.06.005. - DOI - PMC - PubMed
1. Hormozdiari F., Gazal S., van de Geijn B., Finucane H.K., Ju C.J.T., Loh P.R., Schoech A., Reshef Y., Liu X., O'Connor L., et al. Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nat. Genet. 2018;50:1041–1047. doi: 10.1038/s41588-018-0148-2. - DOI - PMC - PubMed
1. Hannon E., Weedon M., Bray N., O’Donovan M., Mill J. Pleiotropic effects of trait-associated genetic variation on DNA methylation: utility for refining GWAS loci. Am. J. Hum. Genet. 2017;100:954–959. doi: 10.1016/j.ajhg.2017.04.013. - DOI - PMC - PubMed
1. Gusev A., Mancuso N., Won H., Kousi M., Finucane H.K., Reshef Y., Song L., Safi A., Schizophrenia Working Group of the Psychiatric Genomics Consortium. McCarroll S., et al. Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights. Nat. Genet. 2018;50:538–548. doi: 10.1038/s41588-018-0092-1. - DOI - PMC - PubMed
1. Watanabe K., Taskesen E., van Bochoven A., Posthuma D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 2017;8:1826. doi: 10.1038/s41467-017-01261-5. - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Joint analysis of GWAS and multi-omics QTL summary statistics reveals a large fraction of GWAS signals shared with molecular phenotypes

Affiliations

Joint analysis of GWAS and multi-omics QTL summary statistics reveals a large fraction of GWAS signals shared with molecular phenotypes

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources