Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2021 Sep 2;22(5):bbaa442.
doi: 10.1093/bib/bbaa442.

Comparison of methods for estimating genetic correlation between complex traits using GWAS summary statistics

Affiliations
Comparative Study

Comparison of methods for estimating genetic correlation between complex traits using GWAS summary statistics

Yiliang Zhang et al. Brief Bioinform. .

Abstract

Genetic correlation is the correlation of phenotypic effects by genetic variants across the genome on two phenotypes. It is an informative metric to quantify the overall genetic similarity between complex traits, which provides insights into their polygenic genetic architecture. Several methods have been proposed to estimate genetic correlation based on data collected from genome-wide association studies (GWAS). Due to the easy access of GWAS summary statistics and computational efficiency, methods only requiring GWAS summary statistics as input have become more popular than methods utilizing individual-level genotype data. Here, we present a benchmark study for different summary-statistics-based genetic correlation estimation methods through simulation and real data applications. We focus on two major technical challenges in estimating genetic correlation: marker dependency caused by linkage disequilibrium (LD) and sample overlap between different studies. To assess the performance of different methods in the presence of these two challenges, we first conducted comprehensive simulations with diverse LD patterns and sample overlaps. Then we applied these methods to real GWAS summary statistics for a wide spectrum of complex traits. Based on these experiments, we conclude that methods relying on accurate LD estimation are less robust in real data applications due to the imprecision of LD obtained from reference panels. Our findings offer guidance on how to choose appropriate methods for genetic correlation estimation in post-GWAS analysis.

Keywords: GWAS summary statistics; benchmarking; complex traits; genetic correlation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Comparisons of genetic covariance and correlation estimation using in-sample reference panel. The estimates for (A) genetic covariance and (B) genetic correlation among LDSC, GNOVA, HDL and REML are demonstrated by boxplot, which shows the quantiles of the estimates. The red dashed lines represent true values.
Figure 2
Figure 2
Comparisons of genetic covariance and correlation estimation using external reference panel with matched LD. We compare the estimation of genetic (A) covariance and (B) correlation when the two GWASs were simulated on the same dataset with a 100% sample overlap. The red dashed lines represent true values.
Figure 3
Figure 3
Comparisons of genetic covariance and correlation estimation using external reference panel with mismatched LD. The estimates for (A) genetic covariance and (B) genetic correlation among LDSC, GNOVA and HDL are demonstrated by boxplot, which shows the quantiles of the estimates. The red dashed lines represent true values.
Figure 4
Figure 4
Trait pairs with significant genetic correlation identified by LDSC, GNOVA, HDL and REML for real GWAS data in UKBB, WTCCC and NFBC. This plot uses bars to break down the Venn diagram of overlapped regions in different categories. The four categories shown in the lower panel are correlated trait pairs in (A) UKBB (in-sample reference panel) and (B) WTCCC + NFBC (external reference panel) identified by LDSC, GNOVA, HDL and REML.
Figure 5
Figure 5
Comparisons of point estimates of genetic correlation among LDSC, GNOVA and HDL. The comparisons are presented by scatter plots for (A) LDSC versus GNOVA, (B) LDSC versus HDL and (C) GNOVA versus HDL with R square 0.91, 0.85 and 0.73. Each point represents a trait pair. Color and shape of each data point denote the significance level. The grey dashed lines are formula image.

References

    1. Anttila V, et al. Analysis of shared heritability in common disorders of the brain. Science 2018;360(6395):1313. - PMC - PubMed
    1. Visscher PM, Wray NR, Zhang Q, et al. 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet 2017;101(1):5–22. - PMC - PubMed
    1. ReproGen Consortium, ReproGen Consortium, Psychiatric Genomics Consortium, et al. An atlas of genetic correlations across human diseases and traits. Nat Genet 2015;47(11):1236. - PMC - PubMed
    1. Lu Q, Li B, Ou D, et al. A powerful approach to estimating annotation-stratified genetic covariance via GWAS summary statistics. Am J Human Genetics 2017;101(6):939–64. - PMC - PubMed
    1. Ning Z, Pawitan Y, Shen X. High-definition likelihood inference of genetic correlations across human complex traits. Nat Genet 2020;52(8):859–64. - PubMed

Publication types