Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2026 Jan 22:rs.3.rs-8585052.
doi: 10.21203/rs.3.rs-8585052/v1.

A resource of "bottom-line" variant associations for 1,281 complex traits by integrating data across published genome-wide association studies

Affiliations

A resource of "bottom-line" variant associations for 1,281 complex traits by integrating data across published genome-wide association studies

Trang Nguyen et al. Res Sq. .

Abstract

Through an analysis of 2,602 genome-wide association studies (GWAS) across 830 human traits, we find that most (56% of) well-studied traits have at least two published GWAS, and many (29%) have at least five. We show that the lack of an established approach for adjudicating variant association estimates across multiple published studies can lead to uncertainty and invalid inferences: using all associations ever published for a trait increases true positives (by 12%) but also false positives (by 55%) relative to using associations from the largest published GWAS for the trait. We employ a "bottom-line" procedure for meta-analyzing published GWAS while inferring and accounting for sample overlap, which identifies a more accurate and comprehensive list of associations relative to existing approaches. Five commonly used bioinformatic methods for post-GWAS analyses produce reliable results when applied to the bottom-line associations. We present these results for 1,281 human complex traits, including 1,839 single-ancestry and 576 trans-ancestry analyses, for browsing or download via the NHGRI Association to Function Knowledge Portal. This resource of "consensus" GWAS results is intended to increase replicability, reuse, and interpretation of GWAS and downstream analyses.

PubMed Disclaimer

Conflict of interest statement

Competing interest statement P. D. and G. A. are employees and stockholders of Regeneron Pharmaceuticals. The remaining authors declare no conflicts of interest relevant to this study.

Figures

Figure 1.
Figure 1.. Overview of the study.
(a) Review of the number of GWAS per trait. (b) Different approaches for estimating the single-ancestry consensus association signals. (c) Comparison of “post-GWAS” analyses using the largest GWAS vs. the overlap-corrected GWAS. (d) Evaluation of trans-ancestry association signals against the largest published trans-ancestry GWAS. EU (European) and HS (Hispanic or Latin American) are two example ancestries. (e) The “bottom-line” procedure and the resource of consensus GWAS publicly available on the A2FKP.
Figure 2.
Figure 2.. Summary of GWAS available for various traits.
(a) Number of traits with different numbers of studies available across 830 traits in the A2FKP. (b) Minimum sample size vs. maximum sample size of studies across 261 trait-ancestry pairs with more than one study in the A2FKP. (c) Number of traits with different numbers of studies available across 15,217 traits in the GWAS Catalog. (d) Number of traits with at least one study available in (at least) one to nine ancestries (blue) and number of traits with at least two studies available in (at least) one to nine ancestries (orange) across 830 traits in the A2FKP.
Figure 3.
Figure 3.. Comparison of single-ancestry association signals produced by four consensus estimation approaches.
(a) Number of association signals significant in the largest GWAS vs. any GWAS. (b) Number of association signals (original clumps instead of the merged clumps) significant in the largest GWAS (blue), any GWAS (orange), and any GWAS but not the largest GWAS (green) at various significance levels. Dark areas: signals replicated in the gold-standard GWAS, light areas: signals not replicated in the gold-standard GWAS. Numbers on top of bars represent the proportion of replicated signals. (c) Overlap of association signals significant in the largest GWAS (blue, top-left), any GWAS (orange, top-right) and uncorrected meta-analysis (green, bottom). (d) Overlap of association signals significant in the largest GWAS (blue, left) and uncorrected meta-analysis (green, right). Percentages represent the proportions of signals replicated in the gold-standard GWAS. (e) Overlap of association signals significant in the largest GWAS (blue, top-left), any GWAS (orange, top-right) and overlap-corrected meta-analysis (green, bottom). (f) Overlap of association signals significant in the largest GWAS (blue, left) and overlap-corrected meta-analysis (green, right). Percentages represent the proportions of signals replicated in the gold-standard GWAS. (a), (c), (e): Main analysis of 270 trait-ancestry pairs, (b), (d), (f): Validation analysis of 97 trait-ancestry pairs.
Figure 4.
Figure 4.. Comparison of single-ancestry polygenic analysis results produced from the overlap-aware GWAS summary statistics versus the largest GWAS summary statistics.
(a) Number of traits with different numbers of independent signals when using the overlap GWAS versus the largest GWAS. (b) Correlation between observed heritability estimates produced by the overlap GWAS and the largest GWAS for 243 trait-ancestry pairs. Black, dashed diagonal lines represent equality; red, solid lines represent linear regressions. (c) Correlation between genetic correlation estimates produced by the overlap GWAS and the largest GWAS for 8,097 pairs of traits within the same ancestry. Black, dashed diagonal lines represent equality; red, solid lines represent linear regressions. (d) Spearman’s correlation coefficients between tissue-specific functional annotation enrichment produced by the overlap-aware GWAS and the largest GWAS for 174 significant trait-ancestry pairs. (e) Spearman’s correlation coefficients between gene-level associations produced by the overlap-aware GWAS and the largest GWAS for 218 significant trait-ancestry pairs. (f) Enrichment scores of the top 50 enriched gene sets produced from the largest GWAS in the enriched gene sets produced from the overlap GWAS for 268 trait-ancestry pairs.
Figure 5.
Figure 5.. Summary of the resource of consensus GWAS.
(a) Number of traits with at least one study available in (at least) one to nine ancestries (blue) and number of traits with at least two studies available in (at least) one to nine ancestries (orange) across 1,281 traits in the A2FKP. (b) Total number of single-ancestry significant association signals for 870 traits (1,347 trait-ancestry pairs) across 14 phenotype groups produced by the consensus GWAS (blue), the largest GWAS (dark orange), and any GWAS other than the largest GWAS (light orange). Locus zoom plots of the associations in the 2Mb region around rs241430 and POAG produced from the Japan Biobank GWAS (c) versus the bottom-line procedure for East Asian ancestry (d). Locus zoom plots of the associations in the 2Mb region around rs7636836 and POAG produced from the Japan Biobank GWAS (e) versus the bottom-line procedure for East Asian ancestry (f). Locus zoom plots of the associations in the 2Mb region around rs1547725 and POAG produced from the FinnGen DF1 GWAS (g) versus the bottom-line procedure for European ancestry (h).
Figure 6.
Figure 6.. Trans-ancestry bottom-line association and downstream results on the A2FKP for type 2 diabetes (T2D; https://a2f.hugeamp.org/phenotype.html?phenotype=T2D).
(a) Manhattan plot and QQ plot of the bottom-line GWAS. (b) Top variant association signals. (c) Gene-level associations produced by MAGMA.

References

    1. Visscher P. M. et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet. 101, 5–22 (2017). - PMC - PubMed
    1. Fuchsberger C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016). - PMC - PubMed
    1. Wray N. R., Goddard M. E. & Visscher P. M. Prediction of individual genetic risk of complex disease. Curr. Opin. Genet. Dev. 18, 257–263 (2008). - PubMed
    1. Flannick J. et al. Loss-of-function mutations in SLC30A8 protect against type 2 diabetes. Nat. Genet. 46, 357–363 (2014). - PMC - PubMed
    1. Cerezo M. et al. The NHGRI-EBI GWAS Catalog: standards for reusability, sustainability and diversity. Nucleic Acids Res. 53, D998–D1005 (2025). - PMC - PubMed

Publication types

LinkOut - more resources