Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Oct 23:2024.10.23.619767.
doi: 10.1101/2024.10.23.619767.

The NHGRI-EBI GWAS Catalog: standards for reusability, sustainability and diversity

Affiliations

The NHGRI-EBI GWAS Catalog: standards for reusability, sustainability and diversity

Maria Cerezo et al. bioRxiv. .

Update in

  • The NHGRI-EBI GWAS Catalog: standards for reusability, sustainability and diversity.
    Cerezo M, Sollis E, Ji Y, Lewis E, Abid A, Bircan KO, Hall P, Hayhurst J, John S, Mosaku A, Ramachandran S, Foreman A, Ibrahim A, McLaughlin J, Pendlington Z, Stefancsik R, Lambert SA, McMahon A, Morales J, Keane T, Inouye M, Parkinson H, Harris LW. Cerezo M, et al. Nucleic Acids Res. 2025 Jan 6;53(D1):D998-D1005. doi: 10.1093/nar/gkae1070. Nucleic Acids Res. 2025. PMID: 39530240 Free PMC article.

Abstract

The NHGRI-EBI GWAS Catalog serves as a vital resource for the genetic research community, providing access to the most comprehensive database of human GWAS results. Currently, it contains close to 7,000 publications for more than 15,000 traits, from which more than 625,000 lead associations have been curated. Additionally, 85,000 full genome-wide summary statistics datasets - containing association data for all variants in the analysis - are available for downstream analyses such as meta-analysis, fine-mapping, Mendelian randomisation or development of polygenic risk scores. As a centralised repository for GWAS results, the GWAS Catalog sets and implements standards for data submission and harmonisation, and encourages the use of consistent descriptors for traits, samples and methodologies. We share processes and vocabulary with the PGS Catalog, improving interoperability for a growing user group. Here, we describe the latest changes in data content, improvements in our user interface, and the implementation of the GWAS-SSF standard format for summary statistics. We address the challenges of handling the rapid increase in large-scale molecular quantitative trait GWAS and the need for sensitivity in the use of population and cohort descriptors while maintaining data interoperability and reusability.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest statement: M.I. is a trustee of the Public Health Genomics (PHG) Foundation, a member of the Scientific Advisory Board of Open Targets, and has research collaborations with AstraZeneca, Nightingale Health and Pfizer which are unrelated to this study. No other conflicts of interest were reported by authors.

Figures

Figure 1.
Figure 1.
(A) Number of studies added to the GWAS Catalog per year between 2019 and 2024 (until end of August), showing the growing proportion of molecular quantitative trait studies, defined as all studies annotated with the ontology terms “protein measurement” (EFO_0004747), “metabolite measurement” (EFO_0004725), “lipid measurement” (EFO_0004529) or their child terms. (B) Of all studies annotated with these terms, 54% were annotated with a protein term, 32% with a metabolite term and 11% with a lipid term, while 3% included two or more categories.
Figure 2.
Figure 2.
(A) New layout of GWAS Catalog UI trait page for “lung cancer”. Data is loaded into each tab (e.g. Full summary statistics) only when users click, to improve performance and load times [1]. Improved pagination allows rows to be loaded in batches, with the user able to choose how many rows to display [2]. If required, pages with many rows can be downloaded for analysis offline without having to load the full dataset in the browser. (B) New page displaying all studies in the Catalog, with checkboxes to filter the table to show only specific study types (currently GxE or seqGWAS studies). (C) The Study Information panel on each Study page now includes a flag to indicate if it is a GxE study.
Figure 3.
Figure 3.
Contribution of individuals from UK Biobank (UKB) to the breakdown by ancestry label of (a) individuals and (b) associations in the GWAS Catalog. The effect on publications and studies can be found in Supplementary Figure 4.

References

    1. Mahajan A., Spracklen C.N., Zhang W., Ng M.C.Y., Petty L.E., Kitajima H., Yu G.Z., Rueger S., Speidel L., Kim Y.J. et al. (2022) Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Nat Genet, 54, 560–572. - PMC - PubMed
    1. Tcheandjieu C., Zhu X., Hilliard A.T., Clarke S.L., Napolioni V., Ma S., Lee K.M., Fang H., Chen F., Lu Y. et al. (2022) Large-scale genome-wide association study of coronary artery disease in genetically diverse populations. Nat Med, 28, 1679–1692. - PMC - PubMed
    1. Mallard T.T., Linner R.K., Grotzinger A.D., Sanchez-Roige S., Seidlitz J., Okbay A., de Vlaming R., Meddens S.F.W., Bipolar Disorder Working Group of the Psychiatric Genomics, C., Palmer A.A. et al. (2022) Multivariate GWAS of psychiatric disorders and their cardinal symptoms reveal two dimensions of cross-cutting genetic liabilities. Cell Genom, 2. - PMC - PubMed
    1. Wilkinson M.D., Dumontier M., Aalbersberg I.J., Appleton G., Axton M., Baak A., Blomberg N., Boiten J.W., da Silva Santos L.B., Bourne P.E. et al. (2016) The FAIR Guiding Principles for scientific data management and stewardship. Sci Data, 3, 160018. - PMC - PubMed
    1. Reales G. and Wallace C. (2023) Sharing GWAS summary statistics results in more citations. Commun Biol, 6, 116. - PMC - PubMed

Publication types