Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr 1;15(1):74.
doi: 10.1186/s12920-022-01216-w.

The landscape of GWAS validation; systematic review identifying 309 validated non-coding variants across 130 human diseases

Affiliations

The landscape of GWAS validation; systematic review identifying 309 validated non-coding variants across 130 human diseases

Ammar J Alsheikh et al. BMC Med Genomics. .

Abstract

Background: The remarkable growth of genome-wide association studies (GWAS) has created a critical need to experimentally validate the disease-associated variants, 90% of which involve non-coding variants.

Methods: To determine how the field is addressing this urgent need, we performed a comprehensive literature review identifying 36,676 articles. These were reduced to 1454 articles through a set of filters using natural language processing and ontology-based text-mining. This was followed by manual curation and cross-referencing against the GWAS catalog, yielding a final set of 286 articles.

Results: We identified 309 experimentally validated non-coding GWAS variants, regulating 252 genes across 130 human disease traits. These variants covered a variety of regulatory mechanisms. Interestingly, 70% (215/309) acted through cis-regulatory elements, with the remaining through promoters (22%, 70/309) or non-coding RNAs (8%, 24/309). Several validation approaches were utilized in these studies, including gene expression (n = 272), transcription factor binding (n = 175), reporter assays (n = 171), in vivo models (n = 104), genome editing (n = 96) and chromatin interaction (n = 33).

Conclusions: This review of the literature is the first to systematically evaluate the status and the landscape of experimentation being used to validate non-coding GWAS-identified variants. Our results clearly underscore the multifaceted approach needed for experimental validation, have practical implications on variant prioritization and considerations of target gene nomination. While the field has a long way to go to validate the thousands of GWAS associations, we show that progress is being made and provide exemplars of validation studies covering a wide variety of mechanisms, target genes, and disease areas.

Keywords: Experimental validation; Functional variant; GWAS; Non-coding; Systematic review.

PubMed Disclaimer

Conflict of interest statement

AA, SW, EK, JR, SG, ST and HJ are employees of AbbVie. LS, JWD and JL were employees of AbbVie at the time of the study.

Figures

Fig. 1
Fig. 1
Systematic literature search and validation approach. Flow diagram demonstrating the systematic literature search strategy starting with A broad Medline search including all potentially related articles. The search included several concepts related to GWAS, non-coding contexts and other related terms detailed in Additional file 1. B Using text-mining of article titles, abstracts and metadata, we built seven filters to narrow down the search results which excluded 35,222 articles. Exact search terms and their combinations used in the filters are provided in Additional file 2. C 1454 articles of interest that passed all the filters were manually screened and evaluated for eligibility. D Through manual curation an additional set of 579 articles was excluded. E 875 eligible articles that passed manual curation were annotated to identify key information from each study. F These articles proceeded to cross-referencing against the GWAS Catalog to ensure that the validated variants and their reported associated disease trait match known GWAS associations. G Cross-referencing excluded 598 articles with poor GWAS trait matches or no variant match. H The final systematic review includes 286 articles. Reasons for exclusion at each stage are shown in red on the right side and described in more detail in the main text
Fig. 2
Fig. 2
Map of 309 validated GWAS non-coding variants. The Circos plot displays the 309 experimentally validated variants studied within the 286 included articles. The outer most layer (i) shows the validated variants’ 252 target genes, (ii) the chromosomal map, (iii) the location of validated variants marked by their rsIDs, (iv) using higher order ontology mapping, we display inner links between variants associated with diseases in the same category. Disease systems that contain ten or more validated variants are displayed while those contain less than ten validated variants are grouped in “Others” category, and (v) the manually annotated validated variant class. Additional File 3 contains all variant details and annotations
Fig. 3
Fig. 3
Functional validation remains the bottleneck of GWAS follow-up. A Comparison of the number of published studies in the GWAS catalog and non-coding variant validation studies over time. B Relationship between the ratio of validated non-coding variants to the total GWAS variants and disease category mean heritability. C Linkage disequilibrium between reported variant in GWAS Catalog and validated variants. D Distance between validated variant and GWAS Catalog-reported variant. E Global minor allele frequency (MAF) of validated variants in 1000 genomes phase 3. F Location of experimentally validated non-coding GWAS variants in relation to all protein-coding genes compared to GWAS lead variants
Fig. 4
Fig. 4
Non-coding variants regulate 252 target genes through diverse mechanisms. A Illustration of some of the diverse mechanisms of regulation within each variant category. Examples of each mechanism from included studies are discussed in the text. B Cumulative number of validated variants grouped by non-coding variant categories over time. C We used Encode’s Biomart and hg38 to calculate the distance (in kb) between validated variants and their target gene’s closest transcription start site (TSS). Graph plots the number of variant- gene pairs grouped by variant class. Variants more than 200 kb away are plotted at 200 kb. D Distribution of CRE variants relative to their target gene. CRE = Cis-Regulatory Element, ncRNA = non-coding RNA
Fig. 5
Fig. 5
Studies utilize multiple avenues in validating non-coding variants. Using text-mining of abstracts and metadata, we examined the utilization of different avenues for non-coding variant validation across 286 included articles. The six broad categories were gene expression, reporter assays, transcription factor binding, in vivo or animal models, genome editing, and chromatin interaction. The intersection size denotes the number of articles that have the combination of validation categories below it. The color denotes the number of avenues used; pink – 6, orange—5, green—4, black—3, blue—2, red—1. The upset plot shows the overlap of the variant validation avenues and the number of articles. The Set size bars on the right reflect the total number of studies that used/employed each of the categories

References

    1. Collins FS, Doudna JA, Lander ES, Rotimi CN. Human molecular genetics and genomics—Important advances and exciting possibilities. N Engl J Med. 2021;384:1–4. - PubMed
    1. OMIM - Online Mendelian Inheritance in Man. https://www.omim.org/. 2021 [cited 2021 Apr 11]; Available from: https://www.omim.org/
    1. Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D. Benefits and limitations of genome-wide association studies. Nat Rev Genet. 2019;20:467–484. - PubMed
    1. Klein RJ, Zeiss C, Chew EY, Tsai J-Y, Sackler RS, Haynes C, et al. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308:385–389. - PMC - PubMed
    1. Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47:D1005–D1012. - PMC - PubMed

Publication types

LinkOut - more resources