Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb;141(2):229-238.
doi: 10.1007/s00439-021-02407-8. Epub 2022 Jan 4.

SNP characteristics and validation success in genome wide association studies

Affiliations

SNP characteristics and validation success in genome wide association studies

Olga Y Gorlova et al. Hum Genet. 2022 Feb.

Abstract

Genome wide association studies (GWASs) have identified tens of thousands of single nucleotide polymorphisms (SNPs) associated with human diseases and characteristics. A significant fraction of GWAS findings can be false positives. The gold standard for true positives is an independent validation. The goal of this study was to identify SNP features associated with validation success. Summary statistics from the Catalog of Published GWASs were used in the analysis. Since our goal was an analysis of reproducibility, we focused on the diseases/phenotypes targeted by at least 10 GWASs. GWASs were arranged in discovery-validation pairs based on the time of publication, with the discovery GWAS published before validation. We used four definitions of the validation success that differ by stringency. Associations of SNP features with validation success were consistent across the definitions. The strongest predictor of SNP validation was the level of statistical significance in the discovery GWAS. The magnitude of the effect size was associated with validation success in a non-linear manner. SNPs with risk allele frequencies in the range 30-70% showed a higher validation success rate compared to rarer or more common SNPs. Missense, 5'UTR, stop gained, and SNPs located in transcription factor binding sites had a higher validation success rate compared to intergenic, intronic and synonymous SNPs. There was a positive association between validation success and the level of evolutionary conservation of the sites. In addition, validation success was higher when discovery and validation GWASs targeted the same ethnicity. All predictors of validation success remained significant in a multivariate logistic regression model indicating their independent contribution. To conclude, we identified SNP features predicting validation success of GWAS hits. These features can be used to select SNPs for validation and downstream functional studies.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest/Competing interests

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Figures

Figure 1.
Figure 1.
Validation rate for different diseases/traits across all SNPs. Vertical black bars show 95% confidence intervals. Phenotypes are arranged from lowest to highest rate under strict, relaxed, and soft definitions of the validation success (a). (b) The same plus ultra-soft definition of validation success.
Figure 2.
Figure 2.
a) The proportion of validated SNPs in categories of −log(p) in the discovery GWAS stratified by deciles. X-coordinates of the dots represent the median −log(p) in each group. Bars represent 95% CI for the proportion of validated SNPs. a) – strict, relaxed, and soft definition of the validation success. b) – the same as (a) plus the ultra-soft definition of the validation success.
Figure 3.
Figure 3.
a) The proportion of validated SNPs in categories of ORs in the discovery GWAS stratified by deciles (Supplementary Table S2). X-coordinates of the dots represent the median OR in each group. The bars represent 95% CI for the proportion of validated SNPs. a) – strict, relaxed, and soft definition of the validation success. b) – the same as (a) plus ultra-soft definition of the validation success.
Figure 4.
Figure 4.
a) The proportion of validated SNPs in groups of risk allele frequency in the discovery GWAS stratified by deciles. X-coordinates of the dots represent the median risk allele frequency in each group. The bars represent 95% CI for the proportion of validated SNPs. a) – strict, relaxed, and soft definition of the validation success. b) – the same as (a) plus ultra-soft definition of the validation success.
Figure 5.
Figure 5.
a) The proportion of validated SNPs in categories of minor risk allele frequency in the discovery GWAS stratified by deciles. X-coordinates of the dots represent the median MAF in each group. The bars represent 95% CI for the proportion of validated SNPs. a) – strict, relaxed, and soft definition of the validation success. b) – the same as (a) plus ultra-soft definition of the validation success.
Figure 6.
Figure 6.
Validation rate for different types of SNPs. a) – strict, relaxed, and soft definition of the validation success. b) – the same as (a) plus ultra-soft definition of the validation success. Vertical bars show 95% CI.
Figure 7.
Figure 7.
a) The proportion of validated SNPs in categories of PhyloP score stratified by deciles. X coordinates of the dots represent the median PhyloP score in each group. The bars represent 95% CI for the proportion of validated SNPs. a) – strict, relaxed, and soft definition of the validation success. b) – the same as (a) plus ultra-soft definition of the validation success.
Figure 8.
Figure 8.
The proportion of validated SNPs under strict, relaxed, soft and ultra-soft definitions of the validation success. Vertical bars show 95% CI.

References

    1. Bosse Y, & Amos CI (2018). A Decade of GWAS Results in Lung Cancer. Cancer Epidemiol Biomarkers Prev, 27(4), 363–379. doi:10.1158/1055-9965.EPI-16-0794 - DOI - PMC - PubMed
    1. Brzyski D, Peterson CB, Sobczyk P, Candes EJ, Bogdan M, & Sabatti C (2017). Controlling the Rate of GWAS False Discoveries. Genetics, 205(1), 61–75. doi:10.1534/genetics.116.193987 - DOI - PMC - PubMed
    1. Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, … Parkinson H (2019). The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res, 47(D1), D1005–D1012. doi:10.1093/nar/gky1120 - DOI - PMC - PubMed
    1. Buroker NE (2014). Regulatory SNPs and transcriptional factor binding sites in ADRBK1, AKT3, ATF3, DIO2, TBXA2R and VEGFA. Transcription, 5(4), e964559. doi:10.4161/21541264.2014.964559 - DOI - PMC - PubMed
    1. Caballero A, Tenesa A, & Keightley PD (2015). The Nature of Genetic Variation for Complex Traits Revealed by GWAS and Regional Heritability Mapping Analyses. Genetics, 201(4), 1601–1613. doi:10.1534/genetics.115.177220 - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources