Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 6;33(14):1207-1214.
doi: 10.1093/hmg/ddae062.

Impact of the inaccessible genome on genotype imputation and genome-wide association studies

Affiliations

Impact of the inaccessible genome on genotype imputation and genome-wide association studies

Eva König et al. Hum Mol Genet. .

Abstract

Genotype imputation is widely used in genome-wide association studies (GWAS). However, both the genotyping chips and imputation reference panels are dependent on next-generation sequencing (NGS). Due to the nature of NGS, some regions of the genome are inaccessible to sequencing. To date, there has been no complete evaluation of these regions and their impact on the identification of associations in GWAS remains unclear. In this study, we systematically assess the extent to which variants in inaccessible regions are underrepresented on genotyping chips and imputation reference panels, in GWAS results and in variant databases. We also determine the proportion of genes located in inaccessible regions and compare the results across variant masks defined by the 1000 Genomes Project and the TOPMed program. Overall, fewer variants were observed in inaccessible regions in all categories analyzed. Depending on the mask used and normalized for region size, only 4%-17% of the genotyped variants are located in inaccessible regions and 52 to 581 genes were almost completely inaccessible. From the Cooperative Health Research in South Tyrol (CHRIS) study, we present a case study of an association located in an inaccessible region that is driven by genotyped variants and cannot be reproduced by imputation in GRCh37. We conclude that genotyping, NGS, genotype imputation and downstream analyses such as GWAS and fine mapping are systematically biased in inaccessible regions, due to missed variants and spurious associations. To help researchers assess gene and variant accessibility, we provide an online application (https://gab.gm.eurac.edu).

Keywords: GWAS; NGS; accessibility; genotyping chips; web tool.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Characteristics of accessible and inaccessible regions. Proportion of variants (for panels “reference panel”, “pathogenic ClinVar”, “genotyping chips”, “EBI GWAS Catalog”) or proportion of base pairs (for panels “genes”, “exons”) that fall into accessible and inaccessible regions as defined by the five masks, stratified by the size of the inaccessible region in the different masks. TM = TOPMed.
Figure 2
Figure 2
Number and proportion of genes and exons that are inaccessible. (a) Number of genes that are inaccessible by at least a certain proportion of the gene. (b) Number of genes by the proportion of exonic sequence within this gene that is inaccessible. TM = TOPMed.
Figure 3
Figure 3
Regional association plots (LocusZoom) of the aspartate aminotransferase genome-wide association results 500 kb around reference SNP rs2477642. The linkage disequilibrium between rs2477462 and all other variants is displayed as r2 values calculated from the 1000 genome Europeans. Genotyped variants are represented as filled circles, imputed variants as filled triangles. The genome wide significance line is indicated at 5 × 10−8. The regions of the genome defined as accessible according to the different masks are shown in black, inaccessible regions are shown in white. (a) 1000 genomes phase 3 (GRCh37), (b) HRC (GRCh37), (c) TOPMed (GRCh38), (d) 1000 genomes deep (GRCh38).

References

    1. Abdellaoui A, Yengo L, Verweij KJH. et al. . 15 years of GWAS discovery: realizing the promise. Am J Hum Genet 2023;110:179–94. - PMC - PubMed
    1. Howie B, Fuchsberger C, Stephens M. et al. . Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet 2012;44:955–9. - PMC - PubMed
    1. Das S, Abecasis GR, Browning BL. Genotype imputation from large reference panels. Annu Rev Genomics Hum Genet 2018;19:73–96. - PubMed
    1. Guo Y, Dai Y, Yu H. et al. . Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis. Genomics 2017;109:83–90. - PubMed
    1. Treangen TJ, Salzberg SL. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet 2012;13:36–46. - PMC - PubMed

Publication types

MeSH terms