Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Nov;22(11):1321-6.
doi: 10.1038/ejhg.2014.19. Epub 2014 Jun 4.

Improved imputation quality of low-frequency and rare variants in European samples using the 'Genome of The Netherlands'

Collaborators, Affiliations

Improved imputation quality of low-frequency and rare variants in European samples using the 'Genome of The Netherlands'

Patrick Deelen et al. Eur J Hum Genet. 2014 Nov.

Abstract

Although genome-wide association studies (GWAS) have identified many common variants associated with complex traits, low-frequency and rare variants have not been interrogated in a comprehensive manner. Imputation from dense reference panels, such as the 1000 Genomes Project (1000G), enables testing of ungenotyped variants for association. Here we present the results of imputation using a large, new population-specific panel: the Genome of The Netherlands (GoNL). We benchmarked the performance of the 1000G and GoNL reference sets by comparing imputation genotypes with 'true' genotypes typed on ImmunoChip in three European populations (Dutch, British, and Italian). GoNL showed significant improvement in the imputation quality for rare variants (MAF 0.05-0.5%) compared with 1000G. In Dutch samples, the mean observed Pearson correlation, r(2), increased from 0.61 to 0.71. We also saw improved imputation accuracy for other European populations (in the British samples, r(2) improved from 0.58 to 0.65, and in the Italians from 0.43 to 0.47). A combined reference set comprising 1000G and GoNL improved the imputation of rare variants even further. The Italian samples benefitted the most from this combined reference (the mean r(2) increased from 0.47 to 0.50). We conclude that the creation of a large population-specific reference is advantageous for imputing rare variants and that a combined reference panel across multiple populations yields the best imputation results.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Comparison of imputation quality of rare variants using the 1000G data, GoNL, and the combined reference panel.
Figure 2
Figure 2
Clustering of reference and study samples. PC1 and PC2 reveal three main clusters: Tuscans from Italy (TSI), Finnish (FIN), and a Western European cluster with the CEU (Utah Residents with Northern and Western European ancestry), the GBR (British) and the GoNL samples (a). b shows that most of our GWAS samples clustered in a similar way to the corresponding 1000G/GoNL samples.
Figure 3
Figure 3
Calibration of posterior probabilities. The posterior probabilities were, in general, well calibrated, although there were a few deviations from the expected accuracy (a). For common and low-frequency variants (b and c), we observed a strong correlation (r2 0.97 and 0.91, respectively) between the impute2 info metric and the observed r2. However, for the rare variants (d), the relation between predicted and observed quality was less profound. We also observed a correlation of 0.70 and several large deviations from the diagonal.

References

    1. Hindorff LA, Sethupathy P, Junkins HA, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA. 2009;106:9362–9367. - PMC - PubMed
    1. Maller JB, McVean G, Byrnes J, et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat Genet. 2012;44:1294–1301. - PMC - PubMed
    1. Shea J, Agarwala V, Philippakis AA, et al. Comparing strategies to fine-map the association of common SNPs at chromosome 9p21 with type 2 diabetes and myocardial infarction. Nat Genet. 2011;43:801–805. - PMC - PubMed
    1. Kryukov GV, Pennacchio LA, Sunyaev SR. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am J Hum Genet. 2007;80:727–739. - PMC - PubMed
    1. Cirulli ET, Goldstein DB. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet. 2010;11:415–425. - PubMed

Publication types

LinkOut - more resources