Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan 26;10(1):e0116487.
doi: 10.1371/journal.pone.0116487. eCollection 2015.

Performance of genotype imputation for low frequency and rare variants from the 1000 genomes

Affiliations

Performance of genotype imputation for low frequency and rare variants from the 1000 genomes

Hou-Feng Zheng et al. PLoS One. .

Abstract

Genotype imputation is now routinely applied in genome-wide association studies (GWAS) and meta-analyses. However, most of the imputations have been run using HapMap samples as reference, imputation of low frequency and rare variants (minor allele frequency (MAF) < 5%) are not systemically assessed. With the emergence of next-generation sequencing, large reference panels (such as the 1000 Genomes panel) are available to facilitate imputation of these variants. Therefore, in order to estimate the performance of low frequency and rare variants imputation, we imputed 153 individuals, each of whom had 3 different genotype array data including 317k, 610k and 1 million SNPs, to three different reference panels: the 1000 Genomes pilot March 2010 release (1KGpilot), the 1000 Genomes interim August 2010 release (1KGinterim), and the 1000 Genomes phase1 November 2010 and May 2011 release (1KGphase1) by using IMPUTE version 2. The differences between these three releases of the 1000 Genomes data are the sample size, ancestry diversity, number of variants and their frequency spectrum. We found that both reference panel and GWAS chip density affect the imputation of low frequency and rare variants. 1KGphase1 outperformed the other 2 panels, at higher concordance rate, higher proportion of well-imputed variants (info>0.4) and higher mean info score in each MAF bin. Similarly, 1M chip array outperformed 610K and 317K. However for very rare variants (MAF ≤ 0.3%), only 0-1% of the variants were well imputed. We conclude that the imputation of low frequency and rare variants improves with larger reference panels and higher density of genome-wide genotyping arrays. Yet, despite a large reference panel size and dense genotyping density, very rare variants remain difficult to impute.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. The proportion of variants by Minor Allele Frequency (MAF) across imputation reference panels.
Figure 2
Figure 2. The proportion of well-imputed SNPs (info>0.4) in different MAF bins across imputation reference panels (Panel A is for the 317K genotypic array, Panel B is for 610K genotypic array, and Panel C is for 1M genotypic array).
Panel D, E and F is a comparison of median info score across 3 reference panels for 317K, 610K and 1M genotypic array respectively.

Similar articles

Cited by

References

    1. Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nature reviews Genetics 11: 499–511. 10.1038/nrg2796 - DOI - PubMed
    1. Maher B (2008) Personal genomes: The case of the missing heritability. Nature 456: 18–21. 10.1038/456018a - DOI - PubMed
    1. Eichler EE, Flint J, Gibson G, Kong A, Leal SM, et al. (2010) Missing heritability and strategies for finding the underlying causes of complex disease. Nature reviews Genetics 11: 446–450. 10.1038/nrg2809 - DOI - PMC - PubMed
    1. Dickson SP, Wang K, Krantz I, Hakonarson H, Goldstein DB (2010) Rare variants create synthetic genome-wide associations. PLoS biology 8: e1000294 10.1371/journal.pbio.1000294 - DOI - PMC - PubMed
    1. Ladouceur M, Zheng HF, Greenwood CM, Richards JB (2013) Empirical power of very rare variants for common traits and disease: results from sanger sequencing 1998 individuals. European journal of human genetics: EJHG 21: 1027–1030. 10.1038/ejhg.2012.284 - DOI - PMC - PubMed

Publication types

LinkOut - more resources