Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(11):e50610.
doi: 10.1371/journal.pone.0050610. Epub 2012 Nov 30.

Assessment of genotype imputation performance using 1000 Genomes in African American studies

Affiliations

Assessment of genotype imputation performance using 1000 Genomes in African American studies

Dana B Hancock et al. PLoS One. 2012.

Abstract

Genotype imputation, used in genome-wide association studies to expand coverage of single nucleotide polymorphisms (SNPs), has performed poorly in African Americans compared to less admixed populations. Overall, imputation has typically relied on HapMap reference haplotype panels from Africans (YRI), European Americans (CEU), and Asians (CHB/JPT). The 1000 Genomes project offers a wider range of reference populations, such as African Americans (ASW), but their imputation performance has had limited evaluation. Using 595 African Americans genotyped on Illumina's HumanHap550v3 BeadChip, we compared imputation results from four software programs (IMPUTE2, BEAGLE, MaCH, and MaCH-Admix) and three reference panels consisting of different combinations of 1000 Genomes populations (February 2012 release): (1) 3 specifically selected populations (YRI, CEU, and ASW); (2) 8 populations of diverse African (AFR) or European (AFR) descent; and (3) all 14 available populations (ALL). Based on chromosome 22, we calculated three performance metrics: (1) concordance (percentage of masked genotyped SNPs with imputed and true genotype agreement); (2) imputation quality score (IQS; concordance adjusted for chance agreement, which is particularly informative for low minor allele frequency [MAF] SNPs); and (3) average r2hat (estimated correlation between the imputed and true genotypes, for all imputed SNPs). Across the reference panels, IMPUTE2 and MaCH had the highest concordance (91%-93%), but IMPUTE2 had the highest IQS (81%-83%) and average r2hat (0.68 using YRI+ASW+CEU, 0.62 using AFR+EUR, and 0.55 using ALL). Imputation quality for most programs was reduced by the addition of more distantly related reference populations, due entirely to the introduction of low frequency SNPs (MAF≤2%) that are monomorphic in the more closely related panels. While imputation was optimized by using IMPUTE2 with reference to the ALL panel (average r2hat = 0.86 for SNPs with MAF>2%), use of the ALL panel for African American studies requires careful interpretation of the population specificity and imputation quality of low frequency SNPs.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Concordance resulting from four different imputation programs and three different 1000 Genomes (February 2012 release) reference panels.
Concordance rates were based on masking 2% of the genotyped SNPs on chromosome 22 and comparing imputed and true genotypes. The number of subjects corresponding to each reference panel is shown in parentheses.
Figure 2
Figure 2. Imputation quality score (IQS) resulting from four different imputation programs and three different 1000 Genomes (February 2012) reference panels.
IQS results were based on masking 2% of the genotyped SNPs and adjusting the concordance rate chance agreement between imputed and true genotypes. The number of subjects corresponding to each reference panel is shown in parentheses.
Figure 3
Figure 3. Average r2hat values resulting from four different imputation programs and three different 1000 Genomes (February 2012) reference panels.
r2hat values were averaged across all imputed SNPs on chromosome 22. The number of subjects corresponding to each reference panel is shown in parentheses.
Figure 4
Figure 4. Average r2hat, based on imputation using IMPUTE2, across the minor allele frequency (MAF) spectrum.
Imputation was conducted for all SNPs available on the YRI+CEU+ASW (N = 234, in red), AFR+EUR (N = 625, in green), or the ALL (N = 1,092, in blue) reference panel from 1000 Genomes. Imputed polymorphic SNPs were divided into MAF intervals of 1%, and their average r2hat values were calculated within each interval.

Similar articles

Cited by

References

    1. de Bakker PI, Ferreira MA, Jia X, Neale BM, Raychaudhuri S, et al. (2008) Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum Mol Genet 17: R122–128. - PMC - PubMed
    1. Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nat Rev Genet 11: 499–511. - PubMed
    1. Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, et al. (2010) Integrating common and rare genetic variation in diverse human populations. Nature 467: 52–58. - PMC - PubMed
    1. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, et al. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–861. - PMC - PubMed
    1. Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34: 816–834. - PMC - PubMed

Publication types