Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jul;57(7):411-21.
doi: 10.1038/jhg.2012.43. Epub 2012 May 31.

Comprehensive evaluation of imputation performance in African Americans

Affiliations

Comprehensive evaluation of imputation performance in African Americans

Pritam Chanda et al. J Hum Genet. 2012 Jul.

Abstract

Imputation of genome-wide single-nucleotide polymorphism (SNP) arrays to a larger known reference panel of SNPs has become a standard and an essential part of genome-wide association studies. However, little is known about the behavior of imputation in African Americans with respect to the different imputation algorithms, the reference population(s) and the reference SNP panels used. Genome-wide SNP data (Affymetrix 6.0) from 3207 African American samples in the Atherosclerosis Risk in Communities Study (ARIC) was used to systematically evaluate imputation quality and yield. Imputation was performed with the imputation algorithms MACH, IMPUTE and BEAGLE using several combinations of three reference panels of HapMap III (ASW, YRI and CEU) and 1000 Genomes Project (pilot 1 YRI June 2010 release, EUR and AFR August 2010 and June 2011 releases) panels with SNP data on chromosomes 18, 20 and 22. About 10% of the directly genotyped SNPs from each chromosome were masked, and SNPs common between the reference panels were used for evaluating the imputation quality using two statistical metrics-concordance accuracy and Cohen's kappa (κ) coefficient. The dependencies of these metrics on the minor allele frequencies (MAF) and specific genotype categories (minor allele homozygotes, heterozygotes and major allele homozygotes) were thoroughly investigated to determine the best panel and method for imputation in African Americans. In addition, the power to detect imputed SNPs associated with simulated phenotypes was studied using the mean genotype of each masked SNP in the imputed data. Our results indicate that the genotype concordances after stratification into each genotype category and Cohen's κ coefficient are considerably better equipped to differentiate imputation performance compared with the traditionally used total concordance statistic, and both statistics improved with increasing MAF irrespective of the imputation method. We also find that both MACH and IMPUTE performed equally well and consistently better than BEAGLE irrespective of the reference panel used. Of the various combinations of reference panels, for both HapMap III and 1000 Genomes Project reference panels, the multi-ethnic panels had better imputation accuracy than those containing only single ethnic samples. The most recent 1000 Genomes Project release June 2011 had substantially higher number of imputed SNPs than HapMap III and performed as well or better than the best combined HapMap III reference panels and previous releases of the 1000 Genomes Project.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distribution of concordance accuracy (CA) of minor allele homozygotes and heterozygotes of (a, b) MACH, (c, d) IMPUTE and (e, f) for BEAGLE. (M=MACH, I=IMPUTE and B=BEAGLE).
Figure 2
Figure 2
Distribution of kappa for (a) MACH, (c) IMPUTE and (e) BEAGLE. Power is shown in (b) for MACH, (d) for IMPUTE and (f) for BEAGLE. (M=MACH, I=IMPUTE and B=BEAGLE).
Figure 3
Figure 3
Kappa vs yield for the three algorithms with ASW + CEU + YRI III for minor allele frequencies (MAF) bins (a) ⩽ 0.05 (b) 0.05–0.1 (c) 0.1–0.3 and (d) 0.3–0.5. (M=MACH, I=IMPUTE and B=BEAGLE).
Figure 4
Figure 4
For panel ASW + CEU + YRI III, comparison of (a,b) mean concordance accuracy (CA) for minor allele homozygotes and heterozygotes and (c) mean kappa for each method at different minor allele frequencies (MAF) bins. (M=MACH, I=IMPUTE and B=BEAGLE).
Figure 5
Figure 5
With MACH (ac) mean concordance accuracy (CA) for each genotype and (d) mean kappa using masked single-nucleotide polymorphisms (SNPs) exceeding a given r^cutoff2 for panel ASW + CEU + YRI III. Four minor allele frequencies (MAF) bins are shown as ⩽ 0.05 (red), 0.05–0.1 (blue), 0.1–0.3 (green) and 0.3–0.5 (magenta).

References

    1. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA. 2009;106:9362–9367. - PMC - PubMed
    1. Consortium IH, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. - PMC - PubMed
    1. Consortium IH, Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, Gibbs RA, et al. Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–58. - PMC - PubMed
    1. Consortium GP, Durbin RM, Abecasis GR, Altshuler DL, Auton A, Brooks LD, et al. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. - PMC - PubMed
    1. Li Y, Willer C, Sanna S, Abecasis G. Genotype imputation. Annu. Rev. Genomics Hum. Genet. 2009;10:387–406. - PMC - PubMed

Publication types

LinkOut - more resources