Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;73(1):18-25.
doi: 10.1159/000334084. Epub 2011 Dec 30.

Performance of genotype imputations using data from the 1000 Genomes Project

Affiliations

Performance of genotype imputations using data from the 1000 Genomes Project

Yun Ju Sung et al. Hum Hered. 2012.

Abstract

Genotype imputations based on 1000 Genomes (1KG) Project data have the advantage of imputing many more SNPs than imputations based on HapMap data. It also provides an opportunity to discover associations with relatively rare variants. Recent investigations are increasingly using 1KG data for genotype imputations, but only limited evaluations of the performance of this approach are available. In this paper, we empirically evaluated imputation performance using 1KG data by comparing imputation results to those using the HapMap Phase II data that have been widely used. We used three reference panels: the CEU panel consisting of 120 haplotypes from HapMap II and 1KG data (June 2010 release) and the EUR panel consisting of 566 haplotypes also from 1KG data (August 2010 release). We used Illumina 324,607 autosomal SNPs genotyped in 501 individuals of European ancestry. Our most important finding was that both 1KG reference panels provided much higher imputation yield than the HapMap II panel. There were more than twice as many successfully imputed SNPs as there were using the HapMap II panel (6.7 million vs. 2.5 million). Our second most important finding was that accuracy using both 1KG panels was high and almost identical to accuracy using the HapMap II panel. Furthermore, after removing SNPs with MACH Rsq <0.3, accuracy for both rare and low frequency SNPs was very high and almost identical to accuracy for common SNPs. We found that imputation using the 1KG-EUR panel had advantages in successfully imputing rare, low frequency and common variants. Our findings suggest that 1KG-based imputation can increase the opportunity to discover significant associations for SNPs across the allele frequency spectrum. Because the 1KG Project is still underway, we expect that later versions will provide even better imputation performance.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Venn diagram showing 14,163,455 SNPs on chromosomes 1 through 22 across the three reference panels from HMII and 1KG Projects. For the overlap between 1KG-CEU and 1KG-EUR, the hg18 map positions of 1KG-CEU were converted into hg19 positions, using the liftOver program on the UCSC Genome Browser web site.
Fig. 2
Fig. 2
Imputation yield (left) and accuracy (right) across the MAF spectrum for the 5% masked data using the three reference panels. Colored bars show yield and accuracy using filtered SNPs. Gray bars show total number of imputed SNPs and accuracy using all imputed SNPs. Online supplementary figure 1 shows the same information for all masked data.
Fig. 3
Fig. 3
Imputation accuracy (dosage Rsq) values versus MACH Rsq values for the 5% masked data. Red solid circles are rare SNPs and black solid squares are low frequency SNPs. The black line indicates where MACH Rsq equals dosage Rsq. The magenta line is the regression line. The vertical dashed line is the filtering rule that we used. Imputation accuracy (table 4, fig. 2) was computed as the average of dosage Rsq values shown in the Y-axis. Colors refer to the online version only.

Similar articles

Cited by

References

    1. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA. 2009;106:9362–9367. - PMC - PubMed
    1. The International HapMap Consortium: A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. - PMC - PubMed
    1. Anderson CA, Pettersson FH, Barrett JC, Zhuang JJ, Ragoussis J, Cardon LR, Morris AP. Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms. Am J Hum Genet. 2008;83:112–119. - PMC - PubMed
    1. Hao K, Chudin E, McElwee J, Schadt EE. Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies. BMC Genet. 2009;10:27. - PMC - PubMed
    1. Spencer CCA, Su Z, Donnelly P, Marchini J. Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet. 2009;5:e1000477. - PMC - PubMed

Publication types